Senior Site Reliability Engineer

🕒 May 8

Apply Now
Find Similar Remote Jobs

📊 Check your resume score for this job

Improve your chances of getting an interview by checking your resume score before you apply.

Logo of Megaport

Megaport

201 - 500 employees

Founded 2013

📡 Telecommunications

Networking • Cloud Computing • Telecommunications

Megaport is a leading provider of global private connectivity solutions that enable simplified network interconnection. The company offers a platform for deploying secure, scalable, and agile networks that interconnect data centers, clouds, and virtual points of presence. Megaport's services allow users to create secure and dynamic network connections on-demand, without hardware or long-term contracts, offering flexibility and speed to businesses. By partnering with global service providers, data center operators, and systems integrators, Megaport ensures robust and widespread network access across 930+ locations in 25 countries. Its smart software tools and APIs allow for easy network management, making it a trusted choice for cloud networking and hybrid cloud solutions.

📋 Description

• Improving production reliability and system resilience within an SRE scoped team • Championing high standards of work and industry best practices • Communicating with teams and stakeholders at all stages • Bringing fresh ideas to the table and encouraging others • Diving into complex technical problems with a can-do attitude • Working across numerous technologies in a fast-changing industry • Participating in on-call rotation, incident response, and blameless post-incident reviews • Writing code, handling alerts, improving solutions, and supporting others • Playing a crucial role in the success of your company and team

🎯 Requirements

• 5+ years administering Linux systems and related infrastructure in production environments • A collaborative SRE mindset, with familiarity around SLIs/SLOs/SLAs, error budgets, blast radius, and blameless postmortems • A focus on automation, reducing toil, and preventing problem recurrence • A track record of writing runbooks that work for the broader team, not just yourself • Strong Kubernetes and broader ecosystem fundamentals • Cloud infrastructure experience; AWS strongly preferred and bare-metal is a bonus • Strong tool development - Bash, plus either Python or Go preferred, or similar • Infrastructure-as-code tooling experience - Terraform preferred • CI/CD and version control, GitHub preferred • Database experience - one of Postgres, Cassandra, or ClickHouse preferred • Experience operating a production observability stack (metrics, logs, traces), with an eye for signal over noise • Comfortable working on live production infrastructure, with strong troubleshooting instincts and ownership of incident response • A history of continual professional development • A self-directed style suited to an async, globally distributed team, and comfortable picking up adjacent work when the situation calls for it

🏖️ Benefits

• Flexible working environments • Birthday Leave • Generous study and training allowance + 5 days paid study leave • Creative, fun, and contemporary workspaces • Motivated team of industry experts and new talent • Celebrated success with ‘Legend’ and ‘Kudos’ Awards • Health and wellness program

Apply Now

Similar Jobs

🕒 April 28

Sigma Prime

11 - 50

🌐 Web 3

₿ Crypto

🔒 Cybersecurity

Devops Engineer building decentralized network infrastructure with Sigma Prime. Assist developers and create testnets while maintaining production instances of Ethereum software.

Ansible

DNS

Firewalls

Kubernetes

Linux

Terraform

🕒 April 10

Ditto

11 - 50

🔌 API

📡 Telecommunications

Site Reliability Engineer ensuring reliable, scalable cloud infrastructure for Ditto's edge-to-cloud technology. Collaborate on observability and incident management to meet enterprise demands.

AWS

Azure

Cloud

Google Cloud Platform

Grafana

Java

Prometheus

Python

Rust

Terraform

Go

🕒 April 2

ClickHouse

51 - 200

☁️ SaaS

🏢 Enterprise

🤖 Artificial Intelligence

Database Reliability Engineer driving improvements in performance and reliability for ClickHouse. Collaborating with global teams to optimize operations and enhance service reliability.

AWS

Azure

Cloud

Google Cloud Platform

Python

SQL

🕒 March 31

Binance

1001 - 5000

₿ Crypto

💳 Fintech

Senior DevOps Engineer or Architect at Binance elevating infrastructure and configuration management capabilities for the world's largest cryptocurrency exchange.

Ansible

AWS

Cloud

Docker

ElasticSearch

Google Cloud Platform

Kafka

Linux

Python

Terraform

Go

🕒 March 28

RevenueCat

51 - 200

☁️ SaaS

🔌 API

🤝 B2B

Senior DevOps/DevEx Engineer responsible for building internal development tools at RevenueCat. Collaborating with a global remote team across diverse geographic locations.

AWS

Cloud

Docker

Kubernetes

Python