Search Remote Jobs

Senior Site Reliability Engineer

Job not on LinkedIn

🔥 0 minutes ago

Apply Now
Find Similar Remote Jobs

📊 Check your resume score for this job

Improve your chances of getting an interview by checking your resume score before you apply.

Logo of Mozilla

Mozilla

501 - 1000 employees

Founded 1998

👥 B2C

🔒 Cybersecurity

B2C • Cybersecurity • Software

Mozilla is a non-profit organization dedicated to promoting an open and accessible internet. They are the makers of the popular Firefox browser, which emphasizes user privacy, speed, and control. Mozilla also offers a range of products that focus on internet security and privacy, including Mozilla VPN, Firefox Relay, and Mozilla Monitor. Additionally, the organization is involved in open-source projects, AI innovation, and advocating for digital rights. Mozilla aims to empower users with trustworthy technology and policies that protect privacy, support open-source AI development, and foster accountability for tech companies.

📋 Description

• Operate and evolve our EKS-based Kubernetes platform, supporting service migrations, platform improvements, and reliability initiatives. • Design and develop CI/CD systems supporting websites, services, and Thunderbird desktop releases, contributing to pipeline reliability and OIDC-based authentication across GitHub Actions workflows. • Write and maintain infrastructure in Pulumi and/or Terraform/OpenTofu across multiple AWS accounts. • Operate and evolve our observability stack (VictoriaMetrics, VictoriaLogs, Grafana, Vector) and partner with engineering teams to incorporate instrumentation and monitoring into service design. • Apply security-conscious infrastructure practices, including least-privilege IAM, secrets management via AWS Secrets Manager and External Secrets Operator, and network segmentation. • Diagnose and debug production incidents; drive root-cause analysis and post-incident improvements to prevent recurring problems. • Participate in on-call rotation and collaborate with SDEs and fellow SREs to ship, maintain, and monitor new builds and support service onboarding. • Contribute to runbooks, architecture documentation, and team processes.

🎯 Requirements

• 7+ years of experience in infrastructure, platform engineering, or site reliability roles, including hands-on production Kubernetes experience in workload operations, troubleshooting, and cluster management. • Hands-on experience with infrastructure-as-code on AWS using Terraform, OpenTofu, or Pulumi. • Security awareness in day-to-day infrastructure work: identity, least privilege, secrets hygiene, and network controls. • Demonstrated ownership mindset with the ability to proactively identify issues, drive work to completion, and communicate risks early. • Excellent async written communication skills; comfortable working with a geographically distributed team. • Ability to collaborate effectively with software engineers and non-engineering stakeholders to improve platform reliability and operational efficiency. • Ability to learn, evaluate, and responsibly use emerging technologies, including AI-enabled tools, to improve work processes.

🏖️ Benefits

• Fully remote work & schedule flexibility • Company-provided laptop • Annual bonus program • Monthly remote work stipend • Annual professional development stipend • Industry conferences • Company all-hands and team gatherings • 24 days PTO per year (prorated) • Your birthday • Year-end company shutdown • 9 wellbeing days • Public holidays • Other paid leave • Quarterly wellbeing stipend for personal / family activities • RRSP contributions • Health, dental, & vision insurance • Disability insurance • Life insurance • Employee assistance program • Paid parental leave • Paid sick days

Apply Now

Similar Jobs

🔥 3 hours ago

Ant-Tech

11 - 50

🔌 API

☁️ SaaS

🤝 B2B

Site Reliability Engineer for fintech company designing and maintaining infrastructure solutions. Working remotely to automate systems and support reliability across global financial platforms.

🕒 Yesterday

NICE

5001 - 10000

☁️ SaaS

🤖 Artificial Intelligence

📡 Telecommunications

Lead DevOps Engineer responsible for designing and maintaining CI/CD pipelines at NICE. Collaborating with engineering and operations teams to ensure high-quality software delivery.

🕒 Yesterday

NICE

5001 - 10000

☁️ SaaS

🤖 Artificial Intelligence

📡 Telecommunications

DevOps Engineer automating pipelines in a collaborative environment for NICE. Required to coordinate builds, manage release operations, and research new technologies.

Perl

Python

🕒 2 days ago

Honeycomb.io

51 - 200

☁️ SaaS

🏢 Enterprise

🤖 Artificial Intelligence

Senior Site Reliability Engineer scaling backend systems for high-volume customers at Honeycomb.io. Collaborating across teams to enhance reliability and engineering culture.

🕒 3 days ago

MLabs

51 - 200

Site Reliability Engineer at a high-performance financial technology firm specializing in integration platforms for global financial institutions. Join the SRE team to champion automation culture and operational excellence.