Senior Site Reliability Engineer

Job not on LinkedIn

🕒 3 days ago

Apply Now
Find Similar Remote Jobs

📊 Check your resume score for this job

Improve your chances of getting an interview by checking your resume score before you apply.

Logo of Mozilla

Mozilla

501 - 1000 employees

Founded 1998

👥 B2C

🔒 Cybersecurity

B2C • Cybersecurity • Software

Mozilla is a non-profit organization dedicated to promoting an open and accessible internet. They are the makers of the popular Firefox browser, which emphasizes user privacy, speed, and control. Mozilla also offers a range of products that focus on internet security and privacy, including Mozilla VPN, Firefox Relay, and Mozilla Monitor. Additionally, the organization is involved in open-source projects, AI innovation, and advocating for digital rights. Mozilla aims to empower users with trustworthy technology and policies that protect privacy, support open-source AI development, and foster accountability for tech companies.

📋 Description

• Operate and evolve our EKS-based Kubernetes platform, supporting service migrations, platform improvements, and reliability initiatives. • Design and develop CI/CD systems supporting websites, services, and Thunderbird desktop releases, contributing to pipeline reliability and OIDC-based authentication across GitHub Actions workflows. • Write and maintain infrastructure in Pulumi and/or Terraform/OpenTofu across multiple AWS accounts. • Operate and evolve our observability stack (VictoriaMetrics, VictoriaLogs, Grafana, Vector) and partner with engineering teams to incorporate instrumentation and monitoring into service design. • Apply security-conscious infrastructure practices, including least-privilege IAM, secrets management via AWS Secrets Manager and External Secrets Operator, and network segmentation. • Diagnose and debug production incidents; drive root-cause analysis and post-incident improvements to prevent recurring problems. • Participate in on-call rotation and collaborate with SDEs and fellow SREs to ship, maintain, and monitor new builds and support service onboarding. • Contribute to runbooks, architecture documentation, and team processes.

🎯 Requirements

• 7+ years of experience in infrastructure, platform engineering, or site reliability roles, including hands-on production Kubernetes experience in workload operations, troubleshooting, and cluster management. • Hands-on experience with infrastructure-as-code on AWS using Terraform, OpenTofu, or Pulumi. • Security awareness in day-to-day infrastructure work: identity, least privilege, secrets hygiene, and network controls. • Demonstrated ownership mindset with the ability to proactively identify issues, drive work to completion, and communicate risks early. • Excellent async written communication skills; comfortable working with a geographically distributed team. • Ability to collaborate effectively with software engineers and non-engineering stakeholders to improve platform reliability and operational efficiency. • Ability to learn, evaluate, and responsibly use emerging technologies, including AI-enabled tools, to improve work processes.

🏖️ Benefits

• Fully remote work & schedule flexibility • Company-provided laptop • Annual bonus program • Monthly remote work stipend • Annual professional development stipend • Industry conferences • Company all-hands and team gatherings • 24 days PTO per year (prorated) • Your birthday • Year-end company shutdown • 9 wellbeing days • Public holidays • Other paid leave • Quarterly wellbeing stipend for personal / family activities • RRSP contributions • Health, dental, & vision insurance • Disability insurance • Life insurance • Employee assistance program • Paid parental leave • Paid sick days

Apply Now

Similar Jobs

🕒 6 days ago

Minor Hotels Europe and Americas

10,000+ employees

👥 B2C

Software Change Management Consultant supporting application migration projects using IBM’s DBB/Git/IDD Solutions. Guiding clients through the conversion process and providing migration expertise and training.

🇨🇦 Canada – Remote

💵 $62.9k - $147.5k / year

💰 Post-IPO Equity on 2018-05

⏰ Full Time

🟠 Senior

🔴 Lead

⛑ DevOps & Site Reliability Engineer (SRE)

Groovy

🕒 6 days ago

Clic SantĂŠ

11 - 50

☁️ SaaS

🏛️ Government

🤝 B2B

DevOps/DevSecOps managing cloud-native infrastructure on GCP, optimizing CI/CD and automation for a healthcare startup. Prioritizing security, performance, and resilience in a scalable environment.

🗣️🇫🇷 French Required

Cloud

Kubernetes

Terraform

🕒 June 4

Absorb Software

501 - 1000

☁️ SaaS

📚 Education

🏢 Enterprise

Senior DevOps Engineer at Absorb optimizing cloud-based Learning Management System and guiding operational strategies. Partnering with teams to enhance system reliability and performance for user experience.

AWS

Cloud

Distributed Systems

Prometheus

🕒 June 3

FreedX

11 - 50

₿ Crypto

💸 Finance

💳 Fintech

Senior DevOps Engineer responsible for infrastructure design and reliability at Freedx, a cryptocurrency exchange. Proposing solutions and leading technical discussions in a fast-paced environment.

Distributed Systems

DNS

Firewalls

Linux

TCP/IP

🕒 June 2

BrightOrder Inc.

51 - 200

🚗 Transport

☁️ SaaS

📡 Telecommunications

Full Stack Developer responsible for creating and scaling BrightOrder’s cloud-based platform. Collaborating with teams and automating processes for efficient system performance.

AWS

Cloud

Docker

EC2

Grafana

GraphQL

IoT

JavaScript

Kubernetes

Linux

Microservices

MongoDB

MS SQL Server

Oracle

Postgres

Prometheus

Python

RabbitMQ

React

Redis

SQL

Terraform

TypeScript

Go