Senior Site Reliability Engineer

Job not on LinkedIn

🔥 0 minutes ago

Apply Now
Find Similar Remote Jobs

📊 Check your resume score for this job

Improve your chances of getting an interview by checking your resume score before you apply.

Logo of Mozilla

Mozilla

501 - 1000 employees

Founded 1998

👥 B2C

🔒 Cybersecurity

B2C • Cybersecurity • Software

Mozilla is a non-profit organization dedicated to promoting an open and accessible internet. They are the makers of the popular Firefox browser, which emphasizes user privacy, speed, and control. Mozilla also offers a range of products that focus on internet security and privacy, including Mozilla VPN, Firefox Relay, and Mozilla Monitor. Additionally, the organization is involved in open-source projects, AI innovation, and advocating for digital rights. Mozilla aims to empower users with trustworthy technology and policies that protect privacy, support open-source AI development, and foster accountability for tech companies.

📋 Description

• Operate and evolve our EKS-based Kubernetes platform, supporting service migrations, platform improvements, and reliability initiatives. • Design and develop CI/CD systems supporting websites, services, and Thunderbird desktop releases, contributing to pipeline reliability and OIDC-based authentication across GitHub Actions workflows. • Write and maintain infrastructure in Pulumi and/or Terraform/OpenTofu across multiple AWS accounts. • Operate and evolve our observability stack (VictoriaMetrics, VictoriaLogs, Grafana, Vector) and partner with engineering teams to incorporate instrumentation and monitoring into service design. • Apply security-conscious infrastructure practices, including least-privilege IAM, secrets management via AWS Secrets Manager and External Secrets Operator, and network segmentation. • Diagnose and debug production incidents; drive root-cause analysis and post-incident improvements to prevent recurring problems. • Participate in on-call rotation and collaborate with SDEs and fellow SREs to ship, maintain, and monitor new builds and support service onboarding. • Contribute to runbooks, architecture documentation, and team processes.

🎯 Requirements

• 7+ years of experience in infrastructure, platform engineering, or site reliability roles, including hands-on production Kubernetes experience in workload operations, troubleshooting, and cluster management. • Hands-on experience with infrastructure-as-code on AWS using Terraform, OpenTofu, or Pulumi. • Security awareness in day-to-day infrastructure work: identity, least privilege, secrets hygiene, and network controls. • Demonstrated ownership mindset with the ability to proactively identify issues, drive work to completion, and communicate risks early. • Excellent async written communication skills; comfortable working with a geographically distributed team. • Ability to collaborate effectively with software engineers and non-engineering stakeholders to improve platform reliability and operational efficiency. • Ability to learn, evaluate, and responsibly use emerging technologies, including AI-enabled tools, to improve work processes.

🏖️ Benefits

• Fully remote work & schedule flexibility • Company-provided laptop • Annual bonus program • Monthly remote work stipend • Annual professional development stipend • Industry conferences • Company all-hands and team gatherings • 24 days PTO per year (prorated) • Your birthday • Year-end company shutdown • 9 wellbeing days • Public holidays • Other paid leave • Quarterly wellbeing stipend for personal / family activities • RRSP contributions • Health, dental, & vision insurance • Disability insurance • Life insurance • Employee assistance program • Paid parental leave • Paid sick days

Apply Now

Similar Jobs

🕒 2 days ago

Minor Hotels Europe and Americas

10,000+ employees

👥 B2C

Software Change Management Consultant supporting application migration projects using IBM’s DBB/Git/IDD Solutions. Guiding clients through the conversion process and providing migration expertise and training.

🇨🇦 Canada – Remote

💵 $62.9k - $147.5k / year

💰 Post-IPO Equity on 2018-05

⏰ Full Time

🟠 Senior

🔴 Lead

⛑ DevOps & Site Reliability Engineer (SRE)

🕒 2 days ago

Clic SantĂŠ

11 - 50

☁️ SaaS

🏛️ Government

🤝 B2B

DevOps/DevSecOps managing cloud-native infrastructure on GCP, optimizing CI/CD and automation for a healthcare startup. Prioritizing security, performance, and resilience in a scalable environment.

🗣️🇫🇷 French Required

🕒 3 days ago

Absorb Software

501 - 1000

☁️ SaaS

📚 Education

🏢 Enterprise

Senior DevOps Engineer at Absorb optimizing cloud-based Learning Management System and guiding operational strategies. Partnering with teams to enhance system reliability and performance for user experience.

🇨🇦 Canada – Remote

💰 $59M Private Equity Round - Absorb LMS on 2017-09

⏰ Full Time

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

🕒 5 days ago

FreedX

11 - 50

₿ Crypto

💸 Finance

💳 Fintech

Senior DevOps Engineer responsible for infrastructure design and reliability at Freedx, a cryptocurrency exchange. Proposing solutions and leading technical discussions in a fast-paced environment.

🕒 6 days ago

BrightOrder Inc.

51 - 200

🚗 Transport

☁️ SaaS

📡 Telecommunications

Full Stack Developer responsible for creating and scaling BrightOrder’s cloud-based platform. Collaborating with teams and automating processes for efficient system performance.