Manager – Site Reliability Operations

5001 - 10000 employees

Founded 1962

🚘 Automotive

💼 Consulting

📦 Logistics

Automotive • Consulting • Logistics

Mercury Insurance is a leading provider of insurance products, focusing on protecting individuals and their assets with a commitment to privacy and customer service. The company operates through independent agents and offers a range of insurance services including auto, home, and other personal insurance products. Mercury Insurance prioritizes the security of personal information and compliance with privacy laws, ensuring that customer data is handled with care and only shared when necessary for account servicing or as legally required.

Manager – Site Reliability Operations

🕒 June 12

🏄 California – Remote

💵 $118.7k - $230.6k / year

⏰ Full Time

🟠 Senior

🔴 Lead

⛑ DevOps & Site Reliability Engineer (SRE)

🦅 H1B Visa Sponsor

Cloud

Apply Now

Find Similar Remote Jobs

📊 Check your resume score for this job

Improve your chances of getting an interview by checking your resume score before you apply.

Mercury Insurance

5001 - 10000 employees

Founded 1962

🚘 Automotive

💼 Consulting

📦 Logistics

Automotive • Consulting • Logistics

📋 Description

• Lead the Site Reliability Operations team, including the Network Operations Center (NOC), responsible for observability, real-time monitoring, incident response, and operational excellence for key enterprise services; set direction, priorities, and success metrics for the team. • Partner with Product Management, Engineering, SRE, and the rest of infrastructure team to embed CI/CD and release best practices into operations, including automated build/test/deploy, health checks, rollbacks, release monitoring via the NOC, and change-management guardrails. • Oversee service reliability monitoring and incident management: ensure appropriate observability (metrics, logs, traces, dashboards), well-tuned alerting thresholds, escalation paths, and effective communications to stakeholders and leadership during incidents. • Own and mature the Problem Management function for the team: drive root cause analysis (RCA) of recurring or high-severity incidents, standardize post-incident reviews, and ensure corrective actions and follow-ups are implemented and verified. • Define, track, and report operational and reliability metrics (e.g., availability, MTTR, incident volume, change failure rate, deployment frequency, problem resolution time); provide regular insights and recommendations to Technology Operations leadership. • Champion automation and “operations as code” (infrastructure as code, configuration as code, automated runbooks), working with engineering teams to reduce manual toil and improve consistency, speed, and safety of operations and releases. • Recruit, develop, coach, and evaluate team members; provide performance feedback, make salary and promotion recommendations, and foster a high-performing, collaborative culture aligned with Mercury’s core values. • Provide leadership coverage for 7x24 mission-critical support through the NOC and on-call rotations; ensure sustainable on-call practices, high-quality runbooks, and continuous improvement of tooling and processes.

🎯 Requirements

• Minimum: Bachelor’s degree in computer science, Information Systems, Engineering, or related field, or equivalent combination of education and work experience. • Minimum: 7+ years of experience in IT operations, SRE, DevOps, or related roles supporting mission-critical systems. • 3+ years of experience in a lead or management role overseeing technical teams in a 24x7 environment. • Preferred: Advanced coursework or certifications or experience in Site Reliability Engineering, DevOps, Cloud platforms, or ITIL). • Strong understanding of CI/CD pipelines (build, test, security scanning, deployment, rollback) and how they support reliable operations. • Solid knowledge of observability practices and tools (metrics, logs, traces, dashboards, alerts) and how to design actionable monitoring and alerting for production systems. • Deep familiarity with incident and problem management processes, including root cause analysis methods and post-incident review facilitation. • Working knowledge of DevOps/SRE concepts such as SLOs/SLIs, error budgets, resilience patterns, automation to reduce toil, and blameless culture. • Demonstrated ability to lead and influence cross-functional teams, build relationships, and collaborate effectively with engineering, InfoSec, infrastructure, and business stakeholders. • Excellent communication skills, both written and verbal; able to clearly communicate technical issues, risks, and recommendations to technical and non-technical audiences, including senior leadership. • Strong analytical and problem-solving skills; able to analyze operational data and trends to identify risks, drive decisions, and prioritize improvements. • Self-motivated, adaptable, and able to operate with minimal supervision in a fast-changing environment. • Ability to work extended hours, nights, or weekends as needed to support critical releases or resolve major incidents.

🏖️ Benefits

• Competitive compensation • Flexibility to work from anywhere in the United States for most positions • Paid time off (vacation time, sick time, 9 paid Company holidays, volunteer hours) • Incentive bonus programs (potential for holiday bonus, referral bonus, and performance-based bonus) • Medical, dental, vision, life, and pet insurance • 401 (k) retirement savings plan with company match • Engaging work environment • Promotional opportunities • Education assistance • Professional and personal development opportunities • Company recognition program • Health and wellbeing resources, including free mental wellbeing therapy/coaching sessions, child and eldercare resources, and more

Apply Now

Similar Jobs

DevOps Engineer

🕒 June 12

HealthEdge

1001 - 5000

🏥 Healthcare

💼 Consulting

⚕️ Healthcare Insurance

DevOps Engineer with strong Java background to build, automate, and operate healthcare technology platform infrastructure. Accelerating delivery pipelines and ensuring reliability of critical healthcare systems.

🇺🇸 United States – Remote

💵 $110k - $140k / year

⏰ Full Time

🟡 Mid-level

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

🦅 H1B Visa Sponsor

AWS

Cloud

Docker

EC2

Java

Jenkins

Kubernetes

Linux

Microservices

Python

Splunk

Spring

Spring Boot

SpringBoot

Terraform

DevOps Engineer

🕒 June 12

By Light Professional IT Services

1001 - 5000

💼 Consulting

📦 Logistics

🔒 Cybersecurity

DevOps Engineer supporting DOD Persistent Cyber Training Environment by developing and updating Cyber Range software. Responsibilities include software integration, testing, deployment, and process automation.

🇺🇸 United States – Remote

⏰ Full Time

🟡 Mid-level

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

🦅 H1B Visa Sponsor

Cloud

Jenkins

Kubernetes

Linux

Senior DevOps Engineer

🕒 June 12

Precision eControl

11 - 50

🛍️ eCommerce

🤝 B2B

Senior DevOps Engineer designing and operating cloud infrastructure at Precision eControl. Focus on Microsoft Azure with a transition to AWS.

🇺🇸 United States – Remote

💵 $150k - $160k / year

⏰ Full Time

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

AWS

Azure

Cloud

Distributed Systems

Docker

Kubernetes

Terraform

Senior DevOps Engineer

🕒 June 12

Sharetec Systems

51 - 200

💸 Finance

🏦 Banking

💳 Fintech

Senior DevOps Engineer for remote mobile application deployment and automation. Focused on improving complex systems deployment, supporting production environments, and collaborating with cross-functional teams.

🇺🇸 United States – Remote

💵 $120k - $140k / year

⏰ Full Time

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

Android

Ansible

Docker

iOS

Kubernetes

Python

Terraform

Cloud Security Engineer, DevSecOps Engineer

🕒 June 12

Get Well

201 - 500

🏥 Healthcare

⚕️ Healthcare Insurance

☁️ SaaS