Lead Site Reliability Developer – CSRE Consulting

🕒 May 1

Apply Now
Find Similar Remote Jobs

📊 Check your resume score for this job

Improve your chances of getting an interview by checking your resume score before you apply.

Logo of Ticketmaster

Ticketmaster

10,000+ employees

Founded 1976

🛍️ eCommerce

⚽ Sports

eCommerce • Entertainment • Sports

Ticketmaster is a leading ticketing platform that facilitates the sale of tickets for concerts, sports events, theater performances, and other live entertainment. The platform offers a user-friendly experience for purchasing tickets, as well as managing events and finding popular shows and games. Ticketmaster serves as the official ticket marketplace for many major sports leagues and artist events, making it a key player in the live entertainment industry.

📋 Description

• Lead consulting work from discovery through delivery by aligning stakeholders on priorities, sequencing work, and communicating measurable outcomes. • Establish working cadence and facilitate decision forums to surface risks, map dependencies, and drive clear ownership and timelines. • Align product, platform, and engineering stakeholders on reliability targets and trade-offs using SLOs and error budgets. • Partner regularly with Engineering Managers, product managers, Staff and Principal engineers, and platform leads to keep dependencies, decisions, and delivery aligned. • Identify systemic risks across shared dependencies and coordinate remediation across multiple teams to reduce recurring incidents. • Drive change adoption by embedding reliability mechanisms into partner team routines such as planning, PRRs, and on-call practices. • Design and implement reusable reliability mechanisms, templates, and tooling that can be adopted across teams. • Establish and evolve production readiness review practices with partner teams to improve launch quality and change safety. • Drive observability strategy for partner domains by improving signal quality, alerting philosophy, and operational dashboards. • Lead complex incident investigations and ensure learnings translate into durable fixes with clear owners and verification. • Lead reliability-focused design and code reviews and guide teams toward simpler, safer architectures. • Mentor Senior engineers and other consultants through pairing, reviews, and structured coaching to multiply impact. • Partner with internal platform engineering to influence roadmaps and deliver shared capabilities that accelerate SRE adoption. • Improve CSRE Consulting playbooks and operating practices based on repeated patterns observed across teams.

🎯 Requirements

• Deep practical understanding of SRE principles, including SLO governance and error budget policy in practice. • Proven ability to lead cross-team technical work and influence without authority. • Strong experience designing and troubleshooting distributed systems with cross-service failure modes. • Experience shaping observability and alerting strategy and improving operational signal quality. • Strong Kubernetes and AWS experience, including governance and cost trade-offs. • Ability to design reliability automation and tooling that is reusable and adopted by multiple teams. • Experience leading production readiness and resilience practices, including DR validation and controlled testing. • Strong software engineering fundamentals with the ability to deliver and review high-quality changes in enterprise codebases. • Advanced incident analysis skills focused on systemic risk reduction and organizational learning. • Excellent communication skills, including exec-ready summaries and clear technical diagrams.

🏖️ Benefits

• Medical, vision, dental and mental health benefits for you and your family, with access to a health care concierge, and Flexible or Health Savings Accounts (FSA or HSA) • Free concert tickets, generous paid time off including paid holidays, sick time, and personal days • 401(k) program with company match, stock reimbursement program • New parent programs including caregiver leave, plus fertility, adoption, foster, or surrogacy support • Career and skill development programs with School of Live, tuition reimbursement, and student loan repayment • Volunteer time off, crowdfunding match

Apply Now

Similar Jobs

🕒 May 1

Live Nation Entertainment

10,000+ employees

📱 Media

Lead Site Reliability Engineer consulting across multiple teams within Ticketmaster's global services. Aligning stakeholders and driving reliability improvements through established mechanisms.

🇺🇸 United States – Remote

💰 Post-IPO Debt on 2023-01

⏰ Full Time

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

🕒 May 1

Meduit | Driving Revenue Cycle Performance

1001 - 5000

⚕️ Healthcare Insurance

🤖 Artificial Intelligence

☁️ SaaS

DevOps Software Configuration Engineer building and maintaining CI/CD pipelines for Java-based applications at Meduit. Collaborating with Engineering, QA, and Application Support teams to ensure reliable software delivery.

🇺🇸 United States – Remote

💵 $130k - $145k / year

⏰ Full Time

🟡 Mid-level

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

🕒 May 1

Meduit | Driving Revenue Cycle Performance

1001 - 5000

🤝 B2B

🤖 Artificial Intelligence

☁️ SaaS

DevOps Engineer responsible for automated build and deployment in healthcare revenue cycle management. Collaborating with cross-functional teams to support modern deployment practices across AWS and Azure.

🇺🇸 United States – Remote

💵 $130k - $145k / year

⏰ Full Time

🟡 Mid-level

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

🕒 May 1

BakerHostetler

1001 - 5000

📋 Compliance

🏢 Enterprise

🤝 B2B

Database Reliability Engineer at BakerHostetler to enhance firm’s data ecosystem across hybrid environments. Ensuring availability, performance, security, and disaster recovery readiness for critical database systems.

🇺🇸 United States – Remote

💵 $120k - $140k / year

⏰ Full Time

🟡 Mid-level

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

🕒 April 30

Juul Labs

1001 - 5000

👥 B2C

🛒 Retail

🧘 Wellness

Senior Site Reliability Engineer managing operational stability and performance of Juul's hybrid cloud infrastructure. Leading automation efforts and architecting for reliability in critical incidents.