Staff SRE, Ads

Job not on LinkedIn

🔥 0 minutes ago

Apply Now
Find Similar Remote Jobs

📊 Check your resume score for this job

Improve your chances of getting an interview by checking your resume score before you apply.

Logo of Reddit, Inc.

Reddit, Inc.

501 - 1000 employees

Founded 2005

👥 B2C

📱 Media

🌍 Social Impact

B2C • Media • Social Impact

Reddit, Inc. is a social media platform that acts as a hub for thousands of communities, where users can engage in diverse conversations ranging from breaking news to niche interests. It enables users to post, comment, and vote on content, fostering a vibrant online community. Millions of people globally connect and share their passions on Reddit, creating a dynamic environment for authentic human interaction.

📋 Description

• Lead reliability initiatives across multiple Ads domains including ad serving, auctions, targeting, reporting, measurement, and billing. • Partner with engineering leadership to improve reliability, scalability, operational excellence, and engineering efficiency across the Ads organization. • Drive architecture reviews and influence technical decisions impacting critical revenue-generating systems. • Design and build platforms, tooling, and automation that improve reliability and developer productivity at scale. • Participate in on-call rotations, lead complex incident investigations and coordinate cross-functional response efforts during major production events. • Identify systemic reliability risks and drive long-term solutions that improve platform resilience. • Establish reliability metrics around advertiser-critical user journeys such as campaign creation, ad delivery, auction participation, reporting, attribution, and billing. • Mentor engineers and provide technical leadership across multiple teams. • Influence roadmap planning and ensure reliability considerations are incorporated into product and infrastructure investments.

🎯 Requirements

• 8+ years of experience in Site Reliability Engineering, Infrastructure Engineering, or related roles operating large scale distributed systems. • Strong experience supporting high traffic, user facing production environments. • Deep understanding of distributed systems, networking, Linux systems, cloud native architectures. • Experience designing highly available systems with strong operational and reliability practices. • Strong understanding of observability systems including metrics, logging, tracing, and alerting. • Good programming skills in languages such as Go, Python, or similar. • Experience improving reliability through SLOs, automation, incident management, and performance optimization. • Demonstrated ability to troubleshoot complex issues across a modern distributed system stack. • Strong collaboration and communication skills with the ability to influence technical direction across teams.

🏖️ Benefits

• Global Benefit programs that fit your lifestyle, from workspace to professional development to caregiving support • Family Planning Support • Gender-Affirming Care • Mental Health & Coaching Benefits • Private Pension plan with Employer-matching • 100% employer-sponsored group medical plan • Income Replacement Programs • Flexible Vacation & Paid Volunteer Time Off • Generous Paid Parental Leave

Apply Now