Staff SRE, Ads

Job not on LinkedIn

🔥 0 minutes ago

Apply Now
Find Similar Remote Jobs

📊 Check your resume score for this job

Improve your chances of getting an interview by checking your resume score before you apply.

Logo of Reddit, Inc.

Reddit, Inc.

501 - 1000 employees

Founded 2005

👥 B2C

📱 Media

🌍 Social Impact

B2C • Media • Social Impact

Reddit, Inc. is a social media platform that acts as a hub for thousands of communities, where users can engage in diverse conversations ranging from breaking news to niche interests. It enables users to post, comment, and vote on content, fostering a vibrant online community. Millions of people globally connect and share their passions on Reddit, creating a dynamic environment for authentic human interaction.

📋 Description

• Lead reliability initiatives across multiple Ads domains including ad serving, auctions, targeting, reporting, measurement, and billing. • Partner with engineering leadership to improve reliability, scalability, operational excellence, and engineering efficiency across the Ads organization. • Drive architecture reviews and influence technical decisions impacting critical revenue-generating systems. • Design and build platforms, tooling, and automation that improve reliability and developer productivity at scale. • Participate in on-call rotations, lead complex incident investigations and coordinate cross-functional response efforts during major production events. • Identify systemic reliability risks and drive long-term solutions that improve platform resilience. • Establish reliability metrics around advertiser-critical user journeys such as campaign creation, ad delivery, auction participation, reporting, attribution, and billing. • Mentor engineers and provide technical leadership across multiple teams. • Influence roadmap planning and ensure reliability considerations are incorporated into product and infrastructure investments.

🎯 Requirements

• 8+ years of experience in Site Reliability Engineering, Infrastructure Engineering, or related roles operating large scale distributed systems. • Strong experience supporting high traffic, user facing production environments. • Deep understanding of distributed systems, networking, Linux systems, cloud native architectures. • Experience designing highly available systems with strong operational and reliability practices. • Strong understanding of observability systems including metrics, logging, tracing, and alerting. • Good programming skills in languages such as Go, Python, or similar. • Experience improving reliability through SLOs, automation, incident management, and performance optimization. • Demonstrated ability to troubleshoot complex issues across a modern distributed system stack. • Strong collaboration and communication skills with the ability to influence technical direction across teams.

🏖️ Benefits

• Global Benefit programs that fit your lifestyle, from workspace to professional development to caregiving support • Family Planning Support • Gender-Affirming Care • Mental Health & Coaching Benefits • Private Medical, Dental, and Vision Benefits • Personal Retirement Savings Account with matching contribution • Cycle to Work and Tax Saver schemes • Flexible Vacation & Paid Volunteer Time Off • Generous Paid Parental Leave

Apply Now

Similar Jobs

🕒 January 22

Extreme Networks

1001 - 5000

📡 Telecommunications

🏢 Enterprise

🔐 Security

Cloud Operations Engineer at Extreme Networks building scalable cloud solutions. Collaborating on multi-cloud environments and driving operational excellence in cloud services.

AWS

Cloud

Distributed Systems

ElasticSearch

Flux

Grafana

Kafka

Kubernetes

Linux

Microservices

Postgres

Prometheus

RabbitMQ

Redis

Terraform