Search Remote Jobs

Staff SRE, Ads

Job not on LinkedIn

🔥 0 minutes ago

Apply Now
Find Similar Remote Jobs

📊 Check your resume score for this job

Improve your chances of getting an interview by checking your resume score before you apply.

Logo of Reddit, Inc.

Reddit, Inc.

501 - 1000 employees

Founded 2005

👥 B2C

📱 Media

🌍 Social Impact

B2C • Media • Social Impact

Reddit, Inc. is a social media platform that acts as a hub for thousands of communities, where users can engage in diverse conversations ranging from breaking news to niche interests. It enables users to post, comment, and vote on content, fostering a vibrant online community. Millions of people globally connect and share their passions on Reddit, creating a dynamic environment for authentic human interaction.

📋 Description

• Lead reliability initiatives across multiple Ads domains including ad serving, auctions, targeting, reporting, measurement, and billing. • Partner with engineering leadership to improve reliability, scalability, operational excellence, and engineering efficiency across the Ads organization. • Drive architecture reviews and influence technical decisions impacting critical revenue-generating systems. • Design and build platforms, tooling, and automation that improve reliability and developer productivity at scale. • Participate in on-call rotations, lead complex incident investigations and coordinate cross-functional response efforts during major production events. • Identify systemic reliability risks and drive long-term solutions that improve platform resilience. • Establish reliability metrics around advertiser-critical user journeys such as campaign creation, ad delivery, auction participation, reporting, attribution, and billing. • Mentor engineers and provide technical leadership across multiple teams. • Influence roadmap planning and ensure reliability considerations are incorporated into product and infrastructure investments.

🎯 Requirements

• 8+ years of experience in Site Reliability Engineering, Infrastructure Engineering, or related roles operating large scale distributed systems. • Strong experience supporting high traffic, user facing production environments. • Deep understanding of distributed systems, networking, Linux systems, cloud native architectures. • Experience designing highly available systems with strong operational and reliability practices. • Strong understanding of observability systems including metrics, logging, tracing, and alerting. • Good programming skills in languages such as Go, Python, or similar. • Experience improving reliability through SLOs, automation, incident management, and performance optimization. • Demonstrated ability to troubleshoot complex issues across a modern distributed system stack. • Strong collaboration and communication skills with the ability to influence technical direction across teams.

🏖️ Benefits

• Global Benefit programs that fit your lifestyle, from workspace to professional development to caregiving support • Family Planning Support • Gender-Affirming Care • Mental Health & Coaching Benefits • Group Personal Pension Scheme with Employer match • Private Medical and Dental Scheme • Income Replacement Programs • Bike to Work scheme • Flexible Vacation & Paid Volunteer Time Off • Generous Paid Parental Leave

Apply Now

Similar Jobs

🕒 June 11

Advanced Solutions International, Inc.

201 - 500

🤝 B2B

🤝 Non-profit

DevOps Reliability Engineer ensuring performance, scalability, and reliability of Azure-based SaaS platform at ASI. Collaborating with engineering teams to improve system efficiency and resilience.

🕒 June 2

Ohalo

11 - 50

📋 Compliance

🔐 Security

🏢 Enterprise

DevOps engineer optimizing deployment and CI/CD processes at Ohalo, a data protection startup. Collaborating with Engineering teams to enable better solutions and protect client data rights.

🕒 May 26

Intermedia Cloud Communications

1001 - 5000

🤝 B2B

🏢 Enterprise

☁️ SaaS

Principal DevOps Engineer serving as technical lead and architect for infrastructure, automation, and deployments in cloud communications provider. Focused on reliability, standards, and cross-platform initiatives.

🕒 May 12

Menlo Security Inc.

201 - 500

🔒 Cybersecurity

🏢 Enterprise

Principal Platform Infrastructure Engineer designing and operating Menlo Security's infrastructure platform across multiple environments. Collaborating with global teams and leveraging cloud-native technologies like Google Kubernetes Engine and Terraform.

🕒 April 28

Runware

11 - 50

🤖 Artificial Intelligence

🔌 API

📱 Media

Staff/Senior DevOps Engineer at Runware enhancing infrastructure for real-time AI inference. Focus on automation, observability, and scaling to meet growing demands.

DNS

Firewalls

Linux

TCP/IP