Search Remote Jobs

Senior SRE, Ads

Job not on LinkedIn

🔥 0 minutes ago

Apply Now
Find Similar Remote Jobs

📊 Check your resume score for this job

Improve your chances of getting an interview by checking your resume score before you apply.

Logo of Reddit, Inc.

Reddit, Inc.

501 - 1000 employees

Founded 2005

👥 B2C

📱 Media

🌍 Social Impact

B2C • Media • Social Impact

Reddit, Inc. is a social media platform that acts as a hub for thousands of communities, where users can engage in diverse conversations ranging from breaking news to niche interests. It enables users to post, comment, and vote on content, fostering a vibrant online community. Millions of people globally connect and share their passions on Reddit, creating a dynamic environment for authentic human interaction.

📋 Description

• Partner with Ads Engineering teams to improve reliability, scalability, and operational excellence of ad-serving, auction, targeting, measurement, and billing systems. • Design, build, and maintain infrastructure, tooling, and automation that improve service reliability and engineering productivity. • Improve observability through monitoring, alerting, tracing, logging, and dashboards. • Participate in on-call rotations and lead incident response efforts for critical production systems. • Run root cause analysis and drive corrective actions following incidents. • Collaborate with software engineers throughout the service lifecycle, from design reviews through production operations. • Drive adoption of SRE best practices including SLIs, SLOs, error budgets, capacity planning, and operational readiness reviews. • Reduce operational toil through automation and self-service tooling. • Help define and measure advertiser-critical user journeys such as campaign creation, ad delivery, reporting, and billing. • Scale Ads systems to support continued traffic growth, increased advertiser demand, and evolving business requirements.

🎯 Requirements

• 5+ years of experience in Site Reliability Engineering, Infrastructure Engineering, or related roles operating large scale distributed systems. • Strong experience supporting high traffic, user facing production environments. • Good understanding of distributed systems, networking, Linux systems, cloud native architectures. • Good programming skills in languages such as Go, Python, or similar. • Demonstrated ability to troubleshoot complex issues across applications, infrastructure, networking, and services. • Experience with observability platforms, monitoring systems, alerting, and incident response. • Experience driving automation and operational improvements.

🏖️ Benefits

• Global Benefit programs that fit your lifestyle, from workspace to professional development to caregiving support • Family Planning Support • Gender-Affirming Care • Mental Health & Coaching Benefits • Group Personal Pension Scheme with Employer match • Private Medical and Dental Scheme • Income Replacement Programs • Bike to Work scheme • Flexible Vacation & Paid Volunteer Time Off • Generous Paid Parental Leave

Apply Now

Similar Jobs

🔥 9 hours ago

NBCUniversal

10,000+ employees

📱 Media

Site Reliability Engineer managing full-stack lifecycle support for digital asset delivery systems. Working collaboratively with DevOps teams to ensure performance and reliability.

🔥 9 hours ago

NBCUniversal

10,000+ employees

📱 Media

SRE Production Support role in AIOps group at NBCUniversal focusing on digital media and software services. Responsible for lifecycle support of digital asset delivery systems with a strong focus on performance optimization.

🔥 18 hours ago

Your Bourse

11 - 50

💳 Fintech

🤝 B2B

Site Reliability Engineer managing and scaling global infrastructure for Your Bourse. Involving hands-on administration of Linux servers and automation for high availability.

🗣️🇷🇺 Russian Required

🕒 2 days ago

Artificial Labs

11 - 50

🤖 Artificial Intelligence

☁️ SaaS

Support system reliability and operability, contributing to cloud infrastructure at an insurtech company. Work with containerised systems and ensure observability and incident response in a distributed team.

🕒 2 days ago

Intapp

1001 - 5000

☁️ SaaS

💸 Finance

🤖 Artificial Intelligence

Site Reliability Engineer at Intapp ensuring reliability of Azure-based cloud platform. Using AI to improve reliability and automation in compliance-heavy environments for financial and legal firms.