Senior SRE, Ads

Job not on LinkedIn

🔥 0 minutes ago

Apply Now
Find Similar Remote Jobs

📊 Check your resume score for this job

Improve your chances of getting an interview by checking your resume score before you apply.

Logo of Reddit, Inc.

Reddit, Inc.

501 - 1000 employees

Founded 2005

👥 B2C

📱 Media

🌍 Social Impact

B2C • Media • Social Impact

Reddit, Inc. is a social media platform that acts as a hub for thousands of communities, where users can engage in diverse conversations ranging from breaking news to niche interests. It enables users to post, comment, and vote on content, fostering a vibrant online community. Millions of people globally connect and share their passions on Reddit, creating a dynamic environment for authentic human interaction.

📋 Description

• Partner with Ads Engineering teams to improve reliability, scalability, and operational excellence of ad-serving, auction, targeting, measurement, and billing systems. • Design, build, and maintain infrastructure, tooling, and automation that improve service reliability and engineering productivity. • Improve observability through monitoring, alerting, tracing, logging, and dashboards. • Participate in on-call rotations and lead incident response efforts for critical production systems. • Run root cause analysis and drive corrective actions following incidents. • Collaborate with software engineers throughout the service lifecycle, from design reviews through production operations. • Drive adoption of SRE best practices including SLIs, SLOs, error budgets, capacity planning, and operational readiness reviews. • Reduce operational toil through automation and self-service tooling. • Help define and measure advertiser-critical user journeys such as campaign creation, ad delivery, reporting, and billing. • Scale Ads systems to support continued traffic growth, increased advertiser demand, and evolving business requirements.

🎯 Requirements

• 5+ years of experience in Site Reliability Engineering, Infrastructure Engineering, or related roles operating large scale distributed systems. • Strong experience supporting high traffic, user facing production environments. • Good understanding of distributed systems, networking, Linux systems, cloud native architectures. • Good programming skills in languages such as Go, Python, or similar. • Demonstrated ability to troubleshoot complex issues across applications, infrastructure, networking, and services. • Experience with observability platforms, monitoring systems, alerting, and incident response. • Experience driving automation and operational improvements.

🏖️ Benefits

• Global Benefit programs that fit your lifestyle, from workspace to professional development to caregiving support • Family Planning Support • Gender-Affirming Care • Mental Health & Coaching Benefits • Private Pension plan with Employer-matching • 100% employer-sponsored group medical plan • Income Replacement Programs • Flexible Vacation & Paid Volunteer Time Off • Generous Paid Parental Leave

Apply Now

Similar Jobs

🕒 June 8

Nebius Group

1001 - 5000

🏢 Enterprise

☁️ SaaS

Senior Site Reliability Engineer maintaining and growing systems for Nebius' AI cloud platform. Collaborating within a fast-paced SRE team to improve user experience.

Java

Kotlin

Python

Ruby

Spring

Unix

Go

🕒 April 21

NCR Corporation

10,000+ employees

🤝 B2B

💳 Fintech

🛒 Retail

DevSecOps Engineer securing cloud-native retail platform for NCR Voyix. Involves automation, security practices, and collaboration with various teams.

Android

Ansible

Azure

Cloud

Docker

Google Cloud Platform

Jenkins

Kubernetes

Linux

OpenStack

Prometheus

Terraform

🕒 April 21

Finom

501 - 1000

💳 Fintech

💸 Finance

🤝 B2B

Senior/Lead SRE Engineer at Finom, a European tech startup revolutionizing financial landscape for entrepreneurs. Focusing on Kubernetes-based platform in multi-cloud environment.

AWS

Cloud

Google Cloud Platform

Kubernetes

Prometheus

Terraform

🕒 April 2

ClickHouse

51 - 200

☁️ SaaS

🏢 Enterprise

🤖 Artificial Intelligence

Database Reliability Engineer responsible for reliability and performance of ClickHouse core services. Collaborating with teams for process improvements, investigations, and incident response.

AWS

Azure

Cloud

Google Cloud Platform

Python

SQL

🕒 April 2

Nebius Group

1001 - 5000

🏢 Enterprise

☁️ SaaS

Senior Site Reliability Engineer at Nebius ensuring fault-tolerance and uninterrupted service operations using cutting-edge cloud technology.

Ansible

Cloud

Docker

Kubernetes

Python

SaltStack

Terraform

Unix

Go