Site Reliability Engineer, Production Reliability

🔥 20 hours ago

Apply Now
Find Similar Remote Jobs

📊 Check your resume score for this job

Improve your chances of getting an interview by checking your resume score before you apply.

Logo of Yelp

Yelp

1001 - 5000 employees

Founded 2004

Yelp is a platform that connects consumers with local businesses, allowing users to discover and review a wide variety of services including restaurants, home services, and automotive services. It aims to help consumers find trusted recommendations for goods, services, and experiences in their local area, while offering business owners tools to manage customer interactions and promote their offerings.

📋 Description

• Build and manage scalable, self-healing, globally-distributed systems. • Keep Yelp fast, available, and growing. • Implement key parts of the core architecture and support developers. • Empower Yelp: spinning up infrastructure should always be a git commit and a code review away, with automation and self-service being at the core of what we do. • Troubleshoot site issues using industry-leading tools like Splunk, Grafana, and Prometheus. • Automate everything with Python, Puppet, Git, Jenkins, Terraform and more! • Develop custom tools, when off-the-shelf solutions don’t work at our scale and contribute upstream to open source projects. • Design and implement new systems, tests, and procedures. • Participate in light on-call rotations - we have geographically distributed SRE teams for follow-the-sun support.

🎯 Requirements

• Mastery of Linux (we use Ubuntu but any distro is fine) • Command of your favorite modern programming language to appreciate delivering safe and secure services: Python, Typescript, Ruby, Go, Rust, Java, C++, etc. • A solid understanding of Internet fundamental technologies in delivering services on the Internet (TCP/IP, HTTP, DNS, etc). • Experience with public cloud platforms (we use AWS and GCP, but others are also fine) and related tooling (Terraform, Puppet, Chef, Ansible etc.). • Experience with Linux containerisation and orchestration (e.g., Docker, Podman and Kubernetes). • Self-motivated to investigate, fix and improve Yelp in an ever changing environment. • Leading, Collaborating and Sharing technical activities with global teams. • Own the total lifecycle of a system.

🏖️ Benefits

• Health insurance • Retirement plans • Paid time off • Flexible work arrangements • Professional development

Apply Now

Similar Jobs

🕒 4 days ago

Netomi

51 - 200

🤖 Artificial Intelligence

🏢 Enterprise

☁️ SaaS

Agentic AI Forward Deployment Engineering Lead at Netomi transforming enterprise customer requirements into production-grade AI solutions. Collaborating with teams to ensure successful deployments and measurable business outcomes.

Distributed Systems

🕒 5 days ago

Vista

5001 - 10000

🤝 B2B

🛍️ eCommerce

Site Reliability Engineer enhancing incident response and engineering practices for Vista's reliability. Focused on identifying failure patterns and implementing proactive improvements for operational excellence.

AWS

Azure

Cloud

Grafana

Java

Python

TypeScript

Go

🕒 6 days ago

Pragmatike

11 - 50

🎯 Recruiter

👥 HR Tech

🤝 B2B

SRE / Network Engineer focused on Metal-as-a-Service and bare-metal automation for innovative cloud infrastructure. Supporting core infrastructure systems and scalable networks in a remote environment.

Ansible

Grafana

Linux

OpenStack

Prometheus

Python

VMware

🕒 June 17

Pragmatike

11 - 50

🎯 Recruiter

👥 HR Tech

🤝 B2B

SRE / Network Engineer working remotely for a European deep-tech cloud computing company. Responsible for maintaining infrastructure systems and automating processes across distributed environments.

Ansible

Cloud

Grafana

Linux

OpenStack

Prometheus

Python

VMware

🕒 June 16

Intrahealth, a HEALWELL AI Company

51 - 200

⚕️ Healthcare Insurance

☁️ SaaS

🤖 Artificial Intelligence

DevOps Engineer at Intrahealth working on Kubernetes and CI/CD for healthcare data solutions. Focused on AI-augmented development and collaboration with global teams.

AWS

Azure

Cloud

DNS

Flux

Google Cloud Platform

Grafana

Kubernetes

Prometheus

Python

Terraform

Go