
1001 - 5000 employees
Founded 2004
Yelp is a platform that connects consumers with local businesses, allowing users to discover and review a wide variety of services including restaurants, home services, and automotive services. It aims to help consumers find trusted recommendations for goods, services, and experiences in their local area, while offering business owners tools to manage customer interactions and promote their offerings.
🕒 March 3
🇨🇦 Canada – Remote
💵 $135k - $185k / year
⏰ Full Time
🟡 Mid-level
🟠 Senior
⛑ DevOps & Site Reliability Engineer (SRE)
Improve your chances of getting an interview by checking your resume score before you apply.

1001 - 5000 employees
Founded 2004
Yelp is a platform that connects consumers with local businesses, allowing users to discover and review a wide variety of services including restaurants, home services, and automotive services. It aims to help consumers find trusted recommendations for goods, services, and experiences in their local area, while offering business owners tools to manage customer interactions and promote their offerings.
• Design, deploy, and maintain large-scale Kafka event streaming infrastructure across hybrid and multi-cloud environments • Collaborate with engineers to enable new features, ensure data pipeline reliability, and advise on best practices for real-time data processing • Execute and automate Kafka cluster upgrades, migrations, and major version rollouts with minimal impact to critical services • Build or enhance self-service capabilities and automation for cluster operations, scaling, and incident recovery • Troubleshoot complex issues affecting data flow, performance, or stability, and drive root cause analyses • Participate in on-call rotations.
• Strong hands-on experience designing and implementing large-scale Kafka event streaming capabilities in production, across hybrid or multi-cloud and Linux environments • In-depth knowledge of event streaming/data-in-motion design principles, architecture, and operational nuances • Programming proficiency in Java, Python, or similar modern languages for tooling, integration, and automation • Familiarity with Kafka Client APIs (Producer, Consumer, Streams), as well as sizing and capacity planning for high-throughput clusters • Experience designing and optimizing real-time data streaming solutions with technologies like Apache Flink • Knowledge of automating infrastructure and operational tasks (configuration management, IaC, scripting, or related) • Problem-solving mindset with an eagerness to learn, take initiative, and advocate for infrastructure best practices in a fast-paced environment.
• Health insurance • 401(k) matching • Flexible work hours • Paid time off • Professional development opportunities
Apply Now🕒 February 26
DevOps Engineer focusing on infrastructure and applications supporting valuations and trade data at S&P Global. Collaborating with Development, Testing and Client Services teams to improve service availability.
AWS
Chef
Cloud
DynamoDB
EC2
Java
JavaScript
Linux
MySQL
NoSQL
PHP
Postgres
Puppet
Python
SQL
Terraform
Unix
🕒 February 20
DevOps Engineer managing and scaling cloud infrastructure and services for a global technology organization. Collaborating with IT teams across multiple regions to ensure operational excellence.
AWS
Azure
Cloud
DNS
Firewalls
Linux
MacOS
Terraform
🕒 February 18
DevOps Engineer developing functional systems that improve customer experience for S&P Global's applications. Responsibilities include automation, monitoring and maintaining infrastructure using cutting-edge technologies.
AWS
Chef
Cloud
DynamoDB
EC2
Java
JavaScript
Linux
MySQL
NoSQL
PHP
Postgres
Puppet
Python
SQL
Terraform
Unix
🕒 February 4
Senior Site Reliability Engineer ensuring reliability and performance of Vantage’s services while collaborating across teams. Engaging in incident response and driving infrastructure improvements.
🇨🇦 Canada – Remote
💵 CA$150k - CA$175k / year
💰 Series unknown on 2016-02
⏰ Full Time
🟠 Senior
⛑ DevOps & Site Reliability Engineer (SRE)
Ansible
AWS
Azure
Python
Terraform
🕒 January 13
Site Reliability Engineer joining Cohere to build and operate high-performance AI platforms for NLP applications. Collaborating with teams to deploy optimized models in production environments.
AWS
Azure
Cloud
Distributed Systems
Google Cloud Platform
Kubernetes
Linux
Go