Site Reliability Engineer, Core Streaming

🕒 March 3

Apply Now
Find Similar Remote Jobs

📊 Check your resume score for this job

Improve your chances of getting an interview by checking your resume score before you apply.

Logo of Yelp

Yelp

1001 - 5000 employees

Founded 2004

Yelp is a platform that connects consumers with local businesses, allowing users to discover and review a wide variety of services including restaurants, home services, and automotive services. It aims to help consumers find trusted recommendations for goods, services, and experiences in their local area, while offering business owners tools to manage customer interactions and promote their offerings.

📋 Description

• Design, deploy, and maintain large-scale Kafka event streaming infrastructure across hybrid and multi-cloud environments • Collaborate with engineers to enable new features, ensure data pipeline reliability, and advise on best practices for real-time data processing • Execute and automate Kafka cluster upgrades, migrations, and major version rollouts with minimal impact to critical services • Build or enhance self-service capabilities and automation for cluster operations, scaling, and incident recovery • Troubleshoot complex issues affecting data flow, performance, or stability, and drive root cause analyses • Participate in on-call rotations.

🎯 Requirements

• Strong hands-on experience designing and implementing large-scale Kafka event streaming capabilities in production, across hybrid or multi-cloud and Linux environments • In-depth knowledge of event streaming/data-in-motion design principles, architecture, and operational nuances • Programming proficiency in Java, Python, or similar modern languages for tooling, integration, and automation • Familiarity with Kafka Client APIs (Producer, Consumer, Streams), as well as sizing and capacity planning for high-throughput clusters • Experience designing and optimizing real-time data streaming solutions with technologies like Apache Flink • Knowledge of automating infrastructure and operational tasks (configuration management, IaC, scripting, or related) • Problem-solving mindset with an eagerness to learn, take initiative, and advocate for infrastructure best practices in a fast-paced environment.

🏖️ Benefits

• Health insurance • 401(k) matching • Flexible work hours • Paid time off • Professional development opportunities

Apply Now

Similar Jobs

🕒 February 26

S&P Global

10,000+ employees

💸 Finance

🏢 Enterprise

🤖 Artificial Intelligence

DevOps Engineer focusing on infrastructure and applications supporting valuations and trade data at S&P Global. Collaborating with Development, Testing and Client Services teams to improve service availability.

AWS

Chef

Cloud

DynamoDB

EC2

Java

JavaScript

Linux

MySQL

NoSQL

PHP

Postgres

Puppet

Python

SQL

Terraform

Unix

🕒 February 20

Modaxo

1001 - 5000

🚗 Transport

☁️ SaaS

🤝 B2B

DevOps Engineer managing and scaling cloud infrastructure and services for a global technology organization. Collaborating with IT teams across multiple regions to ensure operational excellence.

AWS

Azure

Cloud

DNS

Firewalls

Linux

MacOS

Terraform

🕒 February 18

S&P Global

10,000+ employees

💸 Finance

🏢 Enterprise

🤖 Artificial Intelligence

DevOps Engineer developing functional systems that improve customer experience for S&P Global's applications. Responsibilities include automation, monitoring and maintaining infrastructure using cutting-edge technologies.

AWS

Chef

Cloud

DynamoDB

EC2

Java

JavaScript

Linux

MySQL

NoSQL

PHP

Postgres

Puppet

Python

SQL

Terraform

Unix

🕒 February 4

Vantage

51 - 200

☁️ SaaS

🤝 B2B

🛍️ eCommerce

Senior Site Reliability Engineer ensuring reliability and performance of Vantage’s services while collaborating across teams. Engaging in incident response and driving infrastructure improvements.

Ansible

AWS

Azure

Python

Terraform

🕒 January 13

Cohere

11 - 50

🤖 Artificial Intelligence

🏢 Enterprise

☁️ SaaS

Site Reliability Engineer joining Cohere to build and operate high-performance AI platforms for NLP applications. Collaborating with teams to deploy optimized models in production environments.

AWS

Azure

Cloud

Distributed Systems

Google Cloud Platform

Kubernetes

Linux

Go