Senior Site Reliability Engineer

51 - 200 employees

Founded 2016

☁️ SaaS

🏢 Enterprise

🤖 Artificial Intelligence

SaaS • Enterprise • Artificial Intelligence

ClickHouse is a fast and resource-efficient real-time data warehouse and open-source database that is designed to deliver superior query performance for mission-critical and time-sensitive applications. It is available as a cloud service on major platforms like AWS, GCP, and Azure, with a "Bring Your Own Cloud" option and a wide range of integrations for seamless operation within diverse tech stacks. ClickHouse excels in real-time analytics, machine learning, business intelligence, and observability, making it an ideal choice for tasks such as financial services, fraud detection, and gaming analytics. It supports developer-friendly SQL operations, offers cost-effective storage solutions, and provides an open-source alternative to traditional databases. Companies like Sony, Lyft, Cisco, GitLab, and Twilio leverage ClickHouse for its scalability, efficiency, and ease of use.

Senior Site Reliability Engineer

🕒 March 12

🇺🇸 United States – Remote

💵 $141k - $208k / year

⏰ Full Time

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

🦅 H1B Visa Sponsor

Ansible

AWS

Azure

Cloud

Docker

Google Cloud Platform

Kubernetes

Puppet

Python

SQL

Terraform

Apply Now

Find Similar Remote Jobs

📊 Check your resume score for this job

Improve your chances of getting an interview by checking your resume score before you apply.

ClickHouse

51 - 200 employees

Founded 2016

☁️ SaaS

🏢 Enterprise

🤖 Artificial Intelligence

SaaS • Enterprise • Artificial Intelligence

📋 Description

• Collaborate with various engineering teams in ClickHouse to design and implement scalable, secure, and highly available systems for ClickHouse. • Establish and manage service level objectives (SLOs) and service level agreements (SLAs) for ClickHouse Cloud. • Ensure all the infrastructure components in ClickHouse Cloud (including Dataplane, Control Plane and ClickHouse Core) have monitoring and alerting in place to ensure timely detection and resolution of incidents. • Enhance and refine incident response processes and post-mortem analysis for any outages in ClickHouse Cloud including working with the support team to communicate to the impacted customers. • Continuously improve the reliability and performance of our ClickHouse services. • Plan, enable, and drive Chaos initiatives across Engineering teams, based upon internal priorities. • Manage on-call processes to respond to performance and reliability issues, and establish best practices for coordinating escalation to resolve issues and minimize downtime.

🎯 Requirements

• Bachelor’s or Master’s degree in Computer Science or a related field. • At least 8 years of experience in Site Reliability Engineering or a related field. • Previous experience using ClickHouse in production. • Hands on experience with Go and/or Python. • Strong knowledge of cloud computing platforms such as AWS, Azure, or Google Cloud Platform. • Excellent understanding of distributed databases and SQL, particularly ClickHouse is a major plus. • Hands on experience with container orchestration tools such as Kubernetes or Docker Swarm. • Strong experience with automation and configuration management tools such as Ansible, Terraform, or Puppet. • You are a strong problem solver and have solid production debugging skills. • You are passionate about efficiency, availability, scalability, and data governance. • You thrive in a fast paced environment, and see yourself as a partner with the business with the shared goal of moving the business forward. • You have a high level of responsibility, ownership, and accountability. • Excellent communication and interpersonal skills.

🏖️ Benefits

• Flexible work environment - ClickHouse is a globally distributed company and remote-friendly. We currently operate in 20 countries. • Healthcare - Employer contributions towards your healthcare. • Equity in the company - Every new team member who joins our company receives stock options. • Time off - Flexible time off in the US, generous entitlement in other countries. • A $500 Home office setup if you’re a remote employee. • Global Gatherings – We believe in the power of in-person connection and offer opportunities to engage with colleagues at company-wide offsites.

Apply Now

Similar Jobs

Site Reliability Engineer – AI & ML Infrastructure, Kubernetes, AWS, Terraform

🕒 March 10

Deepgram

51 - 200

💼 Consulting

🏥 Healthcare

📦 Logistics

Site Reliability Engineer managing AI/ML infrastructure for Deepgram. Architecting, building, and optimizing hybrid systems with Kubernetes, AWS, and Terraform.

🇺🇸 United States – Remote

💵 $150k - $220k / year

💰 $47M Series B on 2022-11

⏰ Full Time

🟡 Mid-level

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

🦅 H1B Visa Sponsor

AWS

Kubernetes

Python

Terraform

Expert DevOps, DevSecOps, GenAI

🕒 March 7

Inetum

10,000+ employees

💼 Consulting

🏥 Healthcare

🛡️ Insurance

Expert DevOps / DevSecOps supporting Generative AI initiatives at Inetum for digital transformation in the United States. Designing high-value GenAI use cases and integrating new tools and practices.

🇺🇸 United States – Remote

💰 Post-IPO Equity on 2007-03

⏰ Full Time

🟠 Senior

🔴 Lead

⛑ DevOps & Site Reliability Engineer (SRE)

🗣️🇫🇷 French Required

Cloud

Open Source

Design and Release Engineer, Glazing

🕒 March 4

ALTEN Technology USA

501 - 1000

💼 Consulting

🎖️ Defense

🏥 Healthcare

Design and Release Engineer developing vehicle components and systems from concept to production at ALTEN Technology USA.

🇺🇸 United States – Remote

💵 $115k - $135k / year

⏰ Full Time

🟡 Mid-level

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

🦅 H1B Visa Sponsor

Senior DevOps Engineer

🕒 March 4

Akamai Technologies

5001 - 10000

🔒 Cybersecurity

🏢 Enterprise

📱 Media

Senior II DevOps Engineer developing and maintaining cloud infrastructures and web applications for top-tier security solutions. Engaging with highly skilled colleagues in a dynamic learning environment.

🇺🇸 United States – Remote

💵 $112.5k - $202.5k / year

⏰ Full Time

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

AWS

Azure

Cloud

Docker

Google Cloud Platform

Jenkins

Kubernetes

Linux

Microservices

Python

Terraform

VMware

DevOps Engineer

🕒 March 3

Tiger Resourcing Group

11 - 50

🎖️ Defense

💼 Consulting

📦 Logistics