Database Reliability Engineer – Core Team

🕒 April 2

Apply Now
Find Similar Remote Jobs

📊 Check your resume score for this job

Improve your chances of getting an interview by checking your resume score before you apply.

Logo of ClickHouse

ClickHouse

51 - 200 employees

Founded 2016

☁️ SaaS

🏢 Enterprise

🤖 Artificial Intelligence

SaaS • Enterprise • Artificial Intelligence

ClickHouse is a fast and resource-efficient real-time data warehouse and open-source database that is designed to deliver superior query performance for mission-critical and time-sensitive applications. It is available as a cloud service on major platforms like AWS, GCP, and Azure, with a "Bring Your Own Cloud" option and a wide range of integrations for seamless operation within diverse tech stacks. ClickHouse excels in real-time analytics, machine learning, business intelligence, and observability, making it an ideal choice for tasks such as financial services, fraud detection, and gaming analytics. It supports developer-friendly SQL operations, offers cost-effective storage solutions, and provides an open-source alternative to traditional databases. Companies like Sony, Lyft, Cisco, GitLab, and Twilio leverage ClickHouse for its scalability, efficiency, and ease of use.

📋 Description

• Continuously improve the reliability and performance of ClickHouse core. • Improve and create metrics and alerts for ClickHouse to be able to identify and prevent problems in production before they affect customers. • Dig deeper into the most common problems encountered by customers in ClickHouse Core to identify the root cause of problems and submit bug fixes, issue reports and suggest improvements. • Enhance and refine incident response processes and post-mortem analysis for ClickHouse core related outages including working with support and Cloud teams to communicate to the impacted customers. • Plan, enable, and drive Chaos initiatives across Engineering teams, based upon internal priorities. • Manage on-call processes to respond to performance and reliability issues, and establish best practices for coordinating escalation to resolve issues and minimize customer impact.

🎯 Requirements

• Bachelor’s or Master’s degree in Computer Science or a related field. • At least 5 years of experience in Reliability Engineering, QA or customer facing engineering. • Previous experience operating ClickHouse or other SQL databases in production. • Excellent understanding of distributed database internals and SQL, particularly ClickHouse is a major plus. • Scripting experience with Shell or Python, and ability to read and understand C++ code. • Knowledge of cloud computing platforms such as AWS, Azure, or Google Cloud Platform. • You are a strong problem-solver and have solid production debugging skills. • You thrive in a fast-paced environment as part of a global team, and you see yourself as a partner with the business with the shared goal of moving the business forward. • You have a high level of responsibility, ownership, and accountability. • Excellent communication skills.

🏖️ Benefits

• Flexible work environment - ClickHouse is a globally distributed company and remote-friendly. We currently operate in 20 countries. • Healthcare - Employer contributions towards your healthcare. • Equity in the company - Every new team member who joins our company receives stock options. • Time off - Flexible time off in the US, generous entitlement in other countries. • A $500 Home office setup if you’re a remote employee. • Global Gatherings – We believe in the power of in-person connection and offer opportunities to engage with colleagues at company-wide offsites.

Apply Now

Similar Jobs

🕒 April 1

Prima Power

1001 - 5000

🚀 Aerospace

DevOps Engineer in Infrastructure team leveraging data and tech for innovative motor insurance solutions. Join over 300 engineers for impactful scalable systems.

AWS

Distributed Systems

DNS

Kubernetes

Microservices

Python

Terraform

🕒 April 1

Prima Power

1001 - 5000

🚀 Aerospace

Senior Site Reliability Engineer shaping the future of motor insurance at a leading provider. Collaborating across engineering teams to build reliable and scalable systems.

AWS

Cloud

Distributed Systems

DNS

Kafka

Kubernetes

Microservices

Postgres

PySpark

Python

RabbitMQ

Redis

Terraform

🕒 April 1

Fortyx

1 - 10

Site Reliability Engineer optimizing reliability, scalability, and performance for Luupli's AWS cloud infrastructure. Collaborating with teams to enhance automation and incident management.

AWS

Cloud

EC2

Python

Terraform

🕒 March 31

RemoteStar

11 - 50

🤝 B2B

🎯 Recruiter

☁️ SaaS

Senior Site Reliability Engineer Manager ensuring infrastructure and service reliability. Leading SRE team and driving operational excellence in a B2B diamond marketplace.

AWS

Azure

Cloud

Google Cloud Platform

Grafana

Prometheus

Python

Go

🕒 March 31

Keywords Studios

10,000+ employees

🎮 Gaming

📱 Media

🤖 Artificial Intelligence

Azure DevOps Engineer supporting Azure services for Keywords Group in the global Video Game Industry. Managing cloud solutions and leading projects in a remote environment.

AWS

Azure

Cloud

SQL