Database Reliability Engineer – Core Team

51 - 200 employees

Founded 2016

☁️ SaaS

🏢 Enterprise

🤖 Artificial Intelligence

SaaS • Enterprise • Artificial Intelligence

ClickHouse is a fast and resource-efficient real-time data warehouse and open-source database that is designed to deliver superior query performance for mission-critical and time-sensitive applications. It is available as a cloud service on major platforms like AWS, GCP, and Azure, with a "Bring Your Own Cloud" option and a wide range of integrations for seamless operation within diverse tech stacks. ClickHouse excels in real-time analytics, machine learning, business intelligence, and observability, making it an ideal choice for tasks such as financial services, fraud detection, and gaming analytics. It supports developer-friendly SQL operations, offers cost-effective storage solutions, and provides an open-source alternative to traditional databases. Companies like Sony, Lyft, Cisco, GitLab, and Twilio leverage ClickHouse for its scalability, efficiency, and ease of use.

Database Reliability Engineer – Core Team

🕒 April 2

🇬🇧 United Kingdom – Remote

⏰ Full Time

🟡 Mid-level

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

AWS

Azure

Cloud

Google Cloud Platform

Python

SQL

Apply Now

Find Similar Remote Jobs

📊 Check your resume score for this job

Improve your chances of getting an interview by checking your resume score before you apply.

ClickHouse

51 - 200 employees

Founded 2016

☁️ SaaS

🏢 Enterprise

🤖 Artificial Intelligence

SaaS • Enterprise • Artificial Intelligence

📋 Description

• Continuously improve the reliability and performance of ClickHouse core. • Improve and create metrics and alerts for ClickHouse to be able to identify and prevent problems in production before they affect customers. • Dig deeper into the most common problems encountered by customers in ClickHouse Core to identify the root cause of problems and submit bug fixes, issue reports and suggest improvements. • Enhance and refine incident response processes and post-mortem analysis for ClickHouse core related outages including working with support and Cloud teams to communicate to the impacted customers. • Plan, enable, and drive Chaos initiatives across Engineering teams, based upon internal priorities. • Manage on-call processes to respond to performance and reliability issues, and establish best practices for coordinating escalation to resolve issues and minimize customer impact.

🎯 Requirements

• Bachelor’s or Master’s degree in Computer Science or a related field. • At least 5 years of experience in Reliability Engineering, QA or customer facing engineering. • Previous experience operating ClickHouse or other SQL databases in production. • Excellent understanding of distributed database internals and SQL, particularly ClickHouse is a major plus. • Scripting experience with Shell or Python, and ability to read and understand C++ code. • Knowledge of cloud computing platforms such as AWS, Azure, or Google Cloud Platform. • You are a strong problem-solver and have solid production debugging skills. • You thrive in a fast-paced environment as part of a global team, and you see yourself as a partner with the business with the shared goal of moving the business forward. • You have a high level of responsibility, ownership, and accountability. • Excellent communication skills.

🏖️ Benefits

• Flexible work environment - ClickHouse is a globally distributed company and remote-friendly. We currently operate in 20 countries. • Healthcare - Employer contributions towards your healthcare. • Equity in the company - Every new team member who joins our company receives stock options. • Time off - Flexible time off in the US, generous entitlement in other countries. • A $500 Home office setup if you’re a remote employee. • Global Gatherings – We believe in the power of in-person connection and offer opportunities to engage with colleagues at company-wide offsites.

Apply Now

Similar Jobs

DevOps Engineer

🕒 April 1

Prima

1001 - 5000

💸 Finance

👥 B2C

DevOps Engineer in Infrastructure team leveraging data and tech for innovative motor insurance solutions. Join over 300 engineers for impactful scalable systems.

🇬🇧 United Kingdom – Remote

💰 $115.8M Series A on 2018-11

⏰ Full Time

🟡 Mid-level

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

AWS

Distributed Systems

DNS

Kubernetes

Microservices

Python

Terraform

Senior Site Reliability Engineer

🕒 April 1

Prima

1001 - 5000

💸 Finance

👥 B2C

Senior Site Reliability Engineer shaping the future of motor insurance at a leading provider. Collaborating across engineering teams to build reliable and scalable systems.

🇬🇧 United Kingdom – Remote

💰 $115.8M Series A on 2018-11

⏰ Full Time

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

AWS

Cloud

Distributed Systems

DNS

Kafka

Kubernetes

Microservices

Postgres

PySpark

Python

RabbitMQ

Redis

Terraform

Site Reliability Engineer

🕒 April 1

Fortyx

1 - 10

Site Reliability Engineer optimizing reliability, scalability, and performance for Luupli's AWS cloud infrastructure. Collaborating with teams to enhance automation and incident management.

🇬🇧 United Kingdom – Remote

⏰ Full Time

🟡 Mid-level

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

AWS

Cloud

EC2

Python

Terraform

Senior Site Reliability Engineer Manager

🕒 March 31

RemoteStar

11 - 50

🤝 B2B

🎯 Recruiter

☁️ SaaS

Senior Site Reliability Engineer Manager ensuring infrastructure and service reliability. Leading SRE team and driving operational excellence in a B2B diamond marketplace.

🇬🇧 United Kingdom – Remote

⏰ Full Time

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

AWS

Azure

Cloud

Google Cloud Platform

Grafana

Prometheus

Python

Senior DevOps Engineer

🕒 March 31

Whitespace Software

51 - 200

🔌 API

💸 Finance

Senior DevOps Engineer at WhiteSpace Technology managing cloud provisioning and high availability. Collaborating with developers and implementing CI/CD while ensuring system hardening and security.

🇬🇧 United Kingdom – Remote

⏰ Full Time

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

Ansible

Cloud

Grafana

Prometheus

Python