Senior Site Reliability Engineer

🕒 March 12

🇺🇸 United States – Remote

💵 $141k - $208k / year

⏰ Full Time

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

🦅 H1B Visa Sponsor

info
Apply Now
Find Similar Remote Jobs

📊 Check your resume score for this job

Improve your chances of getting an interview by checking your resume score before you apply.

Logo of ClickHouse

ClickHouse

51 - 200 employees

Founded 2016

☁️ SaaS

🏢 Enterprise

🤖 Artificial Intelligence

SaaS • Enterprise • Artificial Intelligence

ClickHouse is a fast and resource-efficient real-time data warehouse and open-source database that is designed to deliver superior query performance for mission-critical and time-sensitive applications. It is available as a cloud service on major platforms like AWS, GCP, and Azure, with a "Bring Your Own Cloud" option and a wide range of integrations for seamless operation within diverse tech stacks. ClickHouse excels in real-time analytics, machine learning, business intelligence, and observability, making it an ideal choice for tasks such as financial services, fraud detection, and gaming analytics. It supports developer-friendly SQL operations, offers cost-effective storage solutions, and provides an open-source alternative to traditional databases. Companies like Sony, Lyft, Cisco, GitLab, and Twilio leverage ClickHouse for its scalability, efficiency, and ease of use.

📋 Description

• Collaborate with various engineering teams in ClickHouse to design and implement scalable, secure, and highly available systems for ClickHouse. • Establish and manage service level objectives (SLOs) and service level agreements (SLAs) for ClickHouse Cloud. • Ensure all the infrastructure components in ClickHouse Cloud (including Dataplane, Control Plane and ClickHouse Core) have monitoring and alerting in place to ensure timely detection and resolution of incidents. • Enhance and refine incident response processes and post-mortem analysis for any outages in ClickHouse Cloud including working with the support team to communicate to the impacted customers. • Continuously improve the reliability and performance of our ClickHouse services. • Plan, enable, and drive Chaos initiatives across Engineering teams, based upon internal priorities. • Manage on-call processes to respond to performance and reliability issues, and establish best practices for coordinating escalation to resolve issues and minimize downtime.

🎯 Requirements

• Bachelor’s or Master’s degree in Computer Science or a related field. • At least 8 years of experience in Site Reliability Engineering or a related field. • Previous experience using ClickHouse in production. • Hands on experience with Go and/or Python. • Strong knowledge of cloud computing platforms such as AWS, Azure, or Google Cloud Platform. • Excellent understanding of distributed databases and SQL, particularly ClickHouse is a major plus. • Hands on experience with container orchestration tools such as Kubernetes or Docker Swarm. • Strong experience with automation and configuration management tools such as Ansible, Terraform, or Puppet. • You are a strong problem solver and have solid production debugging skills. • You are passionate about efficiency, availability, scalability, and data governance. • You thrive in a fast paced environment, and see yourself as a partner with the business with the shared goal of moving the business forward. • You have a high level of responsibility, ownership, and accountability. • Excellent communication and interpersonal skills.

🏖️ Benefits

• Flexible work environment - ClickHouse is a globally distributed company and remote-friendly. We currently operate in 20 countries. • Healthcare - Employer contributions towards your healthcare. • Equity in the company - Every new team member who joins our company receives stock options. • Time off - Flexible time off in the US, generous entitlement in other countries. • A $500 Home office setup if you’re a remote employee. • Global Gatherings – We believe in the power of in-person connection and offer opportunities to engage with colleagues at company-wide offsites.

Apply Now

Similar Jobs

🕒 March 10

Deepgram

51 - 200

🤖 Artificial Intelligence

☁️ SaaS

🔌 API

Site Reliability Engineer managing AI/ML infrastructure for Deepgram. Architecting, building, and optimizing hybrid systems with Kubernetes, AWS, and Terraform.

🇺🇸 United States – Remote

💵 $150k - $220k / year

💰 $47M Series B on 2022-11

⏰ Full Time

🟡 Mid-level

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

🦅 H1B Visa Sponsor

info

🕒 March 9

Elligint Health

51 - 200

⚕️ Healthcare Insurance

🧬 Biotechnology

DevOps Engineer optimizing Windows-based web services in AWS for healthcare organization. Collaborating on file processing and ensuring compliance with healthcare regulations.

🇺🇸 United States – Remote

⏰ Full Time

🟡 Mid-level

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

🕒 March 7

Inetum

10,000+ employees

🤝 B2B

🏢 Enterprise

☁️ SaaS

Expert DevOps / DevSecOps supporting Generative AI initiatives at Inetum for digital transformation in the United States. Designing high-value GenAI use cases and integrating new tools and practices.

🇺🇸 United States – Remote

💰 Post-IPO Equity on 2007-03

⏰ Full Time

🟠 Senior

🔴 Lead

⛑ DevOps & Site Reliability Engineer (SRE)

🗣️🇫🇷 French Required

🕒 March 7

Flywire

1001 - 5000

💸 Finance

💳 Fintech

Manager II of Site Reliability Engineering at Flywire driving reliability, automation, and performance in cloud infrastructure. Collaborating with Engineering teams to achieve production excellence in a global environment.

🇺🇸 United States – Remote

💵 $160k - $200k / year

💰 $60M Series F on 2021-03

⏰ Full Time

🟡 Mid-level

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

🦅 H1B Visa Sponsor

info

🕒 March 5

NOVA Corporation

1 - 10

🤝 B2B

☁️ SaaS

DevSecOps & Cloud Operations Engineer at North Stone supporting cloud automation, monitoring, and security. Managing CI/CD pipelines and optimizing system performance across cloud platforms.

🇺🇸 United States – Remote

⏰ Full Time

🟡 Mid-level

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)