Senior Site Reliability Engineer

51 - 200 funcionários

Fundada em 2016

☁️ SaaS

🏢 Corporativo

🤖 Inteligência Artificial

SaaS • Enterprise • Artificial Intelligence

ClickHouse é um data warehouse em tempo real rápido e eficiente no uso de recursos e um banco de dados open source, projetado para oferecer desempenho superior de consultas para aplicações de missão crítica e sensíveis ao tempo. Está disponível como serviço em nuvem nas principais plataformas, como AWS, GCP e Azure, com opção de "Bring Your Own Cloud" e uma ampla gama de integrações para operação fluida em diferentes stacks de tecnologia. O ClickHouse se destaca em analytics em tempo real, machine learning, business intelligence e observabilidade, sendo uma escolha ideal para tarefas como serviços financeiros, detecção de fraudes e analytics para jogos. Ele oferece operações SQL amigáveis para desenvolvedores, soluções de armazenamento com ótimo custo-benefício e uma alternativa open source a bancos de dados tradicionais. Empresas como Sony, Lyft, Cisco, GitLab e Twilio utilizam o ClickHouse por sua escalabilidade, eficiência e facilidade de uso.

Senior Site Reliability Engineer

🕒 Março 13

🇨🇦 Canadá – Remoto

⏰ Tempo Integral

🟠 Sênior

⛑ DevOps & Engenheiro de Confiabilidade do Site (SRE)

🗣️🇺🇸🇬🇧 Inglês obrigatório

Ansible

AWS

Azure

Cloud

Docker

Google Cloud Platform

Kubernetes

Puppet

Python

SQL

Terraform

Encontrar Vagas Remotas Similares

📊 Verifique sua pontuação de currículo para esta vaga

Melhore suas chances de conseguir uma entrevista verificando sua pontuação de currículo antes de se candidatar.

ClickHouse

51 - 200 funcionários

Fundada em 2016

☁️ SaaS

🏢 Corporativo

🤖 Inteligência Artificial

SaaS • Enterprise • Artificial Intelligence

Descrição

• Collaborate with various engineering teams in ClickHouse to design and implement scalable, secure, and highly available systems for ClickHouse. • Establish and manage service level objectives (SLOs) and service level agreements (SLAs) for ClickHouse Cloud. • Ensure all the infrastructure components in ClickHouse Cloud (including Dataplane, Control Plane, ClickHouse Core, etc) have monitoring and alerting in place to ensure timely detection and resolution of incidents. • Enhance and refine incident response processes and post-mortem analysis for any outages in ClickHouse Cloud including working with the support team to communicate to the impacted customers. • Continuously improve the reliability and performance of our ClickHouse services. • Plan, enable, and drive Chaos initiatives across Engineering teams, based upon internal priorities. • Manage on-call processes to respond to performance and reliability issues, and establish best practices for coordinating escalation to resolve issues and minimize downtime.

🎯 Requisitos

• Bachelor’s or Master’s degree in Computer Science or a related field. • At least 8 years of experience in Site Reliability Engineering or a related field. • Hands-on experience with Go and/or Python. • Strong knowledge of cloud computing platforms such as AWS, Azure, or Google Cloud Platform. • Excellent understanding of distributed databases and SQL, particularly ClickHouse is a major plus. • Hands-on experience with container orchestration tools such as Kubernetes or Docker Swarm. • Strong experience with automation and configuration management tools such as Ansible, Terraform, or Puppet. • You are a strong problem solver and have solid production debugging skills. • You are passionate about efficiency, availability, scalability, and data governance. • You thrive in a fast paced environment, and see yourself as a partner with the business with the shared goal of moving the business forward. • You have a high level of responsibility, ownership, and accountability. • Excellent communication and interpersonal skills.

🏖️ Benefícios

• Flexible work environment - ClickHouse is a globally distributed company and remote-friendly. We currently operate in 20 countries. • Healthcare - Employer contributions towards your healthcare. • Equity in the company - Every new team member who joins our company receives stock options. • Time off - Flexible time off in the US, generous entitlement in other countries. • A $500 Home office setup if you’re a remote employee. • Global Gatherings – We believe in the power of in-person connection and offer opportunities to engage with colleagues at company-wide offsites.

Vagas Similares

Site Reliability Engineer, Core Streaming

🕒 Março 3

Yelp

1001 - 5000

Site Reliability Engineer specializing in Kafka, managing Yelp’s data streaming infrastructure. Collaborating on projects to ensure the reliability and performance of critical services across hybrid and multi-cloud environments.

🇨🇦 Canadá – Remoto

💵 $135.000 - $185.000 / ano

⏰ Tempo Integral

🟡 Pleno

🟠 Sênior

⛑ DevOps & Engenheiro de Confiabilidade do Site (SRE)

🗣️🇺🇸🇬🇧 Inglês obrigatório

Apache

Cloud

Java

Kafka

Linux

Python

DevOps Engineer

🕒 Fevereiro 26

S&P Global

10.000+ funcionários

💸 Finanças

🏢 Corporativo

🤖 Inteligência Artificial

DevOps Engineer focusing on infrastructure and applications supporting valuations and trade data at S&P Global. Collaborating with Development, Testing and Client Services teams to improve service availability.

🇨🇦 Canadá – Remoto

⏰ Tempo Integral

🟡 Pleno

🟠 Sênior

⛑ DevOps & Engenheiro de Confiabilidade do Site (SRE)

🗣️🇺🇸🇬🇧 Inglês obrigatório

AWS

Chef

Cloud

DynamoDB

EC2

Java

JavaScript

Linux

MySQL

NoSQL

PHP

Postgres

Puppet

Python

SQL

Terraform

Unix

DevOps Engineer

🕒 Fevereiro 20

Modaxo

1001 - 5000

🚗 Transporte

☁️ SaaS

🤝 B2B

DevOps Engineer managing and scaling cloud infrastructure and services for a global technology organization. Collaborating with IT teams across multiple regions to ensure operational excellence.

🇨🇦 Canadá – Remoto

⏰ Tempo Integral

🟡 Pleno

🟠 Sênior

⛑ DevOps & Engenheiro de Confiabilidade do Site (SRE)

🗣️🇺🇸🇬🇧 Inglês obrigatório

AWS

Azure

Cloud

DNS

Firewalls

Linux

MacOS

Terraform

DevOps Engineer

🕒 Fevereiro 18

S&P Global

10.000+ funcionários

💸 Finanças

🏢 Corporativo

🤖 Inteligência Artificial

DevOps Engineer developing functional systems that improve customer experience for S&P Global's applications. Responsibilities include automation, monitoring and maintaining infrastructure using cutting-edge technologies.

🇨🇦 Canadá – Remoto

⏰ Tempo Integral

🟡 Pleno

🟠 Sênior

⛑ DevOps & Engenheiro de Confiabilidade do Site (SRE)

🗣️🇺🇸🇬🇧 Inglês obrigatório

AWS

Chef

Cloud

DynamoDB

EC2

Java

JavaScript

Linux

MySQL

NoSQL

PHP

Postgres

Puppet

Python

SQL

Terraform

Unix

Site Reliability Engineer – Inference Infrastructure

🕒 Janeiro 13

Cohere

11 - 50

🤖 Inteligência Artificial

🏢 Corporativo

☁️ SaaS

Site Reliability Engineer joining Cohere to build and operate high-performance AI platforms for NLP applications. Collaborating with teams to deploy optimized models in production environments.

🇨🇦 Canadá – Remoto

⏰ Tempo Integral

🟡 Pleno

🟠 Sênior

⛑ DevOps & Engenheiro de Confiabilidade do Site (SRE)

🗣️🇺🇸🇬🇧 Inglês obrigatório

AWS

Azure

Cloud

Distributed Systems

Google Cloud Platform

Kubernetes

Linux