Staff Software Engineer I – SRE

🕒 Janeiro 28

🗣️🇺🇸🇬🇧 Inglês obrigatório

Candidatar-se
Encontrar Vagas Remotas Similares

📊 Verifique sua pontuação de currículo para esta vaga

Melhore suas chances de conseguir uma entrevista verificando sua pontuação de currículo antes de se candidatar.

Logo of Confluent

Confluent

1001 - 5000 funcionários

Fundada em 2014

🤖 Inteligência Artificial

☁️ SaaS

💰 Secondary Market em 2021-06

Artificial Intelligence • SaaS • Cloud Computing

A Confluent é uma empresa especializada em plataformas de streaming de dados que transformam eventos de dados em tempo real em resultados acionáveis. Suas soluções permitem o desenvolvimento de aplicativos inteligentes e em tempo real, capacitando equipes e sistemas a responder instantaneamente aos dados. A Confluent constrói uma nova categoria de dados que impacta o mundo real ao fornecer a infraestrutura para streaming de dados em tempo real, reconhecida e parceira de grandes empresas de tecnologia, como Google Cloud e Microsoft. A empresa mantém uma cultura de trabalho predominantemente remota, contratando talentos de mais de 25 países, e valoriza a diversidade e a inclusão no ambiente de trabalho.

Descrição

• Analyze systemic failure patterns and design improvements that prevent incident recurrence • Define and maintain SLO/SLA frameworks; use error budgets to guide reliability investments • Build tooling and automation to reduce incident response toil and scale team impact • Own Rootly configuration, workflows, and integrations with PagerDuty, Jira, Confluence, and Slack • Analyze reliability data to identify systemic improvements; build dashboards that drive action • Explore AI-assisted approaches to documentation quality and incident analysis • Design scalable reliability standards that reduce reactive workload over time. • Own standards, practices, and continuous improvement of incident response • Define incident commander eligibility criteria and manage the rotation • Available as escalation IC when incidents exceed a team's management chain • Develop and deliver training programs for engineering teams at all levels • Coach teams through post-mortems and on developing actionable corrective actions • Edit and review customer-facing incident documents to ensure quality and clarity • Drive turnaround SLAs while maintaining technical accuracy • Ensure clear explanation of what happened, why, and how we'll prevent recurrence • Partner with engineering leaders to elevate reliability practices • Be the expert who teams proactively engage for guidance

🎯 Requisitos

• 10+ years in SRE, incident management, or reliability engineering • Cloud experience with at least one of AWS, GCP, or Azure • Deep expertise with incident management tooling (Rootly, PagerDuty, or similar platforms) • Strong understanding of distributed systems and failure modes at scale—Kafka/event streaming expertise preferred, or demonstrated rapid mastery of complex systems • Deep experience with observability: metrics, logging, tracing—ability to diagnose complex issues • Kubernetes and container orchestration experience • Understanding of CI/CD pipelines and release processes • Systems thinking: understanding how infrastructure design choices affect failure modes and recovery • Familiarity with SLO/SLA frameworks. • Track record as a trusted advisor across engineering organizations • Experience driving org-wide process and cultural changes • Strong written communication (design docs, one-pagers, runbooks) • Post-mortem facilitation experience • Experience with async collaboration across time zones • Large company experience navigating reliability/incident programs at 500+ engineer organizations

🏖️ Benefícios

• Belonging isn’t a perk here. It’s the baseline. We work across time zones and backgrounds, knowing the best ideas come from different perspectives. And we make space for everyone to lead, grow, and challenge what’s possible. • We’re proud to be an equal opportunity workplace. Employment decisions are based on job-related criteria, without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, veteran status, or any other classification protected by law.

Candidatar-se

Vagas Similares

🕒 Julho 24, 2025

EY

10.000+ funcionários

💸 Finanças

Staff CloudOps Engineer supporting DevOps practices and managing AWS and Azure environments. Responsible for CI/CD pipelines and infrastructure management within EY.

🗣️🇺🇸🇬🇧 Inglês obrigatório