Site Reliability Engineer

🕒 June 19

🗣️🇧🇷🇵🇹 Portuguese Required

Apply Now
Find Similar Remote Jobs

📊 Check your resume score for this job

Improve your chances of getting an interview by checking your resume score before you apply.

Logo of Digibee

Digibee

51 - 200 employees

Founded 2017

☁️ SaaS

🔌 API

🏢 Enterprise

💰 $60.5M Series B - Digibee on 2023-06

SaaS • API • Enterprise

Digibee is a cloud-native integration and automation platform that helps enterprises connect apps, legacy systems, data, and AI. It provides low-code/visual tools for building integrations, orchestrating workflows, deploying agents, and managing MCP/AI connectivity with built-in observability, security, and serverless scaling. Digibee is offered as a SaaS platform for IT teams, developers, and architects to accelerate digital transformation, modernize core systems, and automate high-volume, mission-critical processes.

📋 Description

• Own the technical direction of our observability stack (Dash0, OpenTelemetry, Elasticsearch/Logstash/Fluent Bit) — defining instrumentation standards for Java and Node.js services and driving adoption of tracing, metrics, and structured logging. • Establish meaningful SLIs, SLOs, and error budgets, and partner with engineering and product teams to use them to drive real engineering decisions. • Lead major incident response as a senior incident commander, and run blameless postmortems with technical depth and real follow-through. • Evolve our on-call program so it is humane and sustainable — driving down toil and alert noise as a first-class engineering priority. • Influence architecture decisions across the platform, going deep where it matters: GKE, Kong, RabbitMQ, PostgreSQL, MongoDB Atlas, Redis, and MinIO. • Mentor SREs and platform engineers, raise the technical bar through design and incident reviews, and grow the SRE discipline at Digibee.

🎯 Requirements

• 8+ years in SRE, infrastructure, or platform engineering, with meaningful time at Specialist or Principal level operating large-scale production systems — this is a mandatory requirement. • Deep production experience with Kubernetes (preferably GKE), including real fluency debugging systems under pressure. • Strong observability background with OpenTelemetry, Prometheus, distributed tracing, and centralized logging (Elasticsearch, Logstash, Fluent Bit, or similar). Experience with Dash0 is a strong plus. • Hands-on experience operating stateful services in production: at least two of PostgreSQL, MongoDB Atlas, Redis, RabbitMQ, or object storage (MinIO/S3). • Production experience instrumenting and troubleshooting Java services (JVM tuning, GC, thread dumps); familiarity with Node.js runtime characteristics is a plus. • Proven track record leading incident response and SLO programs that actually changed engineering behavior — not dashboards nobody looks at. • Demonstrated ability to mentor senior engineers and influence technical direction across teams without formal authority. • Strong communication skills in both English and Portuguese (written and verbal), with proven ability to collaborate across cross-functional, remote-first teams.

🏖️ Benefits

• Health care • Dental care • R$ 1,400.00/month on Caju card (for food and meal allowance, mobility, home office supplies, culture, health, and education) • Life insurance • Child care assistance • Equity (RSUs) • Gympass • English course: we have a partnership for group classes for R$100 monthly

Apply Now

Similar Jobs

🕒 June 18

Sicredi

10,000+ employees

🏦 Banking

💸 Finance

DevOps/SRE Analyst promoting continuous delivery at Sicredi. Working in a multi-cloud environment and integrating tools across teams.

🗣️🇧🇷🇵🇹 Portuguese Required

AWS

Cloud

Consul

Docker

Grafana

Kubernetes

Linux

Node.js

PHP

Postgres

Prometheus

Python

Redis

Spring

Spring Boot

SpringBoot

SQL

Terraform

Vault

🕒 June 18

CEA

201 - 500

🌾 Agriculture

🔧 Hardware

🤝 B2B

DevOps Specialist at C&A managing cloud platforms and CI/CD pipelines. Ensuring reliable operations and mentoring teams in a diverse and innovative environment.

🗣️🇧🇷🇵🇹 Portuguese Required

AWS

Azure

Cloud

Docker

Google Cloud Platform

Grafana

GraphQL

Kubernetes

Linux

OpenShift

Prometheus

Splunk

Terraform

🕒 June 18

Sicredi

10,000+ employees

🏦 Banking

💸 Finance

DevOps/SRE Analyst position at Sicredi, focusing on multi-cloud technologies and improving service reliability. Engage with various teams to enhance digital transformation and service deliveries.

🗣️🇧🇷🇵🇹 Portuguese Required

AWS

Cloud

Consul

Docker

Grafana

Kubernetes

Linux

Node.js

PHP

Postgres

Prometheus

Python

Redis

Spring

Spring Boot

SpringBoot

SQL

Terraform

Vault

🕒 June 18

Addvisor Group

201 - 500

☁️ SaaS

📋 Compliance

🏢 Enterprise

Analista SRE PL managing critical cloud infrastructure projects for strategic missions remotely in Brazil. Collaborating with development teams and ensuring best practices in cloud operations.

🗣️🇧🇷🇵🇹 Portuguese Required

AWS

Cloud

Docker

EC2

Kubernetes

Linux

OpenShift

Oracle

Python

Terraform

🕒 June 17

Wizdaa

11 - 50

🎯 Recruiter

👥 HR Tech

☁️ SaaS

Senior Platform Engineer/SRE designing Infrastructure as Code and operating multi-tenant Kubernetes on AWS. Leading automation and reliability engineering projects across the platform.

AWS

Kubernetes