Senior Site Reliability Engineer, SRE

🔥 0 minutes ago

🗣️🇧🇷🇵🇹 Portuguese Required

Apply Now
Find Similar Remote Jobs

📊 Check your resume score for this job

Improve your chances of getting an interview by checking your resume score before you apply.

Logo of Oowlish

Oowlish

51 - 200 employees

Founded 2017

🤝 B2B

💳 Fintech

B2B • Software Development • Fintech

Oowlish is a technology company that specializes in providing end-to-end solutions for businesses looking to innovate by developing digital products and services. Their mission is to democratize innovation by connecting companies of all sizes with highly skilled tech talent, especially from developing countries, and they emphasize collaboration through tailored software development, UX/UI design, product management, and agile methodologies. Oowlish also invests in startups through Oowlish Ventures, helping entrepreneurs to co-design and scale their ideas into successful products.

📋 Description

• Design, implement, and improve Site Reliability Engineering practices across production environments. • Define, manage, and continuously improve Service Level Objectives (SLOs), Service Level Indicators (SLIs), and Error Budgets. • Lead and participate in incident response and incident command processes. • Build and evolve observability strategies, including monitoring, logging, alerting, and distributed tracing. • Improve system reliability, availability, scalability, and operational efficiency. • Partner with engineering teams to improve application performance and production readiness. • Develop automation solutions that reduce operational overhead and improve reliability. • Participate in root cause analysis and post-incident reviews. • Drive continuous improvement initiatives based on operational insights and incident learnings. • Help establish reliability best practices across teams and services.

🎯 Requirements

• 5+ years of professional experience in Site Reliability Engineering, DevOps, or Production Engineering roles. • Strong understanding of Site Reliability Engineering principles and best practices. • Experience supporting and operating production systems at scale. • Strong knowledge of monitoring, observability, and reliability engineering concepts. • Experience working in cloud-based environments. • Strong troubleshooting and problem-solving skills. • Experience working with distributed systems and modern application architectures. • Proven Site Reliability Engineering experience. • Experience in defining and managing: • Service Level Objectives (SLOs) • Service Level Indicators (SLIs) • Error Budgets • Experience leading or actively participating in Incident Command and Incident Response processes. • Experience designing and implementing observability strategies. • Hands-on experience with: • Monitoring • Logging • Alerting • Distributed Tracing • Experience improving system reliability, availability, and operational excellence. • Experience supporting mission-critical production environments. • Experience with cloud platforms (AWS preferred). • Strong automation mindset. • Experience conducting root cause analysis and postmortems. • Kubernetes experience. • Terraform or Infrastructure as Code experience. • CI/CD pipeline experience. • Experience with containerized environments. • Experience with distributed microservices architectures. • Experience with performance engineering. • Experience mentoring engineers on reliability practices. • Multi-cloud experience. • Experience working in highly regulated or high-availability environments.

🏖️ Benefits

• Home office; • Competitive compensation based on experience; • Career plans to allow for extensive growth in the company; • International Projects; • Oowlish English Program (Technical and Conversational); • Oowlish Fitness with Total Pass; • Games and Competitions;

Apply Now

Similar Jobs

🔥 3 hours ago

Segware

51 - 200

🔐 Security

🤖 Artificial Intelligence

☁️ SaaS

SRE / SecOps Senior role to enhance security and performance at Segware. Collaborating with teams to implement innovative solutions in monitoring software for client growth.

🗣️🇧🇷🇵🇹 Portuguese Required

Apache

AWS

Docker

Jenkins

Kafka

Kubernetes

Linux

MongoDB

MySQL

Redis

SQL

🔥 3 hours ago

Segware

51 - 200

🔐 Security

🤖 Artificial Intelligence

☁️ SaaS

SRE/DevOps Senior responsible for managing complex environments, ensuring stability and performance. Collaborating on automation projects and infrastructure improvements while reducing costs.

🗣️🇧🇷🇵🇹 Portuguese Required

Apache

AWS

Docker

Hibernate

Java

Jenkins

Kafka

Kubernetes

Linux

MongoDB

MySQL

Redis

Spring

Spring Boot

SpringBoot

SQL

🔥 14 hours ago

OZmap

11 - 50

☁️ SaaS

📡 Telecommunications

🤝 B2B

Senior Platform Engineer at Ozmap responsible for AWS and Linux environments, troubleshooting, and building CI/CD pipelines for continuous delivery.

🗣️🇧🇷🇵🇹 Portuguese Required

AWS

Cloud

Docker

EC2

Grafana

HAProxy

Jenkins

Linux

NGINX

Prometheus

🔥 16 hours ago

Compass

10,000+ employees

🏠 Real Estate

📱 Media

DevSecOps Security Analyst focusing on security practices and vulnerability management for Compass UOL. Involves collaboration with development teams to implement security measures.

🗣️🇧🇷🇵🇹 Portuguese Required

Azure

Cloud

🕒 3 days ago

CI&T

5001 - 10000

🤖 Artificial Intelligence

☁️ SaaS

Analista de SRE/Developer ensuring system resilience and observability at CI&T, leveraging AI and tech-integrated solutions.

🗣️🇧🇷🇵🇹 Portuguese Required

Java

JavaScript

Node.js