Cloud Reliability Engineer

1001 - 5000 employees

We relentlessly make supply chains better. For everyone. No matter your business size. Whatever your goal. No matter the challenge. No matter your starting point. We will meet you where you are to create the future you need.

Cloud Reliability Engineer

🕒 March 4

🇧🇷 Brazil – Remote

⏰ Full Time

🟡 Mid-level

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

Ansible

AWS

Azure

Cloud

Distributed Systems

Google Cloud Platform

Kubernetes

Python

Terraform

Apply Now

Find Similar Remote Jobs

📊 Check your resume score for this job

Improve your chances of getting an interview by checking your resume score before you apply.

Infios

1001 - 5000 employees

📋 Description

• Operate, maintain, and improve cloud infrastructure in AWS, Azure, or GCP environments. • Manage and optimize Kubernetes clusters — deployment, scaling, patching, and upgrades. • Ensure system availability, scalability, and performance through proactive monitoring and optimization. • Maintain infrastructure-as-code (IaC) for consistent and repeatable deployments. • Identify opportunities for operational automation to eliminate manual processes (“reduce toil”). • Build and maintain automated pipelines for deployments, configuration, and remediation. • Develop self-healing mechanisms to automatically detect and resolve common service issues. • Participate in continuous improvement initiatives around reliability, performance, and efficiency. • Implement SRE principles: define and track SLIs, SLOs, and error budgets. • Perform incident analysis and postmortems to identify root causes and prevent recurrence. • Design proactive monitoring, alerting, and observability dashboards (Dynatrace, DataDog). • Collaborate with DevOps and development teams to build reliable, observable, and resilient systems. • Manage and optimize CI/CD pipelines to ensure reliable and consistent delivery. • Support deployment strategies (blue/green, canary, rolling) to reduce downtime risk. • Collaborate with Product and DevOps teams on release readiness and rollback automation. • Monitor, troubleshoot, and resolve infrastructure and application issues. • Respond to production incidents and ensure rapid mitigation and resolution. • Troubleshoot complex cloud, container, and networking issues across distributed systems. • Drive a culture of proactive monitoring, data-driven analysis, and preventive action.

🎯 Requirements

• Bachelor’s degree in computer science, Engineering, or related field (or equivalent experience). • 5+ years of experience in experience in Cloud Engineering, DevOps, or Site Reliability roles. • Hands-on experience with cloud platforms (OCI, AWS, Azure, or GCP). • Strong knowledge of Kubernetes deployment, management, and troubleshooting • Solid understanding of observability and monitoring (e.g., Dynatrace, DataDog) and incident management platforms. • Proficiency in scripting and automation (e.g., Python, Bash, Terraform, Ansible). • Strong troubleshooting and analytical skills across infrastructure and applications. • Experience with incident response, RCA, and postmortem processes. • A mindset of continuous improvement, reliability, and self-healing automation. • Understanding of SRE principles, SLAs/SLOs/SLIs, and chaos engineering practices.

🏖️ Benefits

• Health insurance • Flexible work arrangements • Professional development opportunities

Apply Now

Similar Jobs

DevOps Specialist – Infrastructure

🕒 February 27

BigDataCorp

51 - 200

🤝 B2B

🔌 API

🤖 Artificial Intelligence

DevOps Specialist responsible for managing AWS cloud infrastructure and implementing CI/CD pipelines at BigDataCorp, a leading datatech company in Latin America.

🇧🇷 Brazil – Remote

⏰ Full Time

🟡 Mid-level

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

🗣️🇧🇷🇵🇹 Portuguese Required

AWS

Terraform

Senior DevOps

🕒 February 21

TO Brasil

501 - 1000

🏢 Enterprise

🤖 Artificial Intelligence

📡 Telecommunications

DevOps Senior managing scalable infrastructure and deployments for high-performance teams at T.O. Brasil. Working with automation, cloud solutions, and high-scale web application infrastructure.

🇧🇷 Brazil – Remote

⏰ Full Time

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

🗣️🇧🇷🇵🇹 Portuguese Required

Ansible

AWS

Azure

Docker

Google Cloud Platform

Grafana

Kubernetes

Linux

NGINX

Prometheus

Terraform

Release Manager – DevOps

🕒 February 3

Hitss Brasil

5001 - 10000

🌾 Agriculture

💳 Fintech

Release Manager responsible for planning and coordinating software releases at Hitss, a technology solutions company. Focus on quality, risk management, and stakeholder communication.

🇧🇷 Brazil – Remote

⏰ Full Time

🟡 Mid-level

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

🗣️🇧🇷🇵🇹 Portuguese Required

Azure

Cloud

Jenkins

Technical Lead, DevOps

🕒 December 24, 2025

Atria Institute

51 - 200

⚕️ Healthcare Insurance

🔬 Science

🤝 Non-profit

Technical Lead driving engineering excellence in DevOps for Atria Health. Leading technical direction for infrastructure and collaborating with product engineering teams.

🇧🇷 Brazil – Remote

⏰ Full Time

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

AWS

Azure

Cloud

Google Cloud Platform

Kubernetes

Node.js

Terraform

TypeScript

Java DevOps Architect

🕒 November 7, 2025

Qintess

1001 - 5000

🏢 Enterprise

🤖 Artificial Intelligence

🤝 B2B

Java DevOps Architect developing high-performance Java/JEE systems with a focus on integration and cloud computing. Incumbent will work on agile methodologies to ensure quality and efficiency in enterprise applications.

🇧🇷 Brazil – Remote

⏰ Full Time

🟡 Mid-level

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

🗣️🇧🇷🇵🇹 Portuguese Required

Angular

AWS

Azure

Cloud

Google Cloud Platform

Hibernate

Java

JavaScript

JMeter

jQuery

JUnit

Linux

NoSQL

Selenium

Spring

SQL