SRE, Site Reliability Engineering

🔥 12 minutes ago

🗣️🇪🇸 Spanish Required

Apply Now
Find Similar Remote Jobs

📊 Check your resume score for this job

Improve your chances of getting an interview by checking your resume score before you apply.

Logo of Sofka Technologies

Sofka Technologies

1001 - 5000 employees

Founded 2013

🤝 B2B

🏢 Enterprise

🤖 Artificial Intelligence

B2B • Enterprise • Artificial Intelligence

Sofka Technologies is a Latin American software development and digital transformation company that builds engineering teams and delivers software products and platforms for business clients. It focuses on backend and platform engineering, data & analytics, and applied AI, offering remote and hybrid roles across LATAM and emphasizing technology-driven solutions for enterprises.

📋 Description

• Adapt observability requirements to each technical solution to ensure coverage, visibility, and operational efficiency. • Configure and maintain dashboards, metrics, alerts, and critical business controls. • Validate solution resilience through chaos testing and scalability assessments under load. • Implement resilient design patterns such as circuit breakers, fallbacks, and retries in distributed architectures. • Identify and automate manual processes using infrastructure-as-code tools to reduce MTTR. • Lead the implementation of self-remediation workflows and promote continuous improvement practices in operations. • Collaborate with development and architecture teams to ensure technical quality across critical user journeys.

🎯 Requirements

• Minimum 3 years of experience leading technology resilience and observability in high-complexity environments. • Proven experience automating operational tasks and managing incidents under SRE/DevOps methodologies. • Observability: Dynatrace (primary hands-on), Grafana, Prometheus, OpenTelemetry, and the ELK Stack. • Automation and IaC: Ansible, Terraform, Terragrunt, and Monaco (Monitoring as Code). • Containerization: Kubernetes (AKS, EKS), OpenShift (advanced level), and Docker. • Programming languages: Python (advanced), Bash, YAML, and PowerShell. • Cloud & Infrastructure: Azure, AWS, or GCP (Networking, Security, and Compute). • Reliability management: definition of SLIs, SLOs, SLAs, and Error Budget management. • CI/CD: Git, Jenkins, Azure DevOps, and GitHub Actions. • Resilience engineering: Chaos Engineering, circuit breaker patterns, and Canary/Blue-Green deployments.

🏖️ Benefits

• Technical and personal challenges that will keep you continuously growing. • A connected team focused on your physical and mental wellbeing. • A fresh, collaborative continuous-improvement culture with learning opportunities and people ready to support you. • KaizenHub, a program designed to boost your talents, offering feedback, mentoring, and coaching through Sofka U. • Programs such as Happy Kaizen and WeSofka that support your physical and emotional wellbeing.

Apply Now

Similar Jobs

🕒 June 24

DevOps Engineer focused on building and maintaining Kubernetes clusters for a health care client’s modernization journey. Engaging in designing, developing, and maintaining automated build and release pipelines.

Azure

Cloud

Flux

Kubernetes

MS SQL Server

SQL

🕒 June 10

Blue Coding

51 - 200

🤝 B2B

🛍️ eCommerce

💳 Fintech

Senior DevOps Engineer for cloud-native infrastructure modernization at Blue Coding. Focused on AWS migration and legacy Windows server decommissioning.

AWS

Cloud

Docker

Terraform

.NET

🕒 May 29

KATBOTZ®

1 - 10

🤖 Artificial Intelligence

📚 Education

Senior DevOps & Security Consultant at KATBOTZ, supporting enterprise infrastructure and security initiatives for global projects.

Ansible

AWS

Azure

Cloud

Cyber Security

Docker

Firewalls

Google Cloud Platform

Grafana

Jenkins

Kubernetes

Prometheus

Python

Splunk

Terraform

🕒 May 25

Cloud DevOps Engineer Technical Mentor at Udacity providing support to learners. Engage in 1:1 calls, deep-dive sessions, and group Q&A to enrich the learning experience.

AWS

Cloud

Docker

DynamoDB

EC2

Kubernetes

Microservices

🕒 May 14

Arctiq

201 - 500

🏢 Enterprise

☁️ SaaS

🔐 Security

Technical leader architecting reliability strategy for large-scale government systems. Leading SRE framework implementation and mentoring mid-level engineers while interfacing with government stakeholders.

Cloud

Distributed Systems

Java

Kubernetes

Linux

Python

Go