SRE, Site Reliability Engineering

🔥 0 minutes ago

🗣️🇪🇸 Spanish Required

Apply Now
Find Similar Remote Jobs

📊 Check your resume score for this job

Improve your chances of getting an interview by checking your resume score before you apply.

Logo of Sofka Technologies

Sofka Technologies

1001 - 5000 employees

Founded 2013

🤝 B2B

🏢 Enterprise

🤖 Artificial Intelligence

B2B • Enterprise • Artificial Intelligence

Sofka Technologies is a Latin American software development and digital transformation company that builds engineering teams and delivers software products and platforms for business clients. It focuses on backend and platform engineering, data & analytics, and applied AI, offering remote and hybrid roles across LATAM and emphasizing technology-driven solutions for enterprises.

📋 Description

• Adapt observability requirements to each technical solution to ensure coverage, visibility, and operational efficiency. • Configure and maintain dashboards, metrics, alerts, and critical business controls. • Validate solution resilience through chaos testing and scalability assessments under load. • Implement resilient design patterns such as circuit breakers, fallbacks, and retries in distributed architectures. • Identify and automate manual processes using infrastructure-as-code tools to reduce MTTR. • Lead the implementation of self-remediation workflows and promote continuous improvement practices in operations. • Collaborate with development and architecture teams to ensure technical quality across critical user journeys.

🎯 Requirements

• Minimum 3 years of experience leading technology resilience and observability in high-complexity environments. • Proven experience automating operational tasks and managing incidents under SRE/DevOps methodologies. • Observability: Dynatrace (primary hands-on), Grafana, Prometheus, OpenTelemetry, and the ELK Stack. • Automation and IaC: Ansible, Terraform, Terragrunt, and Monaco (Monitoring as Code). • Containerization: Kubernetes (AKS, EKS), OpenShift (advanced level), and Docker. • Programming languages: Python (advanced), Bash, YAML, and PowerShell. • Cloud & Infrastructure: Azure, AWS, or GCP (Networking, Security, and Compute). • Reliability management: definition of SLIs, SLOs, SLAs, and Error Budget management. • CI/CD: Git, Jenkins, Azure DevOps, and GitHub Actions. • Resilience engineering: Chaos Engineering, circuit breaker patterns, and Canary/Blue-Green deployments.

🏖️ Benefits

• Technical and personal challenges that will keep you continuously growing. • A connected team focused on your physical and mental wellbeing. • A fresh, collaborative continuous-improvement culture with learning opportunities and people ready to support you. • KaizenHub, a program designed to boost your talents, offering feedback, mentoring, and coaching through Sofka U. • Programs such as Happy Kaizen and WeSofka that support your physical and emotional wellbeing.

Apply Now

Similar Jobs

🕒 June 24

DevOps Engineer focused on building and maintaining Kubernetes clusters for a health care client’s modernization journey. Engaging in designing, developing, and maintaining automated build and release pipelines.

🕒 June 10

Blue Coding

51 - 200

🤝 B2B

🛍️ eCommerce

💳 Fintech

Senior DevOps Engineer for cloud-native infrastructure modernization at Blue Coding. Focused on AWS migration and legacy Windows server decommissioning.

🕒 May 25

Cloud DevOps Engineer Technical Mentor at Udacity providing support to learners. Engage in 1:1 calls, deep-dive sessions, and group Q&A to enrich the learning experience.

🕒 May 14

Arctiq

201 - 500

🏢 Enterprise

☁️ SaaS

🔐 Security

Technical leader architecting reliability strategy for large-scale government systems. Leading SRE framework implementation and mentoring mid-level engineers while interfacing with government stakeholders.