Site Reliability Engineer, AWS

November 4

Apply Now
Logo of Truelogic Software

Truelogic Software

SaaS • B2B • Enterprise

Truelogic Software is a nearshore software development company specializing in agile staff augmentation services. They focus on providing custom outsourced software development with a team of highly skilled engineers from Latin America. Truelogic Software partners with both startups and Fortune 500 companies, offering solutions that align with their clients' time zones and ensuring high-quality outcomes through collaboration and responsiveness. With a presence in over 25 countries, Truelogic emphasizes remote work for better quality of life, and their engineers are experienced in various industries, delivering a wide range of successful projects globally.

501 - 1000 employees

Founded 2004

☁️ SaaS

🤝 B2B

🏢 Enterprise

📋 Description

• Designs, implements, and evolves shared AWS CDK and CDK8s constructs used across multiple services and teams. • Maintains core infrastructure components including VPC, EKS clusters and node groups, RDS, OpenSearch, and MSK. • Operates and extends Kubernetes cluster addons such as ingress controllers, cert-manager, autoscalers, and monitoring/logging stacks. • Ensures high reliability through structured alerting systems (Prometheus, CloudWatch), autoscaling strategies, and recovery mechanisms. • Manages and publishes baseline templates, configuration schemas, and comprehensive documentation for infrastructure usage. • Owns the CI/CD pipelines for Infrastructure as Code (IaC) codebases and platform component releases. • Collaborates with engineering teams to troubleshoot infrastructure-related issues and deliver scalable, reliable solutions. • Applies Site Reliability Engineering (SRE) principles—including SLIs, SLOs, observability, and fault tolerance—to all shared platform services. • Supports IAM roles, secrets management, and tenant isolation best practices.

🎯 Requirements

• Has 5+ years of experience in infrastructure or Site Reliability Engineering (SRE), including hands-on work with AWS services such as VPC, IAM, RDS, MSK, and S3, as well as Kubernetes components like Helm, RBAC, and ServiceAccounts. • Demonstrates fluency in Python and has practical experience with Infrastructure-as-Code using AWS CDK, CDK8s, or equivalent frameworks such as Pulumi. • Possesses a strong understanding of Prometheus, Grafana, and effective alert routing practices. • Has experience designing reusable infrastructure patterns or building internal developer platforms. • Shows a proven track record of improving system reliability through automation, monitoring, and operational best practices. • Has experience supporting Spark on Kubernetes, Argo, or Kafka-based batch pipelines.

🏖️ Benefits

• 100% Remote Work: Enjoy the freedom to work from the location that helps you thrive. All it takes is a laptop and a reliable internet connection. • Highly Competitive USD Pay: Earn an excellent, market-leading compensation in USD, that goes beyond typical market offerings. • Paid Time Off: We value your well-being. Our paid time off policies ensure you have the chance to unwind and recharge when needed. • Work with Autonomy: Enjoy the freedom to manage your time as long as the work gets done. Focus on results, not the clock. • Work with Top American Companies: Grow your expertise working on innovative, high-impact projects with Industry-Leading U.S. Companies.

Apply Now

Similar Jobs

October 31

Capgemini

10,000+ employees

🏢 Enterprise

🤖 Artificial Intelligence

🔒 Cybersecurity

Senior DevOps Engineer responsible for CI/CD pipeline implementation and DevOps best practices at a leading insurance provider. Collaborating with software development and operations for high-quality software delivery.

October 31

Capgemini

10,000+ employees

🏢 Enterprise

🤖 Artificial Intelligence

🔒 Cybersecurity

Site Reliability Engineer in a major insurance firm focusing on infrastructure availability and reliability. Engaging in troubleshooting, chaos testing, and observability for enhanced resilience.

🇲🇽 Mexico – Remote

⏰ Full Time

🟡 Mid-level

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

October 31

Arrow Components

10,000+ employees

Senior DevOps Engineer at Arrow Electronics automating application and infrastructure delivery. Collaborating with global teams to design processes and workflows to improve time to market.

🇲🇽 Mexico – Remote

💵 $83.5k - $100k / year

⏰ Full Time

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

October 29

Axented

51 - 200

🤝 B2B

🏢 Enterprise

☁️ SaaS

Software Support Engineer Level 2 at Axented, working with AWS and DevOps tools. Collaborating with cross-functional teams in real-time to solve critical technical issues.

🇲🇽 Mexico – Remote

💵 $50k - $60k / month

⏰ Full Time

🟡 Mid-level

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

🗣️🇪🇸 Spanish Required

October 13

BayRock Labs

1001 - 5000

🤖 Artificial Intelligence

☁️ SaaS

🏢 Enterprise

DevOps Engineer at BayRock Labs managing CI/CD pipelines and cloud infrastructure using AWS. Collaborating with teams to optimize system performance and maintain documentation for processes.

🇲🇽 Mexico – Remote

⏰ Full Time

🟠 Senior

🔴 Lead

⛑ DevOps & Site Reliability Engineer (SRE)

Developed by Lior Neu-ner. I'd love to hear your feedback — Get in touch via DM or support@remoterocketship.com