Site Reliability Engineer – AWS

November 4

🇨🇴 Colombia – Remote

⏰ Full Time

🟡 Mid-level

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

Apply Now
Logo of Truelogic Software

Truelogic Software

SaaS • B2B • Enterprise

Truelogic Software is a nearshore software development company specializing in agile staff augmentation services. They focus on providing custom outsourced software development with a team of highly skilled engineers from Latin America. Truelogic Software partners with both startups and Fortune 500 companies, offering solutions that align with their clients' time zones and ensuring high-quality outcomes through collaboration and responsiveness. With a presence in over 25 countries, Truelogic emphasizes remote work for better quality of life, and their engineers are experienced in various industries, delivering a wide range of successful projects globally.

501 - 1000 employees

Founded 2004

☁️ SaaS

🤝 B2B

🏢 Enterprise

📋 Description

• Designs, implements, and evolves shared AWS CDK and CDK8s constructs used across multiple services and teams. • Maintains core infrastructure components including VPC, EKS clusters and node groups, RDS, OpenSearch, and MSK. • Operates and extends Kubernetes cluster addons such as ingress controllers, cert-manager, autoscalers, and monitoring/logging stacks. • Ensures high reliability through structured alerting systems (Prometheus, CloudWatch), autoscaling strategies, and recovery mechanisms. • Manages and publishes baseline templates, configuration schemas, and comprehensive documentation for infrastructure usage. • Owns the CI/CD pipelines for Infrastructure as Code (IaC) codebases and platform component releases. • Collaborates with engineering teams to troubleshoot infrastructure-related issues and deliver scalable, reliable solutions. • Applies Site Reliability Engineering (SRE) principles—including SLIs, SLOs, observability, and fault tolerance—to all shared platform services. • Supports IAM roles, secrets management, and tenant isolation best practices.

🎯 Requirements

• Has 5+ years of experience in infrastructure or Site Reliability Engineering (SRE), including hands-on work with AWS services such as VPC, IAM, RDS, MSK, and S3, as well as Kubernetes components like Helm, RBAC, and ServiceAccounts. • Demonstrates fluency in Python and has practical experience with Infrastructure-as-Code using AWS CDK, CDK8s, or equivalent frameworks such as Pulumi. • Possesses a strong understanding of Prometheus, Grafana, and effective alert routing practices. • Has experience designing reusable infrastructure patterns or building internal developer platforms. • Shows a proven track record of improving system reliability through automation, monitoring, and operational best practices. • Has experience supporting Spark on Kubernetes, Argo, or Kafka-based batch pipelines.

🏖️ Benefits

• 100% Remote Work: Enjoy the freedom to work from the location that helps you thrive. All it takes is a laptop and a reliable internet connection. • Highly Competitive USD Pay: Earn an excellent, market-leading compensation in USD, that goes beyond typical market offerings. • Paid Time Off: We value your well-being. Our paid time off policies ensure you have the chance to unwind and recharge when needed. • Work with Autonomy: Enjoy the freedom to manage your time as long as the work gets done. Focus on results, not the clock. • Work with Top American Companies: Grow your expertise working on innovative, high-impact projects with Industry-Leading U.S. Companies.

Apply Now

Similar Jobs

October 31

Growth Acceleration Partners

501 - 1000

🤖 Artificial Intelligence

☁️ SaaS

SaaS & Cloud Operations Engineer managing cloud-based tools and infrastructure for a fintech company. Collaborating with cross-functional teams to optimize cloud operations and ensure compliance.

🇨🇴 Colombia – Remote

⏰ Full Time

🟡 Mid-level

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

October 2

Baja Tomi Sdn Bhd

11 - 50

🤝 B2B

🎯 Recruiter

👥 HR Tech

Mid-Senior DevOps Engineer responsible for designing cloud infrastructure and improving deployment workflows at PrimeWorks. Collaborating with teams to ensure reliable and secure environments.

🇨🇴 Colombia – Remote

⏰ Full Time

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

October 2

DEUNA

51 - 200

💳 Fintech

☁️ SaaS

🛍️ eCommerce

DevOps Engineer automating processes and improving system reliability for DEUNA's payments platform. Collaborating with teams to design deployment infrastructure and guide continuous improvement initiatives.

🇨🇴 Colombia – Remote

💰 $30M Series A on 2022-07

⏰ Full Time

🟡 Mid-level

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

August 26

Wizeline

1001 - 5000

🏢 Enterprise

☁️ SaaS

🤖 Artificial Intelligence

Manage EKS and on-prem Kubernetes for Wizeline; automate CI/CD, security, and incident response.

🇨🇴 Colombia – Remote

💰 $43M Series B on 2018-03

⏰ Full Time

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

July 30

Property Leads

11 - 50

🏠 Real Estate

🤝 B2B

Join Property Leads as a Technical Operations Analyst to optimize marketing processes and data integrity.

🇨🇴 Colombia – Remote

💵 $1k - $3k / month

⏰ Full Time

🟡 Mid-level

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

Developed by Lior Neu-ner. I'd love to hear your feedback — Get in touch via DM or support@remoterocketship.com