Site Reliability Engineer, AWS

November 4

Apply Now
Logo of Truelogic Software

Truelogic Software

SaaS • B2B • Enterprise

Truelogic Software is a nearshore software development company specializing in agile staff augmentation services. They focus on providing custom outsourced software development with a team of highly skilled engineers from Latin America. Truelogic Software partners with both startups and Fortune 500 companies, offering solutions that align with their clients' time zones and ensuring high-quality outcomes through collaboration and responsiveness. With a presence in over 25 countries, Truelogic emphasizes remote work for better quality of life, and their engineers are experienced in various industries, delivering a wide range of successful projects globally.

501 - 1000 employees

Founded 2004

☁️ SaaS

🤝 B2B

🏢 Enterprise

📋 Description

• Designs, implements, and evolves shared AWS CDK and CDK8s constructs used across multiple services and teams. • Maintains core infrastructure components including VPC, EKS clusters and node groups, RDS, OpenSearch, and MSK. • Operates and extends Kubernetes cluster addons such as ingress controllers, cert-manager, autoscalers, and monitoring/logging stacks. • Ensures high reliability through structured alerting systems (Prometheus, CloudWatch), autoscaling strategies, and recovery mechanisms. • Manages and publishes baseline templates, configuration schemas, and comprehensive documentation for infrastructure usage. • Owns the CI/CD pipelines for Infrastructure as Code (IaC) codebases and platform component releases. • Collaborates with engineering teams to troubleshoot infrastructure-related issues and deliver scalable, reliable solutions. • Applies Site Reliability Engineering (SRE) principles—including SLIs, SLOs, observability, and fault tolerance—to all shared platform services. • Supports IAM roles, secrets management, and tenant isolation best practices.

🎯 Requirements

• Has 5+ years of experience in infrastructure or Site Reliability Engineering (SRE), including hands-on work with AWS services such as VPC, IAM, RDS, MSK, and S3, as well as Kubernetes components like Helm, RBAC, and ServiceAccounts. • Demonstrates fluency in Python and has practical experience with Infrastructure-as-Code using AWS CDK, CDK8s, or equivalent frameworks such as Pulumi. • Possesses a strong understanding of Prometheus, Grafana, and effective alert routing practices. • Has experience designing reusable infrastructure patterns or building internal developer platforms. • Shows a proven track record of improving system reliability through automation, monitoring, and operational best practices. • Has experience supporting Spark on Kubernetes, Argo, or Kafka-based batch pipelines.

🏖️ Benefits

• 100% Remote Work: Enjoy the freedom to work from the location that helps you thrive. All it takes is a laptop and a reliable internet connection. • Highly Competitive USD Pay: Earn an excellent, market-leading compensation in USD, that goes beyond typical market offerings. • Paid Time Off: We value your well-being. Our paid time off policies ensure you have the chance to unwind and recharge when needed. • Work with Autonomy: Enjoy the freedom to manage your time as long as the work gets done. Focus on results, not the clock. • Work with Top American Companies: Grow your expertise working on innovative, high-impact projects with Industry-Leading U.S. Companies.

Apply Now

Similar Jobs

October 31

Senior DevOps Engineer responsible for CI/CD pipeline implementation and DevOps best practices at a leading insurance provider. Collaborating with software development and operations for high-quality software delivery.

AWS

Azure

Cloud

Docker

Java

Jenkins

Kubernetes

Python

Terraform

October 31

Site Reliability Engineer in a major insurance firm focusing on infrastructure availability and reliability. Engaging in troubleshooting, chaos testing, and observability for enhanced resilience.

AWS

Azure

Cloud

Google Cloud Platform

Java

October 31

Arrow Components

10,000+ employees

Senior DevOps Engineer at Arrow Electronics automating application and infrastructure delivery. Collaborating with global teams to design processes and workflows to improve time to market.

Ansible

Azure

Cloud

Kubernetes

Linux

Microservices

Packer

Perl

Terraform

October 29

Software Support Engineer Level 2 at Axented, working with AWS and DevOps tools. Collaborating with cross-functional teams in real-time to solve critical technical issues.

🗣️🇪🇸 Spanish Required

AWS

Cloud

Docker

EC2

Google Cloud Platform

Kubernetes

Prometheus

Python

Terraform

TypeScript

October 13

DevOps Engineer at BayRock Labs managing CI/CD pipelines and cloud infrastructure using AWS. Collaborating with teams to optimize system performance and maintain documentation for processes.

AWS

Cloud

Docker

EC2

Grafana

Jenkins

Kubernetes

Microservices

Prometheus

Python

Terraform

Built by Lior Neu-ner. I'd love to hear your feedback — Get in touch via DM or support@remoterocketship.com