Senior Site Reliability Engineer

🕒 March 20

Apply Now
Find Similar Remote Jobs

📊 Check your resume score for this job

Improve your chances of getting an interview by checking your resume score before you apply.

Logo of Latitude.sh

Latitude.sh

51 - 200 employees

Founded 2001

🎮 Gaming

💳 Fintech

Cloud • Gaming • Fintech

Latitude. sh is a global cloud platform that offers bare metal servers for high-performance computing needs. Designed to support applications for startups, enterprises, and industries such as gaming and generative AI, Latitude. sh provides users with the flexibility to easily deploy, manage, and scale their infrastructure. The platform bridges the gap between traditional cloud services and on-premise solutions, allowing users to benefit from dedicated resources while optimizing performance and cost.

📋 Description

• Continuously improve Latitude.sh’s platform reliability and performance • Design, build, and maintain tools to automate operational tasks and incident response • Implement and improve observability solutions, including monitoring, alerting, and tracing • Collaborate with engineering and platform teams to design scalable and resilient systems • Participate in on-call rotations and lead post-incident reviews with a focus on learning • Develop and document processes and runbooks that ensure operational excellence • Contribute to SLOs/SLIs definition and reliability metrics adoption across teams

🎯 Requirements

• Strong verbal and written English communication skills • Advanced knowledge of Linux/Unix systems in production environments • Experience with Kubernetes and container orchestration • Proficiency with infrastructure automation tools (e.g., Terraform, Ansible) • Experience with observability stacks (e.g., Prometheus, Grafana, Loki, ELK) • Familiarity with scripting and programming languages such as Bash, Python, Go, or Ruby • Working knowledge of Git and CI/CD pipelines • Solid understanding of incident management and root cause analysis processes • Knowledge of cloud-native reliability and security best practices

🏖️ Benefits

• Paid Time Off • Competitive Compensation • Annual Bonus based on company and team performance • Flexible work hours • Opportunities for professional growth and development

Apply Now

Similar Jobs

🕒 March 20

NBCUniversal

10,000+ employees

📱 Media

Staff Software Engineer overseeing day-to-day operational support of SAP BTP applications at NBCUniversal. Collaborating with onsite teams to enhance engineering strategies and manage production deployments.

AEM

Cloud

SOAP

Go

🕒 March 20

Goldstone Partners, Inc.

1 - 10

🎯 Recruiter

👥 HR Tech

🤝 B2B

DevOps Platform Engineer managing CI/CD for SaaS products at The Regis Company. Focused on platform reliability and cloud infrastructure management in a remote work environment.

AWS

Azure

Cloud

Google Cloud Platform

Grafana

Kubernetes

Linux

Prometheus

Python

Terraform

🕒 March 20

Docusign

5001 - 10000

🛍️ eCommerce

💸 Finance

☁️ SaaS

Senior Site Reliability Engineer at Docusign managing critical systems and driving reliability initiatives. Collaborating with teams to enhance observability and incident response for high-impact services across cloud environments.

AWS

Azure

Cloud

Distributed Systems

DNS

Google Cloud Platform

Grafana

Java

Linux

Prometheus

Python

Go

🕒 March 19

Upstart

1001 - 5000

Senior Software Engineer leading technical direction and large initiatives at Upstart. Focusing on building consumer-facing systems and evolving platform architecture.

Distributed Systems

🕒 March 19

Weekday (YC W21)

11 - 50

☁️ SaaS

🎯 Recruiter

DevOps Engineer constructing and managing cloud infrastructure for Weekday's clients. Automating deployments and ensuring system reliability in a security-oriented organization.

Cloud

Terraform