Site Reliability Engineer II

Job not on LinkedIn

October 9

Apply Now
Logo of Atlan Stormwater

Atlan Stormwater

Engineering • Environmental Services • Sustainability

Atlan Stormwater is a company dedicated to providing comprehensive stormwater management solutions, focusing on both water quantity and quality. Established in 1972, Atlan manufactures a range of products including stormwater detention and retention systems, hydrocarbon capture technologies, and gross pollutant traps. The company emphasizes sustainable practices by integrating green infrastructure into urban environments to enhance water filtration, mitigate flooding, and promote clean waterways for communities. Atlan also offers design assistance and maintenance services to ensure optimal performance of their systems over time.

51 - 200 employees

Founded 1972

📋 Description

• Own and operate end-to-end reliability for critical systems — from alert triage and incident resolution to long-term preventive improvements. • Proactively manage incidents within defined SLAs (60 mins for Critical, 180 mins for High) and ensure smooth collaboration across teams during resolution. • Enhance observability by improving monitoring systems, refining alerts, and reducing noise to focus on what truly matters. • Automate operations and incident workflows to eliminate manual toil, improving speed, consistency, and reliability. • Collaborate across teams — work with Platform, Observability, and Product Engineering teams to strengthen uptime and service stability. • Contribute to documentation and playbooks, ensuring that every incident drives learning, process improvement, and team efficiency.

🎯 Requirements

• Proven experience managing alerts, incidents, and root cause analyses in production environments. • Hands-on knowledge of cloud platforms (AWS, GCP, or Azure) and Kubernetes — including networking, deployments, and troubleshooting. • Familiarity with monitoring and observability tools such as Prometheus, Grafana, ELK/EFK, or Datadog. • Ability to automate repetitive operational tasks using scripting (Python, Bash, or Shell). • Strong communication and collaboration skills — especially in distributed or remote-first teams. • A mindset of ownership, curiosity, and calm under pressure — you thrive in incident response and turn challenges into learning opportunities.

🏖️ Benefits

• Real impact from Day 1: Your work directly shapes reliability for thousands of users across the globe. • Modern tech stack: Work with cutting-edge tools — Kubernetes, Terraform, Prometheus, Datadog, and more. • Learning culture: Collaborate with world-class platform engineers and senior SREs who believe in mentorship and continuous growth. • Autonomy & trust: Freedom to experiment, improve, and own your work end-to-end. • Clear growth path: Grow from SRE II → Senior SRE → Senior SRE II → Staff SRE → Principal SRE as you expand your technical depth and ownership scope.

Apply Now

Similar Jobs

October 8

Senior Site Reliability Engineer analyzing system performance to shape product direction at Akamai. Collaborating on automation and network deployment for a global customer base.

Distributed Systems

DNS

Kubernetes

Linux

Python

October 7

Senior Site Reliability Engineer optimizing performance and reliability of Akamai's global network systems. Engaging in operations work to enhance operational efficiency and reliability.

Linux

Python

Unix

September 26

Lead DevOps and cloud infrastructure (AWS) for 3Pillar product teams. Mentor engineers, build CI/CD, IaC, monitoring, and scalable deployments.

AWS

Cloud

Docker

DynamoDB

EC2

Jenkins

Kubernetes

Linux

MySQL

Prometheus

Python

Ruby

SaltStack

Terraform

Go

.NET

September 23

Manage AWS infrastructure, Terraform provisioning, CI/CD, and system reliability for a remote-first technology company. Ensure monitoring, security compliance, and user/admin tooling.

Ansible

AWS

Azure

Cloud

Docker

Google Cloud Platform

JavaScript

Linux

Postgres

Python

Terraform

Go

September 22

Senior Azure Infrastructure and Cloud Ops Engineer building and operating Azure infrastructure using Terraform, networking, firewalls, gateways, container apps, and CI/CD for 3Pillar Global.

Azure

Cloud

DNS

Terraform

Built by Lior Neu-ner. I'd love to hear your feedback — Get in touch via DM or support@remoterocketship.com