Site Reliability Engineer II

Engineering • Environmental Services • Sustainability

Atlan Stormwater is a company dedicated to providing comprehensive stormwater management solutions, focusing on both water quantity and quality. Established in 1972, Atlan manufactures a range of products including stormwater detention and retention systems, hydrocarbon capture technologies, and gross pollutant traps. The company emphasizes sustainable practices by integrating green infrastructure into urban environments to enhance water filtration, mitigate flooding, and promote clean waterways for communities. Atlan also offers design assistance and maintenance services to ensure optimal performance of their systems over time.

51 - 200 employees

Founded 1972

Site Reliability Engineer II

Job not on LinkedIn

October 9

🇮🇳 India – Remote

⏰ Full Time

🟡 Mid-level

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

AWS

Azure

Cloud

Google Cloud Platform

Grafana

Kubernetes

Prometheus

Python

Apply Now

Atlan Stormwater

Engineering • Environmental Services • Sustainability

51 - 200 employees

Founded 1972

📋 Description

• Own and operate end-to-end reliability for critical systems — from alert triage and incident resolution to long-term preventive improvements. • Proactively manage incidents within defined SLAs (60 mins for Critical, 180 mins for High) and ensure smooth collaboration across teams during resolution. • Enhance observability by improving monitoring systems, refining alerts, and reducing noise to focus on what truly matters. • Automate operations and incident workflows to eliminate manual toil, improving speed, consistency, and reliability. • Collaborate across teams — work with Platform, Observability, and Product Engineering teams to strengthen uptime and service stability. • Contribute to documentation and playbooks, ensuring that every incident drives learning, process improvement, and team efficiency.

🎯 Requirements

• Proven experience managing alerts, incidents, and root cause analyses in production environments. • Hands-on knowledge of cloud platforms (AWS, GCP, or Azure) and Kubernetes — including networking, deployments, and troubleshooting. • Familiarity with monitoring and observability tools such as Prometheus, Grafana, ELK/EFK, or Datadog. • Ability to automate repetitive operational tasks using scripting (Python, Bash, or Shell). • Strong communication and collaboration skills — especially in distributed or remote-first teams. • A mindset of ownership, curiosity, and calm under pressure — you thrive in incident response and turn challenges into learning opportunities.

🏖️ Benefits

• Real impact from Day 1: Your work directly shapes reliability for thousands of users across the globe. • Modern tech stack: Work with cutting-edge tools — Kubernetes, Terraform, Prometheus, Datadog, and more. • Learning culture: Collaborate with world-class platform engineers and senior SREs who believe in mentorship and continuous growth. • Autonomy & trust: Freedom to experiment, improve, and own your work end-to-end. • Clear growth path: Grow from SRE II → Senior SRE → Senior SRE II → Staff SRE → Principal SRE as you expand your technical depth and ownership scope.

Apply Now

Similar Jobs

Senior Site Reliability Engineer

October 8

Akamai Technologies

5001 - 10000

🔒 Cybersecurity

Senior Site Reliability Engineer analyzing system performance to shape product direction at Akamai. Collaborating on automation and network deployment for a global customer base.

🇮🇳 India – Remote

💰 Post-IPO Equity on 2001-07

⏰ Full Time

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

Distributed Systems

DNS

Kubernetes

Linux

Python

Senior Site Reliability Engineer

October 7

Akamai Technologies

5001 - 10000

🔒 Cybersecurity

Senior Site Reliability Engineer optimizing performance and reliability of Akamai's global network systems. Engaging in operations work to enhance operational efficiency and reliability.

🇮🇳 India – Remote

💰 Post-IPO Equity on 2001-07

⏰ Full Time

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

Linux

Python

Unix

Senior DevOps Engineer, AWS

September 26

3Pillar Global

1001 - 5000

☁️ SaaS

🏢 Enterprise

🤖 Artificial Intelligence

Lead DevOps and cloud infrastructure (AWS) for 3Pillar product teams. Mentor engineers, build CI/CD, IaC, monitoring, and scalable deployments.

🇮🇳 India – Remote

💰 Private Equity Round on 2021-10

⏰ Full Time

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

AWS

Cloud

Docker

DynamoDB

EC2

Jenkins

Kubernetes

Linux

MySQL

Prometheus

Python

Ruby

SaltStack

Terraform

.NET

DevOps / SRE / SysAdmin Engineer

September 23

Napses Technologies

51 - 200

☁️ SaaS

🤖 Artificial Intelligence

🛍️ eCommerce

Manage AWS infrastructure, Terraform provisioning, CI/CD, and system reliability for a remote-first technology company. Ensure monitoring, security compliance, and user/admin tooling.

🇮🇳 India – Remote

⏰ Full Time

🟢 Junior

🟡 Mid-level

⛑ DevOps & Site Reliability Engineer (SRE)

🚫👨‍🎓 No degree required

Ansible

AWS

Azure

Cloud

Docker

Google Cloud Platform

JavaScript

Linux

Postgres

Python

Terraform

Cloud Infrastructure Engineer, Azure, DevOps

September 22

3Pillar Global

1001 - 5000

☁️ SaaS

🏢 Enterprise

🤖 Artificial Intelligence

Senior Azure Infrastructure and Cloud Ops Engineer building and operating Azure infrastructure using Terraform, networking, firewalls, gateways, container apps, and CI/CD for 3Pillar Global.

🇮🇳 India – Remote

💰 Private Equity Round on 2021-10

⏰ Full Time

🟡 Mid-level

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

Azure

Cloud

DNS

Terraform