Site Reliability Engineer II

Job not on LinkedIn

October 9

Apply Now
Logo of Atlan Stormwater

Atlan Stormwater

Engineering • Environmental Services • Sustainability

Atlan Stormwater is a company dedicated to providing comprehensive stormwater management solutions, focusing on both water quantity and quality. Established in 1972, Atlan manufactures a range of products including stormwater detention and retention systems, hydrocarbon capture technologies, and gross pollutant traps. The company emphasizes sustainable practices by integrating green infrastructure into urban environments to enhance water filtration, mitigate flooding, and promote clean waterways for communities. Atlan also offers design assistance and maintenance services to ensure optimal performance of their systems over time.

51 - 200 employees

Founded 1972

📋 Description

• Own and operate end-to-end reliability for critical systems — from alert triage and incident resolution to long-term preventive improvements. • Proactively manage incidents within defined SLAs (60 mins for Critical, 180 mins for High) and ensure smooth collaboration across teams during resolution. • Enhance observability by improving monitoring systems, refining alerts, and reducing noise to focus on what truly matters. • Automate operations and incident workflows to eliminate manual toil, improving speed, consistency, and reliability. • Collaborate across teams — work with Platform, Observability, and Product Engineering teams to strengthen uptime and service stability. • Contribute to documentation and playbooks, ensuring that every incident drives learning, process improvement, and team efficiency.

🎯 Requirements

• Proven experience managing alerts, incidents, and root cause analyses in production environments. • Hands-on knowledge of cloud platforms (AWS, GCP, or Azure) and Kubernetes — including networking, deployments, and troubleshooting. • Familiarity with monitoring and observability tools such as Prometheus, Grafana, ELK/EFK, or Datadog. • Ability to automate repetitive operational tasks using scripting (Python, Bash, or Shell). • Strong communication and collaboration skills — especially in distributed or remote-first teams. • A mindset of ownership, curiosity, and calm under pressure — you thrive in incident response and turn challenges into learning opportunities.

🏖️ Benefits

• Real impact from Day 1: Your work directly shapes reliability for thousands of users across the globe. • Modern tech stack: Work with cutting-edge tools — Kubernetes, Terraform, Prometheus, Datadog, and more. • Learning culture: Collaborate with world-class platform engineers and senior SREs who believe in mentorship and continuous growth. • Autonomy & trust: Freedom to experiment, improve, and own your work end-to-end. • Clear growth path: Grow from SRE II → Senior SRE → Senior SRE II → Staff SRE → Principal SRE as you expand your technical depth and ownership scope.

Apply Now

Similar Jobs

October 8

Akamai Technologies

5001 - 10000

🔒 Cybersecurity

Senior Site Reliability Engineer analyzing system performance to shape product direction at Akamai. Collaborating on automation and network deployment for a global customer base.

🇮🇳 India – Remote

💰 Post-IPO Equity on 2001-07

⏰ Full Time

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

October 7

Akamai Technologies

5001 - 10000

🔒 Cybersecurity

Senior Site Reliability Engineer optimizing performance and reliability of Akamai's global network systems. Engaging in operations work to enhance operational efficiency and reliability.

🇮🇳 India – Remote

💰 Post-IPO Equity on 2001-07

⏰ Full Time

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

September 26

3Pillar Global

1001 - 5000

☁️ SaaS

🏢 Enterprise

🤖 Artificial Intelligence

Lead DevOps and cloud infrastructure (AWS) for 3Pillar product teams. Mentor engineers, build CI/CD, IaC, monitoring, and scalable deployments.

🇮🇳 India – Remote

💰 Private Equity Round on 2021-10

⏰ Full Time

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

September 23

Napses Technologies

51 - 200

☁️ SaaS

🤖 Artificial Intelligence

🛍️ eCommerce

Manage AWS infrastructure, Terraform provisioning, CI/CD, and system reliability for a remote-first technology company. Ensure monitoring, security compliance, and user/admin tooling.

🇮🇳 India – Remote

⏰ Full Time

🟢 Junior

🟡 Mid-level

⛑ DevOps & Site Reliability Engineer (SRE)

🚫👨‍🎓 No degree required

September 22

3Pillar Global

1001 - 5000

☁️ SaaS

🏢 Enterprise

🤖 Artificial Intelligence

Senior Azure Infrastructure and Cloud Ops Engineer building and operating Azure infrastructure using Terraform, networking, firewalls, gateways, container apps, and CI/CD for 3Pillar Global.

🇮🇳 India – Remote

💰 Private Equity Round on 2021-10

⏰ Full Time

🟡 Mid-level

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

Developed by Lior Neu-ner. I'd love to hear your feedback — Get in touch via DM or support@remoterocketship.com