Senior DevOps Engineer

51 - 200 employees

Founded 2021

🤖 Artificial Intelligence

🤝 B2B

🏢 Enterprise

Artificial Intelligence • B2B • Enterprise

Shuru is a product, AI, and technology consulting firm that partners with businesses to deliver strategic consulting, full-cycle product and custom software development, and curated engineering team extension. Their AI-native engineering teams build scalable AI applications, data engineering and analytics, cloud/DevOps, and API integrations to modernize systems and accelerate product delivery. Shuru operates globally with a remote-first model and emphasizes high ownership, design thinking, and measurable outcomes for enterprise and startup clients.

Senior DevOps Engineer

🕒 February 2

🇺🇸 United States – Remote

⏰ Full Time

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

AWS

Cloud

Distributed Systems

EC2

Grafana

Jenkins

Kubernetes

Prometheus

Python

Spinnaker

Terraform

Apply Now

Find Similar Remote Jobs

📊 Check your resume score for this job

Improve your chances of getting an interview by checking your resume score before you apply.

Shuru

51 - 200 employees

Founded 2021

🤖 Artificial Intelligence

🤝 B2B

🏢 Enterprise

Artificial Intelligence • B2B • Enterprise

📋 Description

• Kubernetes platform engineering (EKS-first) ● Design, build, and operate production-grade Kubernetes clusters (multi-nodegroup, autoscaling, upgrades, cluster add-ons). • Implement intelligent autoscaling using real metrics (queue depth, consumer lag, service latency) via tools like KEDA/Karpenter. • Own AWS environments end-to-end (VPC, IAM, EKS/ECS/EC2, ALB/ELB, S3, Route53, CloudWatch, RDS, SQS, Lambda). • Build reproducible infrastructure using Terraform, with strong review + change management practices. • Implement backup/DR patterns (e.g., snapshots, retention, automation) and safe rollouts. • Design infrastructure for data-intensive workloads: high-throughput ingestion, batch processing, and real-time streaming. • Understand and operate distributed systems at scale — consensus, partitioning, replication, and failure modes. • Build and maintain infrastructure for data pipelines, vector databases. • Design for horizontal scalability, ensuring systems handle growing data volumes and user traffic gracefully. • Build/own monitoring + logging from scratch and make it actionable (Prometheus/Grafana, ELK/EFK, alerting). • Define/partner on SLI/SLOs and incident response practices; improve reliability with data-driven changes. • Establish performance testing and production-like load testing environments. • Continuously reduce AWS spend via right-sizing, Spot strategies, reserved capacity planning, and architecture improvements. • Partner with engineering teams to diagnose bottlenecks (db queries, caching, queueing) and propose scalable solutions. • Optimize infrastructure costs for data-heavy workloads (storage tiering, compute scheduling, GPU utilization). • Improve cloud and cluster security posture (IAM, network policies, secrets management, least privilege). • Support SOC2 readiness/execution (controls, evidence automation, operational hardening). • Implement access management patterns.

🎯 Requirements

• 7+ years in DevOps / SRE / Cloud Infra roles operating production systems. • Deep hands-on experience with Kubernetes in production. • Strong AWS fundamentals across compute/networking/storage/identity, including VPC, IAM, EC2/EKS, ALB, S3, Route53, CloudWatch, RDS, SQS. • Proven ability to build infra using Terraform (and strong IaC practices). • Production-grade observability experience: Prometheus + Grafana, and centralized logging (ELK/EFK or similar). • Experience scaling product infrastructure — you've grown systems from thousands to millions of requests, and understand capacity planning, bottleneck identification, and scaling patterns. • Solid understanding of distributed systems concepts: CAP theorem, consistency models, partitioning strategies, distributed consensus, and failure handling. • Strong understanding of databases and performance fundamentals. • CI/CD experience building reliable pipelines (Jenkins/Spinnaker/GitHub Actions equivalents), with safe deployment strategies. • Scripting/automation ability in Python and/or Bash (Go is a plus).

🏖️ Benefits

• Competitive salary and benefits package. • Opportunity to work with a team of experienced product and tech leaders. • A flexible work environment with remote working options. • Continuous learning and development opportunities. • Chance to make a significant impact on diverse and innovative projects.

Apply Now

Similar Jobs

DevOps Engineer / Infrastructure Engineer

🕒 January 30

Signalmash

51 - 200

📡 Telecommunications

🔌 API

☁️ SaaS

DevOps Engineer managing production infrastructure and optimizing cloud operations for a telecom+AI company. Design, deploy, and maintain applications while ensuring security and cost efficiency.

🇺🇸 United States – Remote

⏰ Full Time

🟡 Mid-level

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

Cloud

DNS

Docker

Google Cloud Platform

Kubernetes

Linux

Python

DevOps Engineer

🕒 January 27

Sardine

51 - 200

🔒 Cybersecurity

📋 Compliance

💳 Fintech

DevOps Engineer at Sardine evolving infrastructure and platform tooling for fraud prevention. Collaborating cross-functionally to enhance reliability, scalability, and cost efficiency.

🇺🇸 United States – Remote

💵 $160k - $200k / year

⏰ Full Time

🟡 Mid-level

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

Cloud

Distributed Systems

Google Cloud Platform

Kubernetes

Prometheus

Python

Terraform

Senior Site Reliability Engineer, Observability

🕒 December 25, 2025

Chainlink Labs

201 - 500

💸 Finance

💳 Fintech

🌐 Web 3

Senior Site Reliability Engineer at Chainlink focusing on observability and reliability in decentralized finance solutions. Supporting engineering teams and enhancing self-service capabilities for development.

🇺🇸 United States – Remote

⏰ Full Time

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

Distributed Systems

Grafana

Java

Kubernetes

Oracle

Perl

Prometheus

Python

Ruby

Splunk

DevOps Engineer

🕒 December 16, 2025

Cyera

201 - 500

🔒 Cybersecurity

🏢 Enterprise

DevOps Engineer at Cyera designing and optimizing data security infrastructure. Collaborating with cross-functional teams to ensure secure, scalable, and automated environments.

🇺🇸 United States – Remote

⏰ Full Time

🟡 Mid-level

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

AWS

Azure

Cloud

Docker

Google Cloud Platform

Kubernetes

Linux

Python

Terraform

Senior DevSecOps Engineer

🕒 December 15, 2025

Stand Together

5001 - 10000

🤲 Charity

📚 Education

🌍 Social Impact