Senior DevOps Engineer

🕒 February 2

Apply Now
Find Similar Remote Jobs

📊 Check your resume score for this job

Improve your chances of getting an interview by checking your resume score before you apply.

Logo of Shuru

Shuru

51 - 200 employees

Founded 2021

🤖 Artificial Intelligence

🤝 B2B

🏢 Enterprise

Artificial Intelligence • B2B • Enterprise

Shuru is a product, AI, and technology consulting firm that partners with businesses to deliver strategic consulting, full-cycle product and custom software development, and curated engineering team extension. Their AI-native engineering teams build scalable AI applications, data engineering and analytics, cloud/DevOps, and API integrations to modernize systems and accelerate product delivery. Shuru operates globally with a remote-first model and emphasizes high ownership, design thinking, and measurable outcomes for enterprise and startup clients.

📋 Description

• Kubernetes platform engineering (EKS-first) ● Design, build, and operate production-grade Kubernetes clusters (multi-nodegroup, autoscaling, upgrades, cluster add-ons). • Implement intelligent autoscaling using real metrics (queue depth, consumer lag, service latency) via tools like KEDA/Karpenter. • Own AWS environments end-to-end (VPC, IAM, EKS/ECS/EC2, ALB/ELB, S3, Route53, CloudWatch, RDS, SQS, Lambda). • Build reproducible infrastructure using Terraform, with strong review + change management practices. • Implement backup/DR patterns (e.g., snapshots, retention, automation) and safe rollouts. • Design infrastructure for data-intensive workloads: high-throughput ingestion, batch processing, and real-time streaming. • Understand and operate distributed systems at scale — consensus, partitioning, replication, and failure modes. • Build and maintain infrastructure for data pipelines, vector databases. • Design for horizontal scalability, ensuring systems handle growing data volumes and user traffic gracefully. • Build/own monitoring + logging from scratch and make it actionable (Prometheus/Grafana, ELK/EFK, alerting). • Define/partner on SLI/SLOs and incident response practices; improve reliability with data-driven changes. • Establish performance testing and production-like load testing environments. • Continuously reduce AWS spend via right-sizing, Spot strategies, reserved capacity planning, and architecture improvements. • Partner with engineering teams to diagnose bottlenecks (db queries, caching, queueing) and propose scalable solutions. • Optimize infrastructure costs for data-heavy workloads (storage tiering, compute scheduling, GPU utilization). • Improve cloud and cluster security posture (IAM, network policies, secrets management, least privilege). • Support SOC2 readiness/execution (controls, evidence automation, operational hardening). • Implement access management patterns.

🎯 Requirements

• 7+ years in DevOps / SRE / Cloud Infra roles operating production systems. • Deep hands-on experience with Kubernetes in production. • Strong AWS fundamentals across compute/networking/storage/identity, including VPC, IAM, EC2/EKS, ALB, S3, Route53, CloudWatch, RDS, SQS. • Proven ability to build infra using Terraform (and strong IaC practices). • Production-grade observability experience: Prometheus + Grafana, and centralized logging (ELK/EFK or similar). • Experience scaling product infrastructure — you've grown systems from thousands to millions of requests, and understand capacity planning, bottleneck identification, and scaling patterns. • Solid understanding of distributed systems concepts: CAP theorem, consistency models, partitioning strategies, distributed consensus, and failure handling. • Strong understanding of databases and performance fundamentals. • CI/CD experience building reliable pipelines (Jenkins/Spinnaker/GitHub Actions equivalents), with safe deployment strategies. • Scripting/automation ability in Python and/or Bash (Go is a plus).

🏖️ Benefits

• Competitive salary and benefits package. • Opportunity to work with a team of experienced product and tech leaders. • A flexible work environment with remote working options. • Continuous learning and development opportunities. • Chance to make a significant impact on diverse and innovative projects.

Apply Now

Similar Jobs

🕒 January 30

G2i Inc.

11 - 50

🎯 Recruiter

🏢 Enterprise

☁️ SaaS

DevOps Engineer managing AWS infrastructure and CI/CD pipelines for a remote team. Collaborating on security, automation, and future-proofing for container orchestration.

Ansible

AWS

Kubernetes

Postgres

Python

Terraform

🕒 January 30

Signalmash

51 - 200

📡 Telecommunications

🔌 API

☁️ SaaS

DevOps Engineer managing production infrastructure and optimizing cloud operations for a telecom+AI company. Design, deploy, and maintain applications while ensuring security and cost efficiency.

Cloud

DNS

Docker

Google Cloud Platform

Kubernetes

Linux

Python

🕒 January 28

Mashgin

11 - 50

🤖 Artificial Intelligence

🛒 Retail

🏢 Enterprise

Deployment Engineer responsible for technology installations across the country for Mashgin's AI kiosks. Involves traveling, troubleshooting, and customer support for successful deployments.

🕒 January 27

Whitespace

1 - 10

🔐 Security

🤖 Artificial Intelligence

📋 Compliance

Senior DevSecOps Engineer improving cybersecurity posture and supporting compliance for federal requirements in the U.S. Working remotely with less than 10% travel.

Ansible

AWS

Azure

Cloud

Docker

Google Cloud Platform

Kubernetes

OpenShift

Python

Terraform

🕒 January 27

Sardine

51 - 200

🔒 Cybersecurity

📋 Compliance

💳 Fintech

DevOps Engineer at Sardine evolving infrastructure and platform tooling for fraud prevention. Collaborating cross-functionally to enhance reliability, scalability, and cost efficiency.

Cloud

Distributed Systems

Google Cloud Platform

Kubernetes

Prometheus

Python

Terraform

Go