Senior Software Engineer – Site Reliability Engineering

🕒 April 29

Apply Now
Find Similar Remote Jobs

📊 Check your resume score for this job

Improve your chances of getting an interview by checking your resume score before you apply.

Logo of The Home Depot

The Home Depot

10,000+ employees

Founded 1978

🛒 Retail

👥 B2C

💰 Debt Financing on 2007-07

Retail • B2C

The Home Depot is a leading home improvement retailer, offering a wide range of building materials, home improvement products, lawn and garden products, and related services. The company operates both physical stores and an online platform, providing comprehensive solutions for DIY enthusiasts, professional contractors, and homeowners. The Home Depot is committed to diversity, equity, and inclusion, providing employment opportunities and benefits to a diverse workforce. Additionally, the company places a high emphasis on customer service and associate engagement to maintain its position as a trusted leader in the home improvement industry.

📋 Description

• Develops, tests, deploys, and maintains software for internal platforms • Designs, develops, and maintains tools for reliability engineering teams • Extends internal reliability tools using Kubernetes, Terraform on Google Cloud Platform • Deploys and maintains production logging, tracing, and profiling systems • Identifies and automates repetitive operational tasks • Maintains and extends SLO and Critical User Journey platforms • Participates in on-call rotation and contributes to incident response

🎯 Requirements

• 3-5 years of experience in Site Reliability Engineering, Platform Engineering, DevOps, or Infrastructure Engineering • Hands-on experience with Google Cloud Platform (GCP), including GKE, GCS, BigQuery, Cloud Pub/Sub, Cloud Logging, IAM, and Workload Identity. • Strong Kubernetes experience: deploying and managing workloads on GKE or similar managed Kubernetes services, writing and debugging Helm charts, managing namespaces, RBAC, service accounts, and troubleshooting issues • Experience with infrastructure-as-code tools, particularly Terraform for cloud resource management. • Proficiency in one or more of: Go, Python, JavaScript/TypeScript, YAML. • Experience with observability platforms: deploying, configuring, or operating log aggregation, distributed tracing, metrics, dashboarding, or continuous profiling • Practical understanding of SLOs, SLIs, and error budgets. • Experience with synthetic monitoring or performance testing frameworks (k6, Playwright, Selenium, Locust, or similar). • Familiarity with incident management and on-call practices: Blameless post-mortems, runbook development, and incident communication • Experience with CI/CD pipelines using GitHub Actions, Spinnaker, ArgoCD, or similar. • Understanding of deployment strategies (blue/green, canary, rolling).

🏖️ Benefits

• Health insurance • 401(k) matching • Flexible work hours • Paid time off • Remote work options

Apply Now

Similar Jobs

🕒 April 29

Satsuma Technology Ltd

1 - 10

🔌 API

🤖 Artificial Intelligence

🛍️ eCommerce

Senior Site Reliability Engineer managing multi-cloud infrastructure at Satsuma. Ensuring reliability, scalability, and operational posture using AI-assisted development.

AWS

Azure

Cloud

Google Cloud Platform

Grafana

Kubernetes

Terraform

🕒 April 28

Parallel Domain

51 - 200

🤖 Artificial Intelligence

🔌 API

Senior Site Reliability Engineer managing AWS infrastructure and Kubernetes for autonomous systems testing. Collaborating across teams to ensure system reliability and security.

AWS

Cloud

DNS

Grafana

Kubernetes

Linux

Node.js

Prometheus

Python

Terraform

🕒 April 28

Nomi Health

501 - 1000

⚕️ Healthcare Insurance

💸 Finance

☁️ SaaS

Senior Manager of Cloud and DevOps Engineering managing daily operations of AWS and Kubernetes infrastructure across businesses. Leading a team and working closely with senior leadership for operational excellence.

AWS

Cloud

Docker

EC2

Kubernetes

Terraform

🕒 April 28

Sagent

201 - 500

☁️ SaaS

💳 Fintech

Cloud Infrastructure Engineer managing cloud resources for large-scale infrastructure. Supporting development teams in a microservices environment to streamline deployments and optimize performance.

Airflow

Azure

BigQuery

Cloud

DNS

Google Cloud Platform

Grafana

Kafka

Kubernetes

Matillion

Microservices

Postgres

Prometheus

Python

Redis

Spark

SQL

Terraform

Vault

Go

🕒 April 27

Veeam Software

1001 - 5000

☁️ SaaS

🔒 Cybersecurity

🏢 Enterprise

Senior Site Reliability Engineer for Veeam's Government & Sovereign Cloud environments. Building a global SRE function with an emphasis on high availability and operational excellence.

AWS

Azure

Cloud

Dagger

Distributed Systems

Grafana

Java

JavaScript

Kubernetes

Prometheus

Terraform

TypeScript

Go