Staff Site Reliability Engineer

SaaS • Transport • eCommerce

Stord is a cloud supply-chain company that combines fulfillment network operations with integrated SaaS (OMS/WMS) to power omnichannel and DTC e-commerce brands. It provides warehousing, last-mile and transportation services, order routing, inventory management, and consumer-facing delivery tools to improve conversion, speed up fulfillment, and protect brand experience. Stord serves growing B2B and direct-to-consumer brands across food & beverage, beauty, apparel and other retail categories, offering a customizable fulfillment network and software platform to optimize costs and delivery performance.

1001 - 5000 employees

Founded 2015

☁️ SaaS

🚗 Transport

🛍️ eCommerce

Staff Site Reliability Engineer

Job not on LinkedIn

November 19

🇺🇸 United States – Remote

⏰ Full Time

🔴 Lead

⛑ DevOps & Site Reliability Engineer (SRE)

Ansible

Chef

Cloud

Distributed Systems

Docker

Google Cloud Platform

Grafana

Java

Jenkins

Kubernetes

Prometheus

Puppet

Python

Terraform

Apply Now

Stord

SaaS • Transport • eCommerce

1001 - 5000 employees

Founded 2015

☁️ SaaS

🚗 Transport

🛍️ eCommerce

📋 Description

• Lead architecture decisions to deliver scalable and reliable infrastructure, primarily on Google Cloud Platform (GCP) • Implement Infrastructure as Code (IaC) using Terraform, CloudFormation, Pulumi, or similar • Manage containerized environments with Docker and Kubernetes • Drive system performance tuning, capacity planning, and resource optimization • Define and maintain Service Level Objectives (SLOs) and Indicators (SLIs) • Build robust monitoring, alerting, and observability solutions using Prometheus, Grafana, DataDog, or New Relic • Develop and maintain disaster recovery and business continuity strategies • Design and maintain CI/CD pipelines (Jenkins, GitLab CI, GitHub Actions, etc.) • Automate operational workflows and infrastructure provisioning • Implement configuration management with Ansible, Chef, Puppet, or similar tools • Develop custom tooling and scripts to enhance operational efficiency • Partner with engineering teams to improve deployment practices and application reliability • Provide escalation support for production incidents and lead post-incident reviews • Conduct technical design reviews and offer architectural guidance • Mentor junior engineers on SRE and infrastructure best practices • Participate in on-call rotations for critical systems

🎯 Requirements

• 8+ years of experience in site reliability, platform engineering, or infrastructure roles with leadership exposure • Proficiency in at least one programming language (Python, Go, Java, etc.) • Strong hands-on experience with GCP and its core services • Expertise in containerization (Docker) and orchestration (Kubernetes) • Deep knowledge of Infrastructure as Code (Terraform, CloudFormation, etc.) • Skilled in monitoring/observability (Prometheus, Grafana, ELK, etc.) • Solid understanding of networking, load balancing, and distributed systems • Experience with Git and collaborative development workflows

🏖️ Benefits

• Remote work options

Apply Now

Similar Jobs

Staff DevOps Engineer

November 18

Cleerly

201 - 500

⚕️ Healthcare Insurance

🤖 Artificial Intelligence

🧬 Biotechnology

Staff Cloud DevOps Engineer for Cleerly, leading cloud infrastructure and enhancing systems for AI-powered diagnostics. Focused on continuous integration, software delivery, and mentoring junior engineers.

🇺🇸 United States – Remote

💵 $207k - $235k / year

💰 Series C on 2022-07

⏰ Full Time

🔴 Lead

⛑ DevOps & Site Reliability Engineer (SRE)

🦅 H1B Visa Sponsor

AWS

Cloud

DynamoDB

EC2

JavaScript

Kubernetes

Linux

Node.js

Python

Terraform

Staff Software Engineer – SAP BTP CPI SRE

November 14

NBCUniversal

10,000+ employees

📱 Media

Staff Software Engineer overseeing operational support of SAP BTP CPI applications at NBCUniversal. Leading offshore teams and collaborating on production deployments.

🇺🇸 United States – Remote

💵 $140k - $180k / year

⏰ Full Time

🔴 Lead

⛑ DevOps & Site Reliability Engineer (SRE)

🦅 H1B Visa Sponsor

Staff Site Reliability Engineer, Platform Engineering

November 13

Paxos

201 - 500

₿ Crypto

💳 Fintech

🏦 Banking

Staff Site Reliability Engineer at Paxos enhancing cloud infrastructure reliability and scalability. Leading initiatives in Kubernetes, IaC, and cloud services architecture.

🇺🇸 United States – Remote

💵 $179.5k - $211.2k / year

⏰ Full Time

🔴 Lead

⛑ DevOps & Site Reliability Engineer (SRE)

🦅 H1B Visa Sponsor

AWS

Cloud

EC2

Kubernetes

Postgres

Python

Terraform

Release Engineer – Automation, DevOps

November 13

Brillio

1001 - 5000

🤖 Artificial Intelligence

🔒 Cybersecurity

Release Engineer for Brillio driving efficient software build and deployment processes. Collaborating with teams to ensure high-quality releases and streamline operations.

🇺🇸 United States – Remote

💵 $100k - $110k / year

💰 Private Equity Round on 2019-01

⏰ Full Time

🟠 Senior

🔴 Lead

⛑ DevOps & Site Reliability Engineer (SRE)

🦅 H1B Visa Sponsor

Azure

Docker

Grafana

Jenkins

Kubernetes

Python

Subversion

Staff Site Reliability Engineer

November 13

FloSports

201 - 500

Staff SRE at FloSports improving developer enablement and migrating infrastructure to AWS. Leading technical architecture and critical tooling development with a focus on reliability and automation.

🇺🇸 United States – Remote

⏰ Full Time

🔴 Lead

⛑ DevOps & Site Reliability Engineer (SRE)

AWS

Google Cloud Platform

JavaScript

Kubernetes

Node.js

Terraform