Principal Site Reliability Engineer, SRE

Education • HR Tech

InStride is a company that partners with organizations to provide career-aligned, debt-free education programs for employees. By offering strategic design and a flexible platform, InStride supports talent development, enhances employee engagement, and improves talent acquisition. Their solutions include tuition reimbursement, diversity, and inclusion initiatives, which help to build a skilled and resilient workforce in a rapidly evolving workplace. InStride's clients experience significant benefits, such as higher promotion rates, increased employee retention, and improved return on investment (ROI). Notable partnerships include collaborations with companies like Amazon to expand education opportunities for employees across various sectors.

51 - 200 employees

📚 Education

👥 HR Tech

Principal Site Reliability Engineer, SRE

September 16

🌵 Arizona – Remote

🏄 California – Remote

+21 more states

💵 $165k - $185k / year

⏰ Full Time

🔴 Lead

⛑ DevOps & Site Reliability Engineer (SRE)

🦅 H1B Visa Sponsor

AWS

Cloud

Grafana

Kubernetes

Prometheus

Python

Terraform

TypeScript

Apply Now

InStride

Education • HR Tech

51 - 200 employees

📚 Education

👥 HR Tech

📋 Description

• Serve as the go-to AWS expert, setting technical direction and raising the bar for operational excellence across the platform • Design and operate multi-region, fault-tolerant systems to ensure InStride’s learning platform availability • Deliver Infrastructure as Code libraries, CI/CD pipelines, and self-service capabilities to reduce operational toil • Implement defense-in-depth strategies, policy-as-code guardrails, and proactive monitoring for security and compliance • Define and enforce SLIs/SLOs and error-budget policies and build monitoring frameworks that inform release readiness • Deploy and manage service mesh solutions to secure, monitor, and optimize service-to-service communication across Kubernetes workloads • Partner with engineering and security stakeholders to shape InStride’s AWS strategy for scalability, resilience, and cost efficiency • Mentor and uplift engineers, lead design reviews, and guide teams toward modern DevOps and SRE practices

🎯 Requirements

• 10+ years of experience in SRE, DevOps, or Platform Engineering roles operating production AWS workloads • Hands-on expertise with AWS EKS, Kubernetes networking, Helm, autoscaling frameworks (Karpenter/Cluster Autoscaler), serverless architectures, and API Gateways • Proven delivery of service mesh solutions (Istio, Linkerd, or AWS App Mesh) • Proficiency with Infrastructure as Code (IaC) using AWS CDK (TypeScript preferred/Python), Terraform, or CloudFormation • Strong programming and automation skills in Go, Python, or TypeScript, with additional proficiency in Bash • Demonstrated experience implementing policy-as-code with OPA/Rego or similar tooling integrated into CI/CD pipelines • Solid understanding of SLI/SLO/error-budget methodologies and hands-on experience with Prometheus, Grafana, CloudWatch, Groundcover • Deep knowledge of AWS security best practices, including IAM policies, encryption, OS hardening, and compliance enforcement • Excellent communication skills with the ability to translate reliability metrics into business impact and guide incident/post-mortem discussions • Experience mentoring engineers and influencing enterprise AWS and DevOps strategies without direct management responsibilities • Familiarity with Internal Developer Portals (Backstage, Port, Cortex) and self-service automation is a strong plus • Candidates must be located in one of the following states to be considered eligible for employment: AZ, CA, CO, CT, FL, GA, IL, IN, KS, LA, MD, MA, MI, MO, NV, NH, NJ, NY, PA, OH, OR, TX, VA, WA, WI

🏖️ Benefits

• Eligible to enroll in 2,800+ online certificate and degree programs through our Step Forward program; InStride covers your tuition upfront, eligible starting Day 1. • 401(k) plan with company match • Flexible vacation policy • Paid family leave • Best-in-class health care benefits • And more!

Apply Now

Similar Jobs

Principal DevOps Engineer

September 12

Veeva Systems

1001 - 5000

☁️ SaaS

⚕️ Healthcare Insurance

💊 Pharmaceuticals

Principal DevOps Engineer designing AWS infrastructure and CI/CD for Veeva Nitro. Leading migration and building scalable environment for life sciences SaaS.

🇺🇸 United States – Remote

💵 $150k - $300k / year

⏰ Full Time

🔴 Lead

⛑ DevOps & Site Reliability Engineer (SRE)

🦅 H1B Visa Sponsor

Ansible

AWS

Cloud

EC2

ElasticSearch

Grafana

Groovy

Jenkins

Kubernetes

Prometheus

Terraform

Principal DevOps Engineer

September 12

Veeva Systems

1001 - 5000

☁️ SaaS

⚕️ Healthcare Insurance

💊 Pharmaceuticals

Principal DevOps Engineer leading AWS infrastructure migration at Veeva Systems, a life sciences SaaS company. Ownership of CI/CD, IaC, tooling, and platform reliability.

🇺🇸 United States – Remote

💵 $150k - $300k / year

⏰ Full Time

🔴 Lead

⛑ DevOps & Site Reliability Engineer (SRE)

🦅 H1B Visa Sponsor

Ansible

AWS

Cloud

EC2

ElasticSearch

Grafana

Groovy

Jenkins

Kubernetes

Prometheus

Terraform

Principal DevOps Engineer

September 12

Veeva Systems

1001 - 5000

☁️ SaaS

⚕️ Healthcare Insurance

💊 Pharmaceuticals

Lead migration and build scalable AWS infrastructure and CI/CD for Veeva Nitro team. Drive IaC, Kubernetes, and platform reliability for life sciences cloud.

🇺🇸 United States – Remote

💵 $150k - $300k / year

⏰ Full Time

🔴 Lead

⛑ DevOps & Site Reliability Engineer (SRE)

🦅 H1B Visa Sponsor

Ansible

AWS

Cloud

EC2

ElasticSearch

Grafana

Groovy

Jenkins

Kubernetes

Prometheus

Terraform

Principal DevOps Engineer

September 12

Veeva Systems

1001 - 5000

☁️ SaaS

⚕️ Healthcare Insurance

💊 Pharmaceuticals

Lead migration and build scalable AWS infrastructure, CI/CD, and IaC for Veeva's life sciences cloud.

🇺🇸 United States – Remote

💵 $150k - $300k / year

⏰ Full Time

🔴 Lead

⛑ DevOps & Site Reliability Engineer (SRE)

🦅 H1B Visa Sponsor

Ansible

AWS

Cloud

EC2

ElasticSearch

Grafana

Groovy

Jenkins

Kubernetes

Prometheus

Terraform

Principal DevOps Engineer

September 12

Veeva Systems

1001 - 5000

☁️ SaaS

⚕️ Healthcare Insurance

💊 Pharmaceuticals

Lead design and migration of scalable AWS infrastructure and CI/CD for Veeva's life‑sciences cloud. Ownership of tooling, reliability, and DevOps best practices.

🇺🇸 United States – Remote

💵 $150k - $300k / year

⏰ Full Time

🔴 Lead

⛑ DevOps & Site Reliability Engineer (SRE)

🦅 H1B Visa Sponsor

Ansible

AWS

Cloud

EC2

ElasticSearch

Grafana

Groovy

Jenkins

Kubernetes

Prometheus

Terraform