Principal Site Reliability Engineer

đŸ”„ 2 minutes ago

Apply Now
Find Similar Remote Jobs

📊 Check your resume score for this job

Improve your chances of getting an interview by checking your resume score before you apply.

Logo of DraftKings Inc.

DraftKings Inc.

1001 - 5000 employees

Founded 2012

đŸŽČ Gambling

🎼 Gaming

đŸ‘„ B2C

Gambling ‱ Gaming ‱ B2C

DraftKings Inc. is a digital sports entertainment company operating a leading online sportsbook, daily fantasy sports, and casino platform that delivers real-money betting and gaming experiences via web and mobile apps. It combines sports data, analytics, and content to engage fans, provides marketing and VIP/loyalty programs, and maintains global teams across engineering, product, compliance, and customer experience while emphasizing responsible gaming.

📋 Description

‱ Define and execute the long-term strategy for our Kubernetes platform across Google Kubernetes Engine, Amazon Elastic Kubernetes Service, RKE2, and on-premise environments, ensuring reliability, scalability, and operational consistency. ‱ Drive architectural decisions across critical infrastructure, including cluster lifecycle management, networking, identity and access management, observability, autoscaling, capacity planning, and cost optimization. ‱ Lead large-scale platform initiatives across multiple engineering teams, establishing technical direction, engineering standards, and measurable outcomes that improve platform reliability and developer experience. ‱ Establish and evolve reliability practices by defining service level objectives, service level indicators, and error budget frameworks that align platform performance with business priorities. ‱ Build automation-first infrastructure through Infrastructure as Code, GitOps workflows, self-healing systems, and internal platform tooling that improve engineering velocity and reduce operational overhead. ‱ Champion the responsible adoption of AI-powered engineering capabilities that improve operational efficiency, accelerate incident response, and enhance developer productivity. ‱ Lead critical platform incidents, drive post-incident improvements, and strengthen platform resilience through automation, capacity planning, and operational excellence. ‱ Mentor senior engineers, influence technical strategy across the organization, and elevate engineering excellence through architecture reviews, coaching, and technical leadership.

🎯 Requirements

‱ A Bachelor's Degree in Computer Science or a related technical field. ‱ At least 8 years of experience designing, operating, and scaling distributed cloud and on-premise infrastructure, including at least 3 years operating at the Staff, Principal, or equivalent technical leadership level. ‱ Proven experience leading large-scale infrastructure or platform initiatives that require cross-functional alignment and long-term technical ownership. ‱ Deep expertise with Kubernetes, including cluster architecture, networking, storage, security, operators, lifecycle management, and large-scale production operations. ‱ Extensive experience building and operating production infrastructure in AWS and Google Cloud Platform using Infrastructure as Code technologies such as Terraform, Pulumi, or similar tools. ‱ Strong software development experience in Go, Python, or both, with expertise in GitOps, continuous integration and continuous delivery, observability, distributed systems, Linux, and reliability engineering principles. ‱ Experience incorporating AI-powered tools into engineering workflows while applying sound judgment around reliability, security, and operational risk. ‱ Exceptional communication and leadership skills with a proven ability to mentor engineers, influence technical strategy, and drive engineering excellence. ‱ Experience working in regulated industries, hybrid cloud environments, contributing to open-source projects, or holding cloud certifications is preferred.

đŸ–ïž Benefits

‱ bonus ‱ equity ‱ benefits as applicable

Apply Now

Similar Jobs

đŸ”„ 2 hours ago

DraftKings Inc.

1001 - 5000

🎼 Gaming

⚜ Sports

đŸ‘„ B2C

Principal Site Reliability Engineer shaping the strategy for Kubernetes platform and driving architectural decisions. Leading platform initiatives at DraftKings with a focus on reliability and automation.

AWS

Cloud

Distributed Systems

Google Cloud Platform

Kubernetes

Linux

Python

Terraform

Go

đŸ”„ 6 hours ago

Convoso

201 - 500

đŸ€ B2B

Director of DevOps leading a team of engineers at Convoso, an AI-powered contact center platform. Responsible for developing and optimizing the platform and ensuring service reliability.

Ansible

AWS

Chef

Cloud

Docker

Google Cloud Platform

Jenkins

Kubernetes

Linux

Puppet

SaltStack

SDLC

đŸ”„ 10 hours ago

FluidStack

11 - 50

đŸ€– Artificial Intelligence

Principal Operations Engineer overseeing critical operations in data centers for Fluidstack. Leading on-call escalation, root cause analysis, and operational excellence in real-time situations.

🕒 2 days ago

General Dynamics Information Technology

10,000+ employees

🔒 Cybersecurity

đŸ€– Artificial Intelligence

Site Reliability Engineer blending software engineering, automation, and operations expertise. Building scalable platforms and enabling high-velocity delivery for critical Defense systems.

Cloud

Distributed Systems

Grafana

Kubernetes

Linux

Prometheus

Python

Splunk

🕒 4 days ago

Coinbase

1001 - 5000

₿ Crypto

💾 Finance

💳 Fintech

Staff Software Engineer responsible for enhancing reliability and security in production environments. Collaborating on projects to scale systems at Coinbase.

AWS

Azure

Cloud

Google Cloud Platform

Ruby

Terraform

Go