Principal Site Reliability Engineer

Job not on LinkedIn

đŸ”„ 2 minutes ago

Apply Now
Find Similar Remote Jobs

📊 Check your resume score for this job

Improve your chances of getting an interview by checking your resume score before you apply.

Logo of DraftKings Inc.

DraftKings Inc.

1001 - 5000 employees

Founded 2012

🎼 Gaming

⚜ Sports

đŸ‘„ B2C

Gaming ‱ Sports ‱ B2C

DraftKings Inc. is a global company known for providing innovative products and experiences primarily in the sports betting and fantasy sports sectors. The company boasts a strong presence across multiple countries, aiming to deliver exceptional customer moments and overcoming challenges through teamwork and persistence. At DraftKings, innovation in engineering, analytics, and product development is key, with a focus on creating unforgettable customer experiences in sportsbook and casino operations. The company emphasizes a dynamic work culture, inclusion, equity, and global collaboration within its diverse teams.

📋 Description

‱ Define and execute the long-term strategy for our Kubernetes platform across Google Kubernetes Engine, Amazon Elastic Kubernetes Service, RKE2, and on-premise environments, ensuring reliability, scalability, and operational consistency. ‱ Drive architectural decisions across critical infrastructure, including cluster lifecycle management, networking, identity and access management, observability, autoscaling, capacity planning, and cost optimization. ‱ Lead large-scale platform initiatives across multiple engineering teams, establishing technical direction, engineering standards, and measurable outcomes that improve platform reliability and developer experience. ‱ Establish and evolve reliability practices by defining service level objectives, service level indicators, and error budget frameworks that align platform performance with business priorities. ‱ Build automation-first infrastructure through Infrastructure as Code, GitOps workflows, self-healing systems, and internal platform tooling that improve engineering velocity and reduce operational overhead. ‱ Champion the responsible adoption of AI-powered engineering capabilities that improve operational efficiency, accelerate incident response, and enhance developer productivity. ‱ Lead critical platform incidents, drive post-incident improvements, and strengthen platform resilience through automation, capacity planning, and operational excellence. ‱ Mentor senior engineers, influence technical strategy across the organization, and elevate engineering excellence through architecture reviews, coaching, and technical leadership.

🎯 Requirements

‱ A Bachelor's Degree in Computer Science or a related technical field. ‱ At least 8 years of experience designing, operating, and scaling distributed cloud and on-premise infrastructure, including at least 3 years operating at the Staff, Principal, or equivalent technical leadership level. ‱ Proven experience leading large-scale infrastructure or platform initiatives that require cross-functional alignment and long-term technical ownership. ‱ Deep expertise with Kubernetes, including cluster architecture, networking, storage, security, operators, lifecycle management, and large-scale production operations. ‱ Extensive experience building and operating production infrastructure in AWS and Google Cloud Platform using Infrastructure as Code technologies such as Terraform, Pulumi, or similar tools. ‱ Strong software development experience in Go, Python, or both, with expertise in GitOps, continuous integration and continuous delivery, observability, distributed systems, Linux, and reliability engineering principles. ‱ Experience incorporating AI-powered tools into engineering workflows while applying sound judgment around reliability, security, and operational risk. ‱ Exceptional communication and leadership skills with a proven ability to mentor engineers, influence technical strategy, and drive engineering excellence. ‱ Experience working in regulated industries, hybrid cloud environments, contributing to open-source projects, or holding cloud certifications is preferred.

đŸ–ïž Benefits

‱ Bonuses ‱ Equity ‱ Benefits as applicable

Apply Now

Similar Jobs

đŸ”„ 3 hours ago

Convoso

201 - 500

đŸ€ B2B

Director of DevOps leading a team of engineers at Convoso, an AI-powered contact center platform. Responsible for developing and optimizing the platform and ensuring service reliability.

Ansible

AWS

Chef

Cloud

Docker

Google Cloud Platform

Jenkins

Kubernetes

Linux

Puppet

SaltStack

SDLC

đŸ”„ 8 hours ago

FluidStack

11 - 50

đŸ€– Artificial Intelligence

Principal Operations Engineer overseeing critical operations in data centers for Fluidstack. Leading on-call escalation, root cause analysis, and operational excellence in real-time situations.

🕒 2 days ago

General Dynamics Information Technology

10,000+ employees

🔒 Cybersecurity

đŸ€– Artificial Intelligence

Site Reliability Engineer blending software engineering, automation, and operations expertise. Building scalable platforms and enabling high-velocity delivery for critical Defense systems.

Cloud

Distributed Systems

Grafana

Kubernetes

Linux

Prometheus

Python

Splunk

🕒 3 days ago

Coinbase

1001 - 5000

₿ Crypto

💾 Finance

💳 Fintech

Staff Software Engineer responsible for enhancing reliability and security in production environments. Collaborating on projects to scale systems at Coinbase.

AWS

Azure

Cloud

Google Cloud Platform

Ruby

Terraform

Go

🕒 4 days ago

Quantiphi

1001 - 5000

đŸ€– Artificial Intelligence

🏱 Enterprise

📚 Education

Senior DevOps/Observability Engineer leading the design of a unified observability platform. Focused on architecting a sophisticated observability pipeline leveraging AWS technologies.

AWS

Grafana

Kubernetes

Prometheus

Splunk

Terraform