Principal DevOps Engineer

10,000+ employees

Founded 2004

📱 Media

Media • Entertainment

NBCUniversal is a leading global media and entertainment company known for creating and distributing content across a variety of platforms. With over 100 years of experience, it is a part of Comcast and encompasses brands like Peacock, NBC Sports, and many others to educate, entertain, and empower audiences around the world. The company is involved in television broadcasting, film production, and theme parks, and is also recognized for its initiatives in technology and corporate social responsibility. NBCUniversal is committed to innovation and social impact, making it a vibrant workplace for media and tech professionals.

Principal DevOps Engineer

🔥 35 minutes ago

🗽 New York – Remote

💵 $180k - $230k / year

⏰ Full Time

🔴 Lead

⛑ DevOps & Site Reliability Engineer (SRE)

🦅 H1B Visa Sponsor

AWS

Cloud

DNS

Flux

Grafana

Kubernetes

Node.js

Postgres

Prometheus

Puppet

SQLite

Apply Now

Find Similar Remote Jobs

📊 Check your resume score for this job

Improve your chances of getting an interview by checking your resume score before you apply.

NBCUniversal

10,000+ employees

Founded 2004

📱 Media

Media • Entertainment

📋 Description

• Architect a Kubernetes-native platform that models broadcast infrastructure as custom resources. • Lead the technical strategy leveraging Crossplane compositions and custom Go functions to automate provisioning across multi-account AWS environments and on-prem control rooms. • Design, build, and maintain production-grade Kubernetes operators, controllers, and internal platform APIs in Go. • Actively develop custom Crossplane providers to deeply integrate external enterprise platforms (such as NRCS, Venafi, and Infoblox) into our control plane, managing resource lifecycles and approval workflows. • Lead the design of cloud networking, DNS strategies, and cross-account connectivity across hybrid environments, automating VPC topology and dynamic network routing. • Partner closely with broadcast systems engineers, system integrators, and external vendors to bridge the gap between broadcast hardware and automated infrastructure. • Write RFCs, drive architectural decisions, mentor engineers, and establish high-confidence CI/CD pipelines, testing strategies, and GitHub Actions automation. • Own the platform's authorization model, designing hierarchical RBAC systems, resource identifier schemes, and identity integrations that enforce fine-grained access control. • Drive GitOps-based continuous delivery (Flux, Kustomize, Helm) and manage configuration-as-code for compute fleets using Puppet. • Ensure deep operational visibility by designing comprehensive observability and alerting stacks. • Oversee the integration of remote desktop/VDI connectivity solutions, focusing on session authentication, credential management, and gateway routing.

🎯 Requirements

• 10+ years of experience designing, building, and operating production infrastructure and cloud-native platforms at enterprise scale. • Strong proficiency in Go (systems-level programming, API servers). • Expert-level knowledge of the Kubernetes ecosystem, including CRD/XRD generation, operators, informers, admission webhooks, and RBAC. • Deep production experience with Crossplane, including composite resources, composition functions, and specifically developing custom Crossplane providers in Go to integrate external enterprise platforms. • Extensive production experience with AWS multi-account architectures, cross-account networking patterns, and identity federation. • Production experience with GitOps tooling, specifically Flux (HelmRelease, Kustomization) or ArgoCD for continuous delivery on Kubernetes. • Hands-on experience with Puppet, including module development, PuppetDB, Hiera, and r10k. • Experience designing REST APIs with middleware patterns and modern authentication (OAuth/JWT). • Keen eye for information security, including cross-account IAM trust chains, least-privilege policies, JWT token lifecycles, and secrets abstraction. • Strong background in designing telemetry platforms using Grafana, Prometheus/Mimir, Loki, OpenTelemetry, and metrics collection agents (Alloy, Prometheus Node Exporter). • Working knowledge of PostgreSQL, SQLite or similar relational databases, encompassing schema design, migrations, and query optimization. • Excellent problem-solving skills with a proven ability to present architectural decisions to executives, engage with vendors, and write clear technical documentation.

🏖️ Benefits

• Health insurance • Dental insurance • Vision insurance • 401(k) • Paid leave • Tuition reimbursement • Variety of discounts and perks

Apply Now

Similar Jobs

Director, Clinical Development Operations

🔥 3 hours ago

Syneos Health

10,000+ employees

🧬 Biotechnology

💊 Pharmaceuticals

⚕️ Healthcare Insurance

Director of Clinical Development Operations leading teams at Syneos Health. Responsible for oversight of clinical development and ensuring compliance across studies.

🇺🇸 United States – Remote

💵 $134.4k - $255.4k / year

⏰ Full Time

🔴 Lead

⛑ DevOps & Site Reliability Engineer (SRE)

🦅 H1B Visa Sponsor

Staff Database Reliability Engineer, DBRE

🔥 7 hours ago

Assured

11 - 50

☁️ SaaS

🤖 Artificial Intelligence

Staff Site Reliability Engineer optimizing database systems for tech-driven insurance provider. Leading design, automation, and performance initiatives for a modern claims processing platform.

🇺🇸 United States – Remote

💵 $165k - $185k / year

⏰ Full Time

🔴 Lead

⛑ DevOps & Site Reliability Engineer (SRE)

🦅 H1B Visa Sponsor

Amazon Redshift

Docker

JavaScript

Kubernetes

Node.js

Postgres

Terraform

TypeScript

Azure Cloud Operations Engineer

🕒 Yesterday

Autel Automotive Intelligence USA

51 - 200

Azure Cloud Operations Engineer managing Azure infrastructure at the Port Authority of New York and New Jersey. Focused on optimizing performance, security, and operational efficiency.

🇺🇸 United States – Remote

⏰ Full Time

🟠 Senior

🔴 Lead

⛑ DevOps & Site Reliability Engineer (SRE)

Azure

Cloud

Python

Principal Site Reliability Engineer

🕒 Yesterday

DraftKings Inc.

1001 - 5000

🎲 Gambling

🎮 Gaming

👥 B2C

Principal Site Reliability Engineer shaping the Kubernetes platform and infrastructure strategy at DraftKings. Leading modernization and reliability initiatives across engineering teams.

🇺🇸 United States – Remote

💵 $200k - $250k / year

⏰ Full Time

🔴 Lead

⛑ DevOps & Site Reliability Engineer (SRE)

AWS

Cloud

Distributed Systems

Google Cloud Platform

Kubernetes

Linux

Python

Terraform

Principal Site Reliability Engineer

🕒 Yesterday

DraftKings Inc.

1001 - 5000

🎮 Gaming

⚽ Sports

👥 B2C