Site Reliability Engineer, Monitoring and Control Engineering

Media • Entertainment

NBCUniversal is a leading global media and entertainment company known for creating and distributing content across a variety of platforms. With over 100 years of experience, it is a part of Comcast and encompasses brands like Peacock, NBC Sports, and many others to educate, entertain, and empower audiences around the world. The company is involved in television broadcasting, film production, and theme parks, and is also recognized for its initiatives in technology and corporate social responsibility. NBCUniversal is committed to innovation and social impact, making it a vibrant workplace for media and tech professionals.

10,000+ employees

Founded 2004

📱 Media

Site Reliability Engineer, Monitoring and Control Engineering

2 days ago

🦌 Connecticut – Remote

💵 $110k - $145k / year

⏰ Full Time

🟡 Mid-level

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

🦅 H1B Visa Sponsor

Ansible

AWS

Azure

Chef

Cloud

Docker

Google Cloud Platform

Grafana

Kubernetes

Linux

Node.js

Python

React

SaltStack

Splunk

Terraform

TypeScript

Apply Now

NBCUniversal

Media • Entertainment

10,000+ employees

Founded 2004

📱 Media

📋 Description

• Utilize scripting and automation to develop, customize and enhance monitoring/alerting tools for “on-air” environments • Interact with automated monitoring infrastructure to ensure healthy environments • Create system dashboards that improve system availability and reliability • Query data stores to quantify the scope of reported issues • Create new metrics and identify monitoring deliverables to improve site reliability • Act as a Level 2 resource, drive and own investigations related to Broadcast issues and report back findings in a timely manner to leadership and operations. • This role requires on-call 24/7 support on a rotating shift schedule • Follow up with team members & 3rd party vendors if issues found cannot be solved and drive vendors for root cause and solutions if possible. • Create comprehensive documentation outlining the intricacies of encountered issue, elucidating the root cause and steps for effective issue resolution. • Administer monitoring and control systems within the “on-air” environments • Develop proof of concept deployments for evaluation of products and architectures • Utilize modern frameworks and scripting languages to develop products and services for NBCU's IP video distribution environment

🎯 Requirements

• Bachelor’s degree in computer science or related degree • Experience with IP video and broadcast technologies • 3-5+ yrs experience with monitoring and alerting tools i.e. Grafana, Splunk, ELK Stack, Dataminer • Ability to develop end-to-end monitoring dashboards, alerts and reports for enterprise level environments • 3-5 years of SRE experience in the technology sector supporting and maintaining production-quality software or software-defined infrastructure in a high traffic environment run in a cloud environments (AWS preferred) • Ability to collect data from various systems using COTS APIs • Experience with scripting languages and tools i.e C#, Python, Bash • Experience with modern frontend technologies like Vite, React, NodeJS, Typescript • Experience with configuration management technology i.e. Ansible, Salt, and/or Chef • Experience with public cloud platforms such as AWS, GCP or Azure • Experience with networking and cloud-based network environments • Experience with containerization Docker & Kubernetes • Experience with CI/CD build (Github Actions), deployment practices, and Infrastructure as Code (Terraform) • Experience in administrating Linux and Windows environments • Ability to use Agile process for project management, development & tracking • Comfortable working in a fast-paced agile environment. Requirements change quickly and our team needs to adapt to moving targets.

🏖️ Benefits

• medical, dental, and vision insurance • 401(k) • paid leave • tuition reimbursement • various other discounts and perks

Apply Now

Similar Jobs

DevOps Engineer

2 days ago

Blue Acorn iCi

201 - 500

🛍️ eCommerce

🏢 Enterprise

DevOps Engineer supporting Adobe Experience Manager and Adobe Commerce environments. Designing CI/CD pipelines, automating infrastructure, and guiding clients on best practices within a digital solutions team.

🇺🇸 United States – Remote

⏰ Full Time

🟡 Mid-level

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

🦅 H1B Visa Sponsor

AEM

Ansible

Apache

AWS

Azure

Chef

Docker

Google Cloud Platform

Java

Jenkins

Kubernetes

Linux

MariaDB

NGINX

Perl

PHP

Puppet

Python

Redis

SDLC

Splunk

Terraform

Unix

DevOps Engineer

2 days ago

Afero

11 - 50

☁️ SaaS

🔐 Security

DevOps Engineer designing and implementing cloud infrastructure on GCP for Afero, a leading PaaS for IoT manufacturers. Collaborating with software engineers to manage applications and enhance system security.

🇺🇸 United States – Remote

💵 $200k - $230k / year

💰 $50M Series C on 2021-12

⏰ Full Time

🟠 Senior

🔴 Lead

⛑ DevOps & Site Reliability Engineer (SRE)

BigQuery

Cloud

Docker

Google Cloud Platform

IoT

Jenkins

Kubernetes

Python

Terraform

DevOps/DevSecOps Engineer – Azure, IaC, Security, Automation

2 days ago

Avaya

5001 - 10000

🤝 B2B

DevOps / DevSecOps Engineer designing, automating, and securing cloud infrastructure across Azure environments. Collaborating with engineering, security, and operations teams for compliance and reliability.

🇺🇸 United States – Remote

💵 $101.2k - $136.6k / year

💰 Post-IPO Debt on 2022-06

⏰ Full Time

🟡 Mid-level

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

🦅 H1B Visa Sponsor

Ansible

Azure

Cloud

Grafana

Jenkins

Kubernetes

Prometheus

Python

Terraform

Vault

Senior Site Reliability Engineer, Platform Engineering

2 days ago

Vultr

51 - 200

🤖 Artificial Intelligence

Senior Site Reliability Engineer at Vultr specializing in cloud infrastructure solutions and observability automation. Collaborating with engineering teams to enhance reliability and scalability of services.

🇺🇸 United States – Remote

💵 $120k - $130k / year

⏰ Full Time

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

Ansible

Chef

Grafana

Graphite

Kafka

Kubernetes

Puppet

Python

SaltStack