Senior Network Reliability Engineer – DGX Cloud

10,000+ employees

Founded 1993

🤖 Artificial Intelligence

🎮 Gaming

Artificial Intelligence • Gaming • Automotive

NVIDIA is a leading technology company specializing in accelerated computing and artificial intelligence. NVIDIA pioneers advancements in graphical processing units (GPUs), cloud computing, data centers, and virtual reality, with a focus on gaming, automotive, healthcare, and robotics industries. The company's innovations, such as NVIDIA Omniverse, transform traditional digital processes by enabling high-fidelity simulations and rendering tasks. Their applications span various industries, from autonomous vehicles using NVIDIA DRIVE to healthcare solutions with NVIDIA Clara, and AI-driven analytics and workflows.

Senior Network Reliability Engineer – DGX Cloud

🕒 May 14

🏄 California – Remote

💵 $136k - $264.5k / year

⏰ Full Time

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

🦅 H1B Visa Sponsor

AWS

Azure

Cloud

DNS

Google Cloud Platform

TCP/IP

Apply Now

Find Similar Remote Jobs

📊 Check your resume score for this job

Improve your chances of getting an interview by checking your resume score before you apply.

NVIDIA

10,000+ employees

Founded 1993

🤖 Artificial Intelligence

🎮 Gaming

Artificial Intelligence • Gaming • Automotive

📋 Description

• Engage in 24/7 global shift rotations to provide remote support for network repairs and changes while collaborating across teams and updating customers on status and ticket information. • Drive operational improvements in change management and daily operations by following procedures. • Manage and operate large scale IP network technologies and infrastructures. • Utilize your skills in Peering and Datacenter interconnect technologies: PNI, Transit, Exchange, Passive DWDM, Wave circuits. • Monitor and support the network health of on-premises and cloud infrastructures. • Collaborate and develop workflow enhancements while documenting best practices.

🎯 Requirements

• Deep knowledge and experience of TCP/IP, BGP, OSPF, MPLS, IS-IS, VxLAN, EVPN, QoS, GRE, IPsec, DNS, and MACsec. • 5+ years of experience in network operations. • Skilled in network troubleshooting techniques and demonstrating creative problem-solving abilities. • Strong track record of alert response within defined SLAs and Incident management. • Experience with one or more of the following CSP environments: AWS, Azure, GCP, OCI. • Familiarity with Arista, Fortinet and Juniper. • Hands-on experience with contributing to tooling and automation for provisioning, monitoring, and managing complex network infrastructures. • Bachelor’s degree in Computer Science, related technical field, or equivalent experience. • Excellent verbal and written communication skills.

🏖️ Benefits

• equity • benefits

Apply Now

Similar Jobs

Site Reliability Engineer – Azure, DevSecOps, IaC, Governance, Observability

🕒 May 14

Avaya

5001 - 10000

🤝 B2B

Site Reliability Engineer at Avaya driving stability and performance across Azure and GCP platforms. Collaborating with DevOps and Security teams to manage incidents and optimize operations.

🇺🇸 United States – Remote

💵 $129k - $143k / year

💰 Post-IPO Debt on 2022-06

⏰ Full Time

🟡 Mid-level

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

🦅 H1B Visa Sponsor

Ansible

Azure

Cloud

Google Cloud Platform

Terraform

Senior Site Reliability Engineer, Infrastructure Foundations

🕒 May 13

Wikimedia Foundation

501 - 1000

🤝 Non-profit

📚 Education

📱 Media

Senior Site Reliability Engineer with Wikimedia Foundation supporting platform for Wikipedia. Focus on operational tasks, collaboration, and continual improvement of infrastructure reliability.

🇺🇸 United States – Remote

💵 $113.1k - $175.7k / year

💰 $2.5M Grant on 2019-09

⏰ Full Time

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

🦅 H1B Visa Sponsor

Ansible

Kubernetes

Linux

Puppet

Python

Ruby

SRE/DevOps

🕒 May 13

BCW Group

51 - 200

🔌 API

🤖 Artificial Intelligence

🌐 Web 3

Senior Systems Administrator at BCW Technologies managing large scale systems in a remote setting. Responsibilities include setup, configuration, and optimization of Linux servers while coordinating with team members.

🇺🇸 United States – Remote

⏰ Full Time

🟡 Mid-level

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

Ansible

Apache

AWS

Cloud

Docker

Google Cloud Platform

Kubernetes

Linux

NGINX

Python

Release Engineer

🕒 May 13

Jito Labs

1 - 10

Release Engineer managing software upgrades and releases in Jito’s blockchain infrastructure team. Focusing on operational execution across various high-stakes systems and repositories.

🇺🇸 United States – Remote

💵 $180k - $200k / year

💰 $10M Series A on 2022-08

⏰ Full Time

🟡 Mid-level

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

Linux

SRE Architect, AI-Powered Reliability

🕒 May 13

WEX

5001 - 10000

🚗 Transport

💸 Finance

💳 Fintech

SRE Architect driving AI-Powered Reliability Engineering strategy and enforcing enterprise-wide SRE standards. Overseeing the architecture and implementation of mission-critical systems for WEX.

🇺🇸 United States – Remote

💵 $200.6k - $250.4k / year

💰 $310M Post-IPO Debt on 2020-06

⏰ Full Time

🟠 Senior

🔴 Lead

⛑ DevOps & Site Reliability Engineer (SRE)

🦅 H1B Visa Sponsor

Cloud

Distributed Systems