Senior Engineering Manager – Data Center Telemetry, RAS

Job not on LinkedIn

November 18

Apply Now
Logo of NVIDIA

NVIDIA

Artificial Intelligence • Gaming • Automotive

NVIDIA is a leading technology company specializing in accelerated computing and artificial intelligence. NVIDIA pioneers advancements in graphical processing units (GPUs), cloud computing, data centers, and virtual reality, with a focus on gaming, automotive, healthcare, and robotics industries. The company's innovations, such as NVIDIA Omniverse, transform traditional digital processes by enabling high-fidelity simulations and rendering tasks. Their applications span various industries, from autonomous vehicles using NVIDIA DRIVE to healthcare solutions with NVIDIA Clara, and AI-driven analytics and workflows.

10,000+ employees

Founded 1993

🤖 Artificial Intelligence

🎮 Gaming

📋 Description

• Lead Data Center Compute Telemetry & RAS: Own the end-to-end architecture and delivery for telemetry solutions, including fleet health monitoring, fault remediation, and data visualization at scale. • Owning OOB telemetry solution and data validation for telemetry from each underlying device. • Build and Mentor a World-Class Team: Recruit, develop, and motivate a high-performing engineering team focused on platform telemetry, RAS and observability. • Process Optimization: Continuously improve software development processes for optimal productivity and quality. • Cross-Functional Collaboration: Work across teams to ensure seamless integration of telemetry solutions with platform firmware, server architecture, and data center management. • Product Ownership: Drive product life cycles with QA teams, ensuring robust testing, productization, and delivery. • Performance Management: Conduct performance reviews, foster a culture of excellence, and ensure high productivity.

🎯 Requirements

• 12+ overall years of relevant experience and 5 yrs of managing systems/platform software teams, ideally in server RAS, firmware, telemetry, or data center solutions. • BS, MS, or PhD in EE/CS or related field (or equivalent experience). • Strong knowledge of DMTF/PLDM for OOB telemetry collection, time series databases (e.g., InfluxDB, Prometheus) and REST APIs (Redfish). • Deep understanding of Server and firmware architecture and optimization for low-latency APIs. • Proven track record of delivering scalable server products and telemetry solutions. • Experience with SCM (Git, Perforce) and project management tools (Jira). • Excellent written and oral communication skills, strong work ethic, and commitment to teamwork. • Hands-on experience with x86/ARM system architecture and coding (C/C++, Python). • Familiarity with Confidential Compute and notification systems. • Demonstrated ability to analyze algorithms for time/space complexity and system resource requirements.

🏖️ Benefits

• Equity • Benefits

Apply Now

Similar Jobs

November 18

Engineering Manager leading Database Reliability, Scalability & Operations for GitLab’s AI-powered platform. Responsible for team management and technical leadership in database strategies.

Distributed Systems

Postgres

November 18

Engineering Manager leading globally distributed System and Kernel developers within the macOS operating system at SentinelOne. Direct involvement in core technology and improving agent architecture.

MacOS

November 18

Engineering Manager leading Product Experience at Helius, building core solutions for crypto applications. Driving team performance and product quality in a fast-paced environment.

AWS

JavaScript

Next.js

React

TypeScript

Go

November 18

Senior Software Engineering Manager leading back-end engineering teams to enhance healthcare digital services at CVS Health. Focus on cloud technologies and AI integration for exceptional member experiences.

Apache

AWS

Cloud

Kafka

Microservices

November 17

Senior Engineering Manager leading a dynamic team specializing in Node.js at Raya. Focused on project management and innovative software development for a utility-driven app.

Cloud

Distributed Systems

JavaScript

Node.js

Go

Built by Lior Neu-ner. I'd love to hear your feedback — Get in touch via DM or support@remoterocketship.com