Senior ML Platform Engineer

🕒 vor 9 Tagen

🗣️🇺🇸🇬🇧 Englisch erforderlich

Jetzt Bewerben
Ähnliche Remote-Jobs finden

📊 Überprüfen Sie Ihre Lebenslauf-Bewertung für diese Stelle

Verbessern Sie Ihre Chancen auf ein Vorstellungsgespräch, indem Sie Ihre Lebenslauf-Bewertung vor der Bewerbung überprüfen.

Logo of NVIDIA

NVIDIA

10.000+ Mitarbeiter

Gegründet 1993

🤖 Künstliche Intelligenz

🎮 Gaming

Artificial Intelligence • Gaming • Automotive

NVIDIA ist ein führendes Technologieunternehmen mit Spezialisierung auf beschleunigtes Computing und Künstliche Intelligenz (AI). NVIDIA treibt Fortschritte bei Grafikprozessoren (GPUs), Cloud Computing, Rechenzentren und Virtual Reality voran und fokussiert dabei Branchen wie Gaming, Automotive, Gesundheitswesen und Robotik. Innovationen des Unternehmens wie NVIDIA Omniverse transformieren traditionelle digitale Prozesse, indem sie hochrealistische Simulationen und Rendering-Aufgaben ermöglichen. Die Anwendungen erstrecken sich über zahlreiche Branchen – von autonomen Fahrzeugen mit NVIDIA DRIVE über Gesundheitslösungen mit NVIDIA Clara bis hin zu AI-gestützten Analysen und Workflows.

Beschreibung

• Design, build, and maintain our core ML platform infrastructure as code, primarily using Ansible and Terraform, ensuring reproducibility and scalability across large-scale, distributed GPU clusters. • Apply SRE principles to diagnose, troubleshoot, and resolve complex system issues across the entire stack, ensuring high availability and performance for critical AI workloads. • Develop robust internal automation and tooling for ML workflow orchestration, resource scheduling, and platform operations, with a strong focus on software engineering best practices. • Collaborate with ML researchers and applied scientists to understand infrastructure needs and build solutions that streamline their end-to-end experimentation. • Evolve and operate our multi-cloud and hybrid (on-prem + cloud) environments, implementing monitoring, alerting, and incident response protocols. • Participate in on-call rotation to provide support for platform services and infrastructure running critical ML jobs, driving root cause analysis and implementing preventative measures. • Write high-quality, maintainable code (Python, Go) to contribute to the core orchestration platform and automate manual processes. • Drive the adoption of modern GPU technologies and ensure smooth integration of next-generation hardware into ML pipelines (e.g., GB200, NVLink, etc.).

🎯 Anforderungen

• BS/MS in Computer Science, Engineering, or equivalent experience. • 5+ years in software/platform engineering or SRE roles, including 3+ years focused on ML infrastructure or distributed compute systems. • Strong proficiency in Infrastructure-as-Code (IaC) tools, specifically Ansible and Terraform, with a proven track record of building and managing production infrastructure. • SRE-oriented mindset with extensive experience in diagnosing system-level issues, performance tuning, and ensuring platform reliability. • Solid understanding of ML workflows and lifecycle—from data preprocessing to deployment. • Proficiency in operating containerized workloads with Kubernetes and Docker. • Strong software engineering skills in languages such as Python or Go, with a focus on automation, tooling, and writing production-grade code. • Experience with Linux systems internals, networking, and performance tuning at scale.

🏖️ Vorteile

• equity • benefits

Jetzt Bewerben

Ähnliche Jobs

🕒 vor 9 Tagen

Lead Data Platform Engineer handling the technical architecture for the Enterprise Data Analytics Platform team. Driving large-scale engineering initiatives across the organization while mentoring engineers.

🇺🇸 Vereinigte Staaten – Remote

⏰ Vollzeit

🟠 Senior

🏗️ Plattformingenieur

🗣️🇺🇸🇬🇧 Englisch erforderlich

🕒 vor 9 Tagen

Bridgeway Benefit Technologies

201 - 500

☁️ SaaS

👥 HR Tech

Senior Platform Engineer focused on architecting and maintaining Bridgeway's cloud infrastructure. Driving DevOps practices and delivering efficient platform solutions across teams.

🇺🇸 Vereinigte Staaten – Remote

⏰ Vollzeit

🟠 Senior

🏗️ Plattformingenieur

🗣️🇺🇸🇬🇧 Englisch erforderlich

🕒 vor 9 Tagen

NeoBIM GmbH

1 - 10

🤖 Künstliche Intelligenz

🏠 Immobilien

Senior Platform Engineer at neoBIM transforming the construction industry with AI-powered BIM solutions. Focused on infrastructure, system reliability, and CI/CD workflows in a collaborative environment.

🇺🇸 Vereinigte Staaten – Remote

⏰ Vollzeit

🟠 Senior

🏗️ Plattformingenieur

🗣️🇺🇸🇬🇧 Englisch erforderlich

🕒 vor 10 Tagen

MANSCAPED

201 - 500

💄 Schönheit

👥 B2C

🛍️ eCommerce

Senior Systems & Platform Engineer at MANSCAPED shaping Azure-based platform architecture and enterprise application integrations. Collaborating on cloud strategy and driving critical engineering initiatives.

🇺🇸 Vereinigte Staaten – Remote

💵 $167.000 - $177.000 / Jahr

⏰ Vollzeit

🟠 Senior

🏗️ Plattformingenieur

🗣️🇺🇸🇬🇧 Englisch erforderlich

🕒 vor 10 Tagen

Strivacity

11 - 50

🔌 API

🔒 Cybersecurity

💳 Fintech

Platform Engineer building and maintaining infrastructure for engineering teams at Strivacity. Focusing on Kubernetes, automation, and operational excellence in a remote role.

🇺🇸 Vereinigte Staaten – Remote

⏰ Vollzeit

🟠 Senior

🏗️ Plattformingenieur

🗣️🇺🇸🇬🇧 Englisch erforderlich