Senior HPC DevOps Engineer

🔥 17 minutes ago

Apply Now
Find Similar Remote Jobs

📊 Check your resume score for this job

Improve your chances of getting an interview by checking your resume score before you apply.

Logo of NVIDIA

NVIDIA

10,000+ employees

Founded 1993

🤖 Artificial Intelligence

🎮 Gaming

Artificial Intelligence • Gaming • Automotive

NVIDIA is a leading technology company specializing in accelerated computing and artificial intelligence. NVIDIA pioneers advancements in graphical processing units (GPUs), cloud computing, data centers, and virtual reality, with a focus on gaming, automotive, healthcare, and robotics industries. The company's innovations, such as NVIDIA Omniverse, transform traditional digital processes by enabling high-fidelity simulations and rendering tasks. Their applications span various industries, from autonomous vehicles using NVIDIA DRIVE to healthcare solutions with NVIDIA Clara, and AI-driven analytics and workflows.

📋 Description

• Design, implement, and maintain large-scale HPC/AI clusters with state-of-the-art monitoring, logging, and alerting systems. • Utilize and develop tools to manage infrastructure as code, ensuring scalable and repeatable deployments. • Develop and maintain continuous integration and continuous delivery (CI/CD) pipelines to automate and streamline deployment processes. • Develop automation scripts and tools to automate deployment, configuration management, and operational monitoring. • Perform comprehensive troubleshooting from bare metal to application level, ensuring system reliability and efficiency. • Serve as a technical resource, developing and sharing best practices with internal teams. • Support R&D activities and engage in proof of concepts (POCs) and proof of values (POVs) for future improvements.

🎯 Requirements

• B.Sc. in Computer Science, Engineering, or a related field with 5+ years of experience. • Deep knowledge of HPC and AI solution technologies, including CPUs, GPUs, high-speed interconnects, and supporting software. • Advanced proficiency in programming and scripting languages, with a solid understanding of object-oriented programming principles. • Familiarity with Jenkins, Ansible, Puppet/Chef. • Excellent knowledge of Windows and Linux (Redhat/CentOS and Ubuntu), networking and OS-level security. • Deep understanding of networking protocols such as InfiniBand and Ethernet. • Experience with job scheduling workloads and orchestration tools such as Slurm and Kubernetes. • Background with multiple storage solutions like Lustre, GPFS, ZFS, and XFS. • Expertise with virtual systems (VMware, Hyper-V, KVM, Citrix). • Familiarity with cloud platforms (AWS, Azure, Google Cloud).

🏖️ Benefits

• Health insurance • 401(k) matching • Flexible working hours • Paid time off • Remote work options

Apply Now

Similar Jobs

🕒 Yesterday

Univention

51 - 200

🤝 B2B

📚 Education

☁️ SaaS

DevOps / Platform Engineer creating technical foundations for product teams. Working on CI/CD pipelines, automation, and containerization for open-source solutions in education and administration.

🗣️🇩🇪 German Required

Ansible

Docker

Kubernetes

Python

Terraform

🕒 2 days ago

XAAS

11 - 50

🤝 B2B

🔒 Cybersecurity

🏛️ Government

Consultant advising clients on sustainable Cloud and DevOps solutions for complex IT landscapes. Collaborating on architecture design and implementation with a focus on a hands-on approach.

🗣️🇩🇪 German Required

Ansible

AWS

Azure

Cloud

Google Cloud Platform

Java

Kubernetes

OpenShift

OpenStack

Python

Terraform

Go

🕒 2 days ago

Senior DevOps Engineer responsible for developing a Kubernetes identity platform for a major financial institution. Collaborating remotely with teams in Germany to enhance the platform.

🗣️🇩🇪 German Required

Cloud

Flux

Grafana

Kubernetes

Prometheus

Terraform

TypeScript

Vault

🕒 2 days ago

easybill GmbH

11 - 50

☁️ SaaS

🛍️ eCommerce

🤝 B2B

Senior DevOps Engineer at easybill ensuring the availability and reliability of complex systems. Supporting maintenance and automation of a cloud-based invoicing platform with a remote-first team.

🗣️🇩🇪 German Required

ElasticSearch

HAProxy

Java

Kubernetes

MySQL

NGINX

Redis

Rust

TypeScript

Go

🕒 3 days ago

FourEnergy GmbH

51 - 200

🤝 B2B

🏢 Enterprise

Consultant role with DevOps focus at FourEnergy GmbH supporting software development projects for clients. Requires extensive experience with modern Java ecosystem and agile methodologies.

🗣️🇩🇪 German Required

Ansible

DNS

Docker

Java

Kubernetes

Linux

Terraform