HPC System Engineer

Job not on LinkedIn

🔥 0 minutes ago

Apply Now
Find Similar Remote Jobs

📊 Check your resume score for this job

Improve your chances of getting an interview by checking your resume score before you apply.

Logo of Nebius Group

Nebius Group

1001 - 5000 employees

🏢 Enterprise

☁️ SaaS

AI • Enterprise • SaaS

Nebius Group is building one of the world’s leading AI infrastructure companies, focusing on providing the necessary compute, storage, and tools for developers in the AI space. Based in Europe and listed on Nasdaq, Nebius has a global presence with R&D centers across Europe, North America, and Israel. The company's primary offering is an AI-centric cloud platform designed for intensive AI workloads, complemented by various other businesses involved in generative AI development, edtech, and autonomous technology.

📋 Description

• Work closely with hardware, development teams to profile and analyze GPU performance at the system and kernel level. • Evaluate and compare GPU performance across different platforms, architectures, and software stacks (e.g., CUDA, ROCm). • Perform acceptance testing for new GPU clusters, ensuring hardware and software meet performance, stability, and compatibility requirements for AI workloads. • Perform experiments across diverse GPU system configurations to assess the impact of varying interconnect strategies and system-level optimizations on performance and scalability.

🎯 Requirements

• Proficient in Unix/Linux, plus Python and Bash for automation. • Good understanding of the GPU stack: CUDA, NCCL, drivers, and relevant libraries • Proven ability to troubleshoot complex system issues including hardware, software, and networking problems. • Familiarity with containerized environments (e.g., Docker, Kubernetes).

🏖️ Benefits

• Competitive compensation • Career growth and learning opportunities • Flexibility and work-life balance • Collaborative and innovative culture • Opportunity to work on impactful AI projects • International environment and talented teams

Apply Now

Similar Jobs

🕒 4 days ago

Leaseweb

501 - 1000

🤝 B2B

📡 Telecommunications

🏢 Enterprise

System Engineer developing Kubernetes solutions for Leaseweb's container services. Automating deployments and improving product offerings in a remote-first environment.

Cloud

DNS

Kubernetes

Linux

🕒 May 30

Strukton

1001 - 5000

🚗 Transport

☁️ SaaS

System Engineer ensuring compliance and managing documentation in civil engineering projects. Specializing in complex infrastructure development with Strukton Civiel NL.

🗣️🇳🇱 Dutch Required

🕒 May 28

Tether.to

11 - 50

₿ Crypto

💳 Fintech

💸 Finance

Senior Applied ML Engineer designing backend systems for a media intelligence platform. Collaborating with global teams to integrate AI/ML solutions and optimize workflows.

Distributed Systems

🕒 April 21

Work Life Group

11 - 50

🎯 Recruiter

👥 HR Tech

Senior OutSystems Developer designing and developing applications for a leading Digital Solution Company. Collaborating in Agile teams to build innovative solutions on the OutSystems platform.

Microservices

SOAP

SQL

🕒 January 22

Strukton

1001 - 5000

🚗 Transport

☁️ SaaS

System Engineer managing complex infra projects in the Netherlands, focusing on compliance and documentation. Collaborating with project teams to ensure requirements are met with effective reporting.

🗣️🇳🇱 Dutch Required