Lead Systems HPC Engineer

Job not on LinkedIn

🕒 April 21

Apply Now
Find Similar Remote Jobs

📊 Check your resume score for this job

Improve your chances of getting an interview by checking your resume score before you apply.

Logo of Nebius Group

Nebius Group

1001 - 5000 employees

🏢 Enterprise

☁️ SaaS

AI • Enterprise • SaaS

Nebius Group is building one of the world’s leading AI infrastructure companies, focusing on providing the necessary compute, storage, and tools for developers in the AI space. Based in Europe and listed on Nasdaq, Nebius has a global presence with R&D centers across Europe, North America, and Israel. The company's primary offering is an AI-centric cloud platform designed for intensive AI workloads, complemented by various other businesses involved in generative AI development, edtech, and autonomous technology.

📋 Description

• Focus on understanding system behavior across multiple layers, identifying performance bottlenecks, and driving improvements that shape how our clusters are built, operated, tuned, and validated. • Investigate and troubleshoot performance issues of GPU cluster under real workloads (training and inference). • Evaluate and integrate new hardware, system configurations and tuning approaches through software stack. • Support complex performance-related escalations from internal teams and customers. • Work closely with infrastructure, software engineering and hardware vendor teams (e.g. NVIDIA, Mellanox, Intel). • Contribute to hardware and cluster qualification (acceptance), ensuring systems meet performance expectations.

🎯 Requirements

• 5+ years of professional experience in system-level software development (focused on performance optimization, low-level programming). • 3+ years of hands-on experience with Linux systems (administration, troubleshooting, and performance tuning). • In-depth understanding of server architecture, including PCIe devices, NICs, Linux OS/Kernel, and high-performance computing (HPC) systems. • Strong proficiency in one or more performance-oriented programming languages (C/C++, Go, Python).

🏖️ Benefits

• Health insurance: 100% company-paid medical, dental and vision coverage for employees and families. • 401(k) plan: Up to 4% company match with immediate vesting. • Parental leave: 20 weeks paid for primary caregivers, 12 weeks for secondary caregivers. • Remote work reimbursement: Up to $85/month for mobile and internet. • Disability & life insurance: Company-paid short-term, long-term and life insurance coverage.

Apply Now

Similar Jobs

🕒 April 21

Recruiting.com

11 - 50

🎯 Recruiter

☁️ SaaS

🤝 B2B

Senior Red Team Engineer emulating advanced threat actors to test cybersecurity controls at Cencora. Collaborating with Cyber Threat Intelligence and presenting findings to stakeholders.

Cloud

Cyber Security

Linux

MacOS

Python

Go

🕒 April 21

Ulteig

1001 - 5000

⚡ Energy

System Protection Engineer at Ulteig designing and implementing protective relay settings and studies. Plans and coordinates project phases while providing technical guidance to other engineers.

🕒 April 21

Orbital Engineering, Inc.

501 - 1000

⚡ Energy

Natural Gas Engineer supporting Natural Gas Distribution Service Operations projects throughout Colorado. Collaborating on engineering design, compliance, and quality assurance with minimal supervision.

🕒 April 21

Emory University

10,000+ employees

📚 Education

🔬 Science

Senior Cyber Defense Engineer managing incident response and security operations at Emory University. Analyzing security incidents and providing guidance across information security domains.

Cloud

🕒 April 21

GAI Consultants, Inc.

501 - 1000

⚡ Energy

🚗 Transport

🏛️ Government

Project Engineer 2 focusing on transmission line design and project management at GAI. Responsible for technical leadership and successful project delivery in Power Delivery team.