HPC Engineer, Machine Learning Infrastructure - EMEA Remote

March 19

Apply Now

Hugging Face

The AI community building the future.

machine learning • natural language processing • deep learning

51 - 200

Description

• Design, develop, deploy, and maintain reliable and scalable infrastructure that enables efficient training workloads. • Manage large compute clusters for AI Training and development. • Create tooling and infrastructure that abstract compute and storage in ML workflows • Measure and optimize system performance. • Monitor and troubleshoot infrastructure issues, ensuring high availability and performance of AI workloads. • Recommend improvements to enhance system efficiency and performance. • Work closely with AI software engineering teams to ensure infrastructure can handle all system requirements.

Requirements

• 7+ years of experience in a DevOps or infrastructure Engineer role building machine learning infrastructure and working with large GPU clusters. • Knowledge of cloud providers such as AWS, GCP, infra-as-code frameworks, and observability tools. • Familiarity with Python Scientific stack, Pytorch. • Experience with data structures, data modeling, and database management as well as object and file storage systems. • Strong communication, collaboration, and documentation skills. • Experience with Linux, Git, containers, networking, and command line tools. • Strong programming skills in Python, Golang, and/or Rust.

Benefits

• Flexible working hours and remote options • Health, dental, and vision benefits for employees and dependents • 12 weeks of parental leave (20 for birthing mothers) and unlimited paid time off • Reimbursement for relevant conferences, training, and education • Company equity as part of compensation package

Apply Now
Built by Lior Neu-ner. I'd love to hear your feedback — Get in touch via DM or lior@remoterocketship.com
Jobs by Title
Remote Account Executive jobsRemote Accounting, Payroll & Financial Planning jobsRemote Administration jobsRemote Android Engineer jobsRemote Backend Engineer jobsRemote Business Operations & Strategy jobsRemote Chief of Staff jobsRemote Compliance jobsRemote Content Marketing jobsRemote Content Writer jobsRemote Copywriter jobsRemote Customer Success jobsRemote Customer Support jobsRemote Data Analyst jobsRemote Data Engineer jobsRemote Data Scientist jobsRemote DevOps jobsRemote Engineering Manager jobsRemote Executive Assistant jobsRemote Full-stack Engineer jobsRemote Frontend Engineer jobsRemote Game Engineer jobsRemote Graphics Designer jobsRemote Growth Marketing jobsRemote Hardware Engineer jobsRemote Human Resources jobsRemote iOS Engineer jobsRemote Infrastructure Engineer jobsRemote IT Support jobsRemote Legal jobsRemote Machine Learning Engineer jobsRemote Marketing jobsRemote Operations jobsRemote Performance Marketing jobsRemote Product Analyst jobsRemote Product Designer jobsRemote Product Manager jobsRemote Project & Program Management jobsRemote Product Marketing jobsRemote QA Engineer jobsRemote SDET jobsRemote Recruitment jobsRemote Risk jobsRemote Sales jobsRemote Scrum Master + Agile Coach jobsRemote Security Engineer jobsRemote SEO Marketing jobsRemote Social Media & Community jobsRemote Software Engineer jobsRemote Solutions Engineer jobsRemote Support Engineer jobsRemote Technical Writer jobsRemote Technical Product Manager jobsRemote User Researcher jobs