Search Remote Jobs

Staff AI/ML Infrastructure Engineer

Job not on LinkedIn

🕒 April 14

Apply Now
Find Similar Remote Jobs

📊 Check your resume score for this job

Improve your chances of getting an interview by checking your resume score before you apply.

Logo of Vultr

Vultr

201 - 500 employees

Founded 2014

🤖 Artificial Intelligence

🤝 B2B

🔧 Hardware

🔥 Funding within the last year

💰 $329M Debt Financing - Vultr on 2025-06

Artificial Intelligence • B2B • Hardware

Vultr is a global cloud infrastructure provider offering on-demand virtual machines, bare-metal servers, GPU-accelerated instances, managed databases, object and block storage, Kubernetes, and networking services. The platform emphasizes AI and HPC workloads with a broad selection of AMD and NVIDIA GPUs, fast networking, and 32+ data center regions, plus a marketplace of deployable apps and developer-friendly APIs. Vultr targets developers and businesses seeking affordable, scalable, and compliant cloud compute and storage alternatives to hyperscalers.

📋 Description

• Design and maintain GPU and bare metal infrastructure in containerized and physical environments • Build scalable GPU clusters in partnership with networking and provisioning teams • Ensure reliable, high-performance provisioning of GPU infrastructure • Develop automated testing systems for GPU-based platforms • Implement infrastructure solutions for diverse AI/ML workloads • Benchmark, test, and troubleshoot GPU performance at scale • Collaborate with hardware vendors on drivers, firmware, and support • Resolve hardware, software, and performance issues across environments • Optimize rail and cluster performance across architectures • Lead technical direction and mentor engineers on infrastructure best practices

🎯 Requirements

• 5+ years experience working with bare metal infrastructure and hardware automation • Hands-on experience with modern NVIDIA/AMD GPU platforms and high-performance networking (RoCE, InfiniBand) • Deep knowledge of BIOS, BMC, firmware, NICs, Redfish/IPMI, and PCIe systems • Strong Linux systems experience including device drivers and package management • Experience building infrastructure automation using Python and Bash • Familiarity with GPU drivers, firmware ecosystems, and vendor collaboration • Experience designing and delivering complex infrastructure products • Proven ability to lead projects and mentor engineers • Experience optimizing multi-cluster GPU environments • Exposure to Machine Learning software stacks and GPU workloads

🏖️ Benefits

• 100% company-paid insurance premiums for employee medical, dental and vision plans. • 401(k) plan that matches 100% up to 4%, with immediate vesting • Professional Development Reimbursement of $2,500 each year • 11 Holidays + Paid Time Off Accrual + Rollover Plan • Commitment matters to Vultr! Increased PTO at 3 year and 10 year anniversary + 1 month paid sabbatical every 5 years + Anniversary Bonus each year • $500 stipend for remote office setup in first year + $400 each following year • Internet reimbursement up to $75 per month • Gym membership reimbursement up to $50 per month • Company paid Wellable subscription

Apply Now

Similar Jobs

🕒 April 3

Mechanical Orchard

11 - 50

🤖 Artificial Intelligence

☁️ SaaS

🏢 Enterprise

Manager leading infrastructure engineering delivery for Mechanical Orchard, ensuring effective deployment models and team development. Collaborating across functions to influence key architectural decisions.

🕒 March 20

Voxel51

11 - 50

🤖 Artificial Intelligence

Principal Infrastructure Engineer at Voxel51 designing systems for managing unstructured data. Leading architecture and strategy for AI work deployment across industry verticals.

🕒 March 4

LexisNexis

10,000+ employees

📋 Compliance

🏛️ Government

☁️ SaaS

Consulting AWS Cloud Network Infrastructure Engineer at LexisNexis defining best practices and collaborating on cloud initiatives. Architecting secure and scalable cloud solutions to enable business agility.

🕒 February 26

Epistemix

11 - 50

☁️ SaaS

🤖 Artificial Intelligence

🤝 B2B

Infrastructure Architect responsible for client integration and deployment of Epistemix's data-driven platform. Requires deep technical expertise in cloud environments and automation tools.

🕒 February 12

Unisys

10,000+ employees

🤖 Artificial Intelligence

🔒 Cybersecurity

Engineering Manager overseeing engineering team creating secure and scalable tech solutions for Unisys. Leading automation efforts and collaborating with stakeholders to ensure platform efficiency and compliance.