AI Cluster Architect

Stelle nicht auf LinkedIn

🕒 vor 3 Monaten

🇺🇸 Vereinigte Staaten – Remote

💵 $165.000 - $185.000 / Jahr

⏰ Vollzeit

🟠 Senior

🔴 Experte

🤖 Künstliche Intelligenz

🗣️🇺🇸🇬🇧 Englisch erforderlich

Jetzt Bewerben
Ähnliche Remote-Jobs finden

📊 Überprüfen Sie Ihre Lebenslauf-Bewertung für diese Stelle

Verbessern Sie Ihre Chancen auf ein Vorstellungsgespräch, indem Sie Ihre Lebenslauf-Bewertung vor der Bewerbung überprüfen.

Logo of Vultr

Vultr

201 - 500 Mitarbeiter

Gegründet 2014

🤖 Künstliche Intelligenz

🤝 B2B

🔧 Hardware

💰 €329.000.000 Debt Financing - Vultr im 2025-06

Artificial Intelligence • B2B • Hardware

Vultr ist ein globaler Anbieter von Cloud-Infrastrukturen, der bedarfsgesteuerte virtuelle Maschinen, Bare-Metal-Server, GPU-beschleunigte Instanzen, verwaltete Datenbanken, Objekt- und Blockspeicher, Kubernetes- und Netzwerklösungen anbietet. Die Plattform legt den Schwerpunkt auf KI- und HPC-Workloads mit einer breiten Auswahl an AMD- und NVIDIA-GPUs, schnellen Netzwerken und über 32 Datenzentrumsregionen sowie einem Marktplatz für bereitstellbare Apps und entwicklerfreundliche APIs. Vultr richtet sich an Entwickler und Unternehmen, die nach kostengünstigen, skalierbaren und konformen Alternativen zu Hyperscalern für Cloud-Computing und Speicherlösungen suchen.

Beschreibung

• Architect large-scale GPU clusters within fixed site power budgets that optimizes for maximum GPU density while reserving necessary headroom for compute services, storage, and networking. • Model and validate power consumption across the full cluster bill of materials (GPUs, CPUs, NICs, switches, fabric components, storage, and facility limits). • Evaluate tradeoffs across multiple fabric networking architectures (InfiniBand, RoCE, SpectrumX) as well as multi-plane, 2-tier/3-tier, and rail-optimized topologies. • Determine network scale limits based on switch radix, link speed, topology, and blocking requirements. • Gather, interpret, and maintain detailed SKU-level power and thermal specifications for GPUs, NICs, switches, DPUs, storage, and server platforms. • Develop power-aware cluster configuration templates and capacity-planning models that can scale across sites with varying constraints and allow for quick iteration and ideation. • Document architecture, design choices, tradeoff analyses, and operational considerations for deployment and lifecycle management. • Provide guidance on future-proofing, including the ability to incorporate next-gen GPUs, NICs, or fabrics. • Collaborate with vendors on novel fabric architectures that enable large-scale cluster deployments (100k+ GPUs)

🎯 Anforderungen

• 7+ years designing or building large-scale HPC, AI, or hyperscale GPU clusters. • Expert understanding of GPU and accelerator system design, including node topology, PCIe/NVLink/NVSwitch/ROCm, and NIC-to-GPU affinity considerations. • Strong familiarity with InfiniBand, RoCE, and SpectrumX networking, including multi-tier, multi-plane, Clos/dragonfly variants, and large-radix switch design. • Demonstrated experience modeling power draw and thermal characteristics of servers, GPUs, NICs, switches, optics, and storage systems. • Ability to design networks that maintain full non-blocking performance or intentionally introduce over/under-subscription while understanding impacts on workload performance. • Proven ability to gather and analyze vendor SKU-level specifications and incorporate them into scalable cluster architectures. • Experience balancing customer-driven requirements for compute, storage, and service density in combination with overall GPU count. • Strong documentation, communication, and cross-functional collaboration skills.

🏖️ Vorteile

• Excellent Medical Benefits w/ 100% company-paid premiums for employee only plan + 100% company-paid dental & vision premiums • 401(k) plan that matches 100% up to 4% with immediate vesting • Professional Development Reimbursement of $2,500 each year • 11 Holidays + Paid Time Off Accrual + Rollover Plan + take your birthday off • Commitment matters to Vultr! Increased PTO at 3 year & 10 year anniversary + 1 month paid sabbatical every 5 years + Anniversary Bonus each year • $500 first year remote office setup + $400 each following year for new equipment • Internet reimbursement up to $75 per month • Gym membership reimbursement up to $50 per month • Company-paid Wellable subscription

Jetzt Bewerben

Ähnliche Jobs

🕒 vor 3 Monaten

SandboxAQ

51 - 200

🤖 Künstliche Intelligenz

🔒 Cybersecurity

💊 Pharmazie

Staff Forward Deployed Engineer in AI Simulation developing solutions and ensuring client success at SandboxAQ. Join a global team tackling challenges in drug discovery and chemical simulation.

🇺🇸 Vereinigte Staaten – Remote

💵 $168.300 - $276.000 / Jahr

⏰ Vollzeit

🔴 Experte

🤖 Künstliche Intelligenz

🗣️🇺🇸🇬🇧 Englisch erforderlich

🕒 vor 3 Monaten

Game Plan Tech

51 - 200

🤖 Künstliche Intelligenz

🏛️ Regierung

🔒 Cybersecurity

AI Subject Matter Expert at Game Plan Tech advising on deployment and design of ML models. Focused on machine learning methodologies and generative AI techniques for innovative solutions.

🇺🇸 Vereinigte Staaten – Remote

💰 €550.000 Series B - GamePlan Technologies im 2013-10

⏰ Vollzeit

🟡 Mittelstufe

🟠 Senior

🤖 Künstliche Intelligenz

🗣️🇺🇸🇬🇧 Englisch erforderlich

🕒 vor 3 Monaten

Jump - Advisor AI

51 - 200

🤖 Künstliche Intelligenz

💳 Fintech

☁️ SaaS

Applied AI Evaluation Scientist optimizing AI systems at Jump, a fintech startup leveraging LLMs. Focus on evaluation frameworks for AI/ML quality and trustworthiness.

🇺🇸 Vereinigte Staaten – Remote

💵 $180.000 - $270.000 / Jahr

💰 €24.574.985 Series A - Jump im 2025-02

⏰ Vollzeit

🟡 Mittelstufe

🟠 Senior

🤖 Künstliche Intelligenz

🗣️🇺🇸🇬🇧 Englisch erforderlich

🕒 vor 3 Monaten

HMH

1001 - 5000

📚 Bildung

🛍️ eCommerce

AI Delivery Lead coordinating AI integration across content operations at NWEA. Focusing on enhancing quality, speed, and efficiency in educational solutions.

🗣️🇺🇸🇬🇧 Englisch erforderlich

🕒 vor 3 Monaten

Prolific

51 - 200

🤝 B2B

AI Trainer evaluating and improving cutting-edge AI models. Joining Prolific to assist in training AI with flexible hours and competitive pay.

🗣️🇺🇸🇬🇧 Englisch erforderlich