Senior Solutions Architect – AI Factory Deployment

🕒 il y a 1 mois

🏄 California, North Carolina, +1 états de plus – Distant

info

💵 $184 000 - $287 500 / an

⏰ Temps Plein

🟠 Senior

💻 Ingénieur Solutions

🦅 Parrain de Visa H1B

info

🗣️🇺🇸🇬🇧 Anglais requis

Postuler Maintenant
Trouver des Emplois à Distance Similaires

📊 Vérifiez votre score de CV pour ce poste

Améliorez vos chances d'obtenir un entretien en vérifiant votre score de CV avant de postuler.

Logo of NVIDIA

NVIDIA

10 000+ employés

Fondée en 1993

🤖 Intelligence artificielle

🎮 Jeux vidéo

Artificial Intelligence • Gaming • Automotive

NVIDIA est une entreprise technologique de premier plan, spécialisée dans le calcul accéléré et l’intelligence artificielle (IA). NVIDIA est à l’avant‑garde des avancées en GPU (processeurs graphiques), cloud computing, centres de données et réalité virtuelle, avec un accent particulier sur les secteurs du gaming, de l’automobile, de la santé et de la robotique. Ses innovations, telles que NVIDIA Omniverse, transforment les processus numériques traditionnels en permettant des simulations haute fidélité et des tâches de rendu de pointe. Ses applications couvrent de nombreux secteurs, des véhicules autonomes avec NVIDIA DRIVE aux solutions de santé avec NVIDIA Clara, ainsi que des analyses et workflows pilotés par l’IA.

Description

• Set up, adjust, and verify AI factory environments across multi-GPU and multi-node Linux clusters. • Ensure configurations align with guidelines for NCCL, collectives, and distributed training frameworks. • Own the execution of key AI/LLM benchmarks, including setup, orchestration, result collection, and analysis. • Investigate and resolve issues when training jobs or benchmarks fail, hang, or underperform. • Build and improve observability for AI factories (metrics, logs, traces, dashboards) to understand workload behavior and system health. • Develop automation (Python, Shell) for running benchmarks, collecting results, and performing regression checks. • Examine communication patterns and NCCL usage for AI/LLM workloads, concentrating on collectives such as AllReduce and AllToAll. • Recommend changes to job configuration, parallelism strategies, and cluster settings to improve throughput, latency, and scaling efficiency. • Work closely with hardware, software, networking, datacenter, and product teams to prepare AI factories for customer use. • Contribute to documentation, guidelines, and readiness collateral that support internal collaborators and customer-facing teams.

🎯 Exigences

• Bachelor’s degree or equivalent experience in Computer Science, Mathematics, Engineering, Physics, or related field. • More than 6+ years of experience managing Linux-based systems in HPC, distributed systems, or extensive AI/ML settings. • Hands-on experience running AI/ML workloads on multi-GPU and/or multi-node clusters, with practical knowledge of NCCL. • Solid grasp of collective communication patterns, particularly AllReduce and AllToAll, and how they are applied in contemporary ML/LLM training. • Familiarity with LLM training and/or inference workflows using frameworks such as PyTorch or TensorFlow. • Proficiency with Python and Shell/Bash for scripting, automation, and tooling. • Experience with benchmarking (crafting, executing, and interpreting performance benchmarks). • Comfortable working with observability data (metrics, logs, dashboards) to troubleshoot and optimize complex distributed workloads. • Strong communication skills and the ability to work effectively with cross-functional teams.

🏖️ Avantages

• Eligible for equity and benefits

Postuler Maintenant

Emplois Similaires

🕒 il y a 1 mois

Saviynt

501 - 1000

☁️ SaaS

🔒 Cybersecurity

🏢 Entreprise

Drive technical success of Technology and Cloud partnerships at Saviynt, acting as technical advisor for Tech partners. Support revenue-generating initiatives and lead a team of SEs/SAs.

🇺🇸 États-Unis – Télétravail

💰 €130 000 000 Private Equity Round en 2021-09

⏰ Temps Plein

🟠 Senior

💻 Ingénieur Solutions

🦅 Parrain de Visa H1B

info

🗣️🇺🇸🇬🇧 Anglais requis

🕒 il y a 1 mois

Databricks

1001 - 5000

🤖 Intelligence artificielle

🏢 Entreprise

☁️ SaaS

Solutions Architect providing technical leadership in big data solutions for customers at Databricks. Collaborating with sales and engineers to implement innovative data strategies.

🇺🇸 États-Unis – Télétravail

💵 $180 000 - $247 500 / an

💰 €1 600 000 000 Series H en 2021-08

⏰ Temps Plein

🟡 Intermédiaire

🟠 Senior

💻 Ingénieur Solutions

🦅 Parrain de Visa H1B

info

🗣️🇺🇸🇬🇧 Anglais requis

🕒 il y a 1 mois

DailyPay

501 - 1000

💳 Fintech

🤝 B2B

👥 RH Tech

Solutions Engineer in SaaS company DailyPay, enhancing employer relationships and delivering on-demand pay solutions. Engaging with prospects, presenting demos, and collaborating with Account Executives.

🇺🇸 États-Unis – Télétravail

💵 $86 000 - $131 000 / an

⏰ Temps Plein

🟡 Intermédiaire

🟠 Senior

💻 Ingénieur Solutions

🗣️🇺🇸🇬🇧 Anglais requis

🕒 il y a 1 mois

CDW

10 000+ employés

🏢 Entreprise

☁️ SaaS

🔒 Cybersecurity

Senior Solution Architect developing comprehensive ITAD solutions and providing consultative guidance. Collaborating with teams and customers to deliver high-quality digital experience services.

🇺🇸 États-Unis – Télétravail

💵 $94 500 - $132 100 / an

💰 Post-IPO Equity en 2015-07

⏰ Temps Plein

🟠 Senior

💻 Ingénieur Solutions

🦅 Parrain de Visa H1B

info

🗣️🇺🇸🇬🇧 Anglais requis

🕒 il y a 1 mois

Salt Security

201 - 500

Solutions Engineer partnering with sales team to drive technical aspects of API security sales process. Delivering presentations, building relationships, and demonstrating value for customers.

🇺🇸 États-Unis – Télétravail

⏰ Temps Plein

🟡 Intermédiaire

🟠 Senior

💻 Ingénieur Solutions

🗣️🇺🇸🇬🇧 Anglais requis