Senior Solutions Architect – AI Factory Deployment

🕒 Abril 29

🗣️🇺🇸🇬🇧 Inglês obrigatório

Candidatar-se
Encontrar Vagas Remotas Similares

📊 Verifique sua pontuação de currículo para esta vaga

Melhore suas chances de conseguir uma entrevista verificando sua pontuação de currículo antes de se candidatar.

Logo of NVIDIA

NVIDIA

10.000+ funcionários

Fundada em 1993

🤖 Inteligência Artificial

🎮 Jogos

Artificial Intelligence • Gaming • Automotive

A NVIDIA é uma empresa de tecnologia líder, especializada em computação acelerada e inteligência artificial. A companhia é pioneira em avanços em unidades de processamento gráfico (GPUs), computação em nuvem, data centers e realidade virtual, com foco nos setores de games, automotivo, saúde e robótica. As inovações da empresa, como o NVIDIA Omniverse, transformam processos digitais tradicionais ao viabilizar simulações de alta fidelidade e tarefas de renderização. Suas aplicações abrangem diversos setores, desde veículos autônomos com o NVIDIA DRIVE até soluções de saúde com o NVIDIA Clara, além de análises e fluxos de trabalho impulsionados por IA.

Descrição

• Set up, adjust, and verify AI factory environments across multi-GPU and multi-node Linux clusters. • Ensure configurations align with guidelines for NCCL, collectives, and distributed training frameworks. • Own the execution of key AI/LLM benchmarks, including setup, orchestration, result collection, and analysis. • Investigate and resolve issues when training jobs or benchmarks fail, hang, or underperform. • Build and improve observability for AI factories (metrics, logs, traces, dashboards) to understand workload behavior and system health. • Develop automation (Python, Shell) for running benchmarks, collecting results, and performing regression checks. • Examine communication patterns and NCCL usage for AI/LLM workloads, concentrating on collectives such as AllReduce and AllToAll. • Recommend changes to job configuration, parallelism strategies, and cluster settings to improve throughput, latency, and scaling efficiency. • Work closely with hardware, software, networking, datacenter, and product teams to prepare AI factories for customer use. • Contribute to documentation, guidelines, and readiness collateral that support internal collaborators and customer-facing teams.

🎯 Requisitos

• Bachelor’s degree or equivalent experience in Computer Science, Mathematics, Engineering, Physics, or related field. • More than 6+ years of experience managing Linux-based systems in HPC, distributed systems, or extensive AI/ML settings. • Hands-on experience running AI/ML workloads on multi-GPU and/or multi-node clusters, with practical knowledge of NCCL. • Solid grasp of collective communication patterns, particularly AllReduce and AllToAll, and how they are applied in contemporary ML/LLM training. • Familiarity with LLM training and/or inference workflows using frameworks such as PyTorch or TensorFlow. • Proficiency with Python and Shell/Bash for scripting, automation, and tooling. • Experience with benchmarking (crafting, executing, and interpreting performance benchmarks). • Comfortable working with observability data (metrics, logs, dashboards) to troubleshoot and optimize complex distributed workloads. • Strong communication skills and the ability to work effectively with cross-functional teams.

🏖️ Benefícios

• Eligible for equity and benefits

Candidatar-se

Vagas Similares

🕒 Abril 29

Saviynt

501 - 1000

☁️ SaaS

🔒 Cibersegurança

🏢 Corporativo

Drive technical success of Technology and Cloud partnerships at Saviynt, acting as technical advisor for Tech partners. Support revenue-generating initiatives and lead a team of SEs/SAs.

🇺🇸 Estados Unidos – Remoto (EUA)

💰 $130.000.000 Private Equity Round em 2021-09

⏰ Tempo Integral

🟠 Sênior

💻 Engenheiro de Soluções

🦅 Patrocina Visto H1B

info

🗣️🇺🇸🇬🇧 Inglês obrigatório

🕒 Abril 29

Databricks

1001 - 5000

🤖 Inteligência Artificial

🏢 Corporativo

☁️ SaaS

Solutions Architect providing technical leadership in big data solutions for customers at Databricks. Collaborating with sales and engineers to implement innovative data strategies.

🇺🇸 Estados Unidos – Remoto (EUA)

💵 $180.000 - $247.500 / ano

💰 $1.600.000.000 Series H em 2021-08

⏰ Tempo Integral

🟡 Pleno

🟠 Sênior

💻 Engenheiro de Soluções

🦅 Patrocina Visto H1B

info

🗣️🇺🇸🇬🇧 Inglês obrigatório

🕒 Abril 29

DailyPay

501 - 1000

💳 Fintech

🤝 B2B

👥 RH Tech

Solutions Engineer in SaaS company DailyPay, enhancing employer relationships and delivering on-demand pay solutions. Engaging with prospects, presenting demos, and collaborating with Account Executives.

🇺🇸 Estados Unidos – Remoto (EUA)

💵 $86.000 - $131.000 / ano

⏰ Tempo Integral

🟡 Pleno

🟠 Sênior

💻 Engenheiro de Soluções

🗣️🇺🇸🇬🇧 Inglês obrigatório

🕒 Abril 29

CDW

10.000+ funcionários

🏢 Corporativo

☁️ SaaS

🔒 Cibersegurança

Senior Solution Architect developing comprehensive ITAD solutions and providing consultative guidance. Collaborating with teams and customers to deliver high-quality digital experience services.

🇺🇸 Estados Unidos – Remoto (EUA)

💵 $94.500 - $132.100 / ano

💰 Post-IPO Equity em 2015-07

⏰ Tempo Integral

🟠 Sênior

💻 Engenheiro de Soluções

🦅 Patrocina Visto H1B

info

🗣️🇺🇸🇬🇧 Inglês obrigatório

🕒 Abril 29

Salt Security

201 - 500

Solutions Engineer partnering with sales team to drive technical aspects of API security sales process. Delivering presentations, building relationships, and demonstrating value for customers.

🇺🇸 Estados Unidos – Remoto (EUA)

⏰ Tempo Integral

🟡 Pleno

🟠 Sênior

💻 Engenheiro de Soluções

🗣️🇺🇸🇬🇧 Inglês obrigatório