Research Engineer – Distributed Training

November 20

Apply Now
Logo of CloudWalk, Inc.

CloudWalk, Inc.

Fintech • Blockchain • Artificial Intelligence

CloudWalk, Inc. is focused on revolutionizing the payments industry by democratizing financial services and empowering entrepreneurs through innovative solutions. Their mission is to create a fair and technologically advanced payment network on Earth and beyond. Utilizing AI and blockchain, CloudWalk offers products such as InfinitePay, a financial platform serving businesses in Brazil, and JIM, an instant payment system in the US. Stratus, their high-performance blockchain, supports global payment networks. CloudWalk emphasizes customer engagement and disruptive economic models to transform how merchants sell and profit.

201 - 500 employees

💳 Fintech

🤖 Artificial Intelligence

💰 $150M Series C on 2021-11

📋 Description

• Design, implement, and maintain CloudWalk’s distributed LLM training pipeline. • Orchestrate multi-node, multi-GPU runs across Kubernetes and internal clusters. • Optimize performance, memory, and cost across large training workloads. • Integrate cutting-edge frameworks (Unsloth, TorchTitan, Axolotl) into production workflows. • Build internal tools and templates that accelerate research-to-production transitions. • Collaborate with infra, research, and MLOps teams to ensure reliability and reproducibility.

🎯 Requirements

• Strong background in **PyTorch** and **distributed training** (DeepSpeed, FSDP, Accelerate). • Hands-on experience with large-scale multi-GPU or multi-node training. • Familiarity with **Transformers, Datasets, and mixed-precision techniques.** • Understanding of **GPUs, containers, and schedulers **(Kubernetes, Slurm). • Mindset for reliability, performance, and clean engineering.

🏖️ Benefits

• Competitive salary • Equity • Opportunity to shape future AI infrastructure

Apply Now

Similar Jobs

February 21

avra

2 - 10

Join Avra to enhance core AI models in a fully remote role. Shape impactful technology for SMEs.

🗣️🇧🇷🇵🇹 Portuguese Required

PyTorch

Tensorflow

Built by Lior Neu-ner. I'd love to hear your feedback — Get in touch via DM or support@remoterocketship.com