Member of Technical Staff, Training Engineer – Large Scale Foundation Models

Job not on LinkedIn

October 11

Apply Now
Logo of FirstPrinciples Holding Company

FirstPrinciples Holding Company

B2B • Enterprise • Finance

FirstPrinciples Holding Company focuses on building and scaling a portfolio of successful commercial businesses. The company leverages strategic insight and operational expertise to maximize value and growth within its portfolio companies. FirstPrinciples aims to provide sustainable solutions that drive long-term success for its partners and stakeholders.

51 - 200 employees

🤝 B2B

🏢 Enterprise

💸 Finance

📋 Description

• Develop and lead end-to-end pre-training of large language models on GPU clusters. • Combine deep engineering expertise with research intuition. • Build data pipelines and perform distributed training at scale. • Make informed decisions about microbatch and global batch configurations. • Provide strategic insights to the executive team on financial implications. • Design capital allocation frameworks for sustainability. • Operate distributed training infrastructure using modern techniques. • Write production-grade PyTorch and Triton/CUDA kernels when required. • Lead cross-functional efforts and mentor engineers.

🎯 Requirements

• Bachelor's or Master's degree in Computer Science, Engineering, or related field. • 7-12+ years of total experience, including 2+ years training large Transformers at scale. • Hands-on experience with at least one frontier-style training run. • Expert-level proficiency in PyTorch (including compiled mode/torch.compile). • Deep facility with distributed frameworks (PyTorch FSDP or DeepSpeed ZeRO). • Proven success operating multi-node GPU jobs. • Demonstrated impact from data quality work. • Strong applied mathematics background.

🏖️ Benefits

• Health insurance • Innovative research environment • Collaboration with top experts • Opportunity to work on groundbreaking technology • Flexible remote work

Apply Now

Similar Jobs

September 17

Ethernovia

51 - 200

📡 Telecommunications

Lead ISO 26262 ASIL D functional safety for automotive ICs at Ethernovia, defining safety architecture, FMEDA, audits, and supporting ASPICE Level 3 process development.

🇨🇦 Canada – Remote

💵 $140k - $200k / year

💰 $64M Series A on 2023-05

⏰ Full Time

🔴 Lead

👷🏻‍♀️ Engineer

September 5

Keycard Labs

1 - 10

🤖 Artificial Intelligence

🔒 Cybersecurity

☁️ SaaS

Staff Identity Engineer building core identity, authentication, and authorization systems for Keycard's agent-native identity and access platform

February 15

The Chief Engineer ensures proper maintenance of ship equipment and safety compliance on the vessel.

Developed by Lior Neu-ner. I'd love to hear your feedback — Get in touch via DM or support@remoterocketship.com