Member of Engineering – Pre-training, Synthetic Data

🕒 vor 4 Monaten

🇺🇸 Vereinigte Staaten – Remote

⏰ Vollzeit

🟡 Mittelstufe

🟠 Senior

🖥 Softwareentwickler

🗣️🇺🇸🇬🇧 Englisch erforderlich

Jetzt Bewerben
Ähnliche Remote-Jobs finden

📊 Überprüfen Sie Ihre Lebenslauf-Bewertung für diese Stelle

Verbessern Sie Ihre Chancen auf ein Vorstellungsgespräch, indem Sie Ihre Lebenslauf-Bewertung vor der Bewerbung überprüfen.

Logo of poolside

poolside

51 - 200 Mitarbeiter

Gegründet 2023

🤖 Künstliche Intelligenz

🏢 Unternehmen

Artificial Intelligence • Enterprise

Poolside ist ein Accelerator, der speziell für Web3-Gründerinnen und -Gründer sowie Builder konzipiert ist. Das Programm unterstützt Projekte in den Bereichen Decentralized Finance (DeFi), Gaming, Governance, Infrastruktur und NFTs. Mit einem starken Ökosystem aus 20. 000 Mitgliedern – darunter Mentorinnen und Mentoren, Investorinnen und Investoren sowie Web3-Builder – hat Poolside über 110 Projekte mitangestoßen und begleitet. Der Accelerator bietet exklusiven Zugang zu Mentoring und technischer Expertise, um Web3-Projekte zu skalieren und erfolgreiche Markteinführungen zu ermöglichen. Darüber hinaus arbeitet Poolside mit führenden Unternehmen und Protokollen zusammen, um Wachstum und Innovation im Web3-Bereich voranzutreiben.

Beschreibung

• You’ll be working on our data team focused on the quality of the datasets being delivered for training our models. • This is a hands-on role where your #1 mission would be to improve the quality of the pretraining datasets by leveraging your previous experience, intuition and training experiments. • This role particularly focuses on generating synthetic data at scale and determining the best strategies to leverage such data into training large models. • You’ll closely collaborate with other teams like Pretraining, Postraining, Evals, and Product to define high-quality data needs that map to missing model capabilities and downstream use cases. • Staying in sync with the latest research in synthetic data generation and pretraining is key to success in this role. • You will constantly lead original research initiatives through short, time-bounded experiments while deploying highly technical engineering solutions into production. • With the volumes of data to process being massive, you'll have a performant distributed data pipeline together with a large GPU cluster at your disposal. • To deliver large, high-quality, and diverse synthetic datasets mixing natural language and code modalities to train best-in-class coding agents.

🎯 Anforderungen

• Strong machine learning and engineering background • Experience with Large Language Models (LLM) • Understanding of how LLMs learn • Data ablations and scaling laws • Post-training techniques • Training reasoning and agentic models • Experience with implementing cost-efficient, complex pipelines to generate synthetical datasets at scale optimizing for data quality, correctness, diversity, etc. • Experience with evals tracking model capabilities (general knowledge, reasoning, math, coding, long-context, etc) • Experience in building trillion-scale pretraining datasets, and familiarity with concepts like data curation, deduplication, data mixing, tokenization, curriculum, impact of data repetition, etc. • Excellent programming skills in Python • Strong prompt engineering skills • Experience working with large-scale GPU clusters and distributed data pipelines • Strong obsession with data quality • Research experience: Author of scientific papers on any of the topics: applied deep learning, LLMs, source code generation, etc. - is a nice to have • Can freely discuss the latest papers and descend to fine details • Is reasonably opinionated

🏖️ Vorteile

• Fully remote work & flexible hours • 37 days/year of vacation & holidays • Health insurance allowance for you and dependents • Company-provided equipment • Wellbeing, always-be-learning and home office allowances • Frequent team get togethers • Great diverse & inclusive people-first culture

Jetzt Bewerben

Ähnliche Jobs

🕒 vor 4 Monaten

Helix Workforce

11 - 50

🎯 Rekrutierung

🤝 B2B

Junior/Mid-level CRM Developer responsible for designing and maintaining CRM software solutions. Join a dynamic team to develop applications for Android and iOS platforms.

🇺🇸 Vereinigte Staaten – Remote

💵 $480.000 - $600.000 / Jahr

⏰ Vollzeit

🟡 Mittelstufe

🟠 Senior

🖥 Softwareentwickler

🗣️🇺🇸🇬🇧 Englisch erforderlich

🕒 vor 4 Monaten

Harness

501 - 1000

☁️ SaaS

🔒 Cybersecurity

Webflow Developer optimizing marketing website for Harness' AI-powered software delivery platform. Collaborating with teams to ensure seamless and engaging user experiences while maintaining design integrity.

🇺🇸 Vereinigte Staaten – Remote

💵 $105.000 - $120.000 / Jahr

⏰ Vollzeit

🟡 Mittelstufe

🟠 Senior

🖥 Softwareentwickler

🦅 H1B-Visum-Sponsor

info

🗣️🇺🇸🇬🇧 Englisch erforderlich

🕒 vor 4 Monaten

Miratech

501 - 1000

Developing IVR applications for voice contact center systems at Miratech. Collaborating with teams to enhance customer experience through technical improvements.

🇺🇸 Vereinigte Staaten – Remote

💰 Private Equity Round im 2022-04

⏰ Vollzeit

🟠 Senior

🖥 Softwareentwickler

🗣️🇺🇸🇬🇧 Englisch erforderlich

🕒 vor 4 Monaten

OpenRouter

1 - 10

🤖 Künstliche Intelligenz

☁️ SaaS

📚 Bildung

Founding Product Marketer for OpenRouter, focusing on developer messaging and AI content systems. Lead product launches and create engaging content for technical audiences.

🇺🇸 Vereinigte Staaten – Remote

⏰ Vollzeit

🟠 Senior

🖥 Softwareentwickler

🗣️🇺🇸🇬🇧 Englisch erforderlich

🕒 vor 4 Monaten

Anchorage Digital

201 - 500

💸 Finanzen

₿ Crypto

☁️ SaaS

Technical recruiter responsible for full-cycle recruiting for blockchain engineering teams. Sourcing, screening, and mentoring while building financial infrastructure for institutions.

🇺🇸 Vereinigte Staaten – Remote

💰 €350.000.000 Series D im 2021-12

⏰ Vollzeit

🟠 Senior

🖥 Softwareentwickler

🦅 H1B-Visum-Sponsor

info

🗣️🇺🇸🇬🇧 Englisch erforderlich