Member of Engineering – Pre-training, Synthetic Data

51 - 200 employees

Founded 2023

🤖 Artificial Intelligence

🏢 Enterprise

Artificial Intelligence • Enterprise

poolside is a frontier AI lab and enterprise platform that builds and deploys foundation models, multi-agent systems, and developer-facing tools focused on automating complex software work. The company specializes in on-prem and VPC deployments, security-first integrations, governance, and connectors to enterprise data sources so organizations can run agents and models inside their own boundaries. Poolside embeds research and engineering with customers to deliver outcome ownership, risk controls, and measurable business impact while advancing toward AGI by starting in high-consequence software environments.

Member of Engineering – Pre-training, Synthetic Data

🕒 January 29

🇺🇸 United States – Remote

⏰ Full Time

🟡 Mid-level

🟠 Senior

🖥 Software Engineer

Python

Apply Now

Find Similar Remote Jobs

📊 Check your resume score for this job

Improve your chances of getting an interview by checking your resume score before you apply.

poolside

51 - 200 employees

Founded 2023

🤖 Artificial Intelligence

🏢 Enterprise

Artificial Intelligence • Enterprise

📋 Description

• You’ll be working on our data team focused on the quality of the datasets being delivered for training our models. • This is a hands-on role where your #1 mission would be to improve the quality of the pretraining datasets by leveraging your previous experience, intuition and training experiments. • This role particularly focuses on generating synthetic data at scale and determining the best strategies to leverage such data into training large models. • You’ll closely collaborate with other teams like Pretraining, Postraining, Evals, and Product to define high-quality data needs that map to missing model capabilities and downstream use cases. • Staying in sync with the latest research in synthetic data generation and pretraining is key to success in this role. • You will constantly lead original research initiatives through short, time-bounded experiments while deploying highly technical engineering solutions into production. • With the volumes of data to process being massive, you'll have a performant distributed data pipeline together with a large GPU cluster at your disposal. • To deliver large, high-quality, and diverse synthetic datasets mixing natural language and code modalities to train best-in-class coding agents.

🎯 Requirements

• Strong machine learning and engineering background • Experience with Large Language Models (LLM) • Understanding of how LLMs learn • Data ablations and scaling laws • Post-training techniques • Training reasoning and agentic models • Experience with implementing cost-efficient, complex pipelines to generate synthetical datasets at scale optimizing for data quality, correctness, diversity, etc. • Experience with evals tracking model capabilities (general knowledge, reasoning, math, coding, long-context, etc) • Experience in building trillion-scale pretraining datasets, and familiarity with concepts like data curation, deduplication, data mixing, tokenization, curriculum, impact of data repetition, etc. • Excellent programming skills in Python • Strong prompt engineering skills • Experience working with large-scale GPU clusters and distributed data pipelines • Strong obsession with data quality • Research experience: Author of scientific papers on any of the topics: applied deep learning, LLMs, source code generation, etc. - is a nice to have • Can freely discuss the latest papers and descend to fine details • Is reasonably opinionated

🏖️ Benefits

• Fully remote work & flexible hours • 37 days/year of vacation & holidays • Health insurance allowance for you and dependents • Company-provided equipment • Wellbeing, always-be-learning and home office allowances • Frequent team get togethers • Great diverse & inclusive people-first culture

Apply Now

Similar Jobs

Senior Synon Developer

🕒 January 28

Endava

10,000+ employees

🏢 Enterprise

Senior Synon Developer involved in enhancing RxCLAIM/Claim Adjudication systems. Collaborating on changes to claim processing logic and integrations for a tech-forward company.

🇺🇸 United States – Remote

💵 $120k - $140k / year

💰 Post-IPO Debt on 2023-02

⏰ Full Time

🟠 Senior

🖥 Software Engineer

CRM Developer

🕒 January 28

Helix Workforce

11 - 50

🎯 Recruiter

🤝 B2B

Junior/Mid-level CRM Developer responsible for designing and maintaining CRM software solutions. Join a dynamic team to develop applications for Android and iOS platforms.

🇺🇸 United States – Remote

💵 $480k - $600k / year

⏰ Full Time

🟡 Mid-level

🟠 Senior

🖥 Software Engineer

Android

iOS

Java

Python

Webflow Developer

🕒 January 27

Harness

501 - 1000

☁️ SaaS

🔒 Cybersecurity

Webflow Developer optimizing marketing website for Harness' AI-powered software delivery platform. Collaborating with teams to ensure seamless and engaging user experiences while maintaining design integrity.

🇺🇸 United States – Remote

💵 $105k - $120k / year

⏰ Full Time

🟡 Mid-level

🟠 Senior

🖥 Software Engineer

🦅 H1B Visa Sponsor

Senior IVR Application Developer

🕒 January 27

Miratech

501 - 1000

Developing IVR applications for voice contact center systems at Miratech. Collaborating with teams to enhance customer experience through technical improvements.

🇺🇸 United States – Remote

💰 Private Equity Round on 2022-04

⏰ Full Time

🟠 Senior

🖥 Software Engineer

Java

Spring

Spring Boot

SpringBoot

SQL

Developer Product Marketing Lead

🕒 January 24

OpenRouter

1 - 10

🤖 Artificial Intelligence

☁️ SaaS

📚 Education

Founding Product Marketer for OpenRouter, focusing on developer messaging and AI content systems. Lead product launches and create engaging content for technical audiences.

🇺🇸 United States – Remote

⏰ Full Time

🟠 Senior

🖥 Software Engineer