Staff Research Scientist – Reinforcement Learning

Job not on LinkedIn

🔥 2 hours ago

🏄 California – Remote

info

💵 $200k - $250k / year

⏰ Full Time

🔴 Lead

🧬 Research Scientist

🦅 H1B Visa Sponsor

info
Apply Now
Find Similar Remote Jobs

📊 Check your resume score for this job

Improve your chances of getting an interview by checking your resume score before you apply.

Logo of Thermo Fisher Scientific

Thermo Fisher Scientific

10,000+ employees

⚕️ Healthcare Insurance

🧬 Biotechnology

💊 Pharmaceuticals

Healthcare Insurance • Biotechnology • Pharmaceuticals

Thermo Fisher Scientific is a leading global supplier of scientific instrumentation, reagents and consumables, and software services. They support the life sciences, healthcare, and analytical chemistry sectors by providing robust solutions for laboratory research and production processes. Their innovative products and services encompass a range of applications, including diagnostics, lab workflow automation, and drug discovery.

📋 Description

• Design simulation environments and digital twins for enterprise workflows • Post-train LLM agents using RLHF, DPO, GRPO, PPO, and emerging methods • Build pipelines that convert human-labeled traces and verifiable signals into training data • Architect multi-turn, tool-using agents with closed learning loops • Design reward functions and verifiers that resist reward hacking and reflect real task outcomes • Set the technical bar across the team — architecture, code review, engineering standards • Mentor researchers and engineers; drive technical direction through influence • Translate research into production; contribute to publications

🎯 Requirements

• 7+ years in ML/AI research or engineering; 3+ years at senior/staff level • MS or PhD in Computer Science, Machine Learning, or related field (or equivalent) • 5+ years hands-on RL — environment design, reward engineering, policy optimization — with at least one production deployment LLM Post-Training • 3+ years fine-tuning LLMs with hands-on RL post-training (RLHF, DPO, GRPO, PPO) • Expert-level implementation of RLHF pipelines, reward modeling (Bradley-Terry), DPO, and KTO • Working knowledge of modern post-training and rollout-serving libraries (TRL, veRL, OpenRLHF, SkyRL) • Experience building LLM-based agents: tool use, multi-turn reasoning, trajectory evaluation • Strong Python and software engineering skills — comfortable building production pipelines, not just notebooks • Deep expertise in MDPs, policy gradient methods (PPO, SAC), and temporal difference learning • Hands-on experience with Gymnasium-based environments and reward engineering (sparse vs. dense)

🏖️ Benefits

• N/A

Apply Now

Similar Jobs

🕒 June 3

SandboxAQ

51 - 200

🤖 Artificial Intelligence

🔒 Cybersecurity

💊 Pharmaceuticals

Staff Research Scientist focused on catalyst simulation at SandboxAQ. Leading modeling workflows for industrial catalytic processes and collaborating with validation partners.

🕒 June 3

Upstart

1001 - 5000

Principal Applied Scientist defining long-term optimization strategies for Upstart's offer decisioning systems. Collaborating with cross-functional teams to ensure coherence in modeling and marketplace efficiency.

🕒 May 26

ONE

201 - 500

💳 Fintech

Applied Scientist designing and deploying AI and ML solutions for OnePay. Collaborating with product, engineering, and analytics teams to enhance customer experience and drive business growth.

🕒 May 20

Praxis

11 - 50

🧬 Biotechnology

⚕️ Healthcare Insurance

🔬 Science

Principal Scientist focused on analytical development for oligonucleotide-based therapeutics. Collaborating with CMC teams to ensure high-quality analytical strategies and support innovative CNS therapies.

🕒 May 8

Quantinuum

201 - 500

🤖 Artificial Intelligence

🔒 Cybersecurity

🔬 Science

Seeking a Quantum Research Scientist to lead the development of quantum operations for next generation QCCD trapped-ion quantum computers, integrating advanced laboratory techniques and remote automation.