Applied Data Scientist, LLM Evaluation

Vaga não está no LinkedIn

🕒 Abril 24

🗣️🇺🇸🇬🇧 Inglês obrigatório

Candidatar-se
Encontrar Vagas Remotas Similares

📊 Verifique sua pontuação de currículo para esta vaga

Melhore suas chances de conseguir uma entrevista verificando sua pontuação de currículo antes de se candidatar.

Logo of Driver

Driver

11 - 50 funcionários

Fundada em 2023

☁️ SaaS

🔌 API

⚡ Produtividade

SaaS • API • Productivity

Driver é uma plataforma de documentação automatizada que analisa de forma inteligente bases de código e gera documentação técnica contextual. Ela simplifica o processo de documentação, permitindo que as equipes economizem tempo e se concentrem na criação de soluções inovadoras. Com ferramentas para manter a documentação atualizada em meio a mudanças, Driver melhora os processos de integração e o engajamento do cliente, garantindo conteúdo de alta qualidade e de fácil consumo. Suas soluções em nível empresarial priorizam segurança e conformidade, tornando-se uma escolha confiável para empresas que buscam modernizar seus fluxos de trabalho de documentação.

Descrição

• Own the LLM evaluation strategy at Driver — from first principles to production infrastructure. • Define quality metrics and build evaluation datasets. • Establish what 'good' looks like for each content type across the pipeline. • Build and curate gold-standard evaluation datasets across languages and repo archetypes (monorepos, microservices, libraries, applications). • Design rubrics that capture accuracy, completeness, usefulness, and readability. • Build benchmarking and experimentation infrastructure. • Create automated evaluation pipelines that score output against reference datasets. • Instrument the content generation pipeline to support A/B comparisons — run the same codebase through two strategies and compare results. • Build tooling for LLM-as-judge evaluation and regression detection. • Integrate evaluation into CI so pipeline changes come with quality evidence. • Develop automated quality signals at scale. • Build quality checks that flag degraded output without requiring human review of every document. • Monitor content quality trends over time. • Design sampling strategies for human review that maximize signal with minimal annotation effort. • Quantify tradeoffs and inform decisions. • Run experiments on model selection, context strategies, and pipeline architecture changes. • Quantify cost/quality/latency tradeoffs. • Partner with the engineering team to turn evaluation insights into shipped improvements.

🎯 Requisitos

• Bachelor's, Master's, or PhD in Statistics, Machine Learning, Data Science, Computational Linguistics, or a related quantitative field. • Minimum 3 — 5 years in applied science, ML engineering, or data science roles with a focus on evaluation, NLP, or generative AI. 7+ years experience preferred. • Strong statistical foundations: experimental design, hypothesis testing, confidence intervals, effect sizes, power analysis. • Experience designing and running evaluations for LLM or NLP systems — you've thought carefully about what 'better' means when outputs are open-ended text. • Proficient in Python and the scientific/data stack (pandas, NumPy, scipy, sklearn). • Comfortable working in Jupyter notebooks for exploration and prototyping, and turning that work into automated pipelines. • Experience with LLM-as-judge approaches, inter-annotator agreement, and rubric design for subjective quality assessment. • Familiarity with the practical challenges of non-deterministic systems: variance decomposition, multi-run methodology, distinguishing signal from noise at scale. • Strong data storytelling — you can turn experiment results into clear recommendations that drive engineering and product decisions.

🏖️ Benefícios

• Competitive Compensation Packages - Cash & Equity • Flexible Work Culture • Unlimited Time Off + 12 Paid Company Holidays • Insurance - Health, Dental, & Vision • Life Insurance & FSA Accounts • 401(k) Retirement Accounts - Traditional, Roth, or Both • Quarterly Team Offsites

Candidatar-se

Vagas Similares

🕒 Abril 24

Socure

501 - 1000

🤖 Inteligência Artificial

🔐 Segurança

💸 Finanças

Data Scientist II developing graph-based algorithms and data pipelines for identity verification. Building core capabilities for Socure’s KYC and fraud products while collaborating with senior data scientists and engineers.

🇺🇸 Estados Unidos – Remoto (EUA)

💵 $140.000 - $170.000 / ano

💰 $450.000.000 Series E em 2021-11

⏰ Tempo Integral

🟢 Júnior

🟡 Pleno

📊 Cientista de Dados

🦅 Patrocina Visto H1B

info

🗣️🇺🇸🇬🇧 Inglês obrigatório

🕒 Abril 24

Socure

501 - 1000

🤖 Inteligência Artificial

🔐 Segurança

💸 Finanças

Senior Data Scientist driving international eKYC solutions and entity resolution for identity verification at Socure. Engaging with cross-functional teams to launch and scale innovative solutions.

🇺🇸 Estados Unidos – Remoto (EUA)

💵 $140.000 - $170.000 / ano

💰 $450.000.000 Series E em 2021-11

⏰ Tempo Integral

🟠 Sênior

📊 Cientista de Dados

🦅 Patrocina Visto H1B

info

🗣️🇺🇸🇬🇧 Inglês obrigatório

🕒 Abril 24

Socure

501 - 1000

🤖 Inteligência Artificial

🔐 Segurança

💸 Finanças

Senior Data Scientist leading ML and graph algorithm development for identity verification and KYC. Collaborating with data, product, and engineering teams at Socure.

🇺🇸 Estados Unidos – Remoto (EUA)

💵 $140.000 - $170.000 / ano

💰 $450.000.000 Series E em 2021-11

⏰ Tempo Integral

🟠 Sênior

📊 Cientista de Dados

🦅 Patrocina Visto H1B

info

🗣️🇺🇸🇬🇧 Inglês obrigatório

🕒 Abril 24

Foundation

11 - 50

₿ Cripto

🏪 Marketplace

🛍️ Comércio Eletrônico

Senior Data Scientist at City of Hope analyzing large healthcare datasets to improve cancer care delivery. Collaborating with administrative and clinical teams, applying machine learning techniques.

🇺🇸 Estados Unidos – Remoto (EUA)

💵 $58 - $93 / hora

💰 $200.000 Seed Round em 2021-02

⏰ Tempo Integral

🟠 Sênior

📊 Cientista de Dados

🦅 Patrocina Visto H1B

info

🗣️🇺🇸🇬🇧 Inglês obrigatório

🕒 Abril 24

OneStudyTeam

201 - 500

⚕️ Seguro de Saúde

🧬 Biotecnologia

💊 Farmacêutico

Senior Data Scientist advancing data-driven solutions for clinical trials at OneStudyTeam. Collaborating with cross-functional teams to improve patient enrollment and trial management through statistical models and machine learning algorithms.

🇺🇸 Estados Unidos – Remoto (EUA)

💵 $140.000 - $190.000 / ano

⏰ Tempo Integral

🟠 Sênior

📊 Cientista de Dados

🗣️🇺🇸🇬🇧 Inglês obrigatório