Data Scientist – AI & Agentic Applications & Benchmarking

August 12

Apply Now
Logo of CloudBees

CloudBees

Software • B2B • DevOps

CloudBees is a leading provider of DevOps solutions, specializing in optimizing and managing the software development lifecycle through its advanced Continuous Integration and Continuous Delivery (CI/CD) capabilities. By leveraging Jenkins, CloudBees enables developers to streamline workflows, automate processes, and enhance software delivery across hybrid and multi-cloud environments. CloudBees empowers organizations to optimize developer experiences through features like security and compliance management, feature management, and smart testing, ensuring seamless integration of tools within the DevOps ecosystem.

501 - 1000 employees

Founded 2010

🤝 B2B

💰 $95M Debt Financing on 2021-12

📋 Description

• CloudBees is the leading software delivery platform for enterprise DevOps teams. • As a high-growth startup, we empower developers to build, deploy, and manage software more efficiently. • Now, we’re bringing agentic intelligence into our platform to supercharge developer workflows—and we need a data scientist who can both drive insights and tell the story behind the metrics. • The Role: CloudBees is seeking a startup-savvy Data Scientist to help define, measure, and evangelize the impact of Agentic Applications across our platform. • You’ll work closely with engineers and product teams to prototype and measure AI and Agentic experiences, using evals, telemetry, and AI benchmarks to help the company drive the conversation in the market and with customers. • Translating performance into clear, compelling narratives to our customers and internal teams. • As a founding member of the team, you will lead the charge as equal parts builder, evaluator, and communicator—with the technical depth to prototype in Python notebooks, Claude Code, and other tools to drive clarity to write about what matters. • Key Responsibilities: Partner with our platform team to develop and prototype telemetry, eval frameworks, and benchmarks for emerging agentic systems. • Partner with product and engineering teams to measure AI outcomes and usage across customers and teams. • Help define KPIs and success metrics for AI and LLM-driven features and workflows. • Use Python notebooks to explore data, visualize insights, and test hypotheses rapidly and share insights. • Tell the story behind the numbers: Write internal documentation, performance summaries, and thought leadership around outcomes. • Enable engineering teams to instrument, log, and evaluate agent performance effectively. • Stay up to date with evolving metrics and evaluation techniques in the LLM and agentic AI ecosystem.

🎯 Requirements

• 3+ years of experience in data science or ML analytics roles, ideally in startup or high-growth environments. • Proficiency in Python , including building and sharing analysis via Jupyter notebooks . • Experience working with evals, telemetry, A/B testing, and evaluating user-facing ML systems. • Experience with AI/ML tools such as MLFlow, Hugging Face, or other Model / LLM tools. • Ability to partner with technical teams to define meaningful metrics and benchmarks. • Clear communication skills—capable of writing about outcomes , sharing learnings, and influencing stakeholders. • Comfort working in fast-paced, ambiguous environments where speed and clarity matter.

🏖️ Benefits

• Competitive salary • startup equity • excellent benefits • remote-first culture built on trust and innovation

Apply Now

Similar Jobs

August 9

Ticketmaster

10,000+ employees

🛍️ eCommerce

⚽ Sports

Drive product experimentation strategy at Ticketmaster, improving data-driven product decisions.

🇺🇸 United States – Remote

⏰ Full Time

🟡 Mid-level

🟠 Senior

📊 Data Scientist

August 9

PLUM Commercial Real Estate Lending

11 - 50

💸 Finance

🏠 Real Estate

🤝 B2B

Join PLUM as a Senior Data Scientist to develop Generative AI pipelines for fintech solutions.

🇺🇸 United States – Remote

⏰ Full Time

🟠 Senior

📊 Data Scientist

August 9

Tilt (formerly Empower)

201 - 500

💳 Fintech

👥 B2C

💸 Finance

Join Tilt to develop machine learning models that help optimize credit risk decisions. Drive business growth with innovative data-driven solutions.

🇺🇸 United States – Remote

💵 $155k - $185k / year

⏰ Full Time

🟠 Senior

📊 Data Scientist

August 8

Live Nation Entertainment

10,000+ employees

📱 Media

Lead product experimentation strategy for Ticketmaster's e-commerce product areas and collaborate cross-functionally.

🇺🇸 United States – Remote

💰 Post-IPO Debt on 2023-01

⏰ Full Time

🟡 Mid-level

🟠 Senior

📊 Data Scientist

August 8

Bluefish AI

11 - 50

🤖 Artificial Intelligence

🤝 B2B

☁️ SaaS

Full-stack Senior Data Scientist at Bluefish AI, leading impactful AI projects independently.

🇺🇸 United States – Remote

⏰ Full Time

🟠 Senior

📊 Data Scientist

Developed by Lior Neu-ner. I'd love to hear your feedback — Get in touch via DM or support@remoterocketship.com