QA Engineer, AI Products

🕒 May 19

Apply Now
Find Similar Remote Jobs

📊 Check your resume score for this job

Improve your chances of getting an interview by checking your resume score before you apply.

Logo of MDCalc

MDCalc

11 - 50 employees

Founded 2011

⚕ Healthcare Insurance

☁ SaaS

📚 Education

Healthcare Insurance ‱ SaaS ‱ Education

MDCalc is a widely used tool that provides a comprehensive suite of medical calculators for clinicians. It helps healthcare professionals make informed decisions by offering calculations for a variety of health assessments and treatment strategies, covering areas such as cardiac risk, pulmonary embolism, liver fibrosis, and more. MDCalc is used by millions of clinicians across the globe to aid in treating hundreds of millions of patients, ensuring calculations are re-checked and not used as a sole guide for patient care. The platform also offers educational resources and integrates with electronic health records.

📋 Description

‱ Design and execute test strategies for LLM-powered features, including prompt regression testing, output evaluation, and hallucination detection ‱ Build and maintain automated evaluation pipelines (eval sets, golden datasets, LLM-as-judge frameworks) to catch quality regressions in non-deterministic outputs ‱ Perform black-box and exploratory testing of MDCalc's AI features across web and mobile, with particular attention to clinical accuracy, safety, and edge cases ‱ Define quality metrics for AI outputs (accuracy, faithfulness, relevance, safety, latency, cost) and establish thresholds for release readiness ‱ Collaborate cross-functionally with engineers, product managers, ML/AI engineers, and clinical reviewers to define what "good" looks like for AI responses ‱ Investigate and triage AI failure modes, distinguishing model issues, prompt issues, retrieval issues, and integration bugs ‱ Participate in team discussions, offering feedback on testability, risks, prompt design, and guardrails ‱ Help develop QA strategies to expand future testing capacity, automation, and evaluation coverage as the AI product surface grows

🎯 Requirements

‱ 5+ years of experience in software QA, with at least 1 year of hands-on testing of LLM-based or AI/ML-powered features ‱ Strong understanding of QA principles, test case creation/documentation, and best practices for both deterministic and non-deterministic systems ‱ Hands-on experience with LLM tooling and concepts: prompt engineering, RAG systems, evaluation frameworks (e.g., Promptfoo, Braintrust, LangSmith, DeepEval, Ragas, OpenAI Evals), and LLM APIs (OpenAI, Anthropic, etc.) ‱ Experience designing automated qualitative evaluation approaches, including LLM-as-judge, rubric-based scoring, semantic similarity checks, and golden dataset regression testing ‱ Proficiency with test automation tools, with a focus on Playwright ‱ Strong SQL skills for data validation, test data creation, and verifying data integrity across systems ‱ Familiarity with token usage, latency profiling, and cost monitoring as quality signals ‱ Eagerness to learn quickly and a positive, solutions-oriented attitude ‱ Clear and concise communicator, able to surface issues, blockers, and risks effectively when communicating ambiguous or probabilistic failures ‱ Self-motivated, proactive, and able to manage time and priorities independently

đŸ–ïž Benefits

‱ Ability to make a true difference in medicine: MDCalc is the most broadly used medical reference by physicians, used by over 65% of US attending doctors weekly ‱ Medical, Dental, & Vision Coverage, with option to extend to your dependents ‱ Company-sponsored short-term insurance ‱ Fully-paid 8 week parental leave, after 6 months of employment ‱ Company-sponsored 401k, after 3 months of employment ‱ Unlimited vacation for salaried roles - we trust you to take the time you need ‱ Bi-annual company offsites to connect, reflect, and plan together ‱ Work from home monthly stipend ‱ A culture of fun and motivated team members who believe in a greater mission here at MDCalc

Apply Now

Similar Jobs

🕒 May 19

FormativGroup

51 - 200

đŸ€ B2B

🏱 Enterprise

☁ SaaS

Salesforce QA Engineer responsible for validating and testing Salesforce solutions for Public Sector projects. Collaborating with teams to ensure high-quality delivery in a remote setting.

Cloud

🕒 May 19

The Hello Team

1001 - 5000

đŸ€ B2B

🎯 Recruiter

QA Engineer responsible for testing and validating features for internal SaaS platform. Collaborating with development teams and managing the QA process for product stability.

🕒 May 19

Huron

5001 - 10000

đŸ€ B2B

🏱 Enterprise

💾 Finance

Senior Director leading Revenue Cycle Learning and Quality Assurance for healthcare organizations at Huron. Responsible for training, development, and operational excellence in revenue cycle practices.

🕒 May 19

NMS

1001 - 5000

đŸ€ B2B

🔐 Security

🏱 Enterprise

Hotel Housekeeping Manager overseeing housekeeping services at Qavartarvik Customer Lodge. Responsible for staff training, operational compliance, and quality service delivery.

🕒 May 19

Gainwell Technologies

10,000+ employees

⚕ Healthcare Insurance

Quality Assurance expert providing hands-on QA support for healthcare operations and quality improvements. Collaborating with cross-functional teams to ensure compliance and operational excellence.