Senior AI Test Engineer

Job not on LinkedIn

🔥 0 minutes ago

Apply Now
Find Similar Remote Jobs

📊 Check your resume score for this job

Improve your chances of getting an interview by checking your resume score before you apply.

Logo of Vialto

Vialto

5001 - 10000 employees

🤝 B2B

👥 HR Tech

☁️ SaaS

💰 $225M Private Equity Round - Vialto Partners on 2024-11

B2B • HR Tech • SaaS

Vialto is a global mobility services and technology company that helps organizations and individuals manage cross-border work, immigration, tax, payroll, compensation, and related compliance. It combines expert advisory and managed services (immigration, tax, social security, rewards, remote work and business travel support) with a unified data and AI platform and open API integrations to deliver end-to-end, role-based solutions for enterprise clients. Vialto focuses on enabling employers and mobile workers to operate across jurisdictions with speed, control, and reduced risk.

📋 Description

• Translate AI testing strategy into executable test scenarios across LLM outputs, document classification, extraction accuracy, agent workflows, and edge cases • Design adversarial and boundary test inputs to expose hallucination, misclassification, and failure modes • Validate AI outputs for structure, consistency, accuracy, and production readiness against defined performance thresholds • Build reusable Python-based evaluation frameworks, including output validation, hallucination detection, and scoring mechanisms • Develop parameterized test scripts reusable across features, models, and releases • Implement AI-as-Judge frameworks, including prompt design, scoring logic, and calibration of evaluation reliability • Embed evaluation frameworks into CI/CD pipelines to support continuous testing and deployment • Design and operate drift detection frameworks using fixed baseline datasets and scheduled re-evaluation • Establish thresholds to distinguish acceptable variation from performance degradation • Enable release gating by identifying regressions prior to production deployment • Build and maintain ground truth datasets in partnership with subject matter experts • Define standards for classification, extraction accuracy, and acceptable output characteristics • Continually update datasets to reflect evolving business requirements and use cases • Test end-to-end agentic workflows, validating data integrity, error propagation, and fallback behavior • Perform API-level testing of AI pipeline endpoints using Python and Postman/Newman • Validate data persistence and integrity across system layers using SQL • Partner with engineering teams to ensure testability, observability, and system reliability • Define and scale standardized AI evaluation patterns and reusable quality frameworks across VLabs • Contribute to enterprise AI quality standards and reference architectures • Ensure adherence to Responsible AI, data privacy, and governance requirements • Support auditability, traceability, and transparency of AI outputs and evaluation processes • Translate evaluation results into actionable insights for engineering, product, and business stakeholders • Support decision-making on model readiness, release risk, and performance trade-offs • Proactively identify risks, patterns, and systemic issues and escalate appropriately

🎯 Requirements

• 7+ years in software testing, including 2–3 years focused on AI/ML-enabled systems in production environments • Proven experience designing and executing AI evaluation frameworks and quality strategies • Strong track record building ground truth datasets, drift detection systems, and scalable evaluation pipelines • Experience testing multi-step agentic workflows and AI-driven automation systems • Experience operating in fast-paced, iterative delivery environments • Background in regulated or compliance-driven environments preferred • Advanced Python programming for evaluation frameworks, batch processing, and data analysis • Experience with LLM evaluation tools such as deepeval, RAGAS, promptfoo, or similar • Strong capabilities in: AI output validation, hallucination detection, and grounding checks • Drift detection frameworks and statistical evaluation methods • OCR, VLM, and document AI testing (classification, extraction, edge cases) • API testing using Python (requests/httpx) and Postman/Newman • SQL for data validation and pipeline integrity checks • Familiarity with LangChain, LlamaIndex, or similar frameworks • Experience with cloud AI platforms such as Azure AI Foundry or AWS Bedrock preferred

🏖️ Benefits

• Health insurance • Retirement plans • Flexible work arrangements • Professional development opportunities

Apply Now

Similar Jobs

🔥 16 hours ago

Clario

5001 - 10000

⚕️ Healthcare Insurance

🧬 Biotechnology

🤖 Artificial Intelligence

Test Automation Engineer implementing automated testing for web and mobile applications at Clario. Collaborating with development teams and participating in Agile ceremonies in a clinical trials context.

🕒 2 days ago

NBCUniversal

10,000+ employees

📱 Media

Network Test Engineer at Comcast responsible for design and implementation of network solutions. Providing technical support and guidance while ensuring operational excellence.

🕒 June 4

First American (India)

1001 - 5000

💸 Finance

🤝 B2B

Senior Software Engineer II developing and maintaining features in .NET and TypeScript/Node.js for First American (India). Collaborating on scalable solutions and contributing to AI Engineering initiatives.

🕒 May 31

HighLevel

201 - 500

☁️ SaaS

🤝 B2B

SDET3 ensuring quality for HighLevel's automation platform, developing robust test frameworks and collaborating with teams.

🕒 May 27

Juniper Square

201 - 500

💸 Finance

🏠 Real Estate

☁️ SaaS

QA Automation Lead at Juniper Square overseeing data quality and reliability. Collaborating with cross-functional teams to drive manual and automated testing efforts.