Post a Job Affiliates

Search Remote Jobs

Truelogic Software

Website LinkedIn All Job Openings

SaaS • B2B • Enterprise

Truelogic Software is a nearshore software development company specializing in agile staff augmentation services. They focus on providing custom outsourced software development with a team of highly skilled engineers from Latin America. Truelogic Software partners with both startups and Fortune 500 companies, offering solutions that align with their clients' time zones and ensuring high-quality outcomes through collaboration and responsiveness. With a presence in over 25 countries, Truelogic emphasizes remote work for better quality of life, and their engineers are experienced in various industries, delivering a wide range of successful projects globally.

501 - 1000 employees

Founded 2004

☁️ SaaS

🤝 B2B

🏢 Enterprise

AI/ML Evaluation Engineer

November 20

🇨🇴 Colombia – Remote

⏰ Full Time

🟡 Mid-level

🟠 Senior

🤖 Artificial Intelligence

Apply Now

Truelogic Software

Website LinkedIn All Job Openings

SaaS • B2B • Enterprise

501 - 1000 employees

Founded 2004

☁️ SaaS

🤝 B2B

🏢 Enterprise

📋 Description

• Write Python and SQL scripts to evaluate outputs from large language models (LLMs). • Design and implement LLM-as-Judge evaluations with clear scoring rubrics (faithfulness, relevance, completeness, correctness). • Define and calculate metrics such as exact match, token-level F1, ROUGE, cosine similarity, and subjective rubric scores. • Build and maintain ground-truth datasets for benchmarking and regression testing. • Automate evaluation workflows and integrate them into CI/CD pipelines. • Analyze large unstructured datasets to identify inconsistencies, anomalies, biases, and missing values. • Diagnose failure modes such as hallucinations, irrelevant answers, and formatting issues. • Produce clear reports summarizing evaluation findings and quality trends. • Collaborate with AI engineers, QA, data scientists, and product managers to define quality standards and release criteria. • Document all processes, evaluation setups, specifications, and architecture diagrams. • Maintain reproducibility and traceability for all evaluation runs and datasets.

🎯 Requirements

• Advanced Python skills, including writing, debugging, and automating scripts. • Strong SQL proficiency and experience manipulating large datasets. • Hands-on experience with Python libraries such as Pandas and NumPy. • Ability to clean, standardize, and analyze structured and unstructured data. • Experience inspecting datasets, visualizing distributions, and preparing data for analysis. • Solid understanding of large language models, prompt behavior, hallucinations, and grounding concepts. • Knowledge of retrieval-augmented generation (RAG) flows and embedding-based search. • Awareness of vector similarity concepts such as cosine similarity and dot product. • Experience with at least one LLM evaluation framework (RAGAS, TruLens, LangSmith, etc.) or ability to quickly learn one. • Ability to design or implement custom LLM-as-Judge evaluation systems. • Applied understanding of statistical concepts such as variance, confidence intervals, precision/recall, and correlation. • Ability to translate ambiguous quality expectations into measurable metrics. • Familiarity with cloud-run services and automation pipelines, preferably on Google Cloud Platform (GCP). • Ability to learn new infrastructure tools quickly. • Strong analytical and problem-solving abilities for open-ended technical challenges. • Excellent communication skills for collaborating with cross-functional teams and presenting technical findings.

🏖️ Benefits

• 100% Remote Work: Enjoy the freedom to work from the location that helps you thrive. All it takes is a laptop and a reliable internet connection. • Highly Competitive USD Pay: Earn an excellent, market-leading compensation in USD, that goes beyond typical market offerings. • Paid Time Off: We value your well-being. Our paid time off policies ensure you have the chance to unwind and recharge when needed. • Work with Autonomy: Enjoy the freedom to manage your time as long as the work gets done. Focus on results, not the clock. • Work with Top American Companies: Grow your expertise working on innovative, high-impact projects with Industry-Leading U.S. Companies.

Apply Now

Similar Jobs

AI Developer

November 19

Baja Tomi Sdn Bhd

11 - 50

🤝 B2B

🎯 Recruiter

👥 HR Tech

Website LinkedIn All Job Openings

Collaborate on developing Ecommerce platforms for independent retailers in North America. Execute fast and iterate to help turn ideas into usable products.

🇨🇴 Colombia – Remote

⏰ Full Time

🟡 Mid-level

🟠 Senior

🤖 Artificial Intelligence

Apply

View Job

AI Technical Manager

November 15

AGENTIC

11 - 50

🤖 Artificial Intelligence

🤝 B2B

🏢 Enterprise

Website LinkedIn All Job Openings

Technical Lead overseeing development of a cutting-edge application using Laravel and Angular for ABA therapy business. Providing leadership, ensuring HIPAA compliance, and managing project timelines in a remote role.

🇨🇴 Colombia – Remote

⏰ Full Time

🟡 Mid-level

🟠 Senior

🤖 Artificial Intelligence

Apply

View Job

AI Application Analyst

November 13

Twilio

5001 - 10000

Website LinkedIn All Job Openings

AI Application Analyst at Twilio designing and developing AI assistants to enhance People Team productivity. Collaborate with stakeholders to build effective AI solutions in a dynamic remote work environment.

🇨🇴 Colombia – Remote

⏰ Full Time

🟡 Mid-level

🟠 Senior

🤖 Artificial Intelligence

Apply

View Job

Search More Artificial Intelligence Jobs

Built by Lior Neu-ner. I'd love to hear your feedback — Get in touch via DM or support@remoterocketship.com