Research Engineer – Evaluations

Job not on LinkedIn

October 31

Apply Now
Logo of Canva

Canva

SaaS • Media • Education

Canva is a versatile online design platform that empowers users to create a wide range of professional designs with ease. From social media posts and presentations to business cards and posters, Canva provides thousands of templates and design tools to help users bring their creative ideas to life. The platform also offers a suite of AI-powered features to enhance creativity and productivity, including tools like Magic Write for copy generation and Magic Edit for photo transformations. Canva caters to individuals, teams, and enterprises, making it an ideal solution for collaborative design and workflow management. It is also committed to sustainability and social impact, offering free educational and nonprofit access to its premium features.

1001 - 5000 employees

Founded 2013

☁️ SaaS

📱 Media

📚 Education

💰 $200M Venture Round on 2021-09

📋 Description

• You will engineer sophisticated AI agents that can automatically assess the quality and human alignment of our generative design models. • This high-impact role focuses on building the practical systems that make cutting-edge research effective, to provide a rapid feedback loop that guides the future of design generation at Canva, ultimately empowering millions of users to create. • Design, build, and optimize the infrastructure for an "MLLM-as-a-Judge" evaluation system for scalable, automated feedback. • Implement and experiment with inference-time alignment techniques (Prompt Engineering, RAG, ICL) to directly improve model output quality. • Establish and manage a comprehensive benchmarking process to compare various foundation models on design-centric tasks. • Analyze evaluation data to identify model failure modes and provide actionable recommendations to the research team. • Collaborate with research scientists and ML engineers to integrate the agentic judge system into the model development lifecycle. • Translate the latest research in LLM evaluation and agentic AI into practical, production-ready engineering solutions.

🎯 Requirements

• You have a strong understanding of generative AI models (e.g., Diffusion Models, GANs, Transformers) and their architectures, with practical experience that informs robust evaluation strategies • Excel at creating data-driven evaluation methodologies, turning user analytics into clear, actionable insights. • You’ve successfully managed or optimized large-scale distributed model training across hundreds of GPUs • You have a solid understanding of machine learning, have worked with PyTorch and know how to optimize such codes for speed • You have disciplined coding practices, and are experienced with code reviews and pull requests. • You have experience working in cloud environments, ideally AWS

🏖️ Benefits

• Equity packages - we want our success to be yours too • Inclusive parental leave policy that supports all parents & carers • An annual Vibe & Thrive allowance to support your wellbeing, social connection, home office setup & more • Flexible leave options that empower you to be a force for good, take time to recharge and supports you personally

Apply Now
Built by Lior Neu-ner. I'd love to hear your feedback — Get in touch via DM or support@remoterocketship.com