Senior Machine Learning Systems Engineer – Training Optimization

Job not on LinkedIn

November 21

Apply Now
Logo of Canva

Canva

SaaS • Media • Education

Canva is a versatile online design platform that empowers users to create a wide range of professional designs with ease. From social media posts and presentations to business cards and posters, Canva provides thousands of templates and design tools to help users bring their creative ideas to life. The platform also offers a suite of AI-powered features to enhance creativity and productivity, including tools like Magic Write for copy generation and Magic Edit for photo transformations. Canva caters to individuals, teams, and enterprises, making it an ideal solution for collaborative design and workflow management. It is also committed to sustainability and social impact, offering free educational and nonprofit access to its premium features.

1001 - 5000 employees

Founded 2013

☁️ SaaS

📱 Media

📚 Education

💰 $200M Venture Round on 2021-09

📋 Description

• Design, implement, and optimize large-scale machine learning systems for training and inference. • Improve all aspects of performance, including GPU utilization, communication overhead, and memory efficiency. • Partner with research and modeling teams to align systems with algorithmic needs. • Evaluate and apply best practices for distributed training using industry-leading frameworks. • Dive deep into low-level optimization, including custom CUDA or Triton kernels. • Debug, profile, and fine-tune training workflows to unlock new levels of scalability.

🎯 Requirements

• Strong background in LLMs, multimodal AI, or diffusion models. • Proficiency in Python. • Familiarity with a system programming language (e.g. C++ or Rust) is a plus. • Deep knowledge of PyTorch or JAX as well as libraries such as Megatron-LM, NeMo, or DeepSpeed. • Familiarity with common optimization techniques such as FSDP/ZeRO, gradient checkpointing, or low-precision data types. • Hands-on experience writing custom GPU kernels in CUDA or Triton. • Excellent communication and problem-solving skills, incl. full proficiency in English.

🏖️ Benefits

• Remote work options

Apply Now

Similar Jobs

August 10

Support Level 2 for API trading platform; coordinate with Devs and clients.\nBridge clients and internal teams; document issues and workarounds.

Grafana

Java

JavaScript

Linux

Python

SQL

August 9

Act as level 2 support engineer providing technical assistance and recommendations to clients.

Grafana

Java

JavaScript

Linux

Python

SQL

Built by Lior Neu-ner. I'd love to hear your feedback — Get in touch via DM or support@remoterocketship.com