
AI ⢠Enterprise ⢠SaaS
Nebius Group is building one of the worldâs leading AI infrastructure companies, focusing on providing the necessary compute, storage, and tools for developers in the AI space. Based in Europe and listed on Nasdaq, Nebius has a global presence with R&D centers across Europe, North America, and Israel. The company's primary offering is an AI-centric cloud platform designed for intensive AI workloads, complemented by various other businesses involved in generative AI development, edtech, and autonomous technology.
April 1

AI ⢠Enterprise ⢠SaaS
Nebius Group is building one of the worldâs leading AI infrastructure companies, focusing on providing the necessary compute, storage, and tools for developers in the AI space. Based in Europe and listed on Nasdaq, Nebius has a global presence with R&D centers across Europe, North America, and Israel. The company's primary offering is an AI-centric cloud platform designed for intensive AI workloads, complemented by various other businesses involved in generative AI development, edtech, and autonomous technology.
⢠We are currently in search of senior and staff-level ML engineers to work on optimizing training and inference performance in a large-scale multi-GPU multi-node setups. ⢠This role will require expertise in distributed systems and high-performance computing to build, optimize, and maintain robust pipelines for training and inference. ⢠Your responsibilities will include: ⢠Architect and implement distributed training and inference pipelines leveraging techniques such as data, tensor, context, expert (MoE) and pipeline parallelism. ⢠Implement various inference optimization techniques - speculative decoding and its extensions (Medusa, EAGLE, etc.), CUDA-graphs, compile-based optimization. ⢠Implement custom CUDA/Triton kernels for performance-critical layers.
⢠A profound understanding of theoretical foundations of machine learning ⢠Deep understanding of performance aspects of large neural networks training and inference (data/tensor/context/expert parallelism, offloading, custom kernels, hardware features, attention optimizations, dynamic batching etc.) ⢠Expertise in at least one of those fields: ⢠Implementing custom efficient GPU kernels in CUDA and/or Triton ⢠Training large models on multiple nodes and implementing various parallelism techniques ⢠Inference optimization techniques - disaggregated prefill/decode, paged attention, continuous batching, speculative decoding, etc. ⢠Strong software engineering skills (we mostly use python) ⢠Deep experience with modern deep learning frameworks (we use JAX & PyTorch) ⢠Proficiency in contemporary software engineering approaches, including CI/CD, version control and unit testing ⢠Strong communication and ability to work independently
⢠Competitive salary and comprehensive benefits package. ⢠Opportunities for professional growth within Nebius. ⢠Hybrid working arrangements. ⢠A dynamic and collaborative work environment that values initiative and innovation.
Apply Now