
Artificial Intelligence • Enterprise • SaaS
Cohere is a leading enterprise AI platform optimized for generative AI, search and discovery, and advanced retrieval. The company offers AI-powered applications designed to augment and elevate the global workforce, helping businesses thrive in the AI era. Cohere provides solutions such as embedding and reranking models, allowing enterprises to efficiently retrieve information and build powerful applications. The company offers flexible deployment options for enterprise-grade AI, on any cloud or on-premises, and provides extensive developer resources and support. Cohere is committed to scaling intelligence to serve humanity, making intelligence abundant, affordable, and accessible.
11 - 50 employees
🤖 Artificial Intelligence
🏢 Enterprise
☁️ SaaS
3 days ago

Artificial Intelligence • Enterprise • SaaS
Cohere is a leading enterprise AI platform optimized for generative AI, search and discovery, and advanced retrieval. The company offers AI-powered applications designed to augment and elevate the global workforce, helping businesses thrive in the AI era. Cohere provides solutions such as embedding and reranking models, allowing enterprises to efficiently retrieve information and build powerful applications. The company offers flexible deployment options for enterprise-grade AI, on any cloud or on-premises, and provides extensive developer resources and support. Cohere is committed to scaling intelligence to serve humanity, making intelligence abundant, affordable, and accessible.
11 - 50 employees
🤖 Artificial Intelligence
🏢 Enterprise
☁️ SaaS
• Build and own the training framework responsible for large-scale LLM training. • Design distributed training abstractions (data/tensor/pipeline parallelism, FSDP/ZeRO strategies, memory management, checkpointing). • Improve training throughput and stability on multi-node clusters (e.g., GB200/300, AMD, H200/100). • Develop and maintain tooling for monitoring, logging, debugging, and developer ergonomics. • Collaborate closely with infra teams to ensure Slurm setups, container environments, and hardware configurations support high-performance training. • Investigate and resolve performance bottlenecks across the ML systems stack. • Build robust systems that ensure reproducible, debuggable, large-scale runs.
• Strong engineering experience in large-scale distributed training or HPC systems. • Deep familiarity with JAX internals, distributed training libraries, or custom kernels/fused ops. • Experience with multi-node cluster orchestration (Slurm, Ray, Kubernetes, or similar). • Comfort debugging performance issues across CUDA/NCCL, networking, IO, and data pipelines. • Experience working with containerized environments (Docker, Singularity/Apptainer). • A track record of building tools that increase developer velocity for ML teams. • Excellent judgment around trade-offs: performance vs complexity, research velocity vs maintainability. • Strong collaboration skills — you’ll work closely with infra, research, and deployment teams.
• An open and inclusive culture and work environment • Work closely with a team on the cutting edge of AI research • Weekly lunch stipend, in-office lunches & snacks • Full health and dental benefits, including a separate budget to take care of your mental health • 100% Parental Leave top-up for up to 6 months • Personal enrichment benefits towards arts and culture, fitness and well-being, quality time, and workspace improvement • Remote-flexible, offices in Toronto, New York, San Francisco, London and Paris, as well as a co-working stipend • 6 weeks of vacation (30 working days!)
Apply NowNovember 26
10,000+ employees
Embedded Engineer at eInfochips developing real time embedded software and firmware for clients. Responsibilities include software testing, documentation, and analysis of technical requirements.
November 20
IT Systems Engineer at Perforce managing flexible infrastructure components for public and private solutions. Collaborating within the IT infrastructure team in the Bracknell office.
November 18
Systems Engineer for Public Safety Solutions deploying and maintaining SAFE operating environments. Providing technical support for mission-critical control room operations in the United Kingdom.
🇬🇧 United Kingdom – Remote
💰 $71k Grant on 2014-09
⏰ Full Time
🟡 Mid-level
🟠 Senior
⚙️ Systems Engineer
🇬🇧 UK Skilled Worker Visa Sponsor
November 9
Senior IT Systems Engineer managing ransomware restoration events for global clients affected by cyber threats. Leading technical teams and providing oversight for successful recovery operations across computing infrastructures.
November 9
Technical Engineer providing cyber disaster recovery and incident response support. Working with clients to restore services and maintain cybersecurity infrastructure.
🗣️🇫🇷 French Required