Staff Software Engineer, Inference Infrastructure

11 - 50 employees

🤖 Artificial Intelligence

🏢 Enterprise

☁️ SaaS

Artificial Intelligence • Enterprise • SaaS

Cohere is a leading enterprise AI platform optimized for generative AI, search and discovery, and advanced retrieval. The company offers AI-powered applications designed to augment and elevate the global workforce, helping businesses thrive in the AI era. Cohere provides solutions such as embedding and reranking models, allowing enterprises to efficiently retrieve information and build powerful applications. The company offers flexible deployment options for enterprise-grade AI, on any cloud or on-premises, and provides extensive developer resources and support. Cohere is committed to scaling intelligence to serve humanity, making intelligence abundant, affordable, and accessible.

Staff Software Engineer, Inference Infrastructure

🕒 January 13

🏢🏡 San Francisco – Hybrid

⏰ Full Time

🔴 Lead

🧑‍💻 Full-stack Engineer

🦅 H1B Visa Sponsor

AWS

Azure

Cloud

Distributed Systems

Google Cloud Platform

Kubernetes

Linux

Apply Now

Find Similar Remote Jobs

📊 Check your resume score for this job

Improve your chances of getting an interview by checking your resume score before you apply.

Cohere

11 - 50 employees

🤖 Artificial Intelligence

🏢 Enterprise

☁️ SaaS

Artificial Intelligence • Enterprise • SaaS

📋 Description

• Join a mission to scale intelligence and serve humanity by building AI systems • Work closely with technical teams to deploy optimized NLP models to production • Interface with customers to create customized deployments

🎯 Requirements

• 5+ years of engineering experience running production infrastructure at a large scale • Experience designing large, highly available distributed systems with Kubernetes, and GPU workloads on those clusters • Experience with Kubernetes dev and production coding and support • Experience with GCP, Azure, AWS, OCI, multi-cloud on-prem / hybrid serving • Experience in designing, deploying, supporting, and troubleshooting in complex Linux-based computing environments • Experience in compute/storage/network resource and cost management • Excellent collaboration and troubleshooting skills to build mission-critical systems, and ensure smooth operations and efficient teamwork • The grit and adaptability to solve complex technical challenges that evolve day to day • Familiarity with computational characteristics of accelerators (GPUs, TPUs, and/or custom accelerators), especially how they influence latency and throughput of inference. • Strong understanding or working experience with distributed systems. • Experience in Golang, C++ or other languages designed for high-performance scalable servers).

🏖️ Benefits

• An open and inclusive culture and work environment • Work closely with a team on the cutting edge of AI research • Weekly lunch stipend, in-office lunches & snacks • Full health and dental benefits, including a separate budget to take care of your mental health • 100% Parental Leave top-up for up to 6 months • Personal enrichment benefits towards arts and culture, fitness and well-being, quality time, and workspace improvement • Remote-flexible, offices in Toronto, New York, San Francisco, London and Paris, as well as a co-working stipend • 6 weeks of vacation (30 working days!)

Apply Now

Similar Jobs

Staff Software Engineer – Platform

🕒 January 1

EvenUp

51 - 200

🤖 Artificial Intelligence

☁️ SaaS

Staff Software Engineer developing and scaling backend systems that support LLM applications at EvenUp. Collaborating cross-functionally with product and infrastructure teams to drive AI initiatives.

🏢🏡 San Francisco – Hybrid

💵 $114.8k - $317.2k / year

⏰ Full Time

🔴 Lead

🧑‍💻 Full-stack Engineer

Distributed Systems

Java

Node.js

Python

Staff Engineer

🕒 December 25, 2025

Pear VC

11 - 50

🤖 Artificial Intelligence

🧬 Biotechnology

Staff Engineer working across the stack in a hybrid role at Tanagram. Involved in building tools that accelerate agentic coding and enhance software development reliability.

🏢🏡 San Francisco – Hybrid

💵 $180k - $250k / year

⏰ Full Time

🔴 Lead

🧑‍💻 Full-stack Engineer

TypeScript

Staff Software Engineer – AI/ML

🕒 December 18, 2025

AlphaMeld Corporation

11 - 50

🧬 Biotechnology

🤖 Artificial Intelligence

AI/ML Engineer at Onos Health developing models to optimize healthcare administration. Collaborating in a hybrid work environment based in San Francisco while contributing to impactful AI projects.

🏢🏡 San Francisco – Hybrid

💵 $180k - $250k / year

⏰ Full Time

🔴 Lead

🧑‍💻 Full-stack Engineer

Python

Staff Software Engineer, Product Engineer

🕒 December 16, 2025

Gusto

1001 - 5000

👥 HR Tech

💳 Fintech

☁️ SaaS

Staff Software Engineer developing customer-facing products at Gusto. Overseeing projects end-to-end by influencing feature specs and building backend APIs.

🏢🏡 San Francisco – Hybrid

💵 $163k - $204k / year

⏰ Full Time

🔴 Lead

🧑‍💻 Full-stack Engineer

JavaScript

React

Ruby

Ruby on Rails