Search Remote Jobs

Staff Software Engineer, Inference Infrastructure

🕒 January 13

🏢🏡 San Francisco – Hybrid

⏰ Full Time

🔴 Lead

🧑‍💻 Full-stack Engineer

🦅 H1B Visa Sponsor

info
Apply Now
Find Similar Remote Jobs

📊 Check your resume score for this job

Improve your chances of getting an interview by checking your resume score before you apply.

Logo of Cohere

Cohere

WebsiteLinkedIn

11 - 50 employees

🤖 Artificial Intelligence

🏢 Enterprise

☁️ SaaS

Artificial Intelligence • Enterprise • SaaS

Cohere is a leading enterprise AI platform optimized for generative AI, search and discovery, and advanced retrieval. The company offers AI-powered applications designed to augment and elevate the global workforce, helping businesses thrive in the AI era. Cohere provides solutions such as embedding and reranking models, allowing enterprises to efficiently retrieve information and build powerful applications. The company offers flexible deployment options for enterprise-grade AI, on any cloud or on-premises, and provides extensive developer resources and support. Cohere is committed to scaling intelligence to serve humanity, making intelligence abundant, affordable, and accessible.

📋 Description

• Join a mission to scale intelligence and serve humanity by building AI systems • Work closely with technical teams to deploy optimized NLP models to production • Interface with customers to create customized deployments

🎯 Requirements

• 5+ years of engineering experience running production infrastructure at a large scale • Experience designing large, highly available distributed systems with Kubernetes, and GPU workloads on those clusters • Experience with Kubernetes dev and production coding and support • Experience with GCP, Azure, AWS, OCI, multi-cloud on-prem / hybrid serving • Experience in designing, deploying, supporting, and troubleshooting in complex Linux-based computing environments • Experience in compute/storage/network resource and cost management • Excellent collaboration and troubleshooting skills to build mission-critical systems, and ensure smooth operations and efficient teamwork • The grit and adaptability to solve complex technical challenges that evolve day to day • Familiarity with computational characteristics of accelerators (GPUs, TPUs, and/or custom accelerators), especially how they influence latency and throughput of inference. • Strong understanding or working experience with distributed systems. • Experience in Golang, C++ or other languages designed for high-performance scalable servers).

🏖️ Benefits

• An open and inclusive culture and work environment • Work closely with a team on the cutting edge of AI research • Weekly lunch stipend, in-office lunches & snacks • Full health and dental benefits, including a separate budget to take care of your mental health • 100% Parental Leave top-up for up to 6 months • Personal enrichment benefits towards arts and culture, fitness and well-being, quality time, and workspace improvement • Remote-flexible, offices in Toronto, New York, San Francisco, London and Paris, as well as a co-working stipend • 6 weeks of vacation (30 working days!)

Apply Now

Similar Jobs

🕒 January 1

EvenUp

51 - 200

🤖 Artificial Intelligence

☁️ SaaS

WebsiteLinkedIn

Staff Software Engineer developing and scaling backend systems that support LLM applications at EvenUp. Collaborating cross-functionally with product and infrastructure teams to drive AI initiatives.

🏢🏡 San Francisco – Hybrid

💵 $114.8k - $317.2k / year

⏰ Full Time

🔴 Lead

🧑‍💻 Full-stack Engineer

🕒 December 25, 2025

Pear VC

11 - 50

🤖 Artificial Intelligence

🧬 Biotechnology

WebsiteLinkedIn

Staff Engineer working across the stack in a hybrid role at Tanagram. Involved in building tools that accelerate agentic coding and enhance software development reliability.

🏢🏡 San Francisco – Hybrid

💵 $180k - $250k / year

⏰ Full Time

🔴 Lead

🧑‍💻 Full-stack Engineer

🕒 December 18, 2025

AlphaMeld Corporation

11 - 50

🧬 Biotechnology

🤖 Artificial Intelligence

WebsiteLinkedIn

AI/ML Engineer at Onos Health developing models to optimize healthcare administration. Collaborating in a hybrid work environment based in San Francisco while contributing to impactful AI projects.

🏢🏡 San Francisco – Hybrid

💵 $180k - $250k / year

⏰ Full Time

🔴 Lead

🧑‍💻 Full-stack Engineer

🕒 December 16, 2025

Gusto

1001 - 5000

👥 HR Tech

💳 Fintech

☁️ SaaS

WebsiteLinkedIn

Staff Software Engineer developing customer-facing products at Gusto. Overseeing projects end-to-end by influencing feature specs and building backend APIs.

🏢🏡 San Francisco – Hybrid

💵 $163k - $204k / year

⏰ Full Time

🔴 Lead

🧑‍💻 Full-stack Engineer

🕒 December 13, 2025

Saris AI

51 - 200

🤖 Artificial Intelligence

💳 Fintech

🤝 B2B

WebsiteLinkedIn

Software Developer designing AI-powered solutions in fintech, owning project timelines and defining technical foundations.

🏢🏡 San Francisco – Hybrid

⏰ Full Time

🔴 Lead

🧑‍💻 Full-stack Engineer