Search Remote Jobs

Lead Software Engineer, Model Serving Platform

πŸ•’ April 12

🏒🏑 San Francisco – Hybrid

πŸ’΅ $230k - $300k / year

⏰ Full Time

🟠 Senior

πŸ§‘β€πŸ’» Full-stack Engineer

Apply Now
Find Similar Remote Jobs

πŸ“Š Check your resume score for this job

Improve your chances of getting an interview by checking your resume score before you apply.

Logo of Sciforium

Sciforium

WebsiteLinkedIn

11 - 50 employees

Founded 2024

πŸ€– Artificial Intelligence

πŸ”Œ API

πŸ”§ Hardware

πŸ”₯ Funding within the last year

πŸ’° $12M Seed Round - Sciforium on 2025-10

Artificial Intelligence β€’ API β€’ Hardware

Sciforium is a serverless AI infrastructure platform that provides production-ready, multimodal AI services via a unified, OpenAI-compatible API. The company runs vertically integrated AMD GPU hardware and offers model hosting, a model library, real-time evaluation pipelines, and managed agent deployments to help teams build, evaluate, and ship text, image, video, and audio AI applications with lower cost, stronger privacy, and predictable performance.

πŸ“‹ Description

β€’ Lead the technical direction of the model serving platform, owning architecture decisions and guiding engineering execution. β€’ Build core serving components including execution runtimes, batching, scheduling, and distributed inference systems. β€’ Develop high-performance C++ and CUDA/HIP modules, including custom GPU kernels and memory-optimized runtimes. β€’ Collaborate with ML researchers to productionize new multimodal models and ensure low-latency, scalable inference. β€’ Build Python APIs and services that expose model capabilities to downstream applications. β€’ Mentor and support other engineers through code reviews, design discussions, and hands-on technical guidance. β€’ Drive performance profiling, benchmarking, and observability across the inference stack. β€’ Ensure high reliability and maintainability through testing, monitoring, and engineering best practices. β€’ Troubleshoot and resolve complex issues across GPU, runtime, and service layers.

🎯 Requirements

β€’ Bachelor’s degree in Computer Science, Computer Engineering, Electrical Engineering, or equivalent practical experience β€’ 5+ years of experience designing and building scalable, reliable backend systems or distributed infrastructure β€’ Strong understanding of LLM inference mechanics (prefill vs decode, batching, KV cache) β€’ Experience with Kubernetes/Ray, Containerization β€’ Strong proficiency in C++, Python β€’ Strong debugging, profiling, and performance optimization skills at the system level β€’ Ability to collaborate closely with ML researchers and translate model or runtime requirements into production-grade systems β€’ Effective communication skills and the ability to lead technical discussions, mentor engineers, and drive engineering quality β€’ Comfortable working from the office and contributing to a fast-moving, high-ownership team culture.

πŸ–οΈ Benefits

β€’ Medical, dental, and vision insurance β€’ 401k plan β€’ Daily lunch, snacks, and beverages β€’ Flexible time off β€’ Competitive salary and equity

Apply Now

Similar Jobs

πŸ•’ April 12

Koah

1 - 10

πŸ€– Artificial Intelligence

☁️ SaaS

🀝 B2B

WebsiteLinkedIn

Software Engineer at Koah Labs, shaping AI-native product development and engineering organization with a focus on cross-functional collaboration.

🏒🏑 San Francisco – Hybrid

πŸ’΅ $180k - $250k / year

πŸ”₯ Funding within the last year

πŸ’° $5M Seed on 2025-10

⏰ Full Time

🟑 Mid-level

🟠 Senior

πŸ§‘β€πŸ’» Full-stack Engineer

πŸ•’ April 12

Koah

1 - 10

πŸ€– Artificial Intelligence

☁️ SaaS

🀝 B2B

WebsiteLinkedIn

Software Engineer building and maintaining infrastructure for adtech platform at Koah Labs. Collaborate with a tight-knit team to ensure system performance and reliability.

🏒🏑 San Francisco – Hybrid

πŸ’΅ $180k - $250k / year

πŸ”₯ Funding within the last year

πŸ’° $5M Seed on 2025-10

⏰ Full Time

🟑 Mid-level

🟠 Senior

πŸ§‘β€πŸ’» Full-stack Engineer

πŸ•’ April 12

HeyMilo AI

11 - 50

πŸ€– Artificial Intelligence

πŸ‘₯ HR Tech

☁️ SaaS

WebsiteLinkedIn

Cracked Software Engineer building AI interviewer solutions for hiring at HeyMilo. Focused on solving real problems fast and delivering production-ready systems.

🏒🏑 San Francisco – Hybrid

⏰ Full Time

🟑 Mid-level

🟠 Senior

πŸ§‘β€πŸ’» Full-stack Engineer

πŸ•’ April 11

OpenAI

201 - 500

πŸ€– Artificial Intelligence

☁️ SaaS

🏒 Enterprise

WebsiteLinkedIn

Staff-level Software Engineer at OpenAI focusing on backend infrastructure and systems. Enhancing performance-sensitive infrastructure in Rust or C++ with a hybrid work model.

πŸ•’ April 10

Aircall

501 - 1000

πŸ“‘ Telecommunications

☁️ SaaS

🏒 Enterprise

WebsiteLinkedIn

🏒🏑 San Francisco – Hybrid

πŸ’΅ $130k - $220k / year

πŸ’° Venture Round on 2022-02

⏰ Full Time

🟑 Mid-level

🟠 Senior

πŸ§‘β€πŸ’» Full-stack Engineer