
Artificial Intelligence • B2B • SaaS
BentoML is a flexible platform designed to deploy and manage AI/ML models and custom inference pipelines in production. It offers a unified interface for seamless deployment, scaling, and optimization of various models, including large language models (LLMs). The platform empowers users to maintain full control over their AI models by allowing deployments in any environment, whether cloud or on-premise, while ensuring security and compliance without the data ever leaving the user's infrastructure.
51 - 200 employees
Founded 2019
🤖 Artificial Intelligence
🤝 B2B
☁️ SaaS
August 8

Artificial Intelligence • B2B • SaaS
BentoML is a flexible platform designed to deploy and manage AI/ML models and custom inference pipelines in production. It offers a unified interface for seamless deployment, scaling, and optimization of various models, including large language models (LLMs). The platform empowers users to maintain full control over their AI models by allowing deployments in any environment, whether cloud or on-premise, while ensuring security and compliance without the data ever leaving the user's infrastructure.
51 - 200 employees
Founded 2019
🤖 Artificial Intelligence
🤝 B2B
☁️ SaaS
• Optimize inference in single-GPU, multi-GPU, and multi-node serving setups. • Build repeatable tests that model production traffic; track and report vLLM, SGLang, TRT-LLM, and future runtimes. • Reduce memory use and compute cost with mixed precision, better KV-cache handling, quantization, and speculative decoding. • Improve batching, caching, load balancing, and model-parallel execution. • Write technical posts, contribute code, and present findings to the open-source community.
• Deep understanding of transformer architecture and inference engine internals. • Hands-on experience speeding up model serving through batching, caching, load balancing. • Experienced with inference engines such as vLLM, SGLang, or TRT-LLM (upstream contributions are a plus). • Experienced with inference optimization techniques: quantization, distillation, speculative decoding, or similar. • Proficiency in CUDA and use of profiling tools like Nsight, nvprof, or CUPTI. Proficiency in Triton and ROCm is a bonus. • Track record of blog posts, conference talks, or open-source projects in ML systems is a bonus.
• competitive salary • equity • learning budget • paid conference travel
Apply NowAugust 8
Join Commonware to build applications and cloud-based solutions in a dynamic team.
August 8
The Engineer will design systems and develop software for solar energy projects. Terabase Energy focuses on automation to enhance renewable energy efficiency.
August 8
Drive collaboration in high-speed interconnects at a global industrial tech leader.
🇺🇸 United States – Remote
💵 $184k - $230.5k / year
💰 Post-IPO Debt on 2023-01
⏰ Full Time
🟠 Senior
👷🏻♀️ Engineer
🦅 H1B Visa Sponsor
August 8
Seeking skilled OKTA Engineer to implement IAM solutions. Experience with OKTA's Identity Cloud platform required.
August 8
Seeking a Neutronics Engineer to perform reactor design calculations and analyses for Oklo's radioisotope production.
🇺🇸 United States – Remote
💵 $102k - $190k / year
💰 Venture Round on 2021-11
⏰ Full Time
🟢 Junior
🟡 Mid-level
👷🏻♀️ Engineer