Post a Job Affiliates

Search Remote Jobs

BentoML

Website LinkedIn All Job Openings

Artificial Intelligence • B2B • SaaS

BentoML is a flexible platform designed to deploy and manage AI/ML models and custom inference pipelines in production. It offers a unified interface for seamless deployment, scaling, and optimization of various models, including large language models (LLMs). The platform empowers users to maintain full control over their AI models by allowing deployments in any environment, whether cloud or on-premise, while ensuring security and compliance without the data ever leaving the user's infrastructure.

51 - 200 employees

Founded 2019

🤖 Artificial Intelligence

🤝 B2B

☁️ SaaS

Inference Optimization Engineer

Job not on LinkedIn

August 8

🇺🇸 United States – Remote

⏰ Full Time

🟡 Mid-level

🟠 Senior

👷🏻‍♀️ Engineer

🦅 H1B Visa Sponsor

Kubernetes

Node.js

Open Source

Apply Now

BentoML

Website LinkedIn All Job Openings

Artificial Intelligence • B2B • SaaS

51 - 200 employees

Founded 2019

🤖 Artificial Intelligence

🤝 B2B

☁️ SaaS

📋 Description

• Optimize inference in single-GPU, multi-GPU, and multi-node serving setups. • Build repeatable tests that model production traffic; track and report vLLM, SGLang, TRT-LLM, and future runtimes. • Reduce memory use and compute cost with mixed precision, better KV-cache handling, quantization, and speculative decoding. • Improve batching, caching, load balancing, and model-parallel execution. • Write technical posts, contribute code, and present findings to the open-source community.

🎯 Requirements

• Deep understanding of transformer architecture and inference engine internals. • Hands-on experience speeding up model serving through batching, caching, load balancing. • Experienced with inference engines such as vLLM, SGLang, or TRT-LLM (upstream contributions are a plus). • Experienced with inference optimization techniques: quantization, distillation, speculative decoding, or similar. • Proficiency in CUDA and use of profiling tools like Nsight, nvprof, or CUPTI. Proficiency in Triton and ROCm is a bonus. • Track record of blog posts, conference talks, or open-source projects in ML systems is a bonus.

🏖️ Benefits

• competitive salary • equity • learning budget • paid conference travel

Apply Now

Similar Jobs

Engineer

August 8

WAGMI Ventures

1 - 10

₿ Crypto

Website LinkedIn All Job Openings

Join Commonware to build applications and cloud-based solutions in a dynamic team.

🇺🇸 United States – Remote

💵 $225k / year

⏰ Full Time

🟡 Mid-level

🟠 Senior

👷🏻‍♀️ Engineer

Apache

Assembly

Cloud

Rust

Apply

View Job

SCADA Engineer

August 8

Terabase Energy

51 - 200

⚡ Energy

☁️ SaaS

Website LinkedIn All Job Openings

The Engineer will design systems and develop software for solar energy projects. Terabase Energy focuses on automation to enhance renewable energy efficiency.

🇺🇸 United States – Remote

⏰ Full Time

🟢 Junior

🟡 Mid-level

👷🏻‍♀️ Engineer

🦅 H1B Visa Sponsor

Apply

View Job