Inference Optimization Engineer

Job not on LinkedIn

August 8

Apply Now
Logo of BentoML

BentoML

Artificial Intelligence • B2B • SaaS

BentoML is a flexible platform designed to deploy and manage AI/ML models and custom inference pipelines in production. It offers a unified interface for seamless deployment, scaling, and optimization of various models, including large language models (LLMs). The platform empowers users to maintain full control over their AI models by allowing deployments in any environment, whether cloud or on-premise, while ensuring security and compliance without the data ever leaving the user's infrastructure.

51 - 200 employees

Founded 2019

🤖 Artificial Intelligence

🤝 B2B

☁️ SaaS

📋 Description

• Optimize inference in single-GPU, multi-GPU, and multi-node serving setups. • Build repeatable tests that model production traffic; track and report vLLM, SGLang, TRT-LLM, and future runtimes. • Reduce memory use and compute cost with mixed precision, better KV-cache handling, quantization, and speculative decoding. • Improve batching, caching, load balancing, and model-parallel execution. • Write technical posts, contribute code, and present findings to the open-source community.

🎯 Requirements

• Deep understanding of transformer architecture and inference engine internals. • Hands-on experience speeding up model serving through batching, caching, load balancing. • Experienced with inference engines such as vLLM, SGLang, or TRT-LLM (upstream contributions are a plus). • Experienced with inference optimization techniques: quantization, distillation, speculative decoding, or similar. • Proficiency in CUDA and use of profiling tools like Nsight, nvprof, or CUPTI. Proficiency in Triton and ROCm is a bonus. • Track record of blog posts, conference talks, or open-source projects in ML systems is a bonus.

🏖️ Benefits

• competitive salary • equity • learning budget • paid conference travel

Apply Now

Similar Jobs

August 8

WAGMI Ventures

1 - 10

₿ Crypto

Join Commonware to build applications and cloud-based solutions in a dynamic team.

🇺🇸 United States – Remote

💵 $225k / year

⏰ Full Time

🟡 Mid-level

🟠 Senior

👷🏻‍♀️ Engineer

August 8

Terabase Energy

51 - 200

⚡ Energy

☁️ SaaS

The Engineer will design systems and develop software for solar energy projects. Terabase Energy focuses on automation to enhance renewable energy efficiency.

🇺🇸 United States – Remote

⏰ Full Time

🟢 Junior

🟡 Mid-level

👷🏻‍♀️ Engineer

🦅 H1B Visa Sponsor

August 8

TE Connectivity

10,000+ employees

🚀 Aerospace

⚡ Energy

Drive collaboration in high-speed interconnects at a global industrial tech leader.

🇺🇸 United States – Remote

💵 $184k - $230.5k / year

💰 Post-IPO Debt on 2023-01

⏰ Full Time

🟠 Senior

👷🏻‍♀️ Engineer

🦅 H1B Visa Sponsor

August 8

Zigabyte

51 - 200

🔒 Cybersecurity

🏢 Enterprise

Seeking skilled OKTA Engineer to implement IAM solutions. Experience with OKTA's Identity Cloud platform required.

🇺🇸 United States – Remote

⏰ Full Time

🟡 Mid-level

🟠 Senior

👷🏻‍♀️ Engineer

August 8

Oklo Inc

51 - 200

⚡ Energy

Seeking a Neutronics Engineer to perform reactor design calculations and analyses for Oklo's radioisotope production.

🇺🇸 United States – Remote

💵 $102k - $190k / year

💰 Venture Round on 2021-11

⏰ Full Time

🟢 Junior

🟡 Mid-level

👷🏻‍♀️ Engineer

Developed by Lior Neu-ner. I'd love to hear your feedback — Get in touch via DM or support@remoterocketship.com