LLM Inference Deployment Engineer

11 - 50 employees

Founded 2022

🤖 Artificial Intelligence

🔧 Hardware

🤝 B2B

💰 $100M Series B - EnCharge AI on 2025-02

Artificial Intelligence • Hardware • B2B

EnCharge AI is a company that develops analog in-memory computing hardware and complementary software to accelerate on-device and edge-to-cloud AI workloads. Their technology includes the EN100 analog AI accelerator and other form factors (chiplets, ASICs, PCIe cards) designed to deliver much higher energy efficiency, compute density, and lower total cost of ownership for inference compared with conventional GPUs and digital accelerators. EnCharge emphasizes sustainability, data privacy through local processing, and deployment for enterprise and developer customers seeking efficient, scalable AI computation outside traditional cloud infrastructure.

LLM Inference Deployment Engineer

Job not on LinkedIn

🕒 May 21

🇺🇸 United States – Remote

💵 $180k - $240k / year

⏰ Full Time

🟡 Mid-level

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

🦅 H1B Visa Sponsor

Docker

Kubernetes

Python

PyTorch

Tensorflow

Apply Now

Find Similar Remote Jobs

📊 Check your resume score for this job

Improve your chances of getting an interview by checking your resume score before you apply.

EnCharge AI

11 - 50 employees

Founded 2022

🤖 Artificial Intelligence

🔧 Hardware

🤝 B2B

💰 $100M Series B - EnCharge AI on 2025-02

Artificial Intelligence • Hardware • B2B

📋 Description

• Deploy and optimize LLMs (GPT, LLaMA, Mistral, Falcon, etc.) post-training from libraries like HuggingFace • Utilize inference runtimes such as ONNX Runtime, vLLM for efficient execution. • Optimize batching, caching, and tensor parallelism to improve LLM scalability in real-time applications. • Develop and maintain high-performance inference pipelines using Docker, Kubernetes, and other inference servers.

🎯 Requirements

• Bachelor’s or Master’s degree in Computer Science, Electrical Engineering, or related field. • Experience in LLM inference deployment, model optimization, and runtime engineering. • Strong expertise in LLM inference frameworks (PyTorch, ONNX Runtime, vLLM, TensorRT-LLM, DeepSpeed). • In-depth knowledge of the Python programming language for model integration and performance tuning. • Strong understanding of high-level model representations and experience implementing framework-level optimizations for Generative AI use cases • Experience with containerized AI deployments (Docker, Kubernetes, Triton Inference Server, TensorFlow Serving, TorchServe). • Strong knowledge of LLM memory optimization strategies for long-context applications. • Experience with real-time LLM applications (chatbots, code generation, retrieval-augmented generation).

Apply Now

Similar Jobs

Senior Systems Reliability Engineer

🕒 May 20

IEX

51 - 200

💸 Finance

💳 Fintech

🤝 B2B

Systems Reliability Engineer ensuring reliable operations and automation of IEX's trading platform systems. Collaborating with engineering to optimize performance and troubleshoot complex issues.

🇺🇸 United States – Remote

💵 $150k - $225k / year

💰 Corporate Round on 2022-04

⏰ Full Time

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

🦅 H1B Visa Sponsor

Ansible

Distributed Systems

Linux

Python

TCP/IP

Payment Platform DevOps Engineer

🕒 May 20

SouthState Bank

1001 - 5000

🏦 Banking

💸 Finance

💳 Fintech

Payment Platform DevOps Engineer at SouthState enabling secure and scalable delivery of cloud-based payment solutions. Collaborating with internal teams for innovation in payment technology.

🇺🇸 United States – Remote

💵 $152.6k - $243.8k / year

⏰ Full Time

🟠 Senior

🔴 Lead

⛑ DevOps & Site Reliability Engineer (SRE)

ASP.NET

Azure

Cloud

Ruby on Rails

SDLC

SQL

Terraform

TypeScript

Vault

.NET

Senior DevOps Engineer

🕒 May 20

LI-COR

201 - 500

🍽️ Food & Beverage

🏥 Healthcare

📦 Logistics

Senior DevOps Engineer architecting and managing cloud infrastructure for LI-COR's global IoT platforms. Focus on high-availability operations in the US and China.

🇺🇸 United States – Remote

⏰ Full Time

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

Ansible

AWS

Cloud

Cyber Security

Docker

Grafana

Kubernetes

Linux

MySQL

Prometheus

Python

Shell Scripting

Terraform

Forward Deployment Engineer, Generative AI

🕒 May 18

Tiger Analytics

1001 - 5000

🏥 Healthcare

📦 Logistics

📣 Marketing

Forward Deployment Engineer integrating and scaling Generative AI solutions collaboratively with clients. Working closely with engineering teams to operationalize AI models across multi-cloud environments.

🇺🇸 United States – Remote

⏰ Full Time

🟡 Mid-level

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

🦅 H1B Visa Sponsor

AWS

Azure

Cloud

Google Cloud Platform

Kubernetes

Python

PyTorch

Terraform

DevOps Engineer

🕒 May 18

decircle

1 - 10

📣 Marketing

📦 Logistics

💼 Consulting

DevOps Engineer for M0, a stablecoin platform optimizing AWS infrastructure and CI/CD pipelines. Collaborating with product teams and ensuring security and performance of cloud-native applications.

🇺🇸 United States – Remote

⏰ Full Time

🟡 Mid-level

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

Ansible

AWS

Chef

Cloud

Cyber Security

Distributed Systems

Docker

Grafana

Jenkins

Kubernetes

Prometheus

Puppet

Terraform