LLM Inference Deployment Engineer

Trouver des Emplois à Distance Similaires

11 - 50 employés

Fondée en 2022

🤖 Intelligence artificielle

🔧 Matériel

🤝 B2B

💰 €100 000 000 Series B - EnCharge AI en 2025-02

Artificial Intelligence • Hardware • B2B

EnCharge AI est une entreprise qui développe du matériel informatique analogique en mémoire et des logiciels complémentaires pour accélérer les charges de travail IA sur les dispositifs locaux et du bord à l’infrastructure cloud. Leur technologie comprend l'accélérateur analogique IA EN100 et d'autres formats (chiplets, ASICs, cartes PCIe) conçus pour offrir une bien plus grande efficacité énergétique, une densité de calcul accrue, et un coût total de possession réduit pour l'inférence par rapport aux GPU conventionnels et aux accélérateurs numériques. EnCharge met l'accent sur la durabilité, la confidentialité des données grâce au traitement local, et le déploiement pour les entreprises et les développeurs recherchant un calcul IA efficace et évolutif en dehors des infrastructures cloud traditionnelles.

LLM Inference Deployment Engineer

Emploi pas sur LinkedIn

🕒 il y a 28 jours

🇺🇸 États-Unis – Télétravail

💵 $180 000 - $240 000 / an

⏰ Temps Plein

🟡 Intermédiaire

🟠 Senior

⛑ Ingénieur DevOps & SRE

🦅 Parrain de Visa H1B

🗣️🇺🇸🇬🇧 Anglais requis

Docker

Kubernetes

Python

PyTorch

Tensorflow

Postuler Maintenant

📊 Vérifiez votre score de CV pour ce poste

Améliorez vos chances d'obtenir un entretien en vérifiant votre score de CV avant de postuler.

EnCharge AI

11 - 50 employés

Fondée en 2022

🤖 Intelligence artificielle

🔧 Matériel

🤝 B2B

💰 €100 000 000 Series B - EnCharge AI en 2025-02

Artificial Intelligence • Hardware • B2B

Description

• Deploy and optimize LLMs (GPT, LLaMA, Mistral, Falcon, etc.) post-training from libraries like HuggingFace • Utilize inference runtimes such as ONNX Runtime, vLLM for efficient execution. • Optimize batching, caching, and tensor parallelism to improve LLM scalability in real-time applications. • Develop and maintain high-performance inference pipelines using Docker, Kubernetes, and other inference servers.

🎯 Exigences

• Bachelor’s or Master’s degree in Computer Science, Electrical Engineering, or related field. • Experience in LLM inference deployment, model optimization, and runtime engineering. • Strong expertise in LLM inference frameworks (PyTorch, ONNX Runtime, vLLM, TensorRT-LLM, DeepSpeed). • In-depth knowledge of the Python programming language for model integration and performance tuning. • Strong understanding of high-level model representations and experience implementing framework-level optimizations for Generative AI use cases • Experience with containerized AI deployments (Docker, Kubernetes, Triton Inference Server, TensorFlow Serving, TorchServe). • Strong knowledge of LLM memory optimization strategies for long-context applications. • Experience with real-time LLM applications (chatbots, code generation, retrieval-augmented generation).

Postuler Maintenant

Emplois Similaires

Site Reliability Engineer

🕒 il y a 28 jours

SS&C Technologies

10 000+ employés

🏦 Banque

💳 Fintech

Site Reliability Engineer optimizing infrastructure environments at SS&C Technologies. Collaborate with teams to enhance application reliability and drive technology improvements.

🇺🇸 États-Unis – Télétravail

⏰ Temps Plein

🟡 Intermédiaire

🟠 Senior

⛑ Ingénieur DevOps & SRE

🗣️🇺🇸🇬🇧 Anglais requis

AWS

Cloud

Kubernetes

OpenShift

OpenStack

Prometheus

Splunk

VMware

Senior DevOps Engineer – Infrastructure

🕒 il y a 29 jours

Button

51 - 200

☁️ SaaS

🛍️ eCommerce

🤝 B2B

Senior DevOps Engineer responsible for platform infrastructure management in a commerce-powered internet company. Collaborating with teams on scalable, stable, and operable solutions for business-critical systems.

🇺🇸 États-Unis – Télétravail

💵 $133 000 - $172 000 / an

⏰ Temps Plein

🟠 Senior

⛑ Ingénieur DevOps & SRE

🦅 Parrain de Visa H1B

🗣️🇺🇸🇬🇧 Anglais requis

AWS

Docker

DynamoDB

EC2

Google Cloud Platform

Grafana

JavaScript

Node.js

Prometheus

Python

Terraform

DevOps Engineer – ML & Data Infrastructure

🕒 il y a 29 jours

High 5 Games

51 - 200

🎮 Jeux vidéo

🎲 Jeux d'argent

🤝 B2B

DevOps Engineer responsible for building and optimizing cloud infrastructure for machine learning operations in gaming. Collaborating with data scientists and ML engineers to ensure reliability and performance.

🇺🇸 États-Unis – Télétravail

⏰ Temps Plein

🟡 Intermédiaire

🟠 Senior

⛑ Ingénieur DevOps & SRE

🦅 Parrain de Visa H1B

🗣️🇺🇸🇬🇧 Anglais requis

Ansible

BigQuery

Cloud

Docker

Google Cloud Platform

Groovy

Jenkins

Kubernetes

Python

Terraform

DevSecOps and API Management Platform Leader

🕒 il y a 29 jours

Copper Q8

11 - 50

📋 Conformité

🤝 B2B

DevSecOps and API management Platform Leader shaping secure platforms for digital innovation. Leading the development of automated and secure CI/CD pipelines in a global role.

🇺🇸 États-Unis – Télétravail

⏰ Temps Plein

🟠 Senior

⛑ Ingénieur DevOps & SRE

🗣️🇺🇸🇬🇧 Anglais requis

AWS

Azure

Cloud

Docker

Google Cloud Platform

Grafana

Jenkins

Kubernetes

Microservices

Prometheus

Terraform

Senior Systems Reliability Engineer

🕒 il y a 29 jours

IEX

51 - 200