Senior Principal Software Engineer

🔥 12 hours ago

Apply Now
Find Similar Remote Jobs

📊 Check your resume score for this job

Improve your chances of getting an interview by checking your resume score before you apply.

Logo of Cerence Inc.

Cerence Inc.

1001 - 5000 employees

Founded 2019

🤖 Artificial Intelligence

🚗 Transport

💰 Grant on 2020-12

Artificial Intelligence • Transport • Automotive

Cerence Inc. is a global company focused on providing AI-powered solutions, particularly in the automotive industry. They specialize in conversational and generative AI technologies that create intelligent, natural, and personalized interactions between humans and vehicles. With innovations like their proprietary automotive large language models, Cerence enhances user experiences across various forms of transport including cars, two-wheelers, and trucks. The company has over 500 million vehicles shipped with its AI technology, serving more than 80 OEMs and Tier 1 customers worldwide. Cerence is dedicated to continuous advancements in AI, aiming to revolutionize in-car user experiences through fast delivery and seamless integration of their solutions.

📋 Description

• Optimize and deploy high ‑ performance LLM inference pipelines • Own inference runtimes across data center, edge, and embedded platforms • Push model performance through quantization, kernel fusion, and cache optimization • Drive latency and throughput improvements that directly impact production products • Enable efficient, reliable deployment without external vendor dependency • Build deep expertise and ownership of: vLLM TensorRT‑LLM llama.cpp QAIRT • Extend and tune inference engines using custom CUDA kernels • Adapt runtimes for constrained and embedded deployment environments • Implement and evaluate quantization strategies: INT8, INT4, FP4, FP8, mixed precision AWQ GPTQ • Balance accuracy, latency, memory footprint, and throughput • Optimize key–value cache performance through: Paging Prefix caching Cache ‑ aware memory layout design • Design and tune: Batching strategies Continuous batching Speculative decoding

🎯 Requirements

• Proven experience optimizing ML inference performance in production • Deep understanding of GPU architecture and memory hierarchies • Hands ‑ on experience with CUDA and low ‑ level performance tuning • Experience deploying models beyond research environments • Critical Technical Skills • Inference engines: vLLM, TensorRT ‑ LLM, llama.cpp, QAIRT • CUDA kernel development and profiling • Quantization techniques: INT8/INT4/FP4/FP8, AWQ, GPTQ • KV cache optimisation and memory layout design • Latency optimisation: batching, speculative decoding, continuous batching

🏖️ Benefits

• Annual bonus opportunity • Insurance coverage (medical, dental, vision, life, and disability) • Paid time off • Paid holidays • Company contribution to the RRSP (Registered Retirement Savings Plan) • Equity awards for certain positions and levels • Remote and/or hybrid work available depending on the position

Apply Now

Similar Jobs

🔥 12 hours ago

Dyson

10,000+ employees

🔧 Hardware

🛒 Retail

Senior Software Engineer developing scalable platform components and supporting cloud infrastructure at Robert Half. Leading design and implementation with a focus on CI/CD and platform reliability.

AWS

Azure

Cloud

Java

Jenkins

Linux

Oracle

Postgres

Python

SDLC

ServiceNow

Shell Scripting

Spark

SQL

🔥 12 hours ago

Solventum

10,000+ employees

⚕️ Healthcare Insurance

📚 Education

🧘 Wellness

Senior Software Development Engineer developing backend applications to improve healthcare engagements and reduce physician burnout using innovative technologies.

Angular

AWS

Azure

Cloud

Docker

Google Cloud Platform

JavaScript

Kubernetes

React

🔥 12 hours ago

Cushman & Wakefield

10,000+ employees

🏠 Real Estate

🏢 Enterprise

Senior Engineer building next generation AI powered software at Cushman & Wakefield. Leading full stack teams and shaping engineering strategy with modern tools and platforms.

Angular

Azure

Cloud

Java

JavaScript

Kubernetes

Microservices

Next.js

Node.js

NoSQL

Python

React

Rust

SQL

TypeScript

Go

.NET

🔥 12 hours ago

EasyPost

51 - 200

☁️ SaaS

🚗 Transport

🔌 API

Senior Software Engineer at EasyPost designing and developing software solutions for shipping operations. Collaborating with cross-functional teams to create scalable software products.

Distributed Systems

Kubernetes

NoSQL

Python

Go

🔥 13 hours ago

Koalafi

201 - 500

💸 Finance

💳 Fintech

Tech Lead for Consumer Team to drive technical direction and execution for consumer web experience at Koalafi. Leading a team of engineers in modernizing systems and delivering tools for financial needs.

AWS

Postgres

React

SQL

TypeScript

Go