Senior Principal Software Engineer

1001 - 5000 employees

Founded 2019

🤖 Artificial Intelligence

🚗 Transport

💰 Grant on 2020-12

Artificial Intelligence • Transport • Automotive

Cerence Inc. is a global company focused on providing AI-powered solutions, particularly in the automotive industry. They specialize in conversational and generative AI technologies that create intelligent, natural, and personalized interactions between humans and vehicles. With innovations like their proprietary automotive large language models, Cerence enhances user experiences across various forms of transport including cars, two-wheelers, and trucks. The company has over 500 million vehicles shipped with its AI technology, serving more than 80 OEMs and Tier 1 customers worldwide. Cerence is dedicated to continuous advancements in AI, aiming to revolutionize in-car user experiences through fast delivery and seamless integration of their solutions.

Senior Principal Software Engineer

🔥 12 hours ago

🍂 Massachusetts – Remote

💵 $141.4k - $226.3k / year

⏰ Full Time

🟠 Senior

🧑‍💻 Full-stack Engineer

C++

Apply Now

Find Similar Remote Jobs

📊 Check your resume score for this job

Improve your chances of getting an interview by checking your resume score before you apply.

Cerence Inc.

1001 - 5000 employees

Founded 2019

🤖 Artificial Intelligence

🚗 Transport

💰 Grant on 2020-12

Artificial Intelligence • Transport • Automotive

📋 Description

• Optimize and deploy high ‑ performance LLM inference pipelines • Own inference runtimes across data center, edge, and embedded platforms • Push model performance through quantization, kernel fusion, and cache optimization • Drive latency and throughput improvements that directly impact production products • Enable efficient, reliable deployment without external vendor dependency • Build deep expertise and ownership of: vLLM TensorRT‑LLM llama.cpp QAIRT • Extend and tune inference engines using custom CUDA kernels • Adapt runtimes for constrained and embedded deployment environments • Implement and evaluate quantization strategies: INT8, INT4, FP4, FP8, mixed precision AWQ GPTQ • Balance accuracy, latency, memory footprint, and throughput • Optimize key–value cache performance through: Paging Prefix caching Cache ‑ aware memory layout design • Design and tune: Batching strategies Continuous batching Speculative decoding

🎯 Requirements

• Proven experience optimizing ML inference performance in production • Deep understanding of GPU architecture and memory hierarchies • Hands ‑ on experience with CUDA and low ‑ level performance tuning • Experience deploying models beyond research environments • Critical Technical Skills • Inference engines: vLLM, TensorRT ‑ LLM, llama.cpp, QAIRT • CUDA kernel development and profiling • Quantization techniques: INT8/INT4/FP4/FP8, AWQ, GPTQ • KV cache optimisation and memory layout design • Latency optimisation: batching, speculative decoding, continuous batching

🏖️ Benefits

• Annual bonus opportunity • Insurance coverage (medical, dental, vision, life, and disability) • Paid time off • Paid holidays • Company contribution to the RRSP (Registered Retirement Savings Plan) • Equity awards for certain positions and levels • Remote and/or hybrid work available depending on the position

Apply Now

Similar Jobs

Software Engineer III – ATI

🔥 12 hours ago

Dyson

10,000+ employees

🔧 Hardware

🛒 Retail

Senior Software Engineer developing scalable platform components and supporting cloud infrastructure at Robert Half. Leading design and implementation with a focus on CI/CD and platform reliability.

🇺🇸 United States – Remote

💵 $104k - $153k / year

⏰ Full Time

🟡 Mid-level

🟠 Senior

🧑‍💻 Full-stack Engineer

🦅 H1B Visa Sponsor

AWS

Azure

Cloud

Java

Jenkins

Linux

Oracle

Postgres

Python

SDLC

ServiceNow

Shell Scripting

Spark

SQL

Senior Software Development Engineer

🔥 12 hours ago

Solventum

10,000+ employees

⚕️ Healthcare Insurance

📚 Education

🧘 Wellness

Senior Software Development Engineer developing backend applications to improve healthcare engagements and reduce physician burnout using innovative technologies.

🇺🇸 United States – Remote

💵 $106k - $145.8k / year

⏰ Full Time

🟠 Senior

🧑‍💻 Full-stack Engineer

Angular

AWS

Azure

Cloud

Docker

Google Cloud Platform

JavaScript

Kubernetes

React

Senior Engineer

🔥 12 hours ago

Cushman & Wakefield

10,000+ employees

🏠 Real Estate

🏢 Enterprise

Senior Engineer building next generation AI powered software at Cushman & Wakefield. Leading full stack teams and shaping engineering strategy with modern tools and platforms.

🇺🇸 United States – Remote

💵 $161.5k - $190k / year

⏰ Full Time

🟠 Senior

🧑‍💻 Full-stack Engineer

Angular

Azure

Cloud

Java

JavaScript

Kubernetes

Microservices

Next.js

Node.js

NoSQL

Python

React

Rust

SQL

TypeScript

.NET

Senior Software Engineer

🔥 12 hours ago

EasyPost

51 - 200

☁️ SaaS

🚗 Transport

🔌 API

Senior Software Engineer at EasyPost designing and developing software solutions for shipping operations. Collaborating with cross-functional teams to create scalable software products.

🇺🇸 United States – Remote

💵 $180k - $205k / year

💰 $25M Series B on 2021-09

⏰ Full Time

🟠 Senior

🧑‍💻 Full-stack Engineer

🦅 H1B Visa Sponsor

Distributed Systems

Kubernetes

NoSQL

Python

Technical Lead – Consumer Experience

🔥 13 hours ago

Koalafi

201 - 500

💸 Finance

💳 Fintech