Machine Learning Engineer – Document Intelligence, Applied GenAI

2 days ago

Apply Now
Logo of PandaDoc

PandaDoc

SaaS • B2B • Productivity

PandaDoc is a comprehensive document management solution that helps businesses streamline their document workflows. It offers a range of features including custom agreement generation, eSignatures, CPQ (configure, price, quote) capabilities, and real-time collaboration tools. PandaDoc is designed for ease of use, enabling teams to automate document creation and management processes, thus improving efficiency and reducing errors. The platform integrates with popular CRM systems, payment gateways, and other tools to facilitate seamless business operations. Focused on security and compliance, PandaDoc supports legal and secure electronic transactions, making it ideal for businesses looking to optimize their agreement management processes.

501 - 1000 employees

Founded 2011

☁️ SaaS

🤝 B2B

⚡ Productivity

💰 Series C on 2021-09

📋 Description

• Build and maintain evaluation frameworks for document models, LLMs, OCR, and structured extraction. • Define metrics, benchmarks, and validation strategies for real-world document workloads. • Design and curate high-quality datasets for supervised training, fine-tuning, and validation. • Create scalable preprocessing pipelines for PDFs, scans, images, forms, and semi-structured documents. • Train and fine-tune transformer-based OCR, VLMs, layout models, and open-source LLMs for document understanding tasks. • Optimize models for reliability, accuracy, and cost efficiency in production environments. • Deploy ML models with modern inference runtimes (vLLM, TGI, TensorRT, ONNX Runtime). • Build guardrails, monitoring, and fallback mechanisms to ensure safe and predictable model behavior. • Develop retrieval and chunking strategies tailored to document structures (tables, forms, multi-page PDFs). • Optimize end-to-end RAG pipelines for semantic search, Q&A, and workflow automation. • Partner with PMs, backend engineers, and product designers to define AI opportunities and translate requirements into technical solutions.

🎯 Requirements

• 5+ years of Python experience • Experience training, fine-tuning, and deploying traditional computer vision models for document intelligence tasks (layout detection, table extraction, OCR, information extraction) • Hands-on experience with document understanding frameworks and models: • Traditional document AI models (LayoutLM, Donut, DocFormer) • Modern vision-language models with OCR capabilities (DeepSeek-OCR, LightOnOCR-1B, etc.) • Experience deploying and optimizing models using inference frameworks such as vLLM (preferred), TGI, TensorRT, or ONNX Runtime • Experience applying LLMs to document intelligence workflows, including both frontier models and open-source alternatives • Strong understanding of coordinate systems and spatial reasoning for absolute positioning and field detection in forms/documents.

🏖️ Benefits

• An honest, open culture that emphasizes feedback and promotes professional and personal development • An opportunity to work from anywhere — our team is distributed worldwide, from Lisbon to Manila, from Florida to California • 6 self care days • A competitive salary • And much more!

Apply Now

Similar Jobs

3 days ago

ML Engineer at Interr.io responsible for building robust infrastructure for AI design applications. Focus on production ML and optimization for quality and latency.

🗣️🇺🇦 Ukrainian Required

Docker

Kubernetes

Numpy

Pandas

Python

Scikit-Learn

November 14

ML Engineer developing AI and ML technology solutions at Provectus. Collaborating with teams to innovate and push technology boundaries.

AWS

Docker

Python

November 10

AI Engineer developing AI-driven solutions at OBRIO, a company focused on enhancing relationships. Collaborate with teams on projects related to marketing, support, and risk management.

Cloud

Docker

Kubernetes

NoSQL

Numpy

Pandas

Python

PyTorch

Scikit-Learn

SQL

Tensorflow

September 22

Commit

501 - 1000

Lead engineers at CommIT to design, deploy, and optimize production-grade Generative AI and RAG systems across .NET, Python, and Azure.

Azure

Docker

JavaScript

Kubernetes

Next.js

Python

React

Redis

SQL

.NET

September 7

Commit

501 - 1000

Generative AI Engineer building LLM applications, backend (.NET/Python), RAG pipelines and observability at CommIT

Azure

Docker

ETL

JavaScript

Kubernetes

Next.js

Python

React

Redis

SQL

.NET

Built by Lior Neu-ner. I'd love to hear your feedback — Get in touch via DM or support@remoterocketship.com