Machine Learning Engineer – Document Intelligence, Applied GenAI

2 days ago

Apply Now
Logo of PandaDoc

PandaDoc

SaaS • B2B • Productivity

PandaDoc is a comprehensive document management solution that helps businesses streamline their document workflows. It offers a range of features including custom agreement generation, eSignatures, CPQ (configure, price, quote) capabilities, and real-time collaboration tools. PandaDoc is designed for ease of use, enabling teams to automate document creation and management processes, thus improving efficiency and reducing errors. The platform integrates with popular CRM systems, payment gateways, and other tools to facilitate seamless business operations. Focused on security and compliance, PandaDoc supports legal and secure electronic transactions, making it ideal for businesses looking to optimize their agreement management processes.

501 - 1000 employees

Founded 2011

☁️ SaaS

🤝 B2B

⚡ Productivity

💰 Series C on 2021-09

📋 Description

• Build and maintain evaluation frameworks for document models, LLMs, OCR, and structured extraction. • Define metrics, benchmarks, and validation strategies for real-world document workloads. • Design and curate high-quality datasets for supervised training, fine-tuning, and validation. • Create scalable preprocessing pipelines for PDFs, scans, images, forms, and semi-structured documents. • Train and fine-tune transformer-based OCR, VLMs, layout models, and open-source LLMs for document understanding tasks. • Optimize models for reliability, accuracy, and cost efficiency in production environments. • Deploy ML models with modern inference runtimes (vLLM, TGI, TensorRT, ONNX Runtime). • Build guardrails, monitoring, and fallback mechanisms to ensure safe and predictable model behavior. • Develop retrieval and chunking strategies tailored to document structures (tables, forms, multi-page PDFs). • Optimize end-to-end RAG pipelines for semantic search, Q&A, and workflow automation. • Partner with PMs, backend engineers, and product designers to define AI opportunities and translate requirements into technical solutions.

🎯 Requirements

• 5+ years of Python experience • Experience training, fine-tuning, and deploying traditional computer vision models for document intelligence tasks (layout detection, table extraction, OCR, information extraction) • Hands-on experience with document understanding frameworks and models: • Traditional document AI models (LayoutLM, Donut, DocFormer) • Modern vision-language models with OCR capabilities (DeepSeek-OCR, LightOnOCR-1B, etc.) • Experience deploying and optimizing models using inference frameworks such as vLLM (preferred), TGI, TensorRT, or ONNX Runtime • Experience applying LLMs to document intelligence workflows, including both frontier models and open-source alternatives • Strong understanding of coordinate systems and spatial reasoning for absolute positioning and field detection in forms/documents.

🏖️ Benefits

• An honest, open culture that emphasizes feedback and promotes professional and personal development • An opportunity to work from anywhere — our team is distributed worldwide, from Lisbon to Manila, from Florida to California • 6 self care days • A competitive salary • And much more!

Apply Now

Similar Jobs

4 days ago

Team Lead managing Core & MLOps Squad at Zyte enabling scalable data infrastructure. Overseeing MLOps excellence and technical leadership for a distributed team.

Airflow

Cloud

Distributed Systems

Java

Kafka

Kubernetes

Linux

Python

Rust

TCP/IP

Go

November 25

Senior Machine Learning Engineer at intive developing scalable ML architectures and collaborating with cross-functional teams. Exploring advanced ML techniques and mentoring team members.

Airflow

AWS

Cloud

Python

Terraform

Go

November 13

Senior Machine Learning Engineer leading ML initiatives for Noa product at Docplanner. Collaborating with cross-functional teams to design and deploy AI-driven solutions in healthcare.

Airflow

Apache

Kubernetes

PyTorch

Tensorflow

November 6

(Senior) Machine Learning Engineer developing NLP models for Tidio's AI customer service platform. Collaborating with a small team to push the boundaries of conversational AI solutions.

🇵🇱 Poland – Remote

💵 zł23k - zł33k / month

💰 $25M Series B on 2022-05

⏰ Full Time

🟠 Senior

🤖 Machine Learning Engineer

Flask

Python

PyTorch

November 6

Team Lead managing Core & MLOps Squad at Zyte, a data extraction company. Leading cross-functional teams to design scalable infrastructure for MLOps and systems programming.

Cloud

Distributed Systems

Java

Kubernetes

Linux

Python

Rust

TCP/IP

Go

Built by Lior Neu-ner. I'd love to hear your feedback — Get in touch via DM or support@remoterocketship.com