Senior Research Data Engineer

1001 - 5000 employees

⚕️ Healthcare Insurance

☁️ SaaS

🏢 Enterprise

💰 Secondary Market on 2022-03

Healthcare Insurance • SaaS • Enterprise

PointClickCare is a cloud-based software provider focused on improving care collaboration and financial health in the healthcare industry. The company offers a comprehensive platform that connects care teams with important data, allowing for seamless care transitions and better patient outcomes. By streamlining operations for skilled nursing facilities, senior living communities, and other healthcare providers, PointClickCare helps reduce inefficiencies, manage medication orders, ensure compliance, and improve financial performance. PointClickCare also fosters innovation and quality improvements in healthcare through integrated care coordination and actionable insights across data silos.

Senior Research Data Engineer

🔥 0 minutes ago

🇨🇦 Canada – Remote

💵 C$159.1k - C$176.7k / year

⏰ Full Time

🟠 Senior

🚰 Data Engineer

Airflow

AWS

Azure

PySpark

Python

Spark

SQL

Unity

Apply Now

Find Similar Remote Jobs

📊 Check your resume score for this job

Improve your chances of getting an interview by checking your resume score before you apply.

PointClickCare

1001 - 5000 employees

⚕️ Healthcare Insurance

☁️ SaaS

🏢 Enterprise

💰 Secondary Market on 2022-03

Healthcare Insurance • SaaS • Enterprise

📋 Description

• Own the gold data layer. Transform messy, silver tables into curated, semantically rich, clean and documented gold datasets suitable for AI model development, including datasets and features reusable for AI development across projects. • Maintain the data as products and needs evolve. To do this you will • Reverse-engineer data semantics. Talk with product engineers, clinical and workflow experts to learn how the products are used and how data are created in the field. • Understand SQL queries, stored procedures, technical data definitions, and other code to know how products represent and transform data. • Learn how data are ingested into the data lake, what silver tables and columns actually represent and how they behave. • Capture provenance, semantics, clinical event sequencing, cross module record linkage and known quirks. • Bridge semantics with AI needs. Understand researcher data needs to design and build the gold data product, with documentation that evolves, to meet AI applied research needs for a highly efficient AI-first foundation for model R&D. • Curate datasets across modalities. For various AI uses such as generative AI, RAG, predictive and other techniques, support researcher needs for chunked and tagged unstructured content with rich metadata, point-in-time-correct features and clean labels. For classical ML and statistical work, deliver model-ready tables. • Build pipelines for reuse. Develop transformations from silver into gold inside Databricks/Spark as scheduled, observable workloads. Design them so researchers can iterate on new features and data mixes without rebuilding from scratch. • Automate quality, filtering, and synthesis. Support research needs for programmatic labeling, weak supervision, near-duplicate detection, boilerplate and noise removal, and LLM-API-driven synthetic data generation where ground truth is scarce. • Version and hand off. Maintain reproducible dataset snapshots. Define clean lineage and semantic definitions so the downstream team can use and re-use gold datasets in AI R&D.

🎯 Requirements

• 5+ years building production data systems, with at least 2 supporting ML or AI workloads. • Track record of learning complex new data domains quickly, through reading source code, interviewing experts, and building durable artifacts others rely on. • Advanced Python, SQL, and PySpark /Databricks for working with large, messy data. • Expert SQL specifically: comfortable reading complex stored procedures and reverse-engineering business logic from queries. • Databricks ecosystem depth: Delta Lake, Unity Catalog , Spark/ PySpark tuning, MLflow. • AI domain literacy: working understanding of embeddings, tokenization, feature engineering, point-in-time correctness, train/validation/test splits, data drift, and the differences between what classical ML and generative models need from data. • Data wrangling across modalities: transforming unstructured content (text, PDFs, transcripts, logs) and structured tabular data into clean, model-ready forms. • AI-friendly data formats (Parquet, Hugging Face datasets) and storage layout decisions — partitioning, sharding, caching, that keep researcher workflows responsive in Azure, AWS or other working environments. • Data quality, filtering, and synthesis pipelines: support for programmatic labeling and weak supervision (e.g. Snorkel or equivalent), near-duplicate detection (MinHash /LSH), content and quality filters, LLM-API-driven synthetic data generation. • Pipeline orchestration (e.g. a la Airflow, Databricks Workflows, Dagster , or Prefect) and dataset versioning including Unity Catalog and feature-store support. • Experience handling regulated or sensitive data under controlled access (HIPAA or equivalent). Familiarity with general de-identification concepts. • Git-based version control and CI/CD for data and code. • Strong written documentation. Skill in eliciting requirements and tacit knowledge from technical and non-technical experts. • Bachelor’s degree in computer science, data science, engineering, statistics, or related field. Equivalent practical experience considered.

🏖️ Benefits

• Benefits starting from Day 1! • Retirement Plan Matching • Flexible Paid Time Off • Wellness Support Programs and Resources • Parental & Caregiver Leaves • Fertility & Adoption Support • Continuous Development Support Program • Employee Assistance Program • Allyship and Inclusion Communities • Employee Recognition … and more!

Apply Now

Similar Jobs

Senior Data Architect

🕒 Yesterday

Element Fleet Management

1001 - 5000

🚗 Transport

Senior Data Architect optimizing value of data architecture and supporting data platform transformation at Element Fleet Management. Leading data lifecycle activities and collaborating with business partners for effective data governance.

🇨🇦 Canada – Remote

💵 $111.1k - $152.8k / year

💰 $125M Post-IPO Equity on 2014-03

⏰ Full Time

🟠 Senior

🚰 Data Engineer

Azure

ETL

Postgres

SSIS

Senior Data Engineer

🕒 Yesterday

Samsara

1001 - 5000

🏢 Enterprise

🚗 Transport

🔐 Security

Senior Data Engineer at Samsara helping build data platforms and workflows in an AI-first world. Focused on data architecture and transformation to drive insights and automation.

🇨🇦 Canada – Remote

💵 $119k - $154k / year

💰 Seed Round on 2014-08

⏰ Full Time

🟠 Senior

🚰 Data Engineer

Apache

AWS

Azure

BigQuery

Cloud

ERP

ETL

Google Cloud Platform

MS SQL Server

MySQL

Oracle

Postgres

PySpark

Python

RDBMS

Spark

SQL

Senior Data Engineer

🕒 6 days ago

Flinks

51 - 200

💳 Fintech

🏦 Banking

💸 Finance

Senior Data Engineer developing and managing Flinks' data platform to power their financial intelligence products. Collaborating across teams to establish data governance, observability, and reliability.

🇨🇦 Canada – Remote

💵 $120k - $160k / year

⏰ Full Time

🟠 Senior

🚰 Data Engineer

Airflow

BigQuery

Cloud

ETL

Python

SQL

Software Engineer II – Data Platform

🕒 6 days ago

Pantheon Platform

501 - 1000

Software Engineer II developing scalable data systems and analytics solutions for Pantheon. Contribute to technical strategy and collaborate across teams to enhance data infrastructure.

🇨🇦 Canada – Remote

💵 $103.2k - $129k / year

⏰ Full Time

🟡 Mid-level

🟠 Senior

🚰 Data Engineer

Airflow

BigQuery

Docker

Kubernetes

MySQL

Postgres

Python

SQL

Terraform

Senior Data Engineer

🕒 6 days ago

Versapay

201 - 500

🤝 B2B

💳 Fintech

☁️ SaaS

Senior Data Engineer at Versapay optimizing Snowflake architecture for automating accounts receivable processes. Involved in AI enablement and delivering high-margin commercial data products.

🇨🇦 Canada – Remote

💵 $130k - $150k / year

💰 $4M Post-IPO Debt on 2019-11

⏰ Full Time

🟠 Senior

🚰 Data Engineer

AWS

Python

SQL