Senior Platform Data Engineer

10,000+ employees

Founded 1915

💊 Pharmaceuticals

🧘 Wellness

Healthcare • Pharmaceuticals • Wellness

Geisinger is a healthcare organization that has been providing accessible medical services for over a century in Pennsylvania. It focuses on meeting the healthcare needs of its communities and is dedicated to innovative patient care. With career opportunities in various fields including nursing, allied health, and administration, Geisinger promotes professional development and a supportive workplace for its employees, emphasizing diversity, equity, and inclusion.

Senior Platform Data Engineer

🕒 April 16

🔔 Pennsylvania – Remote

⏰ Full Time

🟠 Senior

🚰 Data Engineer

🦅 H1B Visa Sponsor

Kafka

Pandas

PySpark

Python

Spark

SQL

Unity

Apply Now

Find Similar Remote Jobs

📊 Check your resume score for this job

Improve your chances of getting an interview by checking your resume score before you apply.

Geisinger

10,000+ employees

Founded 1915

💊 Pharmaceuticals

🧘 Wellness

Healthcare • Pharmaceuticals • Wellness

📋 Description

• The Senior Platform Data Engineer owns roadmap, priorities, platform standards, and architecture reviews; provides formal input on performance reviews. • This position makes clinical data ready for AI at scale: owning the shared data products, retrieval infrastructure, and platform administration that the entire AI portfolio depends on. • Owns Real-time data feeds. Reusable clinical data models and feature pipelines. RAG retrieval infrastructure (ingestion, chunking, embeddings, vector DB, retrieval pipelines). • Streams data from Epic SDE, ADT feeds, lab results, and other clinical sources into Databricks for downstream model consumption. • Curates shared clinical feature tables (patient demographics, labs, vitals, diagnoses, utilization history, imaging metadata) in Databricks/Unity Catalog that multiple AI programs consume for model training, validation, and monitoring. • Designs and operates document ingestion pipelines: normalizing clinical documents, policies, guidelines, and unstructured data sources into formats ready for embedding and retrieval. • Implements and optimizes chunking strategies tailored to healthcare content (e.g., preserving clinical note structure, section-aware chunking for guidelines and protocols). • Establishes data quality gates for RAG: automated profiling, completeness checks, and accuracy scoring before content enters the vector store.

🎯 Requirements

• 5+ years in data engineering, with strong experience building both batch and streaming data pipelines • Expert-level Databricks skills: Delta Live Tables, PySpark, Unity Catalog, Feature Store • Hands-on experience with real-time data ingestion (Kafka, Spark Structured Streaming, or comparable frameworks) • Strong SQL and Python (pandas, PySpark) skills for data transformation and feature engineering • Experience administering Databricks workspaces: cluster policies, compute management, access controls, cost monitoring • Familiarity with clinical data models and healthcare data sources (EHR extracts, ADT feeds, lab results, claims data) strongly preferred • Experience with Epic data extraction methods (SDE, FHIR, epic-ws) a significant plus • Understanding of data governance principles: lineage, quality monitoring, access controls.

🏖️ Benefits

• We offer healthcare benefits for full time and part time positions from day one, including vision, dental and domestic partners. • We encourage an atmosphere of collaboration, cooperation and collegiality. • We know that a diverse workforce with unique experiences and backgrounds makes our team stronger.

Apply Now

Similar Jobs

Senior Data Engineering Manager

🕒 April 16

ReUp Education

51 - 200

📚 Education

🤝 B2B

🌍 Social Impact

Senior Manager of Data Engineering leading a team that manages data at ReUp Education. Empowering adult learners and institutions by developing data solutions and pipelines.

🇺🇸 United States – Remote

💵 $175k - $185k / year

⏰ Full Time

🟠 Senior

🚰 Data Engineer

Airflow

Amazon Redshift

AWS

Cloud

Python

SQL

Data Engineer – Databricks, BigQuery, Snowflake

🕒 April 16

TENCYS

11 - 50

🤖 Artificial Intelligence

🏢 Enterprise

🤝 B2B

Data Engineer designing and optimizing data solutions at Uni Tencys Systems. Collaborating with teams for machine learning and analytics initiatives while ensuring data quality.

🇺🇸 United States – Remote

💰 $35k Pre seed on 2024-11

⏰ Full Time

🟠 Senior

🔴 Lead

🚰 Data Engineer

Airflow

Apache

AWS

Azure

Cloud

ETL

PySpark

Python

SQL

Terraform

Unity

Senior Software Engineer, Data Platform

🕒 April 14

SentiLink

51 - 200

🔐 Security

💳 Fintech

💸 Finance

Senior Software Engineer on the Data Platform team at SentiLink building data infrastructure for identity verification solutions. Collaborating with product and engineering on data platform enhancements.

🇺🇸 United States – Remote

💵 $170k - $240k / year

💰 $70M Series B on 2021-08

⏰ Full Time

🟠 Senior

🚰 Data Engineer

🦅 H1B Visa Sponsor

Amazon Redshift

AWS

Azure

Cloud

ETL

Google Cloud Platform

Hadoop

Kafka

Kubernetes

NoSQL

Postgres

Python

RDBMS

Spark

Senior Lead Software Engineer, Data Platform

🕒 April 12

Coupa Software

1001 - 5000

☁️ SaaS

💸 Finance

🛍️ eCommerce

Senior Software Engineer building scalable data and analytics platforms at Coupa. Focusing on advanced data ingestion and processing in a multi-cloud environment.

🇺🇸 United States – Remote

💵 $149k - $193.5k / year

⏰ Full Time

🟠 Senior

🚰 Data Engineer

🦅 H1B Visa Sponsor

Apache

AWS

Azure

BigQuery

Cloud

Google Cloud Platform

Python

SQL

Data Engineer

🕒 April 11

Illumination Works

51 - 200

🤖 Artificial Intelligence

☁️ SaaS

Data Engineer building data pipelines for analytics and machine learning at Illumination Works. Collaborating across teams to ensure reliable data for insights and experimentation.

🇺🇸 United States – Remote

⏰ Full Time

🟡 Mid-level

🟠 Senior

🚰 Data Engineer

Amazon Redshift

AWS

Azure

Cloud

ETL

Google Cloud Platform

Python

SQL