Senior Platform Data Engineer

🕒 April 16

🔔 Pennsylvania – Remote

info

⏰ Full Time

🟠 Senior

🚰 Data Engineer

🦅 H1B Visa Sponsor

info
Apply Now
Find Similar Remote Jobs

📊 Check your resume score for this job

Improve your chances of getting an interview by checking your resume score before you apply.

Logo of Geisinger

Geisinger

10,000+ employees

Founded 1915

💊 Pharmaceuticals

🧘 Wellness

Healthcare • Pharmaceuticals • Wellness

Geisinger is a healthcare organization that has been providing accessible medical services for over a century in Pennsylvania. It focuses on meeting the healthcare needs of its communities and is dedicated to innovative patient care. With career opportunities in various fields including nursing, allied health, and administration, Geisinger promotes professional development and a supportive workplace for its employees, emphasizing diversity, equity, and inclusion.

📋 Description

• The Senior Platform Data Engineer owns roadmap, priorities, platform standards, and architecture reviews; provides formal input on performance reviews. • This position makes clinical data ready for AI at scale: owning the shared data products, retrieval infrastructure, and platform administration that the entire AI portfolio depends on. • Owns Real-time data feeds. Reusable clinical data models and feature pipelines. RAG retrieval infrastructure (ingestion, chunking, embeddings, vector DB, retrieval pipelines). • Streams data from Epic SDE, ADT feeds, lab results, and other clinical sources into Databricks for downstream model consumption. • Curates shared clinical feature tables (patient demographics, labs, vitals, diagnoses, utilization history, imaging metadata) in Databricks/Unity Catalog that multiple AI programs consume for model training, validation, and monitoring. • Designs and operates document ingestion pipelines: normalizing clinical documents, policies, guidelines, and unstructured data sources into formats ready for embedding and retrieval. • Implements and optimizes chunking strategies tailored to healthcare content (e.g., preserving clinical note structure, section-aware chunking for guidelines and protocols). • Establishes data quality gates for RAG: automated profiling, completeness checks, and accuracy scoring before content enters the vector store.

🎯 Requirements

• 5+ years in data engineering, with strong experience building both batch and streaming data pipelines • Expert-level Databricks skills: Delta Live Tables, PySpark, Unity Catalog, Feature Store • Hands-on experience with real-time data ingestion (Kafka, Spark Structured Streaming, or comparable frameworks) • Strong SQL and Python (pandas, PySpark) skills for data transformation and feature engineering • Experience administering Databricks workspaces: cluster policies, compute management, access controls, cost monitoring • Familiarity with clinical data models and healthcare data sources (EHR extracts, ADT feeds, lab results, claims data) strongly preferred • Experience with Epic data extraction methods (SDE, FHIR, epic-ws) a significant plus • Understanding of data governance principles: lineage, quality monitoring, access controls.

🏖️ Benefits

• We offer healthcare benefits for full time and part time positions from day one, including vision, dental and domestic partners. • We encourage an atmosphere of collaboration, cooperation and collegiality. • We know that a diverse workforce with unique experiences and backgrounds makes our team stronger.

Apply Now

Similar Jobs

🕒 April 16

CACI International Inc

10,000+ employees

🔒 Cybersecurity

HCM Data Migration Middleware Developer supporting large-scale federal HCM modernization program. Designing, building, and maintaining ETL pipelines for transitioning to Oracle Fusion Cloud HCM platform.

🇺🇸 United States – Remote

💵 $75.2k - $158.1k / year

⏰ Full Time

🟡 Mid-level

🟠 Senior

🚰 Data Engineer

🕒 April 16

CargoSprint

201 - 500

🚗 Transport

🛍️ eCommerce

💳 Fintech

Data Architect responsible for designing and building data warehouse solutions at CargoSprint. Collaborating with teams to enhance data quality and accessibility while driving modernization efforts.

🗣️🇪🇸 Spanish Required

Airflow

Amazon Redshift

Azure

BigQuery

Kafka

SQL

Tableau

🕒 April 16

Cohere

11 - 50

🤖 Artificial Intelligence

🏢 Enterprise

☁️ SaaS

Data Engineer responsible for building foundational infrastructure for AI deployments and product launches. Collaborating with top researchers and engineers in cutting-edge AI systems.

🕒 April 16

ReUp Education

51 - 200

📚 Education

🤝 B2B

🌍 Social Impact

Senior Manager of Data Engineering leading a team that manages data at ReUp Education. Empowering adult learners and institutions by developing data solutions and pipelines.

🇺🇸 United States – Remote

💵 $175k - $185k / year

⏰ Full Time

🟠 Senior

🚰 Data Engineer

🕒 April 16

TENCYS

11 - 50

🤖 Artificial Intelligence

🏢 Enterprise

🤝 B2B

Data Engineer designing and optimizing data solutions at Uni Tencys Systems. Collaborating with teams for machine learning and analytics initiatives while ensuring data quality.

🇺🇸 United States – Remote

💰 $35k Pre seed on 2024-11

⏰ Full Time

🟠 Senior

🔴 Lead

🚰 Data Engineer