Lead Data Engineer, AI

1001 - 5000 employees

💼 Consulting

🏥 Healthcare

🛡️ Insurance

💰 Private Equity Round on 2021-10

Consulting • Healthcare • Insurance

3Pillar Global is a modern application strategy, design, and engineering firm that specializes in delivering strategic software development initiatives for various industries. They offer a range of services, including application technology strategy, digital product engineering, data and analytics, and artificial intelligence development. 3Pillar Global focuses on helping organizations transform their bold ideas into breakthrough solutions by leveraging cutting-edge technologies such as generative and multimodal AI. They work with partners and clients across multiple sectors, including healthcare, financial services, insurance, media, and information services, to solve complex technology challenges and deliver high-performing results.

Lead Data Engineer, AI

Job not on LinkedIn

🕒 June 12

🇮🇳 India – Remote

⏰ Full Time

🟠 Senior

🚰 Data Engineer

AWS

Azure

Cloud

ETL

Kafka

Neo4j

PySpark

Python

Spark

SQL

Terraform

Apply Now

Find Similar Remote Jobs

📊 Check your resume score for this job

Improve your chances of getting an interview by checking your resume score before you apply.

3Pillar Global

1001 - 5000 employees

💼 Consulting

🏥 Healthcare

🛡️ Insurance

💰 Private Equity Round on 2021-10

Consulting • Healthcare • Insurance

📋 Description

• Build, test, and maintain production pipelines (batch & real-time) on Snowflake, PySpark, Delta Lake, and Kafka. • Implement data quality checks, schema validation, and alerting at every pipeline stage. • Migrate legacy ETL/DWH to cloud-native AWS/Azure architectures with measurable latency and cost improvements. • Maintain CI/CD pipelines: automated testing, deployment, rollback, and IaC (Terraform, GitHub Actions). • Build end-to-end retrieval infrastructure: document ingestion, embedding pipelines, vector store management (Pinecone, FAISS, ChromaDB, OpenSearch), and hybrid retrieval layers. • Implement chunking, metadata filtering, and re ranking — tuning for precision, recall, and latency. • Maintain data freshness and index consistency; instrument with context relevance and faithfulness metrics. • Implement and maintain business entity mappings, ontologies, and knowledge graphs (Neo4j) per Architect design. • Build and version the feature store and semantic data contracts serving both ML models and LLM applications. • Manage metadata, data lineage, and audit trail instrumentation across the platform. • Build ML data infrastructure: training curation, feature engineering, MLflow experiment tracking, dataset versioning. • Support LLM fine-tuning workflows — corpus curation, quality filtering, dataset formatting. • Implement automated evaluation pipelines: factual accuracy, hallucination detection, regression tracking. • Maintain production monitoring dashboards for pipeline health, model metrics, and alerting. • Build and maintain data APIs, tool schemas, and memory/state stores that autonomous agents depend on. • Implement agent observability: capture inputs, retrieved context, tool calls, reasoning traces, and outputs. • Maintain text-to-SQL layers, semantic query interfaces, and context APIs for conversational AI consumers. • Implement RBAC, attribute-based access, PII detection/masking, data classification, and audit logging. • Enforce data contracts and schema governance with automated breaking-change detection and versioned migrations. • Build data quality monitoring (completeness, freshness, consistency) with automated alerting and root-cause tooling. • Support compliance readiness: audit trails, data provenance, and regulatory documentation.

🎯 Requirements

• 7+ years data engineering using Cloud services • 2+ years production AI/ML or LLM-era data infrastructure. Proven experience building production pipelines at scale — batch and streaming, Snowflake,AWS/Azure. • Deep expertise: Python, PySpark, Snowflake, Delta Lake, Kafka, Spark Structured Streaming. • Hands-on with vector stores, embedding pipelines, and retrieval infrastructure in production RAG environments. • Working knowledge of MLOps: MLflow, CI/CD for AI, automated evaluation, and production monitoring. • Strong grounding in data governance, quality frameworks, and compliance-**aligned engineering.

🏖️ Benefits

• Health insurance • 401(k) matching • Flexible work hours • Paid time off • Remote work options

Apply Now

Similar Jobs

Data Engineer – Azure

🕒 June 11

Sikich

1001 - 5000

🛡️ Insurance

📦 Logistics

📣 Marketing

Data Engineer at Sikich optimizing data solutions using Microsoft platforms. Responsible for building robust data pipelines and delivering insight-driven analytics.

🇮🇳 India – Remote

💰 Private Equity Round on 2024-05

⏰ Full Time

🟡 Mid-level

🟠 Senior

🚰 Data Engineer

Azure

Kafka

Python

Scala

Spark

SQL

Data Engineer II

🕒 June 11

NPS Prism

201 - 500

💼 Consulting

📣 Marketing

📦 Logistics

Data Engineer II responsible for developing ETL/ELT workflows and managing data lakes for NPS Prism. Collaborating with teams to design data solutions on cloud platforms like Azure and AWS.

🇮🇳 India – Remote

⏰ Full Time

🟡 Mid-level

🟠 Senior

🚰 Data Engineer

AWS

Azure

Cloud

ETL

PySpark

Python

SQL

Tableau

Data Engineer – L2

🕒 June 6

Forbes Advisor

201 - 500

🛡️ Insurance

💼 Consulting

✈️ Travel

Data Engineer building and maintaining data pipelines for marketing analytics and support across business teams. Contributing to data ingestion and modelling from various platforms with a focus on Meta Ads.

🇮🇳 India – Remote

⏰ Full Time

🟡 Mid-level

🟠 Senior

🚰 Data Engineer

Airflow

BigQuery

Cloud

ETL

Microservices

Python

SQL

Data Engineer, L3

🕒 June 6

Forbes Advisor

201 - 500

🛡️ Insurance

💼 Consulting

✈️ Travel

Data Engineer (L3) designing scalable data architecture and robust data pipelines for social marketing. Collaborate in a fintech initiative providing insights on personal finance and growth marketing.

🇮🇳 India – Remote

⏰ Full Time

🟡 Mid-level

🟠 Senior

🚰 Data Engineer

Airflow

BigQuery

Cloud

ETL

Microservices

Python

SQL

D365 Customer Insight Data Engineer

🕒 June 5

DysrupIT

51 - 200

🏢 Enterprise

☁️ SaaS

🔒 Cybersecurity

Data Engineer managing D365 Customer Insights and data pipelines. Triage issues, validate data fitness, and collaborate on monitoring and observations.

🇮🇳 India – Remote

⏰ Full Time

🟡 Mid-level

🟠 Senior

🚰 Data Engineer

Azure

Python

SQL

.NET