Lead Data Engineer, AI

Job not on LinkedIn

🔥 0 minutes ago

Apply Now
Find Similar Remote Jobs

📊 Check your resume score for this job

Improve your chances of getting an interview by checking your resume score before you apply.

Logo of 3Pillar Global

3Pillar Global

1001 - 5000 employees

☁️ SaaS

🏢 Enterprise

🤖 Artificial Intelligence

💰 Private Equity Round on 2021-10

SaaS • Enterprise • Artificial Intelligence

3Pillar Global is a modern application strategy, design, and engineering firm that specializes in delivering strategic software development initiatives for various industries. They offer a range of services, including application technology strategy, digital product engineering, data and analytics, and artificial intelligence development. 3Pillar Global focuses on helping organizations transform their bold ideas into breakthrough solutions by leveraging cutting-edge technologies such as generative and multimodal AI. They work with partners and clients across multiple sectors, including healthcare, financial services, insurance, media, and information services, to solve complex technology challenges and deliver high-performing results.

📋 Description

• Build, test, and maintain production pipelines (batch & real-time) on Snowflake, PySpark, Delta Lake, and Kafka. • Implement data quality checks, schema validation, and alerting at every pipeline stage. • Migrate legacy ETL/DWH to cloud-native AWS/Azure architectures with measurable latency and cost improvements. • Maintain CI/CD pipelines: automated testing, deployment, rollback, and IaC (Terraform, GitHub Actions). • Build end-to-end retrieval infrastructure: document ingestion, embedding pipelines, vector store management (Pinecone, FAISS, ChromaDB, OpenSearch), and hybrid retrieval layers. • Implement chunking, metadata filtering, and re ranking — tuning for precision, recall, and latency. • Maintain data freshness and index consistency; instrument with context relevance and faithfulness metrics. • Implement and maintain business entity mappings, ontologies, and knowledge graphs (Neo4j) per Architect design. • Build and version the feature store and semantic data contracts serving both ML models and LLM applications. • Manage metadata, data lineage, and audit trail instrumentation across the platform. • Build ML data infrastructure: training curation, feature engineering, MLflow experiment tracking, dataset versioning. • Support LLM fine-tuning workflows — corpus curation, quality filtering, dataset formatting. • Implement automated evaluation pipelines: factual accuracy, hallucination detection, regression tracking. • Maintain production monitoring dashboards for pipeline health, model metrics, and alerting. • Build and maintain data APIs, tool schemas, and memory/state stores that autonomous agents depend on. • Implement agent observability: capture inputs, retrieved context, tool calls, reasoning traces, and outputs. • Maintain text-to-SQL layers, semantic query interfaces, and context APIs for conversational AI consumers. • Implement RBAC, attribute-based access, PII detection/masking, data classification, and audit logging. • Enforce data contracts and schema governance with automated breaking-change detection and versioned migrations. • Build data quality monitoring (completeness, freshness, consistency) with automated alerting and root-cause tooling. • Support compliance readiness: audit trails, data provenance, and regulatory documentation.

🎯 Requirements

• 7+ years data engineering using Cloud services • 2+ years production AI/ML or LLM-era data infrastructure. Proven experience building production pipelines at scale — batch and streaming, Snowflake,AWS/Azure. • Deep expertise: Python, PySpark, Snowflake, Delta Lake, Kafka, Spark Structured Streaming. • Hands-on with vector stores, embedding pipelines, and retrieval infrastructure in production RAG environments. • Working knowledge of MLOps: MLflow, CI/CD for AI, automated evaluation, and production monitoring. • Strong grounding in data governance, quality frameworks, and compliance-**aligned engineering.

🏖️ Benefits

• Health insurance • 401(k) matching • Flexible work hours • Paid time off • Remote work options

Apply Now

Similar Jobs

🔥 10 hours ago

EY

10,000+ employees

💸 Finance

Data Engineering Manager overseeing scalable, enterprise-grade data solutions at EY. Leading a team and ensuring alignment with business objectives for advanced analytics.

Airflow

Apache

AWS

Cloud

EC2

ETL

Hadoop

Java

Jenkins

PySpark

Python

Scala

Spark

SQL

Subversion

🔥 17 hours ago

Sikich

1001 - 5000

Data Engineer at Sikich optimizing data solutions using Microsoft platforms. Responsible for building robust data pipelines and delivering insight-driven analytics.

🇮🇳 India – Remote

💰 Private Equity Round on 2024-05

⏰ Full Time

🟡 Mid-level

🟠 Senior

🚰 Data Engineer

Azure

Kafka

Python

Scala

Spark

SQL

🕒 Yesterday

Danaher Corporation

10,000+ employees

🧬 Biotechnology

🔬 Science

🤝 B2B

Data Engineer developing and supporting Azure-based data solutions. Supporting analytics and data warehouse operations for SCIEX and Danaher life sciences platforms.

Azure

Cloud

ERP

ETL

Oracle

Oracle ERP

SQL

🕒 Yesterday

NPS Prism

201 - 500

🤝 B2B

👥 B2C

☁️ SaaS

Data Engineer II responsible for developing ETL/ELT workflows and managing data lakes for NPS Prism. Collaborating with teams to design data solutions on cloud platforms like Azure and AWS.

AWS

Azure

Cloud

ETL

PySpark

Python

SQL

Tableau

🕒 Yesterday

Cloudera

1001 - 5000

🏢 Enterprise

☁️ SaaS

🤖 Artificial Intelligence

Senior Curriculum Developer at Cloudera designing technical training for Generative AI, MLOps, and Data Engineering. Creating instructor-led guides and delivering workshops for diverse learning audiences.

Airflow

Apache

Kafka

Python

Spark