Senior Data Engineer – AI Ingestion Platform

🔥 0 minutes ago

Apply Now
Find Similar Remote Jobs

📊 Check your resume score for this job

Improve your chances of getting an interview by checking your resume score before you apply.

Logo of Software Mind

Software Mind

1001 - 5000 employees

Founded 1999

🤖 Artificial Intelligence

☁️ SaaS

📡 Telecommunications

💰 Private Equity Round on 2020-12

Artificial Intelligence • SaaS • Telecommunications

Software Mind is a technology company that specializes in software development and digital transformation services. With a focus on AI and cloud solutions, the company offers a wide range of services including custom software development, mobile app development, and cloud consulting. Software Mind serves various industries such as financial services, telecom, biotech, and media, providing tailored solutions to accelerate digital transformations and business growth globally.

📋 Description

• Build and own the historical email ingestion pipeline via Microsoft Graph API • Implement SharePoint / OneDrive document ingestion pipeline with scoped folder access • Design and implement the PII minimisation pre-processing layer • Build the vector store indexing workflow (OpenSearch/Pinecone) with per-tenant data isolation • Define and implement the data processing schema; produce and maintain schema documentation • Build the OCR routing orchestrator and integrate OCR service for scanned documents • Implement the raw text / content extraction layer for all supported document types • Define and prototype push vs. pull ingestion strategy, from one-time PoC through to incremental nightly pipeline • Ensure data lineage and audit traceability are built into pipeline outputs from the outset

🎯 Requirements

• 6+ years in data engineering; strong pipeline and ETL/ELT experience required • Proficiency in Python for data pipeline development • Experience with Microsoft Graph API or similar enterprise email/document APIs (M365, Exchange Online) • AWS data services: S3, DynamoDB, Glue, and/or Lambda-based event-driven processing • Familiarity with PII detection and data minimisation techniques (regex-based, NER-based, or purpose-built libraries) • Experience with vector store indexing or semantic search pipeline construction

🏖️ Benefits

• Remote work options

Apply Now

Similar Jobs

🔥 23 hours ago

Blend360

501 - 1000

🤖 Artificial Intelligence

🏢 Enterprise

Data Engineer designing and building scalable data pipelines using Azure and Databricks at an AI services provider. Collaborating with analytics and engineering teams to improve data processes and architecture.

Apache

Azure

PySpark

Spark

SQL

🕒 Yesterday

Stefanini LATAM

10,000+ employees

🤖 Artificial Intelligence

🔒 Cybersecurity

☁️ SaaS

Data Engineer at Stefanini responsible for designing and optimizing data pipelines. Collaborating with Data Analytics and Business Intelligence teams for robust data integration and solutions.

🗣️🇪🇸 Spanish Required

Apache

AWS

Azure

Cloud

ETL

Google Cloud Platform

PySpark

Python

Spark

SQL

🕒 2 days ago

MUTT DATA

51 - 200

🤖 Artificial Intelligence

📡 Telecommunications

Data Engineer Senior at Muttdata, a remote startup leveraging Big Data and Machine Learning technologies. Collaborating on innovative systems and enhancing data infrastructures remotely.

Airflow

AWS

Cloud

Docker

Numpy

Pandas

Python

SQL

Tableau

🕒 2 days ago

Blend360

501 - 1000

🤖 Artificial Intelligence

🏢 Enterprise

Data Engineer Manager leading a team for Journey Analytics initiatives at Blend, an AI services provider. Focus on scalable data solutions and data engineering best practices.

🇦🇷 Argentina – Remote

💰 $100M Private Equity Round on 2022-08

⏰ Full Time

🟠 Senior

🔴 Lead

🚰 Data Engineer