Data Engineer

Job not on LinkedIn

🔥 0 minutes ago

Apply Now
Find Similar Remote Jobs

📊 Check your resume score for this job

Improve your chances of getting an interview by checking your resume score before you apply.

Logo of Neurons Lab

Neurons Lab

51 - 200 employees

💰 Corporate Round on 2022-10

Neurons Lab is a globally distributed AI R&D company that helps deep tech innovators to accelerate data-driven products development and launch. Our team has expertise in fundamental sciences, full-stack AI/ML engineering, and product design. Such a rare combination and access to scarce talent allows Neurons Lab to build disruptive solutions for clients in HealthTech and EnergyTech industries. Neurons Lab operates within a proprietary delivery framework that is tailored to the innovation environment: fierce competition, tight timelines, little-to-none datasets, and the necessity to generate novel solutions.

📋 Description

• Reproduce a descriptive-statistics report end-to-end so any figure traces back to raw source — closing the gap the client admitted (numbers they can't currently defend). • Profile and reconcile differing source schemas across acquired entities: map differing field names, types, encodings and business definitions for the same concept into one conformed model. • Build dbt staging → intermediate → mart models with tests; codify the harmonized definitions the Data Science Lead specifies. • Write Great Expectations suites (null / range / uniqueness / referential checks) and wire them into the pipeline so bad data fails loudly rather than silently corrupting analysis. • Implement entity / identity resolution (deterministic + fuzzy matching) where there is no clean shared key for the same customer or account across sources. • Implement and verify anonymization / pseudonymization (hashing / tokenization / k-anonymity) and evidence that re-identification risk is controlled for the client's IT / compliance team. • Optimize Spark / Glue jobs over tens of millions of rows — partitioning, file formats (Parquet), incremental loads, cost control. • Orchestrate with Airflow / Step Functions; build repeatable, scheduled pipelines rather than one-off scripts. • Prepare clean, documented, feature-ready datasets for the PD / delinquency models. • Document runbooks so the offshore team can operate the pipelines and handover takes days, not weeks; help scope onboarding of the remaining (Ireland + additional) sources.

🎯 Requirements

• 4+ years in data engineering, with strong AWS + Spark / SQL at scale • Demonstrated experience harmonizing / integrating data across multiple source systems • Experience building validated, reproducible pipelines in a regulated environment (BFSI, healthcare, government) — strong plus • Comfortable stepping into a messy, partly-built data estate and bringing it up to standard • Comfortable as the sole or lead data engineer on a small (3–4 person) delivery pod

🏖️ Benefits

• Full-time engagement preferable.

Apply Now

Similar Jobs

🕒 Yesterday

Sigma Software Group

1001 - 5000

🎮 Gaming

📡 Telecommunications

Senior Data Engineer building modern cloud-native data platforms and migrating legacy systems. Collaborates with Machine Learning, Data Science, and Product teams to innovate data infrastructure.

Airflow

Apache

AWS

Azure

Cloud

ETL

Google Cloud Platform

Kafka

Microservices

PySpark

Python

Spark

SQL

Terraform

Unity

Vault

🕒 Yesterday

Sigma Software Group

1001 - 5000

🎮 Gaming

📡 Telecommunications

Senior Data Engineer building advanced, cloud-native data platforms and migrating legacy systems to the cloud. Collaborating with teams on AI architectures and data pipelines.

Airflow

Apache

AWS

Azure

Cloud

ETL

Google Cloud Platform

Kafka

Microservices

PySpark

Python

Spark

SQL

Terraform

Unity

Vault

🕒 2 days ago

SOFTETA

11 - 50

☁️ SaaS

🏢 Enterprise

🤝 B2B

Senior Data Engineer designing, implementing, and improving data architectures for clients in banking. Collaborating with teams to optimize data processes and maintain high standards.

Airflow

Amazon Redshift

AWS

Azure

BigQuery

Cloud

Docker

ETL

Google Cloud Platform

Kafka

Kubernetes

NoSQL

Python

Spark

SQL

🕒 3 days ago

InPost Group

10,000+ employees

🛍️ eCommerce

🚗 Transport

Data Engineer responsible for designing data pipelines and streaming systems at InPost. Working with cross-functional teams to create data products that power ML models and analytics.

🗣️🇵🇱 Polish Required

Apache

AWS

Azure

BigQuery

Cassandra

Cloud

Docker

ETL

Google Cloud Platform

Java

Jenkins

Kafka

MongoDB

NoSQL

Postgres

PySpark

Python

Scala

SOAP

Spark

SQL

🕒 6 days ago

Miratech

501 - 1000

Middle Data Engineer specializing in Azure Databricks to design and develop modern data pipelines for Miratech. Collaborating on data architectures, enabling advanced analytics and business intelligence.

🇵🇱 Poland – Remote

💰 Private Equity Round on 2022-04

⏰ Full Time

🟡 Mid-level

🟠 Senior

🚰 Data Engineer

Azure

ETL

PySpark

Spark

SQL

SSIS