Senior Data Engineer

Job not on LinkedIn

🔥 1 minute ago

🗣️🇧🇷🇵🇹 Portuguese Required

Apply Now
Find Similar Remote Jobs

📊 Check your resume score for this job

Improve your chances of getting an interview by checking your resume score before you apply.

Logo of Leega

Leega

201 - 500 employees

Founded 2010

🔌 API

🤖 Artificial Intelligence

API • Artificial Intelligence • Cloud Solutions

Leega is a leading technology solutions provider in Latin America, specializing in data analytics and cloud solutions. As the first company in the region certified by Google Cloud for Data Analytics, Leega offers a range of services including application development, machine learning, and risk management analytics. The firm partners with major cloud services such as AWS and Microsoft Azure to help businesses enhance their data management and transition effectively to the cloud, ultimately driving digital transformation and innovation.

📋 Description

• You will architect and evolve the datalake that is the company's data nervous system — the foundation that feeds, in real time, the dynamic pricing engine, ML models, and the group's business intelligence. • This is an ownership role: you define the multi-tenant Lakehouse architecture, from streaming to the semantic layer, and are responsible for its reliability, governance, and cost. • Design and evolve the data lake on Apache Iceberg over S3 — well-defined layers, partitioning and compaction, time-travel and support for DELETE/UPDATE for LGPD (Brazilian data protection law). • Build real-time ingestion (Kafka, Flink, CDC with Debezium) with controlled schema evolution (Schema Registry) and delivery guarantees. • Model the transformation layer in dbt and orchestrate batch and quality flows in Airflow, from crawler to backfill. • Maintain metric definitions in Cube.js — the single source that feeds BI and AI agents and ensures consistency across the company. • Operate federated and low-latency OLAP queries over the lake, with cost and access isolation by tenant and performant queries. • Ensure data testing, lineage and cost efficiency, keeping the platform reliable as it scales.

🎯 Requirements

• Strong command of SQL and query optimization in distributed environments (Minimum 5 years). • Python with solid experience in PySpark or distributed processing. • Orchestration (Airflow), ELT and dbt applied at scale (Minimum 4 years). • Streaming (Kafka, Flink) and Lakehouse architectures with Apache Iceberg (Minimum 3 years). • Strong understanding of data governance, quality, and modeling. • Comfortable with AI-assisted development (e.g., Claude Code). • CDC (Debezium) and low-latency OLAP (ClickHouse, Pinot, Trino/Athena). • Semantic layers (Cube.js, dbt) and Data Mesh architectures. • Governance and catalog tools (OpenMetadata, Lake Formation). • Vector databases (Qdrant) and data pipelines for ML.

🏖️ Benefits

• Remote work • Project duration: 6 months, with possibility of extension or conversion to permanent employment.

Apply Now

Similar Jobs

🔥 13 hours ago

Experian

10,000+ employees

🤖 Artificial Intelligence

🤝 B2B

☁️ SaaS

Junior Data Engineering Analyst at Experian supporting AI solution development and automation in various sectors. Collaborating with experienced professionals to build scalable platforms.

🗣️🇧🇷🇵🇹 Portuguese Required

AWS

Cloud

Docker

NoSQL

Pandas

Python

PyTorch

Scikit-Learn

Spark

SQL

Tensorflow

🔥 16 hours ago

INDT - Instituto de Desenvolvimento Tecnológico

201 - 500

🧬 Biotechnology

🔒 Cybersecurity

📡 Telecommunications

Data Engineer supporting customer analytics team with data ingestion and pipeline maintenance. Involves integration of legacy systems and development using Databricks.

🗣️🇧🇷🇵🇹 Portuguese Required

ETL

PySpark

Spark

SQL

🔥 16 hours ago

Reply

10,000+ employees

Data Engineer at Reply specializing in modeling and maintaining Palantir data solutions. Collaborating on AI-driven projects and ensuring data governance and quality.

🗣️🇧🇷🇵🇹 Portuguese Required

PySpark

Python

SQL

🔥 21 hours ago

avra

1 - 10

Senior Software Engineer developing data products for Avra’s AI infrastructure in a remote-first environment. Collaborating with cross-functional teams to build and maintain data systems and services.

🗣️🇧🇷🇵🇹 Portuguese Required

AWS

Cloud

Distributed Systems

Google Cloud Platform

Python

Rust

Go

🕒 Yesterday

Experian

10,000+ employees

🤖 Artificial Intelligence

🤝 B2B

☁️ SaaS

Data Engineer II at Experian designing and implementing Data Lake architectures. Collaborating on AI and ML solutions for innovative data-driven insights in various industries.

🗣️🇧🇷🇵🇹 Portuguese Required

Airflow

Apache

PySpark

Python

Scala

Spark

SQL

Terraform