Mid-level Data Engineer – GCP, DBT

Job not on LinkedIn

🔥 13 minutes ago

🗣️🇧🇷🇵🇹 Portuguese Required

Apply Now
Find Similar Remote Jobs

📊 Check your resume score for this job

Improve your chances of getting an interview by checking your resume score before you apply.

Logo of Leega

Leega

201 - 500 employees

Founded 2010

🔌 API

🤖 Artificial Intelligence

API • Artificial Intelligence • Cloud Solutions

Leega is a leading technology solutions provider in Latin America, specializing in data analytics and cloud solutions. As the first company in the region certified by Google Cloud for Data Analytics, Leega offers a range of services including application development, machine learning, and risk management analytics. The firm partners with major cloud services such as AWS and Microsoft Azure to help businesses enhance their data management and transition effectively to the cloud, ultimately driving digital transformation and innovation.

📋 Description

• Analysis and Planning of Loads/Pipelines: • Assess the data warehouse (DW) architecture and requirements. • Map data, transformations and processes across GCP services (Cloud Storage, BigQuery, Dataproc). • Define data migration strategies (full load, incremental, CDC). • Develop a data architecture plan on GCP. • Design and Data Modeling on GCP: • Design table schemas in BigQuery, taking performance, cost and scalability into account. • Define partitioning and clustering strategies for BigQuery. • Model data zones in Cloud Storage (Bronze, Silver and Gold). • ELT/ETL Pipeline Development: • Create data transformation routines using Dataproc (Spark) or Dataflow to load data into BigQuery. • Translate business logic and existing transformations into GCP. • Implement data validation and data quality mechanisms. • Performance and Cost Optimization: • Optimize BigQuery queries to reduce costs and improve performance. • Tune and optimize Spark jobs on Dataproc. • Monitor and optimize GCP resource usage to control costs. • Data Security and Governance: • Implement and ensure data security in transit and at rest. • Define and enforce IAM policies to control access to data and resources. • Ensure compliance with data governance policies. • Monitoring and Support: • Troubleshoot performance and functional issues in data pipelines and GCP resources. • Documentation: • Document the architecture, data pipelines, data models and operational procedures. • Communication: • Communicate effectively with team members, stakeholders and other business areas. • Ensure clear communication between architectural definitions and software components, and support the evolution and quality of the team's developments. • Agile Methodologies / Jira: • Familiarity with agile methodologies, their ceremonies and proficiency with the Jira tool.

🎯 Requirements

• Proven experience with dbt (minimum 3 years); • Strong knowledge of: • models (staging, intermediate, marts) • ref() and source() • macros (Jinja) • seeds and snapshots • tests (not null, unique, custom) • Layered organization: • Staging → Transform → Mart (Data Warehouse) • Google Cloud Platform (GCP): • BigQuery: Deep knowledge of data modeling, query optimization, partitioning, clustering, data loading (streaming and batch), security and data governance. • Cloud Storage: Experience managing buckets, storage classes, lifecycle policies, access control (IAM) and data security. • Dataproc: Skill in provisioning, configuring and managing Spark/Hadoop clusters, job optimization, and integration with other GCP services. • Dataflow/Composer/DBT: Knowledge of orchestration and data processing tools for ELT/ETL pipelines. • Cloud IAM (Identity and Access Management): Implementation of security policies and granular access control. • VPC, Networking and Security: Understanding of networks, subnets, firewall rules and cloud security best practices. • Programming Languages: • Python and PySpark: Essential for automation scripts, data pipeline development and integration with GCP APIs. • SQL (advanced): For BigQuery, dbt and data transformations. • Shell Scripting: For task automation. • Version Control: • Git/GitHub/Bitbucket.

🏖️ Benefits

• Health insurance (Porto Seguro) • Dental insurance (Porto Seguro) • Profit Sharing (PLR) • Childcare assistance • Meal and food allowance (Alelo) • Home office allowance • Partnerships with educational institutions • Support for certifications, including cloud certifications • Livelo points • TotalPass • Mindself

Apply Now

Similar Jobs

🔥 16 hours ago

Verity Group

51 - 200

🤖 Artificial Intelligence

🤝 B2B

☁️ SaaS

Data Engineer responsible for understanding, transforming, and validating legacy data models at Verity. Involves mapping entities and executing data extraction processes.

🗣️🇧🇷🇵🇹 Portuguese Required

BigQuery

Cloud

Google Cloud Platform

SQL

🔥 18 hours ago

Compass

10,000+ employees

🏠 Real Estate

📱 Media

Data Engineer specializing in AI for data modernization and system optimization. Join Compass UOL in shaping next-gen data solutions and AI-driven outputs.

🗣️🇧🇷🇵🇹 Portuguese Required

PySpark

Python

Spark

SQL

🔥 23 hours ago

FCamara Consulting & Training

1001 - 5000

🛍️ eCommerce

🤖 Artificial Intelligence

Data Engineer focused on AI building reliable, scalable infrastructure for Machine Learning applications and Generative AI. Ensuring data quality and enabling efficient data processing pipelines.

🗣️🇧🇷🇵🇹 Portuguese Required

Airflow

Amazon Redshift

Apache

AWS

Azure

BigQuery

Cloud

Docker

Google Cloud Platform

Kafka

Kubernetes

NoSQL

Pandas

PySpark

Python

Scala

Spark

SQL

🕒 Yesterday

Compass

10,000+ employees

🏠 Real Estate

📱 Media

Data Engineer working remotely to structure and sustain analytical platforms with Databricks and Azure. Implementing and managing environments, ensuring data quality and governance.

🗣️🇧🇷🇵🇹 Portuguese Required

Azure

Cloud

ETL

SQL

Unity

🕒 Yesterday

EZCORP

5001 - 10000

Data Engineer focusing on ETL processes and data quality improvement at EZCORP. Developing scalable data processing systems with tools like Azure Data Factory and Spark.

Azure

Cloud

ETL

Kubernetes

PySpark

Python

Shell Scripting

Spark

SQL

Terraform