Senior Data Engineer

🔥 46 minutes ago

Apply Now
Find Similar Remote Jobs

📊 Check your resume score for this job

Improve your chances of getting an interview by checking your resume score before you apply.

Logo of Spokeo

Spokeo

51 - 200 employees

👥 B2C

☁️ SaaS

🔌 API

B2C • SaaS • API

<Spokeo> is a people-search and data aggregation service that lets users search by name, phone, email, or address to find contact information, location history, social profiles, property records, court records, and other public records. It combines billions of records from consumer and industry sources into concise, easy-to-read reports, offers account-based report updates, enterprise search capabilities, and an API for integration. Spokeo is positioned for individual consumers seeking to reconnect, identify callers, or verify sellers, while also offering paid services and tools for businesses; it states it is not an FCRA consumer reporting agency.

📋 Description

• Develop, optimize, and improve data systems, including ETL pipelines, storage, and entity resolution • Build infrastructure and data automation pipelines to ingest, process, and load data from various sources • Automate and integrate new components into the data pipeline • Collaborate with stakeholders and data science teams to develop data products • Create unit and stress-test components to monitor technical performance • Develop data analysis tools to provide data insights and capture key metrics • Follow best practices for data governance, quality, cleansing, and other ETL-related activities

🎯 Requirements

• 7+ years of development experience in data engineering within a production environment • Proven experience working with large datasets exceeding 100M+ records or multiple terabytes • 2+ years of development experience in highly scalable, distributed systems and cluster architectures using AWS and utilizing EMR • 5+ years of hands-on programming experience with Python • 5+ years of professional experience working in big data ecosystems; Spark is required; PySpark is preferable • 3+ years of experience with SQL, schema design, and dimensional data modeling • 2+ years of professional experience working with dataflow orchestration tools, such as Airflow • 2+ years of experience with non-relational databases (e.g., DynamoDB, Elasticsearch, etc.) • A bachelor’s degree in Computer Science, Information Systems, Mathematics, or a related field is required

🏖️ Benefits

• Health insurance • 401(k) • Unlimited employee PTO • Bonus program • Equity plans

Apply Now

Similar Jobs

🔥 14 hours ago

Spotify

5001 - 10000

📱 Media

👥 B2C

🛍️ eCommerce

Data Engineer developing scalable advertising systems for Spotify's Ads Product & Technology team. Building backend services and APIs to enhance the advertising experience.

Airflow

Apache

AWS

Azure

Cassandra

Distributed Systems

DynamoDB

ETL

Google Cloud Platform

GRPC

Hadoop

Java

MySQL

Pandas

Postgres

Python

Scala

Spring

Spring Boot

SpringBoot

SQL

🔥 14 hours ago

Newsela

201 - 500

Data Engineer responsible for building and maintaining data integrations for K-12 platforms. Collaborating with stakeholders to ensure data accuracy and support school operations.

ETL

Python

SQL

🔥 15 hours ago

Dutch Bros Coffee

10,000+ employees

🛒 Retail

👥 B2C

Lead Engineer, Data at Dutch Bros Coffee optimizing foundational data ecosystem and building data platforms for analytics, machine learning, and AI. Collaborating on scalable and resilient data infrastructure.

Airflow

AWS

ETL

Python

SQL

🔥 17 hours ago

CVS Health

10,000+ employees

⚕️ Healthcare Insurance

🛒 Retail

🧘 Wellness

Senior Data Engineer responsible for creating ETL data pipelines at CVS Health. Collaborate with teams to implement data solutions for analytical capabilities.

AWS

Azure

Cloud

ETL

Google Cloud Platform

PySpark

Python

SQL

🔥 18 hours ago

Blue River Technology

201 - 500

🌾 Agriculture

🤖 Artificial Intelligence

🔧 Hardware

Software Engineer optimizing image-processing systems for agriculture. Collaborating with cross-functional teams to deliver insights and improve systems using data from agricultural robotics.

Airflow

Cloud

Python

TypeScript

Go