Data Engineer

Job not on LinkedIn

September 10

Apply Now
Logo of Cummins Inc.

Cummins Inc.

Energy • Transport • Hardware

Cummins Inc. is a global power technology leader that designs, manufactures, and distributes a variety of engines and power systems solutions. They offer products that range from diesel and natural gas engines to hybrid and electric power systems, as well as components like turbochargers, fuel systems, and emissions solutions. With a strong emphasis on innovation, Cummins aims to reduce emissions and improve fuel efficiency. The company is dedicated to helping industries navigate the transition to cleaner energy through integrated power solutions suitable for diverse applications such as on-highway, marine, mining, and construction. Additionally, Cummins provides services including remote monitoring, diagnostics, and aftermarket support, reinforcing its commitment to sustainability and customer service excellence.

10,000+ employees

Founded 1919

⚡ Energy

🚗 Transport

🔧 Hardware

💰 $75M Grant on 2024-07

📋 Description

• Supports, develops and maintains a data and analytics platform • Effectively and efficiently process, store and make data available to analysts and other consumers • Work with Business and IT teams to understand requirements and enable agile data delivery at scale • Implement and automate deployment of distributed systems for ingesting and transforming data from relational, event-based, and unstructured sources • Implement methods to continuously monitor and troubleshoot data quality and data integrity issues • Implement data governance processes and methods for managing metadata, access, and retention • Develop reliable, efficient, scalable and quality data pipelines with monitoring and alert mechanisms using ETL/ELT tools or scripting languages • Develop physical data models and implement data storage architectures per design guidelines • Analyze complex data elements, system data flow, dependencies, and relationships to contribute to conceptual, physical and logical data models • Participate in testing and troubleshooting of data pipelines • Develop and operate large-scale data storage and processing solutions using distributed and cloud-based platforms (Data Lakes, Hadoop, NoSQL databases) • Use agile development technologies (DevOps, Scrum, Kanban) and continuous improvement for data-driven applications

🎯 Requirements

• College, university, or equivalent degree in relevant technical discipline, or relevant equivalent experience required • Relevant experience preferred (temporary student employment, intern, co-op, or extracurricular team activities) • Exposure to Big Data open source technologies • Experience with SPARK, Scala/Java, Map-Reduce, Hive, Hbase, and Kafka (or equivalent) • Proficiency with SQL query language • Clustered compute cloud-based implementation experience • Familiarity developing applications requiring large file movement for a Cloud-based environment • Experience with ETL/ELT tools or scripting languages for data ingestion and transformation • Experience developing physical data models and implementing data storage architectures • Experience with data governance, metadata management, access and retention processes • Experience operating large scale data storage and processing solutions (Data Lakes, Hadoop, Hbase, Cassandra, MongoDB, Accumulo, DynamoDB, or equivalents) • Exposure to Agile software development (DevOps, Scrum, Kanban) • Strong programming, testing, and build automation skills • Experience in data quality monitoring and troubleshooting • Problem solving, communication, collaboration, and customer focus competencies • May require licensing for compliance with export controls or sanctions regulations

Apply Now

Similar Jobs

September 10

ALT

51 - 200

Data Engineer owning Alt's data pipelines ingesting marketplace transactions. Driving pricing accuracy and analytics for trading-card marketplace.

Airflow

AWS

Cloud

Pandas

Puppeteer

PySpark

Python

Selenium

SQL

Vault

September 10

Build and optimize data pipelines and warehouses powering internal analytics and customer-facing data products at Recharge.

Airflow

Amazon Redshift

AWS

Cloud

Docker

ETL

Google Cloud Platform

Kubernetes

MySQL

Oracle

Postgres

Python

SQL

September 10

Senior Data Engineer ingesting API data into Microsoft Fabric, implementing Medallion architecture and syncing datasets to Azure Synapse for analytics at Curotec.

Azure

Cloud

PySpark

Python

September 10

Senior Data Engineer building large-scale AI-driven data infrastructure at Altimate AI. Designing PB-scale pipelines, SQL intelligence, and cloud-native systems overlapping US Pacific Time.

Airflow

AWS

Cloud

Kubernetes

Open Source

Python

SQL

September 9

Lead Data Engineer building and scaling Hopscotch Primary Care's data platform, pipelines, and governance to support care delivery and analytics.

AWS

Azure

Cloud

Google Cloud Platform

PySpark

Python

Built by Lior Neu-ner. I'd love to hear your feedback — Get in touch via DM or support@remoterocketship.com