Spark Data Engineer

Enterprise • Data

Mactores is a company that provides end-to-end data platform solutions aimed at accelerating business value through automation. Since 2008, Mactores has been helping businesses with digital transformation, offering services like Enterprise Data Lakes, Scalable Databases, Modern Data Warehouses, Automated DataOps, MLOps, and Generative AI solutions. They focus on enabling faster and cost-effective migrations and modernizations in data analytics, partnering with leading platforms to drive innovation and success. Mactores works alongside tech teams to strategize and implement the right data solutions timely and efficiently.

51 - 200 employees

Founded 2008

🏢 Enterprise

Spark Data Engineer

Job not on LinkedIn

November 18

🇮🇳 India – Remote

⏰ Full Time

🟡 Mid-level

🟠 Senior

🚰 Data Engineer

Apache

AWS

Azure

Cloud

ETL

Google Cloud Platform

Kubernetes

PySpark

Scala

Spark

SQL

Yarn

Apply Now

Mactores

Enterprise • Data

51 - 200 employees

Founded 2008

🏢 Enterprise

📋 Description

• Architect, design, and build scalable data pipelines and distributed applications using Apache Spark (Spark SQL, DataFrames, RDDs) • Develop and manage ETL/ELT pipelines to process structured and unstructured data at scale. • Write high-performance code in Scala or PySpark for distributed data processing workloads. • Optimize Spark jobs by tuning shuffle, caching, partitioning, memory, executor cores, and cluster resource allocation. • Monitor and troubleshoot Spark job failures, cluster performance, bottlenecks, and degraded workloads. • Debug production issues using logs, metrics, and execution plans to maintain SLA-driven pipeline reliability. • Deploy and manage Spark applications on on-prem or cloud platforms (AWS, Azure, or GCP). • Collaborate with data scientists, analysts, and engineers to design data models and enable self-serve analytics. • Implement best practices around data quality, data reliability, security, and observability. • Support cluster provisioning, configuration, and workload optimization on platforms like Kubernetes, YARN, or EMR/Databricks. • Maintain version-controlled codebases, CI/CD pipelines, and deployment automation. • Document architecture, data flows, pipelines, and runbooks for operational excellence.

🎯 Requirements

• Bachelor’s degree in Computer Science, Engineering, or a related field. • 4+ years of experience building distributed data processing pipelines, with deep expertise in Apache Spark. • Strong understanding of Spark internals (Catalyst optimizer, DAG scheduling, shuffle, partitioning, caching). • Proficiency in Scala and/or PySpark with strong software engineering fundamentals. • Solid expertise in ETL/ELT, distributed computing, and large-scale data processing. • Experience with cluster and job orchestration frameworks. • Strong ability to identify and resolve performance bottlenecks and production issues. • Familiarity with data security, governance, and data quality frameworks. • Excellent communication and collaboration skills to work with distributed engineering teams. • Ability to work independently and deliver scalable solutions in a fast-paced environment

🏖️ Benefits

• Equal opportunities in all of our employment practices • Recruitment, compensation, promotions, transfers, disciplinary action, layoff, training, and social programs respect non-discrimination principles.

Apply Now

Similar Jobs

Senior Data Engineer

November 18

Netomi

51 - 200

🤖 Artificial Intelligence

🏢 Enterprise

☁️ SaaS

Senior Data Engineer using data engineering techniques to create scalable data solutions for enterprise customer experience at Netomi.

🇮🇳 India – Remote

💰 $30M Series B on 2021-11

⏰ Full Time

🟠 Senior

🚰 Data Engineer

Airflow

Amazon Redshift

Apache

AWS

Azure

BigQuery

Cloud

Distributed Systems

Docker

ETL

Google Cloud Platform

Hadoop

Kafka

Kubernetes

Python

Spark

SQL

Data Engineer

November 14

NextHire

11 - 50

Data Engineer specializing in data mapping, transformations, and integrations at IT consulting firm. Working with Python, SQL, and data-driven solutions.

🇮🇳 India – Remote

⏰ Full Time

🟡 Mid-level

🟠 Senior

🚰 Data Engineer

Python

SQL

Lead – Data Engineer I

November 13

Ollion

501 - 1000

🤖 Artificial Intelligence

Lead Data Engineer responsible for designing and implementing scalable data platforms in India. Collaborating with cross-functional teams and mentoring junior engineers.

🇮🇳 India – Remote

⏰ Full Time

🟠 Senior

🚰 Data Engineer

Airflow

AWS

Cloud

Docker

IoT

JavaScript

Kubernetes

Microservices

Python

SDLC

SQL

Cloud Data Engineer

November 11

Vodafone

178672 - 178672

📡 Telecommunications

👥 B2C

🤝 B2B

Cloud Data Engineer developing scalable applications on GCP while collaborating with data engineering teams. Requires experience in SQL, PL/SQL, and cloud technologies for large datasets.

🇮🇳 India – Remote

💰 Post-IPO Equity on 2022-09

⏰ Full Time

🟠 Senior

🔴 Lead

🚰 Data Engineer

BigQuery

Cloud

Google Cloud Platform

NoSQL

SQL

Data Architect

November 11

Astreya

1001 - 5000

🔒 Cybersecurity

🏢 Enterprise

☁️ SaaS

Data Architect designing, developing, and analyzing big data solutions for actionable insights. Leading projects on statistical modeling and database management with a focus on diverse data sources.

🇮🇳 India – Remote

⏰ Full Time

🟠 Senior

🔴 Lead

🚰 Data Engineer

Hadoop

Python

Spark

SQL

Tensorflow