Spark Engineer

Job not on LinkedIn

September 28

Apply Now
Logo of CAI

CAI

B2B • Recruitment • Enterprise

CAI is a company that prides itself on valuing people and leveraging modern technologies to drive progress. The company offers diverse career opportunities, placing a strong emphasis on diversity, equity, and inclusion as well as corporate social responsibility. CAI aims to empower its employees to unlock their potential and find their fit within the organization. They offer tailored job recommendations and are keen to attract talent through their inclusive culture and talent community initiatives, with specific focus on supporting military veterans and neurodiverse individuals. By fostering an environment of growth and collaboration, CAI seeks to be a catalyst for innovation and empowerment.

5001 - 10000 employees

Founded 1983

🤝 B2B

🎯 Recruiter

🏢 Enterprise

📋 Description

• Design, build, and optimize large-scale data processing systems using Apache Spark (Batch and Streaming) • Collaborate with data scientists, analysts, and engineers to ensure scalable, reliable, and efficient data solutions • Design, develop, and maintain big data solutions using Apache Spark • Build data pipelines for processing structured, semi-structured, and unstructured data from multiple sources • Optimize Spark jobs for performance and scalability across large datasets • Integrate Spark with various data storage systems (HDFS, S3, Hive, Cassandra, etc.) • Implement data quality checks, monitoring, and alerting for Spark-based workflows • Ensure security and compliance of data processing systems • Troubleshoot and resolve data pipeline and Spark job issues in production environments

🎯 Requirements

• Bachelor’s degree in Computer Science, Engineering, or related field (Master’s preferred) • 3+ years of hands-on experience with Apache Spark (Core, SQL, Streaming) • Strong programming skills in Scala, Java, or Python (PySpark) • Solid understanding of distributed computing concepts and big data ecosystems (Hadoop, YARN, HDFS) • Experience with data serialization formats (Parquet, ORC, Avro) • Familiarity with data lake and cloud environments (AWS EMR, Databricks, GCP DataProc, or Azure Synapse) • Knowledge of SQL and experience with data warehouses (Snowflake, Redshift, BigQuery is a plus) • Strong background in performance tuning and Spark job optimization • Experience with CI/CD pipelines and version control (Git) • Familiarity with containerization (Docker, Kubernetes) is an advantage • Preferred: Experience with stream processing frameworks (Kafka, Flink) • Preferred: Exposure to machine learning workflows with Spark MLlib • Preferred: Knowledge of workflow orchestration tools (Airflow, Luigi) • Ability to safely and successfully perform the essential job functions (sedentary work) • Ability to conduct repetitive tasks on a computer, utilizing a mouse, keyboard, and monitor

🏖️ Benefits

• Remote work • Reasonable accommodation for applicants (application.accommodations@cai.io)

Apply Now

Similar Jobs

September 26

Network Support Technician ensuring reliable and high-performing connectivity for clients. Supporting and troubleshooting complex network environments for IT companies with a focus on managed services.

Firewalls

ITSM

September 14

Senior network engineer managing and troubleshooting client networks remotely. Diagnose Level 3 issues and implement secure technical solutions.

AWS

Azure

Cloud

Cyber Security

DNS

Firewalls

TCP/IP

September 4

Remote MEPF Building Services Engineer designing and coordinating mechanical, electrical, plumbing, and fire systems for UK property development projects. Collaborates with UK and Philippines teams.

September 4

Senior SRE at Series A payments fintech; build IaC, CI/CD, observability, and lead incident response to ensure platform performance and scalability.

AWS

Cloud

Docker

EC2

Grafana

Java

Jenkins

Kubernetes

MySQL

Postgres

Prometheus

Python

Terraform

Go

September 4

Senior Site Reliability Engineer for payments orchestration platform. Build and automate cloud infrastructure, observability, and incident management to maintain high availability.

AWS

Cloud

Docker

EC2

Grafana

Java

Jenkins

Kubernetes

MySQL

Postgres

Prometheus

Python

Terraform

Go

Built by Lior Neu-ner. I'd love to hear your feedback — Get in touch via DM or support@remoterocketship.com