Search Remote Jobs

Lead Data Platform Engineer

Job not on LinkedIn

🔥 0 minutes ago

🗽 New York – Remote

info

💵 $125k - $174.3k / year

⏰ Full Time

🟠 Senior

🏗️ Platform Engineer

🦅 H1B Visa Sponsor

info
Apply Now
Find Similar Remote Jobs

📊 Check your resume score for this job

Improve your chances of getting an interview by checking your resume score before you apply.

Logo of Coupa Software

Coupa Software

1001 - 5000 employees

Founded 2006

☁️ SaaS

💸 Finance

🛍️ eCommerce

SaaS • Finance • eCommerce

Coupa Software is a leading provider of business spend management solutions. Their platform focuses on optimizing and transforming direct and indirect spend across procurement, finance, supply chain, and IT. Coupa leverages AI and extensive data insights to drive cost efficiencies, manage supplier relationships, and mitigate risks. With products covering areas such as invoicing, payments, expense management, and supply chain collaboration, Coupa serves a wide range of industries including automotive, healthcare, retail, and more. Their comprehensive community and partner ecosystem enable organizations to unlock hidden savings and improve compliance, promoting growth and resilience in a changing economic climate.

📋 Description

• Manage end-to-end **Data pipeline **(ETL jobs) within agreed SLAs. • Manage AWS core and **big data services** (S3, IAM, EMR, Redshift, etc..). • Running applications in containers (ECS, Docker). • Lead Day 2 operational lifecycle for ML and GenAI infrastructure. This includes designing, deploying, and maintaining high-availability production LLM serving platforms, implementing automated scaling, self-healing, and infrastructure-as-code patterns. Focus on proactive reliability, model performance observability, and continuous cost optimization for high-compute AI workloads. • Collaborate closely with our product development and engineering teams to create AI-driven features. • Drive cloud operations consistency by automating platform maintenance, standardizing infrastructure configurations (IaC), and implementing robust release management processes to minimize drift across multi-cloud environments. • Manage AWS infrastructure using code (Terraform, Chef, etc..). • Administering applications running in Linux operating system. • Enable application and system monitoring for better observability. • Application and infrastructure support for ETL jobs and data pipelines including participating in an on-call rotation for after-hours emergencies. • Collaborate with platform and Dev teams to plan and deploy product releases and patch Linux/ECS clusters. • Ability to participate in design reviews, code reviews, and troubleshooting incidents. • Ability to operate in a high-pressure environment and troubleshoot complex issues quickly while successfully handling multiple priorities. • Ability to record, write, and review RCAs.

🎯 Requirements

• Bachelor's Degree and at least 8+ years of experience managing Big Data technologies and Data Pipelines. • Sound knowledge and experience in Linux administration and troubleshooting. • 5+ years of experience in managing cloud infrastructure and platforms, such as AWS and Azure. • Familiar with the current engineering landscape in the generative AI space and have a strong interest in AI and related technologies. • Strong expertise in MLOps and production-grade LLM operations. Proven track record in managing high-availability model inference clusters, automating model lifecycle management, and implementing advanced observability (latency, throughput, and error rate monitoring) specifically for AI workloads. • Have Bash or Python scripting experience. • Experience with containerization, Amazon ECS, EKS/ Azure AKS. • Experience with tools like Chef, Ansible, Jenkins, Rundeck, or equivalent. • Experience with source control systems such as Git and operating in complex branching strategies. • Experience with Infrastructure as Code products like Terraform, helm charts. • Good understanding of DNS and Load balancers setup and troubleshooting. • Experience in Big Data platforms/Data lakes and managing Business Intelligence tools (like looker..). • Knowledge in ApacheSpark architecture and troubleshooting Java applications. • Basic understanding of MySQL Server and general database knowledge. • Excellent written and verbal communication with a passion for solving the problem. • Confidence in your ability to own and deliver projects and issues to resolution on your own & can think and act globally. • Deep experience in Day 2 cloud operations, including automated incident remediation, capacity planning, and managing large-scale production cloud environments with a focus on performance and reliability.

Apply Now

Similar Jobs

🔥 6 hours ago

SitusAMC

5001 - 10000

🏠 Real Estate

💸 Finance

🤝 B2B

Platform Engineer specializing in CI/CD automation and cloud technologies within SitusAMC's Cloud development team. Requires extensive experience with AWS, Kubernetes, and Release automation.

🔥 11 hours ago

Cytora

51 - 200

Senior Platform Engineer developing Cytora’s serverless Underwriting Productivity Suite for insurance applications. Collaborating with platform team on infrastructure and CI/CD pipelines.

🕒 Yesterday

EvenUp

51 - 200

🤖 Artificial Intelligence

☁️ SaaS

Senior Frontend Engineer leading development of scalable AI systems at EvenUp, a vertical SaaS company. Collaborating with cross-functional teams to deliver quality software solutions.

🕒 Yesterday

NVIDIA

10,000+ employees

🤖 Artificial Intelligence

🎮 Gaming

Senior HPC Support Engineer handling AI hardware and software solutions for NVIDIA's compute and GPU platforms. Resolving sophisticated customer issues and maintaining a high level of customer happiness.

🕒 Yesterday

nDeavour Consulting

1 - 10

🎯 Recruiter

👥 HR Tech

🤝 B2B

Senior Data Engineer building analytical data platform on GCP for Mobile Wave Solutions. Collaborating on AI features and data governance for reliable data consumption.