Senior Site Reliability Engineer

Job not on LinkedIn

October 28

Apply Now
Logo of Dev.Pro

Dev.Pro

B2B • Fintech • SaaS

Dev. Pro is a software development partner that supports technology companies with custom outsourced software development services. With over 13 years of experience, a team of more than 900 experts, and operations in over 50 countries, Dev. Pro provides a comprehensive range of services including cloud development, DevOps, software testing and QA, system integration, and application security. The company caters to a wide array of industries such as digital commerce, fintech, hospitality, and healthcare by delivering tailored software development experiences. Dev. Pro emphasizes quality, innovation, and a transparent collaboration process to accelerate growth for ambitious startups and Fortune 500 enterprises alike, ensuring successful outcomes through a well-balanced and efficient team approach.

501 - 1000 employees

Founded 2011

🤝 B2B

💳 Fintech

☁️ SaaS

📋 Description

• Automate deployment, scaling, and lifecycle management of GPU clusters • Optimize HPC scheduling and AI workload orchestration, including job preemption and GPU affinity • Implement observability and monitoring across GPU, NVLink, InfiniBand, and storage layers • Ensure reliability and uptime through SLOs, error budgets, chaos testing, and automated remediation • Collaborate with teams to optimize performance, resources, and fault recovery at petascale

🎯 Requirements

• 5+ years as an SRE, DevOps, or HPC engineer in large-scale compute environments • Expertise in HPC workload managers (Slurm, PBS Pro, LSF) • Strong Python or Go skills for automation and observability • Infrastructure-as-code experience (Terraform, Ansible, Helm) • Kubernetes experience for AI workloads (vLLM, Ray, Triton Inference Server) • GPU resource management knowledge (MIG, NCCL, CUDA, containers) • Experience with storage systems (VAST, WEKA, DDN) and parallel filesystems (GPFS, Lustre) • Linux systems engineering, CI/CD, and configuration management skills • Strategic thinking with strong technical and business communication • Organization, autonomy, adaptability • Advanced English level • **Desirable:** • Exposure to BlueField DPU, NVSwitch, or Slurm-on-Kubernetes hybrid orchestration

Apply Now

Similar Jobs

October 1

AccelOne

51 - 200

🤝 B2B

Senior DevOps Engineer/Lead responsible for CI/CD and securing cloud environments while collaborating with engineers on a transformative project. Join a high-performing team at AccelOne to modernize mission-critical applications.

🇦🇷 Argentina – Remote

💰 $100k Seed Round on 2021-11

⏰ Full Time

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

October 1

InnovativeDev

11 - 50

🛍️ eCommerce

🤝 B2B

☁️ SaaS

Python Backend & DevOps role designing APIs and orchestrating distributed systems at Interinnova. Seeking a candidate with strong DevOps skills and 4 years of experience.

🇦🇷 Argentina – Remote

⏰ Full Time

🟡 Mid-level

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

🗣️🇪🇸 Spanish Required

September 29

Creative Chaos

201 - 500

🤝 B2B

☁️ SaaS

⚡ Productivity

Lead DevOps Architect building automated cloud CI/CD environments and infrastructure. Ensure security, reliability, and deployment automation while collaborating with engineering teams.

🇦🇷 Argentina – Remote

⏰ Full Time

🟠 Senior

🔴 Lead

⛑ DevOps & Site Reliability Engineer (SRE)

August 1

Particle41

51 - 200

☁️ SaaS

🤖 Artificial Intelligence

🏢 Enterprise

As a DevOps Engineer at Particle41, streamline software delivery and automate IT operations processes.

🇦🇷 Argentina – Remote

⏰ Full Time

🟡 Mid-level

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

July 31

Fever

1001 - 5000

👥 B2C

Join FeverUp as an SRE / Performance Engineer leveraging Kubernetes to solve performance issues in cloud environments.

🇦🇷 Argentina – Remote

💰 $110M Venture Round on 2023-01

⏰ Full Time

🟡 Mid-level

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

Developed by Lior Neu-ner. I'd love to hear your feedback — Get in touch via DM or support@remoterocketship.com