
11 - 50 employees
🤖 Artificial Intelligence
🤝 B2B
🔧 Hardware
🔥 Funding within the last year
💰 $15.1M Series A - Andromeda Robotics on 2025-09
Artificial Intelligence • B2B • Hardware
Andromeda is a GPU compute service and marketplace offering instant access to large clusters of H100, H200, and B200 accelerators for experiments, full-scale training, and inference. It supports orchestration with Slurm, Kubernetes, or direct SSH, provides flexible, no-minimum-duration usage and competitive pricing, and includes DevOps expertise, local NAS or streamed storage with no ingress/egress fees, and 24/7 support with industry SLAs. The company also operates a third-party GPU marketplace at gpulist. ai.
🕒 April 9
🏄 California – Remote
⏰ Full Time
🟠 Senior
⛑ DevOps & Site Reliability Engineer (SRE)
🦅 H1B Visa Sponsor
Improve your chances of getting an interview by checking your resume score before you apply.

11 - 50 employees
🤖 Artificial Intelligence
🤝 B2B
🔧 Hardware
🔥 Funding within the last year
💰 $15.1M Series A - Andromeda Robotics on 2025-09
Artificial Intelligence • B2B • Hardware
Andromeda is a GPU compute service and marketplace offering instant access to large clusters of H100, H200, and B200 accelerators for experiments, full-scale training, and inference. It supports orchestration with Slurm, Kubernetes, or direct SSH, provides flexible, no-minimum-duration usage and competitive pricing, and includes DevOps expertise, local NAS or streamed storage with no ingress/egress fees, and 24/7 support with industry SLAs. The company also operates a third-party GPU marketplace at gpulist. ai.
• Design and evolve multi-provider, multi-region GPU compute clusters optimized for large-scale training • Serve as the primary technical point of contact for customers running large-scale training workloads • Define SLOs and error budgets that account for the unique failure modes of GPU infrastructure • Ensure the health and performance of high-speed interconnects • Build deep visibility into GPU utilization, memory pressure, interconnect throughput • Build production-grade automation for cluster provisioning, GPU health checks, job scheduling • Lead incident response for complex failures spanning hardware, networking, orchestration
• Deep, hands-on experience operating large-scale GPU clusters (NVIDIA A100/H100/B200 or equivalent) • Production experience with InfiniBand, RoCE, or NVLink fabrics in the context of distributed training • Working knowledge of NCCL, CUDA, PyTorch distributed, DeepSpeed, Megatron, FSDP, or similar • Expert-level Linux knowledge • Strong experience running Kubernetes in production with GPU workloads • Strong engineering skills in Python, Go, or Bash • Hands-on experience building monitoring and alerting for GPU infrastructure • Proven track record leading incident response for complex distributed systems
• Health insurance • Retirement plans • Paid time off • Flexible work arrangements • Professional development
Apply Now🕒 April 9
SRE role focusing on turning fast-growing systems into predictable, reliable platforms. Join PostHog to build and automate infrastructure.
🕒 April 9
Senior Infrastructure Engineer/SRE responsible for building core infrastructure at AI-driven contact center company. Designing tools for developers and ensuring reliability across cloud platforms.
🇺🇸 United States – Remote
💵 $205k - $270k / year
⏰ Full Time
🟠 Senior
⛑ DevOps & Site Reliability Engineer (SRE)
🦅 H1B Visa Sponsor
🕒 April 9
Senior Software Engineer focusing on Mobile DevOps at Toast, creating innovative solutions for restaurant technology with a strong emphasis on AI tools and developer experience.
🇺🇸 United States – Remote
💵 $159k - $254k / year
⏰ Full Time
🟠 Senior
⛑ DevOps & Site Reliability Engineer (SRE)
🦅 H1B Visa Sponsor
🕒 April 9
Lead Site Reliability Engineer guiding reliability strategy and execution for modern multi-region SaaS platform. Focused on system design, incident management, and cross-team collaboration.
🇺🇸 United States – Remote
💵 $136k - $177k / year
⏰ Full Time
🟠 Senior
⛑ DevOps & Site Reliability Engineer (SRE)
🦅 H1B Visa Sponsor
🕒 April 8
Staff Software Engineer, Tech Lead focused on mobile DevOps at Toast, specializing in Android development and CI/CD processes for restaurant technology.
🇺🇸 United States – Remote
💵 $193k - $309k / year
⏰ Full Time
🟠 Senior
⛑ DevOps & Site Reliability Engineer (SRE)
🦅 H1B Visa Sponsor