Post a Job Affiliates

Search Remote Jobs

Vultr

Website LinkedIn All Job Openings

Cloud Computing • Artificial Intelligence

Vultr is a cloud infrastructure provider offering a wide range of services including compute instances, storage, managed databases, and GPU clusters. The company focuses on providing high-performance and accessible cloud solutions, leveraging both AMD and NVIDIA technologies to power applications in artificial intelligence, high-performance computing, and general workloads. Vultr offers services that are designed to be simpler and more cost-effective than major competitors like AWS, GCP, and Azure, with global data center locations to support diverse deployment needs.

51 - 200 employees

Founded 2014

🤖 Artificial Intelligence

Senior Site Reliability Engineer, Core Cloud Engineering

November 5

🇺🇸 United States – Remote

💵 $120k - $130k / year

⏰ Full Time

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

Distributed Systems

Grafana

Linux

MySQL

PHP

Puppet

Apply Now

Vultr

Website LinkedIn All Job Openings

Cloud Computing • Artificial Intelligence

51 - 200 employees

Founded 2014

🤖 Artificial Intelligence

📋 Description

• Operate and scale Vultr's control plane, ensuring availability, correctness, and performance across global datacenters. • Design, implement, and maintain automation to manage hypervisor fleets (KVM, QEMU, libvirt) and supporting infrastructure at scale. • Develop tooling and automation for Open vSwitch (OVS), BGP routing, and other networking components to ensure resilient and self-healing network operations. • Continuously analyze and improve system performance across compute, storage, and network layers, with an emphasis on reducing toil and eliminating single points of failure. • Implement advanced monitoring, logging, and tracing solutions (Grafana, Sentry, SumoLogic) while leading incident response to minimize impact and drive postmortem culture. • Maintain and evolve infrastructure pipelines (GitLab CI/CD, Puppet) to enable safe, fast, and reliable changes to both control plane and hypervisor infrastructure. • Work closely with Software Engineers, Network Engineers, and Product teams to align platform reliability with business and user needs. • Produce clear technical documentation for runbooks, operational procedures, and automation frameworks to improve team efficiency and reliability standards. • Coach and mentor team members in best practices for site reliability, incident handling, automation, and low-level Linux systems debugging.

🎯 Requirements

• Proficiency in PHP with strong scripting and automation skills. • Experience running large-scale distributed systems and control plane infrastructure in production. • Strong background in hypervisor technologies (libvirt, QEMU, KVM) and Linux systems administration. • Expertise in networking protocols and tools, particularly BGP and Open vSwitch (OVS), with automation experience. • Deep knowledge of observability and monitoring frameworks (Grafana, Sentry, SumoLogic) and incident management. • Advanced troubleshooting skills across compute, networking, and storage subsystems. • Experience building and maintaining CI/CD pipelines (GitLab) and configuration management (Puppet). • Familiarity with MySQL or similar databases, with an understanding of operational considerations for reliability and scale. • Strong problem-solving abilities and the drive to tackle complex, low-level reliability challenges. • Effective cross-team communication and collaboration skills. • A commitment to continuous improvement and fostering a culture of operational excellence.

🏖️ Benefits

• 100% company-paid insurance premiums for employee medical, dental and vision plans. • 401(k) plan that matches 100% up to 4%, with immediate vesting • Professional Development Reimbursement of $2,500 each year • 11 Holidays + Paid Time Off Accrual + Rollover Plan • Increased PTO at 3 year and 10 year anniversary + 1 month paid sabbatical every 5 years + Anniversary Bonus each year • $500 stipend for remote office setup in first year + $400 each following year • Internet reimbursement up to $75 per month • Gym membership reimbursement up to $50 per month • Company paid Wellable subscription

Apply Now

Similar Jobs

Site Reliability Engineer

November 5

Leidos

10,000+ employees

🔒 Cybersecurity

🔬 Science

Website LinkedIn All Job Openings

Site Reliability Engineer at Leidos ensuring systems meet reliability standards for the US Space Force. Developing test plans and risk management strategies in a hybrid Microsoft Azure environment.

🇺🇸 United States – Remote

💵 $85.2k - $153.9k / year

⏰ Full Time

🟡 Mid-level

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

🦅 H1B Visa Sponsor

Azure

Cloud

Apply

View Job

Senior / Staff Site Reliability Engineer

November 5

Kindred

1001 - 5000

🤝 B2B

Website LinkedIn All Job Openings

Site Reliability Engineer developing AWS infrastructure for a community-driven home swapping network. Leading cloud architecture and enhancing developer productivity with internal tools.

🇺🇸 United States – Remote

💵 $170k - $220k / year

⏰ Full Time

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

🦅 H1B Visa Sponsor

AWS

Cloud

Docker

EC2

JavaScript

Kubernetes

Python

Terraform

TypeScript

Apply

View Job

Senior/Staff Cloud Operations Engineer

November 5

Kindred

1001 - 5000

🤝 B2B

Website LinkedIn All Job Openings

Cloud Operations Engineer specializing in AWS infrastructure for a members-only home swapping network. Leading infrastructure decisions and ensuring scalable and robust cloud architecture.

🇺🇸 United States – Remote

💵 $170k - $220k / year

⏰ Full Time

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

🦅 H1B Visa Sponsor

AWS

Cloud

Docker

EC2

JavaScript

Kubernetes

Python

Terraform

TypeScript

Apply

View Job

Senior Site Reliability Engineer, SRE

November 4

Cribl

501 - 1000

☁️ SaaS

Website LinkedIn All Job Openings

Senior Site Reliability Engineer unlocking the value of observability data for Cribl. Engaging with teams to improve service delivery and reliability in cloud environments.

🇺🇸 United States – Remote

💵 $180k - $240k / year

⏰ Full Time

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

🦅 H1B Visa Sponsor

Ansible

AWS

Chef

Cloud

Grafana

JavaScript