Senior Site Reliability Engineer, Core Cloud Engineering

201 - 500 employees

Founded 2014

🤖 Artificial Intelligence

🤝 B2B

🔧 Hardware

🔥 Funding within the last year

💰 $329M Debt Financing - Vultr on 2025-06

Artificial Intelligence • B2B • Hardware

Vultr is a global cloud infrastructure provider offering on-demand virtual machines, bare-metal servers, GPU-accelerated instances, managed databases, object and block storage, Kubernetes, and networking services. The platform emphasizes AI and HPC workloads with a broad selection of AMD and NVIDIA GPUs, fast networking, and 32+ data center regions, plus a marketplace of deployable apps and developer-friendly APIs. Vultr targets developers and businesses seeking affordable, scalable, and compliant cloud compute and storage alternatives to hyperscalers.

Senior Site Reliability Engineer, Core Cloud Engineering

🕒 February 25

🇺🇸 United States – Remote

💵 $120k - $130k / year

⏰ Full Time

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

Distributed Systems

Grafana

Linux

MySQL

PHP

Puppet

Apply Now

Find Similar Remote Jobs

📊 Check your resume score for this job

Improve your chances of getting an interview by checking your resume score before you apply.

Vultr

201 - 500 employees

Founded 2014

🤖 Artificial Intelligence

🤝 B2B

🔧 Hardware

🔥 Funding within the last year

💰 $329M Debt Financing - Vultr on 2025-06

Artificial Intelligence • B2B • Hardware

📋 Description

• Operate and scale Vultr’s control plane, ensuring availability, correctness, and performance across global datacenters. • Design, implement, and maintain automation to manage hypervisor fleets (KVM, QEMU, libvirt) and supporting infrastructure at scale. • Develop tooling and automation for Open vSwitch (OVS), BGP routing, and other networking components to ensure resilient and self-healing network operations. • Continuously analyze and improve system performance across compute, storage, and network layers, with an emphasis on reducing toil and eliminating single points of failure. • Implement advanced monitoring, logging, and tracing solutions (Grafana, Sentry, SumoLogic) while leading incident response to minimize impact and drive postmortem culture. • Maintain and evolve infrastructure pipelines (GitLab CI/CD, Puppet) to enable safe, fast, and reliable changes to both control plane and hypervisor infrastructure. • Work closely with Software Engineers, Network Engineers, and Product teams to align platform reliability with business and user needs. • Produce clear technical documentation for runbooks, operational procedures, and automation frameworks to improve team efficiency and reliability standards. • Coach and mentor team members in best practices for site reliability, incident handling, automation, and low-level Linux systems debugging.

🎯 Requirements

• Proficiency in PHP with strong scripting and automation skills. • Experience running large-scale distributed systems and control plane infrastructure in production. • Strong background in hypervisor technologies (libvirt, QEMU, KVM) and Linux systems administration. • Expertise in networking protocols and tools, particularly BGP and Open vSwitch (OVS), with automation experience. • Deep knowledge of observability and monitoring frameworks (Grafana, Sentry, SumoLogic) and incident management. • Advanced troubleshooting skills across compute, networking, and storage subsystems. • Experience building and maintaining CI/CD pipelines (GitLab) and configuration management (Puppet). • Familiarity with MySQL or similar databases, with an understanding of operational considerations for reliability and scale. • Strong problem-solving abilities and the drive to tackle complex, low-level reliability challenges. • Effective cross-team communication and collaboration skills. • A commitment to continuous improvement and fostering a culture of operational excellence.

🏖️ Benefits

• Excellent Medical Benefits w/ 100% company paid premiums for employee only plan + 100% company paid dental & vision premiums • 401(k) plan that matches 100% up to 4% with immediate vesting • Professional Development Reimbursement of $2,500 each year • 11 Holidays + Paid Time Off Accrual + Rollover Plan • Increased PTO at 3 year & 10 year anniversary + 1 month paid sabbatical every 5 years + Anniversary Bonus each year • $500 first year remote office setup + $400 each following year for new equipment • Internet reimbursement up to $75 per month • Gym membership reimbursement up to $50 per month • Company paid Wellable subscription

Apply Now

Similar Jobs

DevOps Engineer – Mission-Critical Systems

🕒 February 25

Tactibit Technologies

11 - 50

🔒 Cybersecurity

🏛️ Government

DevOps Engineer working at Tactibit Technologies to modernize legacy architectures for mission-critical systems. Collaborate with teams on cloud migrations and automating business processes.

🇺🇸 United States – Remote

⏰ Full Time

🟡 Mid-level

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

AWS

Cloud

DevOps – Infrastructure Engineer, GCP

🕒 February 25

Outlive

11 - 50

🧘 Wellness

🛍️ eCommerce

👥 B2C

DevOps/Infrastructure Engineer managing Google Cloud Platform infrastructure for a health optimization product. Focused on building secure, scalable environments with strong industry standards.

🇺🇸 United States – Remote

💵 $125k - $200k / year

⏰ Full Time

🟡 Mid-level

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

Cloud

Google Cloud Platform

Postgres

SQL

Terraform

SRE – Platform Engineer

🕒 February 25

DroneUp

51 - 200

🚀 Aerospace

☁️ SaaS

🤝 B2B

SRE - Platform Engineer at DroneUp focusing on IT infrastructure reliability and scalability. Driving SRE best practices within the team and collaborating on cloud engineering solutions.

🇺🇸 United States – Remote

💵 $125k - $150k / year

💰 $241.2k Seed Round - DroneUp on 2022-07

⏰ Full Time

🟠 Senior

🔴 Lead

⛑ DevOps & Site Reliability Engineer (SRE)

🦅 H1B Visa Sponsor

AWS

Azure

Cloud

Google Cloud Platform

Grafana

Kubernetes

Linux

MacOS

Node.js

Prometheus

Python

Terraform

Unix

Senior Database Reliability Engineer

🕒 February 23

Filevine

201 - 500

☁️ SaaS

🤖 Artificial Intelligence

Senior DBRE managing performance and scalability of data platform at Filevine, a legal AI company. Focus on AI-driven automation, optimizing SQL Server and Postgres environments.

🇺🇸 United States – Remote

💵 $145k - $180k / year

💰 $108M Series D on 2022-04

⏰ Full Time

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

🦅 H1B Visa Sponsor

AWS

Docker

DynamoDB

Entity Framework

Kubernetes

MS SQL Server

Postgres

Python

Redis

SQL

Terraform

DevSecOps Engineer

🕒 February 17

Agile Defense

501 - 1000

🏛️ Government

🔒 Cybersecurity

DevSecOps Engineer building secure software delivery systems for national security missions. Seeking a builder with 3–5 years of relevant experience and a proactive approach to integration challenges.

🇺🇸 United States – Remote

⏰ Full Time

🟡 Mid-level

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

Cloud

Kubernetes

SDLC

Terraform

Vault