Senior Site Reliability Engineer, Infrastructure

🕒 May 29

🇺🇸 United States – Remote

💵 $125k - $135k / year

⏰ Full Time

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

Apply Now
Find Similar Remote Jobs

📊 Check your resume score for this job

Improve your chances of getting an interview by checking your resume score before you apply.

Logo of Vultr

Vultr

201 - 500 employees

Founded 2014

🤖 Artificial Intelligence

🤝 B2B

🔧 Hardware

🔥 Funding within the last year

💰 $329M Debt Financing - Vultr on 2025-06

Artificial Intelligence • B2B • Hardware

Vultr is a global cloud infrastructure provider offering on-demand virtual machines, bare-metal servers, GPU-accelerated instances, managed databases, object and block storage, Kubernetes, and networking services. The platform emphasizes AI and HPC workloads with a broad selection of AMD and NVIDIA GPUs, fast networking, and 32+ data center regions, plus a marketplace of deployable apps and developer-friendly APIs. Vultr targets developers and businesses seeking affordable, scalable, and compliant cloud compute and storage alternatives to hyperscalers.

📋 Description

• Design and build the observability pipeline for datacenter infrastructure including CDUs, PDUs, bare metal servers, and provisioning workflows, collecting telemetry via Redfish, IPMI, SNMP, and OpenTelemetry. • Own the full stack from data collection through to visualization and alerting in Grafana, Loki, and Mimir. • Build dashboards and alerting that are actionable and meaningful for stakeholder teams including Datacenter Ops, SysAdmin, Network, and Provisioning. • Establish standards and patterns for how datacenter infrastructure telemetry is collected, stored, and visualized across Vultr's global footprint. • Partner closely with stakeholder teams to understand their operational needs and translate them into observable, measurable signals. • Drive infrastructure-as-code practices across the observability pipeline to ensure consistency, repeatability, and maintainability.

🎯 Requirements

• 5+ years of experience in site reliability, platform, or infrastructure engineering in a production environment. • Hands-on experience building and operating observability pipelines including metrics, logs, and alerting using Grafana, Loki, Mimir, or equivalent tooling. • Working knowledge of datacenter hardware telemetry protocols including Redfish, IPMI, and/or SNMP. • Strong Linux fundamentals and operational experience in production infrastructure environments. • Demonstrated experience with infrastructure-as-code and configuration management tooling (Terraform, Ansible, Chef or similar). • Strong cross-functional communication skills and experience delivering tooling for operational stakeholder teams.

🏖️ Benefits

• 100% company-paid insurance premiums for employee medical, dental and vision plans. • 401(k) plan that matches 100% up to 4%, with immediate vesting • Professional Development Reimbursement of $2,500 each year • 11 Holidays + Paid Time Off Accrual + Rollover Plan • Commitment matters to Vultr! Increased PTO at 3 year and 10 year anniversary + 1 month paid sabbatical every 5 years + Anniversary Bonus each year • $500 stipend for remote office setup in first year + $400 each following year • Internet reimbursement up to $75 per month • Gym membership reimbursement up to $50 per month • Company paid Wellable subscription

Apply Now

Similar Jobs

🕒 May 29

Onebrief

1 - 10

🏢 Enterprise

🏛️ Government

☁️ SaaS

Senior Release Engineer designing and improving CI/CD systems for military-focused collaboration software. Collaborating on deployment strategies and Kubernetes platform management in a remote-first environment.

🇺🇸 United States – Remote

💵 $180k - $200k / year

💰 $21M Venture Round on 2022-10

⏰ Full Time

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

🕒 May 29

PVcase

201 - 500

⚡ Energy

☁️ SaaS

🏢 Enterprise

DevOps/Platform Engineer managing AWS infrastructure and ensuring application performance. Collaborating with global teams to enhance operational workflows for the PVcase Prospect SaaS application.

🇺🇸 United States – Remote

💵 $126.6k - $180k / year

⏰ Full Time

🟡 Mid-level

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

🕒 May 29

Encoura

51 - 200

📚 Education

☁️ SaaS

🤝 B2B

A hands-on Azure DevOps Engineer at Encoura managing complex, cloud-native platforms for multiple enterprise products. Responsible for reliability, performance, and cloud cost optimization.

🇺🇸 United States – Remote

💵 $116k - $128.8k / year

⏰ Full Time

🟡 Mid-level

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

🕒 May 29

Innovative Solutions

51 - 200

☁️ SaaS

🤖 Artificial Intelligence

DevOps Engineer designing scalable, secure AWS infrastructure while collaborating with multiple clients. Responsibilities include implementing CI/CD pipelines and ensuring system reliability.

🕒 May 29

Primordial Labs

11 - 50

🤖 Artificial Intelligence

🏛️ Government

🚀 Aerospace

Field Deployment Engineer at Primordial Labs ensuring Anura performs in military operations. Bridging engineering and field operations while providing technical support for autonomous systems deployment.

🇺🇸 United States – Remote

💵 $120k - $200k / year

💰 $4M Seed Round - Primordial Labs on 2022-10

⏰ Full Time

🟡 Mid-level

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)