
Cloud Computing • Artificial Intelligence
Vultr is a cloud infrastructure provider offering a wide range of services including compute instances, storage, managed databases, and GPU clusters. The company focuses on providing high-performance and accessible cloud solutions, leveraging both AMD and NVIDIA technologies to power applications in artificial intelligence, high-performance computing, and general workloads. Vultr offers services that are designed to be simpler and more cost-effective than major competitors like AWS, GCP, and Azure, with global data center locations to support diverse deployment needs.
Yesterday
🇺🇸 United States – Remote
💵 $140k - $150k / year
⏰ Full Time
🟡 Mid-level
🟠 Senior
⛑ DevOps & Site Reliability Engineer (SRE)

Cloud Computing • Artificial Intelligence
Vultr is a cloud infrastructure provider offering a wide range of services including compute instances, storage, managed databases, and GPU clusters. The company focuses on providing high-performance and accessible cloud solutions, leveraging both AMD and NVIDIA technologies to power applications in artificial intelligence, high-performance computing, and general workloads. Vultr offers services that are designed to be simpler and more cost-effective than major competitors like AWS, GCP, and Azure, with global data center locations to support diverse deployment needs.
• Own the NetDevOps roadmap — spanning automation, observability, configuration validation, telemetry ingestion, and operational tooling for the global network. • Manage and grow a high-performing team of NetDevOps Engineers, providing technical guidance, career development, and hands-on mentorship. • Drive automation for complex environments, including EVPN-VXLAN data center fabrics, RoCEv2 lossless Ethernet, and global WAN/edge infrastructure. • Build and evolve operator tooling for Network Operations (Tier 1/2) including event correlation, intent validation, playbooks, and automated remediation workflows. • Ensure operational excellence across fleet-wide updates, config management, CI/CD pipelines, and reliability metrics for automation systems. • Partner closely with Cloud Networking (who own front-end networking, VPC automation, dataplane behavior) to unify automation interfaces and ensure clean separation of responsibilities. • Collaborate with Architecture, Platform, and GPU/AI Engineering on next-generation fabric design, automation hooks, observability, and provisioning flows. • Standardize telemetry ingestion and correlation pipelines (gNMI, Kafka, Prometheus, custom collectors) to generate actionable, real-time insights into network behavior. • Lead complex investigations across routing, switching, RDMA transport behavior, congestion, ECMP, and overlay/underlay interactions, especially where tooling or automation must evolve. • Define engineering standards, SLIs/SLOs for automation services, and operational maturity goals (testing, documentation, failure modes).
• Strong experience building and leading high-performing engineering teams (NetDevOps, SRE, automation, or network engineering groups). • Deep understanding of modern data center networking: EVPN-VXLAN, BGP, QoS, telemetry, and config automation. • Familiarity with RoCEv2/RDMA fabrics, PFC/ECN tuning, congestion management, or GPU/AI fabric operations. • Hands-on experience with automation ecosystems - Ansible, Python, Go, Rust, CI/CD pipelines, config linting, and intent validation frameworks. • Experience integrating automation with a Source-of-Truth (NetBox, Nautobot, OpsMill, homegrown systems). • Strong understanding of telemetry and monitoring stacks (Prometheus/Grafana, Kafka, OpenTelemetry, custom collectors). • Ability to dive deep into Linux networking internals, namespaces, netlink, and distributed systems behavior. • Proven experience delivering reliable automation services at scale, with strong fundamentals in testing, versioning, rollback, and change management.
• 100% company-paid insurance premiums for employee medical, dental and vision plans. • 401(k) plan that matches 100% up to 4%, with immediate vesting • Professional Development Reimbursement of $2,500 each year • 11 Holidays + Paid Time Off Accrual + Rollover Plan • Commitment matters to Vultr! Increased PTO at 3 year and 10 year anniversary + 1 month paid sabbatical every 5 years + Anniversary Bonus each year • $500 stipend for remote office setup in first year + $400 each following year • Internet reimbursement up to $75 per month • Gym membership reimbursement up to $50 per month • Company paid Wellable subscription
Apply NowYesterday
PREEvision Release Engineer & Technical Writer at requisimus managing document exports and quality assurance processes for IT consulting projects. Collaborating on various projects in an open multicultural team.
🗣️🇩🇪 German Required
Python
VBA
Yesterday
Senior DevOps Engineer optimizing cloud infrastructure operations for leading cybersecurity firm. Collaborating with scrum teams, managing CI/CD processes, and maintaining cloud infrastructure across providers.
🇺🇸 United States – Remote
💵 $130k - $150k / year
⏰ Full Time
🟠 Senior
⛑ DevOps & Site Reliability Engineer (SRE)
🦅 H1B Visa Sponsor
AWS
Azure
Cloud
Google Cloud Platform
Jenkins
Kubernetes
Prometheus
Terraform
Yesterday
Product Owner role maximizing agile team value at Velera, a fintech solutions provider for credit unions. Overseeing product vision, backlogs, and ensuring high-quality delivery.
🇺🇸 United States – Remote
💵 $95.8k - $124.5k / year
⏰ Full Time
🟠 Senior
🔴 Lead
⛑ DevOps & Site Reliability Engineer (SRE)
Azure
Yesterday
Responsible for ensuring the optimal performance and up-time of Akamai's critical security products. Analyzing system performance and developing tools for monitoring and alerting.
🇺🇸 United States – Remote
💵 $106.6k - $221.4k / year
💰 Post-IPO Equity on 2001-07
⏰ Full Time
🟠 Senior
⛑ DevOps & Site Reliability Engineer (SRE)
🦅 H1B Visa Sponsor
Azure
Cloud
Distributed Systems
Jenkins
Kubernetes
Python
Terraform
Go
2 days ago
Site Reliability Engineer responsible for NBCU's Distribution Engineering monitoring and control systems. Utilizing automation and on-call support, to ensure high availability.
🇺🇸 United States – Remote
💵 $110k - $145k / year
⏰ Full Time
🟡 Mid-level
🟠 Senior
⛑ DevOps & Site Reliability Engineer (SRE)
🦅 H1B Visa Sponsor
Ansible
AWS
Azure
Chef
Cloud
Docker
Google Cloud Platform
Grafana
Kubernetes
Linux
Node.js
Python
React
SaltStack
Splunk
Terraform
TypeScript