Network DevOps Engineer, RDMA Fabric Automation

Job not on LinkedIn

🕒 February 25

🇺🇸 United States – Remote

💵 $90k - $130k / year

⏰ Full Time

🟡 Mid-level

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

Apply Now
Find Similar Remote Jobs

📊 Check your resume score for this job

Improve your chances of getting an interview by checking your resume score before you apply.

Logo of Vultr

Vultr

201 - 500 employees

Founded 2014

🤖 Artificial Intelligence

🤝 B2B

🔧 Hardware

🔥 Funding within the last year

💰 $329M Debt Financing - Vultr on 2025-06

Artificial Intelligence • B2B • Hardware

Vultr is a global cloud infrastructure provider offering on-demand virtual machines, bare-metal servers, GPU-accelerated instances, managed databases, object and block storage, Kubernetes, and networking services. The platform emphasizes AI and HPC workloads with a broad selection of AMD and NVIDIA GPUs, fast networking, and 32+ data center regions, plus a marketplace of deployable apps and developer-friendly APIs. Vultr targets developers and businesses seeking affordable, scalable, and compliant cloud compute and storage alternatives to hyperscalers.

📋 Description

• Automate deployment and operations of large-scale RDMA (RoCEv2) Ethernet fabrics across Vultr data centers. • Build Ansible and Python-based frameworks to provision, validate, and remediate underlay and overlay networks. • Integrate network automation with Vultr’s source-of-truth systems (NetBox, OpsMill) for intent-driven configuration and validation. • Develop telemetry ingestion and correlation pipelines (gNMI, Prometheus, Kafka, custom collectors) for real-time network health and performance metrics. • Collaborate with platform, orchestration, and product engineering teams to optimize RDMA performance, PFC/ECN behavior, and path symmetry across fabrics. • Implement CI/CD workflows for network configuration changes — validation, pre-checks, and rollbacks. • Investigate complex network behaviors across layers — flow hashing, congestion domains, ECMP, and overlay interactions. • Contribute to the design of next-generation GPU and AI interconnect fabrics, ensuring seamless integration into Vultr’s global network architecture.

🎯 Requirements

• Solid understanding of modern data center networking: EVPN-VXLAN, BGP, MLAG, QoS, and traffic engineering. • Deep familiarity with RoCEv2, RDMA transport tuning, ECN/PFC, and lossless Ethernet design. • Strong experience with automation frameworks like Ansible, and languages like Python, Golang, Rust, or PHP • Comfort working with telemetry and monitoring stacks — Prometheus, Grafana, Loki, ELK, or similar. • Previous experience integrating with NetBox, Nautobot, OpsMill or similar for topology and configuration source-of-truth. • Familiarity with CI/CD systems (GitHub Actions, Jenkins, ArgoCD) for continuous delivery of network automation. • Strong Linux networking background, including namespaces, netlink, and system-level debugging.

🏖️ Benefits

• 100% company-paid insurance premiums for employee medical, dental and vision plans. • 401(k) plan that matches 100% up to 4%, with immediate vesting • Professional Development Reimbursement of $2,500 each year • 11 Holidays + Paid Time Off Accrual + Rollover Plan • Increased PTO at 3 year and 10 year anniversary + 1 month paid sabbatical every 5 years + Anniversary Bonus each year • $500 stipend for remote office setup in first year + $400 each following year • Internet reimbursement up to $75 per month • Gym membership reimbursement up to $50 per month • Company paid Wellable subscription

Apply Now

Similar Jobs

🕒 February 25

Tactibit Technologies

11 - 50

🔒 Cybersecurity

🏛️ Government

DevOps Engineer working at Tactibit Technologies to modernize legacy architectures for mission-critical systems. Collaborate with teams on cloud migrations and automating business processes.

🇺🇸 United States – Remote

⏰ Full Time

🟡 Mid-level

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

🕒 February 25

Outlive

11 - 50

🧘 Wellness

🛍️ eCommerce

👥 B2C

DevOps/Infrastructure Engineer managing Google Cloud Platform infrastructure for a health optimization product. Focused on building secure, scalable environments with strong industry standards.

🇺🇸 United States – Remote

💵 $125k - $200k / year

⏰ Full Time

🟡 Mid-level

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

🕒 February 25

DroneUp

51 - 200

🚀 Aerospace

☁️ SaaS

🤝 B2B

SRE - Platform Engineer at DroneUp focusing on IT infrastructure reliability and scalability. Driving SRE best practices within the team and collaborating on cloud engineering solutions.

🇺🇸 United States – Remote

💵 $125k - $150k / year

💰 $241.2k Seed Round - DroneUp on 2022-07

⏰ Full Time

🟠 Senior

🔴 Lead

⛑ DevOps & Site Reliability Engineer (SRE)

🦅 H1B Visa Sponsor

info

🕒 February 23

Filevine

201 - 500

☁️ SaaS

🤖 Artificial Intelligence

Senior DBRE managing performance and scalability of data platform at Filevine, a legal AI company. Focus on AI-driven automation, optimizing SQL Server and Postgres environments.

🇺🇸 United States – Remote

💵 $145k - $180k / year

💰 $108M Series D on 2022-04

⏰ Full Time

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

🦅 H1B Visa Sponsor

info

🕒 February 17

Agile Defense

501 - 1000

🏛️ Government

🔒 Cybersecurity

DevSecOps Engineer building secure software delivery systems for national security missions. Seeking a builder with 3–5 years of relevant experience and a proactive approach to integration challenges.

🇺🇸 United States – Remote

⏰ Full Time

🟡 Mid-level

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)