Infrastructure Engineer, Observability

August 23

Apply Now
Logo of Voltage Park

Voltage Park

Artificial Intelligence • Enterprise

Voltage Park is a company that provides high-performance AI compute infrastructure. They offer bare metal access, transparent pricing, and exceptional customer service for demanding workloads that rely on advanced hardware such as NVIDIA HGX H100 GPUs and state-of-the-art data centers. Voltage Park is committed to delivering fast, flexible, and scalable compute solutions, with a focus on AI training, model fine-tuning, and real-time inference. Security and compliance are paramount, with top-tier firewalls and rigorous security protocols in place. Their infrastructure is designed for reliability, leveraging high-speed networks and advanced data centers to ensure top-notch performance and support for their customers.

2 - 10 employees

🤖 Artificial Intelligence

🏢 Enterprise

đź“‹ Description

• Design and operate systems managing thousands of bare-metal servers, GPUs, and high-performance networks across multiple data centers • Design, build, and maintain observability platforms spanning metrics, logs, traces, and events • Create dashboards and alerting for internal stakeholders and scoped visibility for external customers • Ingest and correlate telemetry from GPUs, CPUs, networking (Ethernet & InfiniBand), containers, APIs, and BMC/Redfish • Implement noise-resistant alerting pipelines that improve detection and reduce operational load • Collaborate with infrastructure, platform, and customer-facing teams to embed observability into workflows • Contribute to broader infrastructure engineering projects beyond observability • Fully remote position requiring candidates to be based in the continental United States; no visa sponsorship

🎯 Requirements

• 8+ years in infrastructure engineering, SRE, or observability roles • Strong experience with monitoring systems (Prometheus, Grafana, ELK, VictoriaMetrics, or similar) • Proficiency in Python, Go, or bash for automation and data integration • Familiarity with container/Kubernetes observability • Understanding of streaming telemetry pipelines (Kafka, OTEL, Promtail, or equivalent) • Strong written and verbal communication skills • Experience with GPU observability, particularly NVIDIA DCGM (ideal) • Designing multi-tenant observability solutions with RBAC and scoped queries (ideal) • Prior work with correlation engines for RCA, forecasting, or predictive alerting (ideal) • Broader exposure to infrastructure domains (networking, storage, provisioning) (ideal)

🏖️ Benefits

• Offers Equity • Offers Bonus • Full benefits • 5% 401k match • Comprehensive health insurance with 100% of premiums covered by Voltage Park

Apply Now

Similar Jobs

August 20

Priority

501 - 1000

đź›’ Retail

🛍️ eCommerce

👥 B2C

AWS Infrastructure Developer implementing and automating AWS infrastructure for Priority IDC. Responsible for IaC, CI/CD, monitoring, cost optimization, and security.

🇺🇸 United States – Remote

⏰ Full Time

🟡 Mid-level

đźź  Senior

đź‘· Infrastructure Engineer

🦅 H1B Visa Sponsor

August 19

Business Wire

501 - 1000

📱 Media

Builds and automates IT infrastructure at Business Wire; supports 24x7 operations and on-call rotations.

🇺🇸 United States – Remote

đź’µ $135k - $145k / year

⏰ Full Time

đźź  Senior

đź‘· Infrastructure Engineer

August 14

Unit 410

11 - 50

₿ Crypto

đź”’ Cybersecurity

đź’¸ Finance

Crypto infra engineer at Unit 410.\nLaunch secure networks; build scalable infra.

🇺🇸 United States – Remote

đź’µ $150k - $200k / year

⏰ Full Time

🟡 Mid-level

đźź  Senior

đź‘· Infrastructure Engineer

August 9

Liquid AI

51 - 200

🤖 Artificial Intelligence

🤝 B2B

🏢 Enterprise

Help create a high-performance training infrastructure for AI models, enabling breakthroughs in capabilities.

🇺🇸 United States – Remote

⏰ Full Time

đź”´ Lead

đź‘· Infrastructure Engineer

August 8

Roboflow

11 - 50

🤖 Artificial Intelligence

Join Roboflow as an Infrastructure Engineer, ensuring robust cloud security and reliability.

🇺🇸 United States – Remote

đź’µ $180k - $200k / year

⏰ Full Time

🟡 Mid-level

đźź  Senior

đź‘· Infrastructure Engineer

Developed by Lior Neu-ner. I'd love to hear your feedback — Get in touch via DM or support@remoterocketship.com