
11 - 50 employees
Founded 2017
🤖 Artificial Intelligence
☁️ SaaS
💰 $100M Debt Financing on 2022-12
Artificial Intelligence • Cloud Computing • SaaS
CoreWeave is a cloud service provider that specializes in purpose-built infrastructure designed for AI workloads. Known as the AI Hyperscaler™, CoreWeave offers a range of products including GPU and CPU compute services, storage solutions, and networking services optimized for deep learning, AI model training, and rendering applications. With a robust cloud platform, CoreWeave simplifies complex infrastructure management, ensuring reliability, scalability, and high-performance computing suitable for leading AI labs and enterprises.
🔥 0 minutes ago
Improve your chances of getting an interview by checking your resume score before you apply.

11 - 50 employees
Founded 2017
🤖 Artificial Intelligence
☁️ SaaS
💰 $100M Debt Financing on 2022-12
Artificial Intelligence • Cloud Computing • SaaS
CoreWeave is a cloud service provider that specializes in purpose-built infrastructure designed for AI workloads. Known as the AI Hyperscaler™, CoreWeave offers a range of products including GPU and CPU compute services, storage solutions, and networking services optimized for deep learning, AI model training, and rendering applications. With a robust cloud platform, CoreWeave simplifies complex infrastructure management, ensuring reliability, scalability, and high-performance computing suitable for leading AI labs and enterprises.
• We’re seeking a talented and experienced Senior Engineer for Network Observability to join our Network Observability team. In this role, you will be a key player in designing, developing, and maintaining the monitoring, telemetry, and observability systems that keep CoreWeave’s GPU cloud network operating reliably and at scale. • You’ll focus on building solutions that provide real-time insights into network performance, ensuring that issues are detected proactively and resolved quickly. • Develop, optimize, and maintain network observability platforms. Use your skills in Python and Golang to create and automate collectors, exporters, and dashboards that provide deep visibility into network health and performance. • Collaborate with Network Engineering and Platform teams to ingest and unify logs, metrics, and events from a variety of platforms (Arista EOS, NVIDIA Cumulus Linux, Nokia SR OS, SR Linux, etc.) into a single observability pipeline. • Design and implement scalable telemetry solutions using protocols like gNMI, SNMP, and streaming analytics. Ensure advanced alerting and anomaly detection with tools such as Prometheus, Grafana, and Alertmanager. • Work closely with network developers, site reliability engineers, and security teams to integrate observability solutions across the broader infrastructure. • Participate in design discussions, RFCs, and architectural decisions. • Join a rotating on-call schedule to troubleshoot and resolve observability-related issues. Provide timely support to operations teams, quickly isolating and fixing problems when they arise. • Guide junior team members, share best practices, and foster a culture of continuous learning and improvement within the observability domain.
• Deep familiarity with Prometheus, Grafana, Alertmanager, gNMI, and SNMP. Experience writing or extending custom metric collectors/exporters is a plus. • Experience as a Network Engineer, SRE, Software Developer, or Systems Administrator in large-scale environments. A track record of building and operating robust telemetry and monitoring solutions is a plus. • Passion for automating tasks and processes. You find satisfaction in creating workflows that handle repetitive tasks and reduce human error to near zero. • Comfortable containerizing solutions in Kubernetes, designing, building, and deploying container-based workloads efficiently. • Proficient with Python, Go, and Bash, plus familiarity with configuration management and templating tools (e.g., Ansible, Jinja2). . • Strong knowledge of Linux systems and IP networking concepts, with hands-on experience in routing, switching, and network troubleshooting. • Practical knowledge with a variety of platforms, including Arista EOS, NVIDIA Cumulus Linux, Nokia SR OS, and SR Linux. • Collaborative, humble, and always ready to help others while staying open to learning from more senior colleagues.
• Family-level Medical Insurance • Family-level Dental Insurance • Generous Pension Contribution • Life Assurance at 4x Salary • Critical Illness Cover • Employee Assistance Programme • Tuition Reimbursement • Work culture focused on innovative disruption
Apply Now🔥 6 hours ago
Lead Full-Stack Engineer at PlannerPal designing and developing solutions for financial tech. Oversee a squad of engineers while enhancing platform architecture.
🔥 6 hours ago
Sustaining Engineering Lead managing complex technical issues in data management with a proactive approach. Leading team of engineers to resolve escalated technical challenges.
🔥 23 hours ago
Senior Software Engineer enhancing the KnowItAll platform using C++. Collaborate with engineers and domain experts for legacy systems improvements.
🇬🇧 United Kingdom – Remote
💵 £44.2k - £63.4k / year
⏰ Full Time
🟠 Senior
🧑💻 Full-stack Engineer
🇬🇧 UK Skilled Worker Visa Sponsor
🕒 Yesterday
Senior Software Engineer responsible for Firstup's innovative public APIs and third-party integrations. Collaborate within product engineering to enhance employee experiences across enterprise systems.
🕒 Yesterday
Full Stack Engineer designing, producing, and implementing software solutions in a permanent feature team. Collaborating across business, applications, data, and infrastructure domains using Agile methods.