
201 - 500 employees
Founded 2024
🤖 Artificial Intelligence
🤝 B2B
☁️ SaaS
🔥 Funding within the last year
💰 $433M Series C - Nscale on 2025-09
Artificial Intelligence • B2B • SaaS
Nscale is a full‑stack AI infrastructure company that provides GPU-powered cloud and on-premise capacity, sovereign and sustainable data centers, and managed platform services for large-scale model training, fine-tuning, and inference. It offers bare-metal and VM instances, managed Kubernetes and Slurm orchestration, high-throughput storage and low-latency networking, plus fleet operations, observability, and APIs for real-time GPU resource governance. Nscale serves enterprise AI teams and labs with modular, multi-megawatt data center deployments and turnkey AI-native services to accelerate R&D and production at scale.
🕒 February 25
Improve your chances of getting an interview by checking your resume score before you apply.

201 - 500 employees
Founded 2024
🤖 Artificial Intelligence
🤝 B2B
☁️ SaaS
🔥 Funding within the last year
💰 $433M Series C - Nscale on 2025-09
Artificial Intelligence • B2B • SaaS
Nscale is a full‑stack AI infrastructure company that provides GPU-powered cloud and on-premise capacity, sovereign and sustainable data centers, and managed platform services for large-scale model training, fine-tuning, and inference. It offers bare-metal and VM instances, managed Kubernetes and Slurm orchestration, high-throughput storage and low-latency networking, plus fleet operations, observability, and APIs for real-time GPU resource governance. Nscale serves enterprise AI teams and labs with modular, multi-megawatt data center deployments and turnkey AI-native services to accelerate R&D and production at scale.
• You’ll join the Support duty rotation and, as a Senior, will collaborate with Engineering on incidents and changes. • Proactively improve dashboards, alerts, and runbooks to prevent repeat incidents. • Contribute to knowledge sharing across Operations and Engineering, including training content, workshops, and PR reviews. Drive to upskill - better the team and yourself. • Accurately record, update, manage and resolve tickets using the call tracking system whilst keeping all parties (internal or external) informed of the tickets progression via phone and email. • Demonstrate a solid understanding of the underlying Platform to our customers and providing assistance in helping them leverage the service and products • Respond to incoming monitoring alerts, resolving or escalating as required in accordance with priorities and agreed service levels • Take decisive actions, and calculated risks, on technically complex incidents and tasks to ensure business speed and efficiency. • Lead by earning trust, speaking candidly, and benchmark against the best to identify where we can improve. • Disagree when appropriate and challenge the status quo. Commit wholly to decisions and plans once in motion. Be a technical expert, and drive the team to make the best decisions. • Deliver project tasks, improvements, and technical assessments in the right quality in a timely fashion. • Handle escalated customer support issues, providing solutions aligned with business SLA requirements • Design and implement automation scripts and tools to optimize processes. • Conduct root cause analysis for major incidents and recommend long-term fixes. • Collaborate with cross-functional teams for service improvements • Responding to critical incidents during out of business hours, and be on-call as required.
• Ability to adapt to customer-driven demands, such as providing specialist support after core business hours, with availability to travel to Nscale or Customer locations to provide onsite technical expertise and guidance. • Disciplined, organised and self-motivated. Able to motivate, support and mentor other team members • Strong leadership principals, with a bias for taking decisive action, working independently, and driving the team and wider organisation to improve. • Understanding of how datacenters operate and the core datacentre technologies: Servers, Networks, Storage and Virtualisation, ideally gained through an operational support background. • Good organisational and time management skills, with strong interpersonal skills, able to deal effectively with people at all levels whilst also having good written and verbal communication skills • Linux systems engineering at scale. Strong command over modern Linux distributions, kernel modules, systemd, networking stack, and filesystem tooling. Proven troubleshooting across compute, storage and network layers in production. • Kubernetes. Operate and troubleshoot K8s clusters, and understand how physical resources are abstracted up the stack to K8s. • GPU platforms (NVIDIA and AMD). Practical experience with GPU drivers and GPU logs investigation tools, e.g. nvidia-smi. Performance diagnostics using NCCL on large scale clusters. • Observability and incident response. Build and use alerting stacks and dashboards, interpret metrics and alerts, and drive runbooks to resolution; contribute to SLOs and post‑incident reviews. • Strong Networking fundamentals. Solid grasp of L2/L3, routing, BGP, VLANs, VXLAN, firewalls, load balancing. Understanding of high‑performance fabrics (RDMA/NVLink basics) for cluster‑to‑cluster traffic. • SRE‑style operations. Write and maintain runbooks, automate diagnostics, and reduce human intervention using scripts or small tools. • Automation and Git. Scripting or software skills in Bash, Python, or JavaScript (or equivalent) for operational tooling and integrations, and experience with Infrastructure Automation tools (Ansible, Puppet, Terraform, Chef) • Cloud Infrastructure Administration and Troubleshooting. Strong familiarity with using virtualisation technologies, and investigating issues that arise, performing deep dive investigation to perform root cause analysis. Openstack operations experience preferred.
• Highly competitive package (base + equity) with reviews every 12 months. 🚀 • Join the fastest-growing tech startup, your chance to push boundaries, collaborate with brilliant minds, and make your mark on cutting-edge AI. ✨ • Expect a dynamic progression plan tailored to your ambitions. Grow by trying new things, leading, challenging the status quo, and owning your impact, always with our full support. • Human-First Flexibility: We treat you as humans first. 🫶🏽 Our flexible workplace trusts Nscalers to deliver, giving you the autonomy to shape your day around life's moments.
Apply Now