Search Remote Jobs

Senior Solutions Architect, Cloud Infrastructure and DevOps

September 6

🇦🇪 United Arab Emirates (UAE) – Remote

⏰ Full Time

đźź  Senior

đź’» Solutions Engineer

Apply Now
Logo of NVIDIA

NVIDIA

Artificial Intelligence • Gaming • Automotive

NVIDIA is a leading technology company specializing in accelerated computing and artificial intelligence. NVIDIA pioneers advancements in graphical processing units (GPUs), cloud computing, data centers, and virtual reality, with a focus on gaming, automotive, healthcare, and robotics industries. The company's innovations, such as NVIDIA Omniverse, transform traditional digital processes by enabling high-fidelity simulations and rendering tasks. Their applications span various industries, from autonomous vehicles using NVIDIA DRIVE to healthcare solutions with NVIDIA Clara, and AI-driven analytics and workflows.

10,000+ employees

Founded 1993

🤖 Artificial Intelligence

🎮 Gaming

đź“‹ Description

• Maintain large scale HPC/AI clusters with monitoring, logging and alerting • Manage Linux job/workload schedulers and orchestration tools • Develop and maintain continuous integration and delivery pipelines • Develop tooling to automate deployment and management of large-scale infrastructure environments, and enable self-service consumption of resources • Deploy monitoring solutions for servers, network and storage • Perform troubleshooting bottom up from bare metal, operating system, software stack and application level • Develop, re-define and document standard methodologies to share with internal teams • Support Research & Development activities and engage in POCs/POVs for future improvements • Interact with customers, partners and internal teams to analyze, define and implement large scale Networking projects • Act as a technical resource and customer-facing representative

🎯 Requirements

• BS/MS/PhD or equivalent experience in Computer Science, Electrical/Computer Engineering, Physics, Mathematics, or related fields • At least 8 years of professional experience in networking fundamentals, TCP/IP stack, and data center architecture • Knowledge of HPC and AI solution technologies, including CPUs, GPUs, high-speed interconnects, and supporting software • Extensive knowledge and hands-on experience with Kubernetes, including container orchestration for AI/ML workloads, resource scheduling, scaling, and integration with HPC environments • Experience in managing and installing HPC clusters, including deployment, optimization, and troubleshooting • Excellent knowledge of Linux systems (Redhat/CentOS and Ubuntu), including internals, ACLs, OS-level security protections, and common protocols like TCP, DHCP, DNS • Experience with multiple storage solutions, including Lustre, GPFS, ZFS, and XFS • Proficiency in Python programming and bash scripting • Comfortable with automation and configuration management tools, including Jenkins, Ansible, Puppet/Chef • Excellent interpersonal skills and customer-facing experience • Familiarity with RDMA (InfiniBand or RoCE) fabrics (way to stand out) • Knowledge of CI/CD pipelines for software deployment and automation (way to stand out) • Experience with GPU-focused hardware/software (DGX, CUDA) (way to stand out)

🏖️ Benefits

• Highly competitive salaries • An extensive benefits package • Work environment that promotes diversity, inclusion, and flexibility

Apply Now
Developed by Lior Neu-ner. I'd love to hear your feedback — Get in touch via DM or support@remoterocketship.com