AI Platform Engineer – OneAI

🕒 April 17

Apply Now
Find Similar Remote Jobs

📊 Check your resume score for this job

Improve your chances of getting an interview by checking your resume score before you apply.

Logo of OpenNebula

OpenNebula

11 - 50 employees

Founded 2010

🏢 Enterprise

Cloud Computing • Software • Enterprise

OpenNebula Systems is an open source cloud and edge computing platform that specializes in simplifying the deployment and management of enterprise private, hybrid, or edge cloud infrastructures. By unifying public cloud agility with private cloud security and performance, OpenNebula provides flexibility, scalability, and vendor independence, enabling seamless orchestration of compute, storage, and network resources across various cloud environments. It supports developers and DevOps practices, making it easy for enterprises to automate operations and manage applications across multiple infrastructures.

📋 Description

• Design, implement, and deploy advanced AI capabilities within the OneAI platform. • Shape the end-user experience by designing intuitive workflows for model management, deployment configuration, and job operation. • Streamline the model lifecycle by integrating public repositories (e.g., Hugging Face) for seamless discovery, import, versioning, and deployment. • Bridge the gap between systems engineering and product design to ensure a seamless transition from backend infrastructure to user features. • Integrate cutting-edge AI frameworks and engines, such as vLLM, NVIDIA Dynamo and Unsloth, into a secure and scalable environment. • Leverage OpenNebula to orchestrate high-performance inference and training workloads across diverse cloud and edge environments. • Develop and maintain reliable APIs for compute provisioning and workload scheduling. • Implement GPU-aware operations to ensure optimal resource allocation and hardware utilization. • Build comprehensive observability suites to monitor and track critical metrics, including latency, throughput, utilization, and failure rates. • Establish and refine deployment and workflow strategies to ensure AI workloads remain efficient and stable at scale. • Optimize system architecture to balance high performance with cost efficiency. • Research and integrate emerging AI tools and engines to keep the OneAI platform at the forefront of the industry. • Analyze performance bottlenecks to iterate on the efficiency of both training and inference processes.

🎯 Requirements

• Bachelor’s or Master’s degree in Computer Science, Information Technology, or Engineering. • 3+ years of experience in applied AI, machine learning, or software engineering, with hands-on delivery of AI/ML solutions in production environments • Demonstrated experience designing and deploying high-performance AI infrastructure, specifically focusing on the scalability and reliability of inference and training workloads. • Proven track record of deploying Large Language Models (LLMs) at scale, with deep knowledge of serving engines (e.g., vLLM) and fine-tuning tools (e.g., Unsloth). • Experience building AI-centric platforms or toolchains that manage the model lifecycle (versioning, deployment, and discovery). • Experience with GPU orchestration and optimizing workloads for cloud, distributed or large-scale environments and collaborating with platform or infrastructure teams. • Hands-on experience with high-throughput inference engines (e.g., vLLM) and fine-tuning tools (e.g., Unsloth) • Proficiency in integrating with the Hugging Face ecosystem (Transformers, Hub, Datasets) for model and data management. • Experience implementing monitoring tools to track system-level AI metrics such as token throughput, latency, GPU utilization, and failure rates. • Experience designing and implementing scalable, reliable APIs for compute provisioning and workload scheduling. • Experience working with cloud platforms and containerized environments (e.g., OpenNebula, Kubernetes) • Advanced English level (B2 or higher) is required.

🏖️ Benefits

• Competitive compensation package and flexible remuneration: Meals, Transport, Nursery/Childcare • Customized workstation (macOS, Windows, Linux) • Private health insurance • Paid time off: Holidays, Personal Time, Sick Time, Parental leave • Afternoon-off working day every friday and during summer • Remote company with bright HQ centrally located in Madrid; offices in Boston (USA), Brussels (Belgium) and Brno (Czech Republic); and access to office space near your location when needed. • Healthy work-life balance: We encourage the right for Digital Disconnecting and promote harmony between employees personal and professional lives • Flexible hiring options: Full Time/Part Time, Employee (Spain/USA) / Contractor (other locations)

Apply Now

Similar Jobs

🕒 April 15

Crystal Intelligence

51 - 200

₿ Crypto

📋 Compliance

🔐 Security

Senior Data Platform Engineer focused on blockchain data processing and pipeline maintenance. Joining a remote data engineering team to innovate and implement cutting-edge solutions.

Docker

Linux

Python

SQL

Unix

🕒 April 10

Factorial

501 - 1000

👥 HR Tech

☁️ SaaS

🏢 Enterprise

Software Engineer specializing in developing ML/AI solutions for a tech company. Working with international teams and utilizing IA to enhance productivity

🗣️🇪🇸 Spanish Required

Kubernetes

Python

Unity

🕒 March 20

Visium

51 - 200

Platform Engineer designing and evolving cloud platforms and DevOps practices for a leading AI & Data strategy firm. Mentoring engineers and ensuring reliability, scalability, and security in cloud operations.

AWS

Azure

Cloud

🕒 January 20

TD SYNNEX

10,000+ employees

🏢 Enterprise

☁️ SaaS

📡 Telecommunications

Senior Platform Engineer at TD SYNNEX architecting multi-cloud infrastructure for next-gen AI applications. Collaborating with developers and operations to automate platform solutions.

Ansible

AWS

Azure

Cloud

Google Cloud Platform

Linux

Python

Terraform

🕒 November 4, 2025

Factorial

501 - 1000

👥 HR Tech

☁️ SaaS

🏢 Enterprise

AI Platform Engineer focused on developing impactful ML/AI solutions at ENCAMINA. Collaborating with teams on innovative digital transformation projects across multiple locations in Spain.

🗣️🇪🇸 Spanish Required

Kubernetes

Python

Unity