Principal SRE – AI Agents Platform

Job not on LinkedIn

5 hours ago

🗣️🇧🇷🇵🇹 Portuguese Required

Apply Now
Logo of iFood

iFood

eCommerce • Food Delivery • Technology

iFood is a leading food delivery platform based in Brazil, connecting customers with a wide array of restaurants and food options. The company leverages cutting-edge technology and artificial intelligence to improve customer experience, streamline operations, and enhance delivery logistics. iFood fosters a diverse and inclusive work environment, focusing on sustainability and the positive impact of its services on the community.

5001 - 10000 employees

🛍️ eCommerce

📋 Description

• Lead the technical development and maintenance of the AI agents platform, ensuring agent reliability, scalability, and governance. • Design execution, deployment, telemetry, and monitoring patterns, contributing to critical architectural decisions and the platform's long-term vision. • Build automations and MLOps-oriented pipelines focused on observability, performance metrics, resilience, and efficient resource usage. • Define and monitor SLOs, SLIs, and protection mechanisms, providing clear communication of risk and impact to technical partners. • Evaluate vendors and technologies in the AI ecosystem, objectively communicating trade-offs and supporting strategic decisions. • Collaborate with Engineering and AI teams to integrate models, embeddings, and external components, establishing strong, reliable partnerships. • Identify opportunities for standardization — documentation, templates, automations, and workflows — bringing systemic vision and problem-solving capability in complex scenarios.

🎯 Requirements

• Proven experience as an SRE in high-scale environments, with strong analytical ability and mature decision-making. • Proficiency with Kubernetes, observability (Prometheus, OpenTelemetry, Grafana), infrastructure as code (Terraform), and SRE practices (SLO/SLI, incident response, advanced troubleshooting). • Experience with MLOps, inference pipelines, or platforms running AI models in production. • Experience with AWS (EKS, IAM, messaging, monitoring), distributed architecture, security, and multi-tenancy. • Clear and collaborative communication skills, ability to navigate ambiguity, and a partnership mindset across multiple teams. • Ability to prioritize, manage risk, and deliver consistently in complex environments.

🏖️ Benefits

Apply Now

Similar Jobs

September 29

Lead Azure DevOps initiatives, CI/CD and automation for cloud infrastructure. Ensure secure, scalable environments and collaborate with development teams.

Ansible

AWS

Azure

Chef

Cloud

Docker

Jenkins

Kubernetes

Linux

Puppet

Python

Ruby

Subversion

Terraform

TFS

September 29

DevOps Architect building automated, secure cloud environments and CI/CD pipelines. Oversees deployments, infrastructure security, and system documentation.

Ansible

Azure

Chef

Cloud

Docker

ElasticSearch

Jenkins

Linux

MongoDB

MySQL

NoSQL

Puppet

Python

RDBMS

Redis

Ruby

Subversion

VMware