Staff Platform Engineer, AI/ML Infrastructure

đŸ”„ 0 minutes ago

Apply Now
Find Similar Remote Jobs

📊 Check your resume score for this job

Improve your chances of getting an interview by checking your resume score before you apply.

Logo of Pfizer

Pfizer

10,000+ employees

Founded 1849

💰 Post-IPO Debt on 2023-05

We’re celebrating 175 years of daring scientific innovation—and we’re not done yet. Every day, we’re channeling our passion and resources into delivering innovative therapies that change the face of healthcare. Let’s outdo yesterday.

📋 Description

‱ Provide technical leadership for the cloud platforms, deployment systems, and operational foundations that power enterprise-scale generative AI applications. ‱ Define and evolve the infrastructure architecture for AI/ML platforms running across AWS, Kubernetes, serverless, and containerized environments. ‱ Lead platform standards for reliability, scalability, observability, CI/CD, security, and developer enablement, while partnering closely with software engineering, AI engineering, security, and operations teams. ‱ Define and drive the technical strategy for AI/ML platform infrastructure supporting generative AI applications, LLM integrations, model routing, and enterprise AI services. ‱ Architect, build, and operate scalable cloud platforms using AWS services such as EKS, ECS Fargate, Lambda, DynamoDB, S3, OpenSearch, Secrets Manager, CloudWatch, ALB, and MWAA. ‱ Establish reusable infrastructure patterns using CloudFormation, Helm, and Terraform to support reliable multi-environment and multi-region deployments. ‱ Lead CI/CD architecture using GitHub Actions, reusable workflows, OIDC-based AWS authentication, automated quality gates, deployment promotion, and environment approvals. ‱ Design and improve observability across AI platforms, including CloudWatch dashboards, logs, alarms, Prometheus/Grafana, OpenSearch, Langfuse, and LLM-specific operational metrics. ‱ Build platform capabilities for GenAI workloads, including model availability monitoring. ‱ Partner with software engineering teams to improve deployment reliability, rollback strategies, health checks, autoscaling, load testing, and runtime performance. ‱ Define and enforce security and compliance practices for infrastructure, including IAM permission boundaries, Secrets Manager usage, secret scanning, audit logging, tagging standards, and change-management controls. ‱ Provide technical leadership for cost optimization, capacity planning, environment standardization, and operational resilience across development, test, production, and sandbox environments. ‱ Mentor engineers, review architecture and infrastructure designs, and influence platform engineering practices across teams.

🎯 Requirements

‱ Bachelor’s degree in Computer Science, Engineering, Information Technology, or a related technical field, or equivalent practical experience. ‱ 7+ years of experience in DevOps, platform engineering, cloud infrastructure, site reliability engineering, or software engineering roles. ‱ Strong hands-on experience with AWS/Azure/GCP infrastructure and services, including container, serverless, networking, storage, observability, and security services. ‱ Experience designing and operating production systems on Kubernetes, ECS/Fargate, or comparable container orchestration platforms. ‱ Proficiency with infrastructure-as-code, especially CloudFormation, Terraform, Helm, or similar tooling. ‱ Strong CI/CD experience with GitHub Actions or similar platforms, including reusable workflows, automated testing, deployment gates, and cloud authentication. ‱ Experience building and operating observability solutions using CloudWatch, Prometheus/Grafana, OpenSearch, or similar tools. ‱ Strong understanding of cloud security practices, IAM, secrets management, least-privilege access, audit logging, and compliance requirements. ‱ Experience supporting distributed systems, microservices, APIs, asynchronous workloads, and multi-environment deployments. ‱ Demonstrated ability to lead technical design, mentor engineers, and influence engineering practices across teams.

đŸ–ïž Benefits

‱ health care coverage ‱ retirement savings plans ‱ insurance benefits ‱ Employee Assistance Program ‱ wellness benefits

Apply Now

Similar Jobs

🕒 March 28

Filigran

201 - 500

🔒 Cybersecurity

☁ SaaS

Lead the platform engineering team to deliver reliable, automated cloud-native and appliance deployments. Drive technical strategy, hiring, and operational excellence across Kubernetes, IaC, CI/CD, observability, and cloud tooling.

đŸ—ŁïžđŸ‡«đŸ‡· French Required

Ansible

AWS

Azure

Cloud

Docker

Google Cloud Platform

Kubernetes

Terraform

🕒 March 5

Joko

11 - 50

đŸ›ïž eCommerce

💾 Finance

đŸ‘„ B2C

Staff Platform Engineer leading platform efforts at Joko, an AI-powered shopping assistant company. Focus on operational reliability, developer experience, and technical leadership within a fast-growing engineering team.

AWS

Cloud