AI Infrastructure Operations Engineer

Job not on LinkedIn

🔥 0 minutes ago

Apply Now
Find Similar Remote Jobs

📊 Check your resume score for this job

Improve your chances of getting an interview by checking your resume score before you apply.

Logo of The Health Management Academy

The Health Management Academy

51 - 200 employees

Founded 1998

⚕️ Healthcare Insurance

📚 Education

Healthcare Insurance • Education

The Health Management Academy is a premier membership-based community founded in 1998 that serves healthcare's most influential CXOs from top U. S. health systems and decision-makers from innovative industry companies. The organization focuses on cultivating exceptional peer groups, offering original market insights, world-class leadership development programs, and novel member alliances. Its industry-leading solutions help member companies facilitate meaningful relationships, navigate strategic transformations, and address critical industry issues with disruptive solutions. The Academy is committed to fostering connections through executive peer learning, supporting professional growth, and delivering actionable data and insights on key healthcare challenges.

📋 Description

• Establish operational reliability for Companion across AKS infrastructure, AI agent workloads, monitoring systems, and deployment pipelines. • Build meaningful observability practices that help PHM understand platform behavior, usage trends, and operational risks before they become incidents. • Create sustainable operational hygiene around patching, CVE remediation, secrets rotation, dependency management, and cloud maintenance cycles. • Strengthen platform resilience, documentation, and operational processes so the environment can scale without relying on tribal knowledge. • Monitor and maintain AKS infrastructure, AI agent workloads, deployment pipelines, and support Azure services. • Investigate incidents, troubleshoot production issues, and improve platform resilience through better operational patterns and tooling. • Support release operations and help ensure deployments remain stable, observable, and recoverable.

🎯 Requirements

• Strong hands-on Kubernetes operations experience, including troubleshooting workloads, admission controllers, cluster networking, and production incidents. • Experience supporting cloud-native infrastructure in Azure environments, particularly AKS and related operational tooling. • Demonstrated strength in monitoring, observability, and incident response using structured logging and metrics platforms. • SRE mindset with experience handling on-call responsibilities, operational prioritization, and post-incident analysis. • Comfort operating in fast-moving environments with incomplete documentation, evolving processes, and broad ownership areas. • Strong communication and collaboration skills with the ability to explain technical issues clearly across technical and non-technical audiences.

🏖️ Benefits

• health/dental/vision benefits • annual cash incentive program • 401k with match • flexible PTO • PHM for PHM — our services for you and your dependents

Apply Now

Similar Jobs

🔥 19 minutes ago

Leidos

10,000+ employees

🔒 Cybersecurity

🔬 Science

Senior Cloud Cybersecurity Infrastructure Engineer managing and securing cloud infrastructure for Air Force & Navy Mission Planning. Collaborate with developer teams in a DevSecOps environment to enhance software development.

🔥 3 hours ago

The Cigna Group

10,000+ employees

⚕️ Healthcare Insurance

💊 Pharmaceuticals

Mainframe z/OS Storage Administrator supporting IBM DS8000 and TS7700 environments. Responsibilities include storage configuration changes, disaster recovery strategies, and supporting storage systems.

🔥 12 hours ago

Fiberco

11 - 50

📡 Telecommunications

🔧 Hardware

Infrastructure Operations Engineer specializing in network operations for Tillman FiberCo. Ensuring optimal network performance and training technical staff in a remote role.

🕒 2 days ago

10Beauty

11 - 50

💄 Beauty

🔧 Hardware

🤝 B2B

Senior Data Infrastructure Engineer at 10Beauty building scalable data infrastructure for analytics, reporting, and future AI/ML initiatives. Collaborating across teams to ensure data reliability and accessibility.

🕒 2 days ago

Armis

201 - 500

🔒 Cybersecurity

🏛️ Government

Senior Staff Engineer in Data Infrastructure R&D for Federal business at Armis Security. Leading architectural improvements and operational excellence for FedRAMP environments.