Cloud Operations Engineer

🕒 May 6

Apply Now
Find Similar Remote Jobs

📊 Check your resume score for this job

Improve your chances of getting an interview by checking your resume score before you apply.

Logo of O'Reilly

O'Reilly

201 - 500 employees

Founded 1978

📚 Education

☁️ SaaS

🤖 Artificial Intelligence

Education • SaaS • Artificial Intelligence

O'Reilly is a prominent technology and business learning platform that offers a wide range of educational resources, including courses, live events, and certifications. It caters to individuals and teams across various sectors, ensuring they stay updated with the latest tools, technologies, and skills necessary for business success. Known for its robust learning infrastructure, O'Reilly integrates AI-powered tools to enhance learning experiences and offers interactive learning paths and certifications across multiple disciplines. With a rich history of providing technical and business insights through books, conferences, and now its online platform, O'Reilly serves over 5,000 organizations worldwide.

📋 Description

• Platform & Infrastructure: Design, build, and maintain cloud infrastructure using infrastructure-as-code (Terraform) on GCP • Manage and evolve our Kubernetes platform, including cluster operations, workload configuration, and service mesh (Istio) • Develop and improve internal tooling that abstracts cloud complexity and improves the developer experience • Collaborate with product engineering teams to understand service deployment needs and deliver infrastructure solutions • Reliability & Observability: Monitor platform health using Datadog; proactively identify and resolve performance, availability, and security issues • Participate in on-call rotation and incident response; drive blameless post-mortems and eliminate recurring issues at their root cause • Define and track service-level indicators and objectives (SLIs/SLOs) for critical platform components • Implement and refine alerting, dashboards, and runbooks that reduce mean time to resolution • Security & Compliance: Embed security best practices into infrastructure workflows (DevSecOps) — not as an afterthought, but as a design principle • Help maintain cloud security posture, IAM hygiene, and policy guardrails across our cloud environment • Stay current with cloud security developments and proactively surface risks to the team • Execute and maintain our automated disaster recovery processes • Collaboration & Growth: Work closely with product engineering teams to understand their needs and remove infrastructure friction • Document systems, processes, and architectural decisions clearly so knowledge is shared, not siloed • Recommend improvements to tooling, architecture, and processes — and help drive them to completion • Keep current with the evolving cloud-native ecosystem and bring relevant knowledge back to the team

🎯 Requirements

• Bachelor's degree in Computer Science or a related field • 5+ years of experience working in cloud infrastructure, platform engineering, or a related discipline • Hands-on experience with Kubernetes in production environments (cluster management, workloads, networking) • Proficiency with infrastructure-as-code tools, particularly Terraform • Experience with at least one major cloud provider (GCP, AWS, or Azure) • Solid scripting and automation skills in Python, Bash, or a comparable language • Experience with modern observability platforms (Datadog, Grafana, or similar) • Strong understanding of Linux systems administration • Working knowledge of CI/CD concepts and tools (GitHub Actions, ArgoCD, Jenkins, or similar) • Excellent communication skills — you write clearly, ask good questions, and explain complex systems accessibly • AI-Augmented Development: Has the ability to demonstrate using AI-enabled development tools (e.g., Claude Code, Cursor) to streamline coding, debugging, and infrastructure-as-code authoring.

🏖️ Benefits

• Health insurance • 401(k) matching • Flexible work hours • Paid time off • Professional development opportunities

Apply Now

Similar Jobs

🕒 May 6

The Home Depot

10,000+ employees

🛒 Retail

👥 B2C

Senior Software Engineer ensuring the stability and performance of platforms at Home Depot. Collaborating to enhance product reliability and mentoring junior engineers in operational excellence.

Cloud

Google Cloud Platform

Grafana

Kubernetes

Prometheus

Terraform

🕒 May 5

Array

51 - 200

Deployment Engineer at Array handling SaaS product implementations, collaborating with teams for timely delivery and optimizing deployment processes.

Cloud

🕒 May 5

NextGen IT Services

51 - 200

🤝 B2B

🏢 Enterprise

🎯 Recruiter

DevOps Engineer at NextGen IT Services responsible for building/maintaining CI/CD pipelines and cloud infrastructure. Focusing on operational efficiencies and security controls implementation.

AWS

Cloud

Docker

Jenkins

Kubernetes

Python

SQL

🕒 May 5

Ad Hoc LLC

501 - 1000

🏛️ Government

🤖 Artificial Intelligence

🔌 API

DevOps Engineer III at Ad Hoc creating products that transform government services. Collaborating with teams to improve DevOps processes and deliver software efficiently.

Ansible

Cloud

Jenkins

Terraform

🕒 May 5

DevOps Engineer developing and maintaining CI/CD pipelines for Foresight Diagnostics' cancer detection platform. Collaborating with teams to enhance delivery workflows and automation.

Ansible

AWS

Chef

Cloud

Docker

Google Cloud Platform

Jenkins

Kubernetes

Puppet

Python

Terraform