Senior Site Reliability Engineer (SRE) – Technical Leader, Kubernetes Platform

🔥 0 minutes ago

Apply Now
Find Similar Remote Jobs

📊 Check your resume score for this job

Improve your chances of getting an interview by checking your resume score before you apply.

Logo of Cisco

Cisco

10,000+ employees

Founded 1984

🔧 Hardware

🔐 Security

🏢 Enterprise

Hardware • Security • Enterprise

Cisco is a multinational technology company that provides networking hardware, software, and services to enterprises, service providers, and governments. It builds routers, switches, optical transceivers, programmable silicon, and edge computing platforms, and offers security, collaboration (Webex), observability, and AI-enabled software and support services to help organizations design, operate, and secure large-scale networks and data centers. Cisco also delivers professional services, training, and cloud-managed solutions to support digital transformation and AI-ready infrastructure.

📋 Description

• Design, build, and operate production-grade Kubernetes platforms in regulated and non-regulated environments. • Improve system reliability through automation, thoughtful design, and continuous iteration. • Define and drive SLOs, SLIs, and error budgets to guide reliability decisions. • Build and evolve CI/CD pipelines that are secure, scalable, and easy to use. • Implement robust observability (metrics, logs, traces) to make systems understandable and actionable. • Reduce operational toil by automating repetitive processes and improving workflows. • Partner with security and compliance teams to meet compliance requirements without sacrificing developer velocity. • Support audit processes, including documentation, controls implementation, and audit readiness. • Participate in on-call rotations supporting customer requests and paging alerts. • Participate in incident response, blameless postmortems, and continuous improvement efforts. • Help shape a platform that engineers enjoy using.

🎯 Requirements

• 10+ years of experience in SRE, DevOps, or infrastructure engineering • Strong experience running Kubernetes in production (EKS, AKS, GKE, or upstream) • Solid understanding of cloud infrastructure, Linux systems, and networking fundamentals • Experience with Infrastructure as Code (Terraform preferred) • Familiarity with CI/CD systems (GitHub Actions, GitLab CI, Jenkins, ArgoCD) • Proficiency in scripting or programming (Python, Go) • Experience building or operating observability platforms (Prometheus, Grafana, OpenTelemetry, ELK) • Working knowledge of compliance frameworks (e.g., PCI, ISO)

🏖️ Benefits

• Health insurance • Professional development opportunities • Flexible work arrangements

Apply Now

Similar Jobs

🔥 12 minutes ago

Sphera

1001 - 5000

☁️ SaaS

🏢 Enterprise

📋 Compliance

Senior DevOps Engineer at Sphera supporting multiple SaaS environments and building cloud infrastructure. Collaborating with engineers to optimize performance, reliability, and security in a fast-paced setting.

Azure

Cloud

Firewalls

NoSQL

SDLC

SQL

Terraform

🔥 17 hours ago

Astreya

1001 - 5000

🔒 Cybersecurity

🏢 Enterprise

☁️ SaaS

Senior Network Deployment Engineer designing and coordinating complex network technologies for a global tech firm. Leading projects and providing mentorship in a multi-vendor environment.

Ansible

Docker

iOS

Kubernetes

Linux

Python

Terraform

VMware

🔥 17 hours ago

Astreya

1001 - 5000

🔒 Cybersecurity

🏢 Enterprise

☁️ SaaS

Senior Network Deployment Engineer responsible for designing, planning, and implementing complex network technologies. Collaborating with teams globally for large-scale network projects.

Docker

iOS

Kubernetes

Linux

VMware

🔥 17 hours ago

Akamai Technologies

5001 - 10000

🔒 Cybersecurity

Senior Site Reliability Engineer at Akamai's Compute products, leading MySQL database operations and improving stability. Collaborating with development teams to provide database expertise and support initiatives.

MySQL

🕒 2 days ago

CrowdStrike

5001 - 10000

🔒 Cybersecurity

☁️ SaaS

🤖 Artificial Intelligence

Engineering Supervisor leading a Site Reliability Engineering team at CrowdStrike. Focusing on automation, reliability, and observability for internal developer platforms.

Ansible

Chef

Cloud

Cyber Security

Grafana

Jenkins

Kafka

Kubernetes

NFS

Postgres

Prometheus

Puppet

Python

Redis

Splunk

Terraform

Go