Senior Cloud - Kubernetes SRE

201 - 500 employees

We’re a leading recruiter with a team of over 250 consultants working in London, Guildford, Milton Keynes, St Albans, Birmingham, New York, Philadelphia, and San Diego.

Senior Cloud - Kubernetes SRE

Job not on LinkedIn

🕒 3 days ago

🇬🇧 United Kingdom – Remote

⏰ Full Time

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

🇬🇧 UK Skilled Worker Visa Sponsor

Ansible

Cloud

Flux

Grafana

Kubernetes

Linux

OpenShift

Prometheus

Python

Shell Scripting

Terraform

VMware

Apply Now

Find Similar Remote Jobs

📊 Check your resume score for this job

Improve your chances of getting an interview by checking your resume score before you apply.

Investigo

201 - 500 employees

We’re a leading recruiter with a team of over 250 consultants working in London, Guildford, Milton Keynes, St Albans, Birmingham, New York, Philadelphia, and San Diego.

📋 Description

• Operate, harden and extend production OKD / Kubernetes clusters across on-premises and hybrid environments. • Support the migration from VMware to KVM, helping modernise the underlying compute and storage layer. • Own and improve CI/CD processes across the full lifecycle of platform and application components. • Work with platform and application engineers to support cloud-native delivery using tools such as Helm and Kustomize. • Develop and mature GitOps deployment practices using tools such as Argo CD or Flux. • Maintain and improve core platform services including identity, ingress, observability, certificate management, service mesh and container registry capabilities. • Build and operate observability across logs, metrics, traces, alerting, SLOs and error budgets. • Improve platform hardening in line with secure and regulated environment requirements, including network policy, SELinux, image provenance, secret management and audit. • Automate repeatable operational tasks using tools such as Ansible, Terraform, Helm, Kustomize, Go, Python or equivalent technologies. • Lead incident response activity, support blameless post-mortems and drive systemic fixes. • Partner with networking and security teams on platform integration, segmentation, load balancing and accreditation evidence. • Create and maintain clear technical documentation, runbooks, design notes and operational guidance. • Mentor other engineers and act as a senior technical authority across cloud and Kubernetes operations. • Participate in an on-call rota, with appropriate compensation.

🎯 Requirements

• Strong experience running production Kubernetes environments, not just consuming or deploying into them. • Strong Linux fundamentals, including systemd, networking, storage and performance troubleshooting. • Experience with at least one Kubernetes distribution such as OKD, OpenShift, vanilla Kubernetes, Rancher, EKS, AKS or GKE. • Experience with infrastructure as code and automation, such as Ansible, Terraform, Helm or Kustomize. • Experience using GitOps tooling such as Argo CD or Flux in production environments. • Experience building or operating CI/CD pipelines for platform or application components. • Strong observability experience across logs, metrics and traces, using tools such as Prometheus, Grafana, Elastic Stack, OpenTelemetry or similar. • Experience working with identity and access technologies such as OIDC, SAML, SCIM or Keycloak. • Experience with virtualisation or infrastructure platforms such as KVM, libvirt or VMware. • Scripting or tooling experience using Go, Python, shell scripting or similar. • Strong troubleshooting, problem-solving and analytical skills. • Experience working in secure, regulated or enterprise-scale environments. • Strong communication skills, with the ability to produce clear documentation, runbooks, post-mortems and technical guidance. • Eligible to hold UK SC clearance. • Specific OpenShift or OKD experience, including operators, MachineConfig or SCCs. (Desirable) • Service mesh experience such as Istio or Linkerd. (Desirable) • Policy engine experience such as OPA, Gatekeeper or Kyverno. (Desirable) • Cloud-native application deployment experience using Helm, Terraform, Kustomize or similar. (Desirable) • GitOps and CI/CD experience managing full application and component lifecycles. (Desirable) • Storage experience such as Ceph, Longhorn, OpenShift Data Foundation or equivalent. (Desirable) • Networking experience including BGP, VXLAN, Palo Alto or Juniper technologies. (Desirable) • Software supply chain security experience, including SBOMs, image signing, admission control or tools such as Sigstore. (Desirable) • Experience operating AI, ML or GPU-enabled platforms. (Desirable) • CKA, CKAD, CKS, Red Hat certifications or equivalent. (Desirable) • Active or recent UK SC clearance. (Desirable) • Recognised open-source contributions to the Kubernetes ecosystem. (Desirable)

🏖️ Benefits

• Private Medical • Health Cash Plan • 4x Life Assurance • Inclusive Culture: Enjoy an inclusive culture and environment. • Holiday: Generous holiday allowance. • Learning: Access to continuous learning and development opportunities. • Bonus Potential: Bonus potential based on performance and business-related factors. • Discounts: Discounts on a wide range of products and services. • Pension: Pension scheme contributions. • EV Car Scheme • Regular Pay Reviews • More Benefits: Explore additional benefits on our career site.

Apply Now

Similar Jobs

DevOps Engineer, GCP

🕒 3 days ago

Inspired Testing

201 - 500

DevOps Engineer ensuring high availability, performance, and reliability in GCP environments. Implementing automation and scalable DevOps practices within Google Cloud.

🇬🇧 United Kingdom – Remote

⏰ Full Time

🟡 Mid-level

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

🇬🇧 UK Skilled Worker Visa Sponsor

Cloud

Docker

Google Cloud Platform

Grafana

Kubernetes

Linux

Prometheus

Python

Terraform

Unix

Site Reliability Engineer

🕒 May 29

Orion Health

501 - 1000

Site Reliability Engineer ensuring reliability and scalability of Orion Health's cloud infrastructure. Collaborating with teams to automate processes and enhance platform stability.

🇬🇧 United Kingdom – Remote

⏰ Full Time

🟡 Mid-level

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

🇬🇧 UK Skilled Worker Visa Sponsor

AWS

Azure

Cloud

Docker

Google Cloud Platform

Kubernetes

Python

Terraform

Senior Site Reliability Engineer

🕒 May 29

ENSEK

201 - 500

⚡ Energy

☁️ SaaS

💳 Fintech

Senior Site Reliability Engineer at ENSEK ensuring reliability and observability in AWS cloud infrastructure. Focus on automating tasks, collaborating with teams for performance improvements, and enhancing service resilience.

🇬🇧 United Kingdom – Remote

💰 Non Equity Assistance on 2019-03

⏰ Full Time

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

🇬🇧 UK Skilled Worker Visa Sponsor

AWS

Docker

EC2

Grafana

Kubernetes

Prometheus

Python

Terraform

Senior DevOps Engineer, AWS Platform

🕒 May 28

Enigmatic Smile

11 - 50

Senior DevOps Engineer focusing on AWS infrastructure for a security-first fintech scale-up. Collaborating with teams to build robust systems aligned with AWS best practices.

🇬🇧 United Kingdom – Remote

💵 £68k / year

⏰ Full Time

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

AWS

Cloud

Distributed Systems

Docker

EC2

Linux

Terraform

Lead DevOps Engineer – GCP, Google Cloud

🕒 May 28

Inspired Testing

201 - 500