Site Reliability Engineer

5001 - 10000 employees

Founded 1991

☁️ SaaS

🤖 Artificial Intelligence

📡 Telecommunications

SaaS • Artificial Intelligence • Telecommunications

NICE is a leading provider of AI-powered customer service automation solutions, transforming contact centers into world-class customer experience centers. Their CXone Mpower platform offers end-to-end automation of customer service workflows, integrating human and AI agents to deliver efficient and personalized customer interactions. NICE's offerings include AI for customer experience, digital and self-service solutions, workforce engagement and management, and complete cloud-based contact center platforms. They are recognized as a leader in the Contact Center as a Service (CCaaS) industry, providing tools for increased operational efficiency, employee engagement, and enhanced customer satisfaction.

Site Reliability Engineer

🕒 April 22

🇬🇧 United Kingdom – Remote

⏰ Full Time

🟡 Mid-level

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

🇬🇧 UK Skilled Worker Visa Sponsor

Ansible

AWS

Cloud

DNS

Docker

Grafana

Kubernetes

Linux

Prometheus

Python

Splunk

TCP/IP

Terraform

Apply Now

Find Similar Remote Jobs

📊 Check your resume score for this job

Improve your chances of getting an interview by checking your resume score before you apply.

NICE

5001 - 10000 employees

Founded 1991

☁️ SaaS

🤖 Artificial Intelligence

📡 Telecommunications

SaaS • Artificial Intelligence • Telecommunications

📋 Description

• Act as a primary or escalation responder in a 24x7 on-call rotation • Lead or support Major Incident (MI) response, including triage, mitigation, and resolution • Coordinate across Engineering, Infrastructure, Security, and Product teams • Execute and improve runbooks, playbooks, and escalation paths • Drive blameless post-incident reviews (PIRs) and track corrective actions • Own service health monitoring across infrastructure, applications, and dependencies • Design and maintain alerting strategies that align with SLIs/SLOs • Reduce alert fatigue through signal-to-noise improvements • Build dashboards using tools such as Grafana, Prometheus, Datadog, Splunk, CloudWatch • Automate repetitive operational tasks to reduce manual toil • Improve mean time to detect (MTTD) and mean time to resolve (MTTR) • Develop scripts and tools (Python, Bash, Go, etc.) to support NOC/SRE workflows • Implement self-healing and auto-remediation where possible • Partner with engineering teams to improve system design for reliability • Support and troubleshoot Linux-based systems, cloud platforms, Kubernetes/containerized environments • Assist with capacity planning and availability reviews • Ensure operational readiness for production releases

🎯 Requirements

• Strong Linux systems administration • Experience with incident management and production support • Familiarity with cloud infrastructure (AWS preferred) • Containers & orchestration (Docker, Kubernetes) • Monitoring/alerting platforms • Scripting or programming experience in Python, Bash, Go, or similar • Understanding of networking fundamentals (DNS, TCP/IP, load balancing) • Experience working in 24x7 NOC or production operations environments • Ability to handle high-pressure incidents calmly and effectively • Strong written and verbal communication for incident coordination • Comfort working from runbooks—but improving them when they fall short • Experience defining or operating to SLOs / SLIs • Prior migration from traditional NOC → SRE model • Infrastructure as Code experience (Terraform, Ansible, etc.) • Exposure to security, compliance, or regulated environments

🏖️ Benefits

• Professional development opportunities • Flexible working hours • Work from home

Apply Now

Similar Jobs

DevOps Engineer

🕒 April 21

Ripjar

51 - 200

💸 Finance

📋 Compliance

🤖 Artificial Intelligence

DevOps Engineer ensuring reliability and security of infrastructure for software combating financial crime at Ripjar. Focus on continuous improvement and automation within a remote-first team.

🇬🇧 United Kingdom – Remote

⏰ Full Time

🟡 Mid-level

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

🇬🇧 UK Skilled Worker Visa Sponsor

Ansible

AWS

Azure

Cloud

Docker

JavaScript

Kubernetes

Linux

Prometheus

Python

Terraform

Lead DevOps Engineer

🕒 April 17

Recruiting.com

11 - 50

🎯 Recruiter

☁️ SaaS

🤝 B2B

Lead DevOps Engineer overseeing Azure infrastructure and CI/CD pipelines improvements at Cencora. Mentor engineers and align initiatives with business goals in the pharmaceutical consulting sector.

🇬🇧 United Kingdom – Remote

💰 Private Equity Round on 2006-06

⏰ Full Time

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

Azure

Cloud

Kubernetes

Python

Terraform

Database Reliability Engineer – Core Team

🕒 April 2

ClickHouse

51 - 200

☁️ SaaS

🏢 Enterprise

🤖 Artificial Intelligence

Database Reliability Engineer at ClickHouse ensuring reliability and performance of ClickHouse core, improving customer service through backend optimization.

🇬🇧 United Kingdom – Remote

⏰ Full Time

🟡 Mid-level

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

AWS

Azure

Cloud

Google Cloud Platform

Python

SQL

DevOps Engineer

🕒 April 1

Prima

1001 - 5000

💸 Finance

👥 B2C

DevOps Engineer in Infrastructure team leveraging data and tech for innovative motor insurance solutions. Join over 300 engineers for impactful scalable systems.

🇬🇧 United Kingdom – Remote

💰 $115.8M Series A on 2018-11

⏰ Full Time

🟡 Mid-level

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

AWS

Distributed Systems

DNS

Kubernetes

Microservices

Python

Terraform

Senior Site Reliability Engineer

🕒 April 1

Prima

1001 - 5000

💸 Finance

👥 B2C