Site Reliability Engineer

🕒 April 22

Apply Now
Find Similar Remote Jobs

📊 Check your resume score for this job

Improve your chances of getting an interview by checking your resume score before you apply.

Logo of NICE

NICE

5001 - 10000 employees

Founded 1991

☁️ SaaS

🤖 Artificial Intelligence

📡 Telecommunications

SaaS • Artificial Intelligence • Telecommunications

NICE is a leading provider of AI-powered customer service automation solutions, transforming contact centers into world-class customer experience centers. Their CXone Mpower platform offers end-to-end automation of customer service workflows, integrating human and AI agents to deliver efficient and personalized customer interactions. NICE's offerings include AI for customer experience, digital and self-service solutions, workforce engagement and management, and complete cloud-based contact center platforms. They are recognized as a leader in the Contact Center as a Service (CCaaS) industry, providing tools for increased operational efficiency, employee engagement, and enhanced customer satisfaction.

📋 Description

• Act as a primary or escalation responder in a 24x7 on-call rotation • Lead or support Major Incident (MI) response, including triage, mitigation, and resolution • Coordinate across Engineering, Infrastructure, Security, and Product teams • Execute and improve runbooks, playbooks, and escalation paths • Drive blameless post-incident reviews (PIRs) and track corrective actions • Own service health monitoring across infrastructure, applications, and dependencies • Design and maintain alerting strategies that align with SLIs/SLOs • Reduce alert fatigue through signal-to-noise improvements • Build dashboards using tools such as Grafana, Prometheus, Datadog, Splunk, CloudWatch • Automate repetitive operational tasks to reduce manual toil • Improve mean time to detect (MTTD) and mean time to resolve (MTTR) • Develop scripts and tools (Python, Bash, Go, etc.) to support NOC/SRE workflows • Implement self-healing and auto-remediation where possible • Partner with engineering teams to improve system design for reliability • Support and troubleshoot Linux-based systems, cloud platforms, Kubernetes/containerized environments • Assist with capacity planning and availability reviews • Ensure operational readiness for production releases

🎯 Requirements

• Strong Linux systems administration • Experience with incident management and production support • Familiarity with cloud infrastructure (AWS preferred) • Containers & orchestration (Docker, Kubernetes) • Monitoring/alerting platforms • Scripting or programming experience in Python, Bash, Go, or similar • Understanding of networking fundamentals (DNS, TCP/IP, load balancing) • Experience working in 24x7 NOC or production operations environments • Ability to handle high-pressure incidents calmly and effectively • Strong written and verbal communication for incident coordination • Comfort working from runbooks—but improving them when they fall short • Experience defining or operating to SLOs / SLIs • Prior migration from traditional NOC → SRE model • Infrastructure as Code experience (Terraform, Ansible, etc.) • Exposure to security, compliance, or regulated environments

🏖️ Benefits

• Professional development opportunities • Flexible working hours • Work from home

Apply Now

Similar Jobs

🕒 April 21

Ripjar

51 - 200

💸 Finance

📋 Compliance

🤖 Artificial Intelligence

DevOps Engineer ensuring reliability and security of infrastructure for software combating financial crime at Ripjar. Focus on continuous improvement and automation within a remote-first team.

Ansible

AWS

Azure

Cloud

Docker

JavaScript

Kubernetes

Linux

Prometheus

Python

Terraform

🕒 April 21

KOPE

11 - 50

☁️ SaaS

DevOps Engineer focusing on secure, scalable cloud infrastructure for KOPE's offsite construction platform. Responsible for CI/CD capabilities and security best practices throughout development.

Azure

Cloud

Docker

Jenkins

Kubernetes

Terraform

🕒 April 17

Recruiting.com

11 - 50

🎯 Recruiter

☁️ SaaS

🤝 B2B

Lead DevOps Engineer overseeing Azure infrastructure and CI/CD pipelines improvements at Cencora. Mentor engineers and align initiatives with business goals in the pharmaceutical consulting sector.

Azure

Cloud

Kubernetes

Python

Terraform

Go

🕒 April 16

Kerv

501 - 1000

🔒 Cybersecurity

AWS DevOps Engineer focusing on managing Kerv Group's AWS infrastructure. Ensuring security, stability, compliance, and cost optimization in production environments.

Ansible

AWS

Azure

Cloud

Jenkins

Python

Terraform

🕒 April 15

Livestock Information Ltd

51 - 200

🌾 Agriculture

🔬 Science

Azure DevOps Engineer responsible for designing and managing secure Azure pipelines for live services. Working closely with teams to deliver automated and secure services in a remote environment.

Azure

Cloud

JavaScript

Terraform