Senior Cloud Operations Engineer

🔥 16 hours ago

🇬🇧 United Kingdom – Remote

⏰ Full Time

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

🇬🇧 UK Skilled Worker Visa Sponsor

info
Apply Now
Find Similar Remote Jobs

📊 Check your resume score for this job

Improve your chances of getting an interview by checking your resume score before you apply.

Logo of NICE

NICE

5001 - 10000 employees

Founded 1991

☁️ SaaS

🤖 Artificial Intelligence

📡 Telecommunications

SaaS • Artificial Intelligence • Telecommunications

NICE is a leading provider of AI-powered customer service automation solutions, transforming contact centers into world-class customer experience centers. Their CXone Mpower platform offers end-to-end automation of customer service workflows, integrating human and AI agents to deliver efficient and personalized customer interactions. NICE's offerings include AI for customer experience, digital and self-service solutions, workforce engagement and management, and complete cloud-based contact center platforms. They are recognized as a leader in the Contact Center as a Service (CCaaS) industry, providing tools for increased operational efficiency, employee engagement, and enhanced customer satisfaction.

📋 Description

• Design, implement, and operate scalable, secure, and highly available AWS cloud infrastructure leveraging services such as EC2, EKS, ECS, RDS, S3, VPC, Lambda, and IAM. • Drive the reliability and performance of containerized applications by managing Amazon EKS and ECS environments, including cluster operations, networking, scaling, and troubleshooting. • Ensure the stability, security, and efficiency of production Linux environments through system administration, performance tuning, storage management, networking, and incident resolution. • Maintain and optimize relational databases (PostgreSQL, MySQL, Aurora) and NoSQL platforms (DynamoDB, Redis), ensuring high availability, performance, and disaster recovery readiness. • Strengthen the organization's cloud security posture through effective management of IAM, network security controls, secrets management, and compliance best practices. • Enhance platform observability and operational excellence by implementing and improving monitoring, logging, alerting, and performance analytics using CloudWatch, Prometheus, and Grafana. • Take ownership of production incidents by participating in on-call rotations, leading troubleshooting efforts, performing root cause analysis, and driving continuous improvement initiatives. • Partner closely with software engineering, DevOps, and platform teams to improve deployment processes, application reliability, and operational efficiency. • Identify and implement cloud cost optimization opportunities through resource right-sizing, capacity planning, automation, and governance best practices.

🎯 Requirements

• 4–5 years in a cloud operation, infrastructure engineering, or SRE role with a strong hands-on technical focus • Deep hands-on experience with core AWS services: EC2, EKS, ECS, RDS/Aurora, S3, VPC, IAM, Lambda, CloudWatch, Route 53, and ALB/NLB • Proven ability to design and troubleshoot complex AWS networking topologies (VPCs, subnets, transit gateways, security groups) • Solid understanding of AWS IAM — roles, policies, permission boundaries, and cross-account access • Hands-on production experience managing workloads on Amazon EKS and ECS — cluster lifecycle, node group management, networking (CNI, service mesh basics), and autoscaling • Strong Docker fundamentals: image builds, registries (ECR), multi-stage builds, and container security • Strong Linux administration skills: Bash/Python scripting, process and memory management, filesystem and storage operations, kernel parameters, and network diagnostics • Experience managing and hardening Linux servers in production environments (RHEL, Ubuntu, or Amazon Linux) • Proficient in Terraform — module design, state management, remote backends, and workspace strategies • Hands-on experience with Puppet for configuration management, node classification, and enforcing system state at scale • Hands-on experience with relational databases: PostgreSQL, MySQL, or AWS RDS/Aurora — schema management, query optimisation, replication, backups, and failover • Familiarity with NoSQL databases: DynamoDB, Redis, or MongoDB — data modelling, performance tuning, and operational monitoring • Familiarity with CI/CD pipelines (GitHub Actions, Jenkins, or AWS CodePipeline) • Experience with observability tooling: CloudWatch, Datadog, Prometheus, or Grafana.

🏖️ Benefits

• Flexible working arrangements • Professional development opportunities

Apply Now

Similar Jobs

🕒 3 days ago

Airalo

51 - 200

📡 Telecommunications

Senior DevSecOps Engineer focused on security solutions across software development life cycle. Managing infrastructure security and mentoring engineering teams at Airalo.

🕒 4 days ago

Pinpoint Applicant Tracking System

51 - 200

👥 HR Tech

☁️ SaaS

🤝 B2B

Product Reliability Engineer focused on enhancing the reliability of HR recruitment software. Engaging in proactive support, tooling, and direct collaboration with product teams.

🕒 5 days ago

Arbor Education

51 - 200

📚 Education

🤝 B2B

Senior DevSecOps Engineer securing Arbor's platform in a remote capacity. Combines security engineering with DevOps practices to enhance system resilience and performance.

🕒 6 days ago

Paddle

201 - 500

☁️ SaaS

💳 Fintech

🤝 B2B

Site Reliability Engineer at Paddle enhancing engineering efficiency and system reliability in a collaborative environment. Focused on automating processes and improving the overall development lifecycle.

🕒 6 days ago

itD

501 - 1000

🤝 B2B

🏢 Enterprise

🤖 Artificial Intelligence

Lead Site Reliability Engineer on the Observability team at itD Tech, responsible for designing and developing large-scale observability systems.