Senior Cloud SRE

🕒 April 8

Apply Now
Find Similar Remote Jobs

📊 Check your resume score for this job

Improve your chances of getting an interview by checking your resume score before you apply.

Logo of Hazelcast

Hazelcast

51 - 200 employees

Founded 2008

🏢 Enterprise

🤖 Artificial Intelligence

☁️ SaaS

Enterprise • Artificial Intelligence • SaaS

Hazelcast is a leading real-time data platform uniquely combining a fast data store and distributed compute engine into one system. It provides solutions for stream processing, event-driven architectures, and real-time AI/ML automation. The platform is well-suited for enterprise architectures and cloud-agnostic deployments, offering high performance, resilience, and scale. Hazelcast serves various industries such as financial services, e-commerce, and healthcare, helping organizations modernize their data architecture, improve payment processing, and enhance fraud detection. The company also integrates with Apache Kafka and Redis for improved data processing capabilities.

📋 Description

• Keep Hazelcast cloud-based production systems running smoothly 24/7/365 • Design and Development: • Design, develop, and maintain our cloud infrastructure to support both our end user management center and microservice based platform • Implement new solutions using AWS and terraform, improving scalability, throughput, and reliability. • Support and manage our Keycloak IDP ensuring it provides appropriate security while meeting the needs of the development team • Security and Integration: • Implement security measures to protect data integrity and confidentiality, including encryption, access control, and compliance with relevant regulations. • Work with our operations team to maintain our SOC2 & ISO27001 compliance, and keeping our environment secure • Monitoring and Maintenance: • Monitor the system for performance issues, errors, and potential failures, and implement maintenance procedures such as backups, data recovery, and disaster recovery plans. • Troubleshoot issues related to data storage, including performance bottlenecks, data corruption, or compatibility issues with other software components. • Collaboration: • Collaborate with cross-functional teams, including software developers, architects, and product managers, to ensure the effective integration and operation of the components within the overall software infrastructure. • Document design decisions, implementation details, and operational procedures to facilitate collaboration among team members and ensure the maintainability of the system. • Continuous Learning: • Stay updated with the latest developments in storage technologies, Java programming language, and software engineering best practices, and apply this knowledge to improve existing storage systems and develop new solutions. • On-call participation • Be part of our on-call rotation to respond to availability incidents and work with support and engineers on customer incidents

🎯 Requirements

• Experience of distributed systems, Kubernetes & microservices • Infrastructure as Code (Terraform) • Modern devops stack (K8s, Prometheus, Grafana, Opentelemetry, ArgoCD, helm) • Experience with at least one programming languages, preferably Golang or Python • Experience with CI and building CD pipelines (Jenkins, GitHub Actions) • A passion for automation and keeping our software delivery fast and efficient • Knowledge of following are desirable: • Mutli-cloud (AWS, GCP and/or Azure) • Experience working with software engineers in designing cloud-native applications or troubleshooting them • Experience as part of an on-call rota • Bachelor's degree in a relevant field of study (Computer Science, or related discipline). OR equivalent experience.

🏖️ Benefits

• 25 days annual leave • Group Company Pension Plan • Private Medical Insurance • Private Dental Insurance • Life Insurance • EAP (Employee Assistance Program)

Apply Now

Similar Jobs

🕒 April 3

Intermedia Cloud Communications

1001 - 5000

🤝 B2B

🏢 Enterprise

☁️ SaaS

Site Reliability Engineer enhancing reliability and operational metrics for cloud communication services at Intermedia. Collaborating with teams to optimize alerting and event management solutions.

🕒 April 2

ClickHouse

51 - 200

☁️ SaaS

🏢 Enterprise

🤖 Artificial Intelligence

Database Reliability Engineer at ClickHouse ensuring reliability and performance of ClickHouse core, improving customer service through backend optimization.

🕒 April 1

Prima Power

1001 - 5000

🚀 Aerospace

Senior Site Reliability Engineer shaping the future of motor insurance at a leading provider. Collaborating across engineering teams to build reliable and scalable systems.

🕒 April 1

Prima Power

1001 - 5000

🚀 Aerospace

DevOps Engineer in Infrastructure team leveraging data and tech for innovative motor insurance solutions. Join over 300 engineers for impactful scalable systems.

🕒 April 1

Fortyx

1 - 10

Site Reliability Engineer optimizing reliability, scalability, and performance for Luupli's AWS cloud infrastructure. Collaborating with teams to enhance automation and incident management.