Site Reliability Engineer

September 6

Apply Now
Logo of 66degrees

66degrees

Artificial Intelligence • Cloud Services • Consulting

66degrees is a consulting and technology company that specializes in transforming customer experiences and business operations by leveraging innovative cloud, data, and AI solutions. They offer a broad spectrum of services including AI and machine learning, data platform modernization, cloud engineering, and managed solutions. By assisting businesses in modernizing infrastructure and applications, 66degrees enables data-driven decision-making and enhanced productivity. The company focuses on creating AI-powered enterprises and improving stakeholder collaboration and customer engagement across various industries such as retail, financial services, healthcare, and more.

501 - 1000 employees

🤖 Artificial Intelligence

📋 Description

• Ensure near-zero downtime with monitoring and alerting, self-healing automation, and continuous improvement • Create highly automated, available and scalable systems by applying software and infrastructure principles • Employ and advise clients on DevOps and SRE principles and practices, covering deployment pipelines, HA, service reliability, technical debt, and operational toil for live services running at scale • Provide a proactive approach to our clients’ workloads, anticipating failures, automating tasks, ensuring availability, and providing a great customer experience • Work closely with clients, your team, and Google engineers to investigate and resolve infrastructure issues • Contribute to ad-hoc initiatives such as writing documentation, open-sourcing, and improving operation, making a huge impact at a rapid-growth Google Premier Partner

🎯 Requirements

• Minimum 3+ years of cloud and infrastructure experience, including demonstrated expertise with Linux, Windows, k8s, databases, and networking services • 2+ years of Google Cloud experience and related certifications strongly preferred but not required • Proficiency with Python required. Other programming language experience is a plus • Strong provisioning and configuration skills using Terraform • Experience with 24x7x365 monitoring, incident response, and on-call support. • Experience in troubleshooting that spans systems, network, and code • Experience determining & negotiating Error budgets, SLIs, SLOs, and SLAs with product owners • Demonstrate the ability to work independently and as a member of a greater team, including cross-team activities • Experience working in Agile Scrum, Kanban methodologies in SDLC • Proven experience balancing service reliability, metrics, sustainability, technical debt, and operational toil for live services running at scale • Strong communication skills, as this is a heavily customer-facing role • Bachelor’s degree in computer science, electrical engineering, or equivalent required

Apply Now

Similar Jobs

July 10

Join to improve server infrastructure for billions of monthly user requests and data analytics.

Ansible

AWS

Azure

Cloud

DNS

EC2

ElasticSearch

Google Cloud Platform

Grafana

Kubernetes

Prometheus

Python

Terraform

Built by Lior Neu-ner. I'd love to hear your feedback — Get in touch via DM or support@remoterocketship.com