Site Reliability Engineer – Level 3

Job not on LinkedIn

🕒 May 16

Apply Now
Find Similar Remote Jobs

📊 Check your resume score for this job

Improve your chances of getting an interview by checking your resume score before you apply.

Logo of Granicus

Granicus

501 - 1000 employees

Founded 1999

🏛️ Government

☁️ SaaS

📋 Compliance

Government • SaaS • Compliance

Granicus is a company focused on transforming the way governments interact with their constituents through digital services and technology solutions. It provides the Government Experience Cloud to improve service delivery, community engagement, and operational efficiency across local, state, and federal governments. Granicus offers tools for agenda and meeting management, digital communication and engagement, public records management, and more, all designed to enhance customer experience and foster transparent and equitable interactions between governments and the people they serve.

📋 Description

• Provide production support on a shift according to the team on-call roster • Work on the customer and internal engineering/implementation team raised tickets while not on-call for production support • Monitor and Maintain Systems: Continuously monitor the health and performance of our services, systems, and infrastructure • Automate Processes: Develop and maintain automation scripts and tools to streamline operations and reduce manual intervention • Incident Management: Assist in troubleshooting and resolving incidents, performing root cause analysis, and implementing long-term fixes to prevent recurrence • System Improvements: Participate in designing and implementing system improvements to enhance reliability, scalability, and performance • Collaboration: Work closely with software engineers to understand application requirements, provide feedback on design and architecture, and support deployment and release processes • Documentation: Create and maintain documentation for processes, procedures, and troubleshooting guides to ensure knowledge sharing within the team • Capacity Planning: Assist in capacity planning activities to anticipate future needs and ensure that our infrastructure can handle growth • Security: Implement and adhere to security best practices to protect our systems and data

🎯 Requirements

• 5+ years of experience in site reliability engineering, system administration, or a similar role • Good understanding of Linux/Unix systems, networking, and cloud services (AWS, Azure, or Google Cloud) • Experience with scripting languages such as Python, Bash, or Ruby • Bachelor's or postgraduate degree in computer science, Information Technology, or a related field, or equivalent practical experience • Familiarity with AI/ML operations, including model lifecycle management, vector databases, and inference performance tuning • Expertise in Linux/Unix systems, networking, and cloud services (AWS, Azure, or Google Cloud) • Proficiency in scripting languages (Python, Bash, Ruby) and programming languages (Go, Java, C++) • Advanced knowledge of monitoring and logging tools like Elastic (Prometheus, Grafana, Splunk), configuration management (Ansible, Chef, Puppet), and CI/CD pipelines • Strong analytical and problem-solving skills with the ability to diagnose and resolve complex issues efficiently • Excellent verbal and written communication skills, with the ability to convey complex technical concepts to non-technical stakeholders • Demonstrated ability to lead and mentor a team, drive projects to completion, and manage cross-functional initiatives • Relevant certifications such as AWS Certified DevOps Engineer, AWS Certified Machine Learning – Specialty, Google Cloud Professional DevOps Engineer, or similar are a plus.

🏖️ Benefits

• Health insurance • 401(k) matching • Flexible work hours • Paid time off • Remote work options

Apply Now

Similar Jobs

🕒 May 16

FICO

1001 - 5000

💸 Finance

🤖 Artificial Intelligence

☁️ SaaS

DevOps Engineer at FICO focusing on secure cloud solutions and Kubernetes expertise. Collaborating with engineering teams to drive reliable and scalable software delivery.

AWS

Cloud

EC2

Grafana

Kubernetes

Prometheus

Python

Terraform

🕒 May 15

ICF

5001 - 10000

☁️ SaaS

⚡ Energy

Salesforce DevOps Engineer focused on CI/CD pipelines and deployment automation in Salesforce. Collaborating with teams to enhance deployment reliability and efficiency in a remote setting.

🕒 May 15

Natera

1001 - 5000

🧬 Biotechnology

⚕️ Healthcare Insurance

💊 Pharmaceuticals

DevOps Engineer designing CI/CD architecture and driving automation across GCP and AWS for Natera's bioinformatics workflows. Collaborating with software engineers and bioinformatics scientists in cloud-native environments.

Ansible

AWS

Chef

Cloud

Docker

Google Cloud Platform

Grafana

Jenkins

Kubernetes

Prometheus

Puppet

Python

Terraform

🕒 May 15

Senior DevOps Engineer designing, deploying, and scaling platforms with Kubernetes for aviation systems. Working in a fully remote, international team with a modern cloud-native technology stack.

Cloud

Distributed Systems

Kubernetes

Linux

NoSQL

Python

SQL

Go

🕒 May 15

ImagineX

201 - 500

🤖 Artificial Intelligence

🔒 Cybersecurity

🏢 Enterprise

Senior Azure DevOps Engineer at ImagineX deploying Azure infrastructure and CI/CD pipelines. Collaborating with teams for secure and scalable solutions in a remote environment.

Azure

Cloud

Docker

Firewalls

Kubernetes

Python

SQL

Terraform