Site Reliability Engineer

Job not on LinkedIn

🕒 2 days ago

Apply Now
Find Similar Remote Jobs

📊 Check your resume score for this job

Improve your chances of getting an interview by checking your resume score before you apply.

Logo of Harbor IT

Harbor IT

51 - 200 employees

Founded 1995

🔒 Cybersecurity

☁️ SaaS

🏢 Enterprise

Cybersecurity • SaaS • Enterprise

Harbor IT is a managed IT service provider that specializes in delivering tailored network infrastructure, cybersecurity solutions, and unified communications to organizations of various sizes. With over 30 years of experience, the company emphasizes a consultative approach to develop customized solutions that maximize operational efficiency with minimal disruption. Their offerings include cybersecurity as a service, network infrastructure management, and lifecycle management, supported by a dedicated team available 24/7 to ensure seamless operations across their clients' technology environments.

📋 Description

• Design and execute a comprehensive infrastructure strategy that proactively supports evolving business requirements and operational excellence. • Own the predictable delivery of high-complexity technical solutions through deep automation using Kubernetes and sophisticated CI/CD pipelines. • Maintain superior portal availability and system health by implementing advanced observability and distributed tracing strategies. • Lead high-severity incident response efforts and drive systemic improvements through insightful, blameless postmortem analysis. • Architect failure-resilient and self-healing infrastructure systems to ensure continuous operational stability and zero data loss. • Serve as the internal subject matter expert to influence software architecture decisions toward maximum scalability and performance. • Facilitate regular knowledge-sharing and training sessions to elevate technical standards and process predictability across the entire technology department. • Direct security initiatives and design secure networking strategies to maintain a high-standard protection framework for all client data and assets.

🎯 Requirements

• 4–7 years of professional experience building and managing resilient, modern infrastructure within a fast-paced environment. • Expert-level proficiency in managing and troubleshooting Linux-based servers across multiple distributions. • Advanced capability in developing modular, reusable infrastructure templates using tools such as Terraform and Ansible. • Proven success in managing containerized workloads at scale using Kubernetes and Helm. • Extensive experience configuring and optimizing high-performance database environments, specifically MySQL. • Demonstrated ability to build robust, secure CI/CD deployment pipelines that include automated rollback and quality gates. • Strong technical documentation skills, including the creation of architectural diagrams, detailed specifications, and operational playbooks. • Ability to lead cross-functional projects independently while mentoring junior engineers and driving team-wide initiatives.

🏖️ Benefits

• Health benefits • Flexible paid time off • Parental leave • Fertility and adoption assistance • 401(k) • Educational reimbursement

Apply Now

Similar Jobs

🕒 2 days ago

Coinbase

1001 - 5000

₿ Crypto

💸 Finance

💳 Fintech

Senior Site Reliability Engineer at Coinbase building and scaling identity and access management systems. Owns reliability and DevOps practices for IAM systems.

AWS

Azure

Cloud

Google Cloud Platform

Java

Python

Ruby

Terraform

Go

🕒 2 days ago

Coinbase

1001 - 5000

₿ Crypto

💸 Finance

💳 Fintech

Senior Site Reliability Engineer managing AI infrastructure at Coinbase. Driving automation, reliability, and observability in critical AI operations.

AWS

Cloud

Docker

Kubernetes

Python

Ruby

Go

🕒 2 days ago

Aya Healthcare

5001 - 10000

⚕️ Healthcare Insurance

🎯 Recruiter

Lead the SRE team at Aya Healthcare for enhancing product reliability and operational efficiency. Manage incident responses and AI-native operations for a top healthcare workforce solutions provider.

AWS

Azure

Google Cloud Platform

🕒 2 days ago

Offchain Labs

11 - 50

₿ Crypto

🌐 Web 3

Site Reliability Engineer at Offchain leading a movement in blockchain scalability and security. Tackling real-world challenges and transforming interactions with decentralized applications.

AWS

Azure

Cloud

Google Cloud Platform

Linux

Python

Shell Scripting

Go

🕒 3 days ago

BeyondTrust

1001 - 5000

🔒 Cybersecurity

Cloud Operations Engineer monitoring, maintaining, and responding to incidents for BeyondTrust Cloud Service. Collaborating across teams to ensure service health and handling cloud environments.

AWS

Azure

Cloud

Distributed Systems

Docker

JavaScript

Kubernetes

Linux

Python

Terraform