Principal Site Reliability Engineer, SRE

Job not on LinkedIn

🔥 0 minutes ago

Apply Now
Find Similar Remote Jobs

📊 Check your resume score for this job

Improve your chances of getting an interview by checking your resume score before you apply.

Logo of SoluStaff

SoluStaff

51 - 200 employees

🎯 Recruiter

👥 HR Tech

Recruitment • HR Tech • Consulting

Symmetrio is a full-service recruiting, staffing, and consulting company with decades of experience across various high-growth industries. They specialize in permanent placement, contract-to-hire, and staff augmentation, focusing on aligning talent with client goals and cultures. Symmetrio offers tailored recruitment solutions and advisory services in sectors such as life sciences, information technology, engineering, medical devices, logistics solutions, manufacturing, and building automation. Their team of talent acquisition experts is committed to understanding clients’ organizational needs and delivering specialized professionals to meet these challenges. With a strong emphasis on trust, understanding, and collaboration, Symmetrio aims to optimize operations and drive innovation and excellence for their clients.

📋 Description

• Serve as the primary technical owner for production reliability across U.S. customer environments. • Investigate and resolve complex issues spanning web applications, APIs, backend services, data pipelines, cloud infrastructure, and customer integrations. • Lead production incident response efforts, coordinating cross-functional teams to restore service and minimize customer impact. • Perform root cause analysis and drive corrective actions that improve long-term system stability and resilience. • Partner with software engineering and platform teams to identify recurring reliability risks and implement sustainable solutions. • Design, configure, and validate secure customer connectivity solutions including Site-to-Site VPNs, Transit Gateway integrations, routing configurations, and secure network paths. • Support customer onboarding initiatives by troubleshooting connectivity challenges and ensuring consistent implementation processes. • Enhance platform observability through improvements in monitoring, logging, alerting, tracing, and operational dashboards. • Contribute to CI/CD, infrastructure automation, and deployment processes that improve release safety and operational consistency. • Develop operational tooling that supports incident response, troubleshooting, onboarding, and system monitoring activities. • Collaborate with engineering leadership to improve cloud architecture, scalability, security, and operational readiness. • Partner with customer-facing teams to communicate technical issues, remediation plans, and reliability improvements in a clear and effective manner. • Support compliance, security, and risk management initiatives within highly regulated healthcare environments.

🎯 Requirements

• 6+ years of hands-on experience supporting and managing AWS-based production environments. • 4+ years of experience supporting web applications and backend services (Python/Django experience strongly preferred). • Experience with AWS networking technologies including VPCs, Site-to-Site VPNs, Transit Gateways, routing, NAT gateways, and security groups. • Strong experience with Terraform and infrastructure-as-code deployment practices. • Experience with containerized environments including ECS, Fargate, Kubernetes, or similar technologies. • Experience building and supporting CI/CD pipelines and release automation processes. • Familiarity with monitoring and observability platforms such as Datadog, CloudWatch, Sentry, Grafana, or similar tools. • Experience leading production incidents, outage management, and root cause analysis initiatives. • Exposure to Windows Server environments, Active Directory, Kerberos, and enterprise infrastructure concepts is preferred. • Healthcare technology, healthcare SaaS, clinical software, or other regulated industry experience is highly preferred. • Bachelor’s degree in Computer Science, Engineering, Information Technology, or a related technical field preferred.

🏖️ Benefits

• Health Care Plan (Medical, Dental & Vision) • Retirement Plan (401k, IRA) • Paid Time Off (Vacation, Sick & Public Holidays)

Apply Now

Similar Jobs

🔥 13 hours ago

NBCUniversal

10,000+ employees

📱 Media

Staff Software Engineer for NBCUniversal, focusing on Release Engineering tools and systems for software delivery. Collaborate across engineering teams to enhance reliable software practices.

🔥 14 hours ago

HBK - Hottinger Brüel & Kjær

1001 - 5000

🚀 Aerospace

⚡ Energy

Software Architect leading architectural direction on DevOps/AI/LLM technologies for ReliaSoft's cloud and desktop products. Collaborating with teams to enhance product capabilities and modernize systems.

🕒 Yesterday

Kong Inc.

201 - 500

🔌 API

☁️ SaaS

🏢 Enterprise

Staff SRE for Project Volcano developing reliability standards and infrastructure at Kong. Collaborate to ensure high performance and data management in a greenfield environment.

🇺🇸 United States – Remote

💵 $140k - $197k / year

💰 $100M Series D on 2021-02

⏰ Full Time

🔴 Lead

⛑ DevOps & Site Reliability Engineer (SRE)

🕒 Yesterday

Lyric - Clarity in motion.

201 - 500

⚕️ Healthcare Insurance

💳 Fintech

☁️ SaaS

Staff Azure DevOps Engineer managing Azure infrastructure for healthcare technology company. Focused on security, reliability, and scalability in a cloud-based environment.

🕒 Yesterday

Rocket Mortgage

10,000+ employees

💸 Finance

💳 Fintech

🏠 Real Estate

Director of Data Reliability Engineering leading reliability, observability, and operational maturity for enterprise data platforms. Focused on shaping future-state data infrastructure at Rocket.