Staff/Principal Site Reliability Engineer

Cybersecurity • SaaS • Enterprise

Veza is a leading identity security company specializing in access management and cybersecurity. The company has developed advanced solutions like the Access Graph to visualize and control data access across all enterprise systems, enhancing productivity and security. Veza's offerings include privileged access monitoring, SaaS access security, and cloud access management. These services are designed to secure identities, manage non-human identity access, and automate identity governance. By leveraging its unique GenAI-based capabilities, Veza helps organizations reduce the risk of data breaches and ensure compliance. The platform integrates seamlessly with major cloud providers and security services to provide a comprehensive view of user and machine identities.

51 - 200 employees

Founded 2020

🔒 Cybersecurity

☁️ SaaS

🏢 Enterprise

Staff/Principal Site Reliability Engineer

Job not on LinkedIn

October 28

🇺🇸 United States – Remote

💵 $184k - $240k / year

⏰ Full Time

🔴 Lead

⛑ DevOps & Site Reliability Engineer (SRE)

🦅 H1B Visa Sponsor

AWS

Cloud

Distributed Systems

EC2

Grafana

Kubernetes

Linux

Microservices

Prometheus

Python

Terraform

Apply Now

Veza

Cybersecurity • SaaS • Enterprise

51 - 200 employees

Founded 2020

🔒 Cybersecurity

☁️ SaaS

🏢 Enterprise

📋 Description

• Lead enterprise-wide reliability and infrastructure projects across multiple teams with high autonomy • Navigate ambiguous problem spaces and deliver innovative solutions under tight deadlines • Architect and deploy solutions for Cloud Prem and SaaS customers at scale • Drive technical innovation and establish SRE best practices across the organization • Respond to critical incidents, lead root cause analysis, and implement long-term resolutions • Develop automation solutions to streamline operations and reduce manual workload • Participate in on-call rotation and ensure effective incident handoff and documentation • Partner with Engineering, Product, and Customer Success teams to align reliability goals with business objectives • Communicate complex technical concepts effectively to technical and non-technical audiences, including executives • Influence technical decisions across teams through thought leadership and demonstrated expertise • Build consensus and drive adoption of new tools, processes, and architectural patterns • Provide tier 2/3 technical support to enterprise customers for complex troubleshooting • Work directly with customer technical teams to resolve deployment, configuration, and integration challenges • Conduct technical onboarding and provide expert guidance on platform architecture and best practices • Create customer-facing documentation, troubleshooting guides, and run-books • Lead customer calls and technical discussions as a trusted advisor • Mentor SRE and engineering team members, elevating technical capabilities • Foster a culture of reliability, operational excellence, and continuous improvement

🎯 Requirements

• BS degree in Computer Science or related field (or equivalent practical experience) • 7+ years in Site Reliability Engineering, DevOps, or Infrastructure Engineering • Proven track record leading large-scale, cross-team infrastructure projects from conception to production • Demonstrated ability to work autonomously on ambiguous projects with tight deadlines • 5+ years with AWS (VPC, EC2, RDS, EKS, CloudFormation) and cloud automation • Expert-level experience with Kubernetes, Helm, Linux, and Terraform • Strong experience with GitOps model, distributed version control, and CI/CD pipelines • Proficiency with monitoring tools (Prometheus, Grafana, DataDog) • Strong programming/scripting skills (Python, Go, Bash) for automation • Deep understanding of distributed systems, microservices, and reliability patterns • Experience with Bazel and CueLang a plus

🏖️ Benefits

• Competitive salary • Equity and a competitive benefits package

Apply Now

Similar Jobs

Staff Software Engineer, DevOps/SRE

October 25

Color

501 - 1000

Staff Software Engineer leading AWS & Kubernetes infrastructure for Color Health's Virtual Cancer Clinic. Enhancing platform and developer experience with a focus on CI/CD and collaboration.

🇺🇸 United States – Remote

💵 $195k - $250k / year

⏰ Full Time

🔴 Lead

⛑ DevOps & Site Reliability Engineer (SRE)

🦅 H1B Visa Sponsor

AWS

Cloud

Kubernetes

Linux

Terraform

DevOps Manager

October 25

Abbott

10,000+ employees

⚕️ Healthcare Insurance

🧬 Biotechnology

💊 Pharmaceuticals

DevOps Manager at Abbott responsible for implementing DevOps practices and overseeing software architecture. Collaborating with teams to refine requirements and ensuring system scalability and quality outcomes.

🇺🇸 United States – Remote

💵 $97.3k - $194.7k / year

⏰ Full Time

🟠 Senior

🔴 Lead

⛑ DevOps & Site Reliability Engineer (SRE)

🦅 H1B Visa Sponsor

Azure

Cloud

Cyber Security

Jenkins

DevOps Manager

October 24

Abbott

10,000+ employees

⚕️ Healthcare Insurance

🧬 Biotechnology

💊 Pharmaceuticals

DevOps Manager overseeing implementation of DevOps practices at Abbott. Collaborating with technical teams to drive quality outcomes and client satisfaction through effective architecture design.

🇺🇸 United States – Remote

💵 $97.3k - $194.7k / year

⏰ Full Time

🟠 Senior

🔴 Lead

⛑ DevOps & Site Reliability Engineer (SRE)

🦅 H1B Visa Sponsor

Azure

Cloud

Cyber Security

Jenkins

DevOps Engineer III

October 21

Shee Atiká

201 - 500

🌍 Social Impact

DevOps Engineer III leading CAST Imaging deployment and operations in secure DoD AWS environment. Providing hands-on technical leadership for Kubernetes and cloud infrastructure teams.

🇺🇸 United States – Remote

💵 $150k - $180k / year

⏰ Full Time

🟠 Senior

🔴 Lead

⛑ DevOps & Site Reliability Engineer (SRE)

Ansible

AWS

Cloud

Jenkins

Kubernetes

Python

SDLC

Terraform

Manager, Site Reliability Engineer

October 17

SafeRide Health

501 - 1000

⚕️ Healthcare Insurance

🚗 Transport

Site Reliability Engineering Manager at SafeRide Health, overseeing technology operations and ensuring reliability of user-facing services. Leading team efforts in automation and incident management for IT infrastructure.

🇺🇸 United States – Remote

⏰ Full Time

🟠 Senior

🔴 Lead

⛑ DevOps & Site Reliability Engineer (SRE)

AWS