Senior Site Reliability Engineer

🕒 February 10

Apply Now
Find Similar Remote Jobs

📊 Check your resume score for this job

Improve your chances of getting an interview by checking your resume score before you apply.

Logo of Barti

Barti

11 - 50 employees

☁️ SaaS

🤝 B2B

🤖 Artificial Intelligence

SaaS • B2B • Artificial Intelligence

<Barti> Barti is an AI-powered electronic health record (EHR) and practice management platform built specifically for eye care practices, including optometrists, ophthalmologists, and opticians. It consolidates scheduling, AI-assisted charting (scribe), messaging, billing, optical and contact lens ordering, and patient intake into a single cloud-based SaaS to reduce administrative clicks, improve workflow efficiency, and increase face time with patients. The product emphasizes HIPAA-compliant security, high uptime, and integration to replace multiple disparate systems.

📋 Description

• Lead and participate in the design, implementation, and maintenance of highly available and scalable infrastructure. • Monitor system health, performance metrics, and capacity planning to ensure optimal performance. • Establish and track SLIs, SLOs, and error budgets to measure and improve system reliability. • Design and implement Infrastructure as Code (IaC) solutions using tools like Terraform, Pulumi, or CloudFormation. • Build and maintain CI/CD pipelines to enable rapid, safe deployments. • Automate operational tasks and eliminate toil through scripting and tooling. • Lead incident response efforts, including on-call rotation, post-mortem analysis, and remediation. • Debug and resolve complex production issues across the entire stack. • Implement monitoring, alerting, and observability solutions to detect and prevent issues proactively. • Provide technical leadership and mentorship to engineers on reliability and infrastructure best practices. • Collaborate with cross-functional teams, including Engineering and Product to ensure reliable product delivery. • Lead the technical design of infrastructure solutions, ensuring alignment with architectural principles and business goals. • Stay updated with emerging technologies and industry trends in SRE, DevOps, and cloud infrastructure. • Propose and drive the adoption of best practices, tools, and processes to enhance system reliability and developer productivity. • Conduct chaos engineering experiments and disaster recovery drills to validate system resilience. • Implement and maintain security best practices across infrastructure and applications. • Manage secrets, access controls, and security monitoring systems. • Foster a collaborative environment within the engineering team and across departments. • Clearly communicate technical concepts and system health to both technical and non-technical stakeholders. • Work closely with engineering teams to define reliability requirements and ensure operational excellence.

🎯 Requirements

• 5+ years (ideally 7+) of relevant work experience in Site Reliability Engineering, DevOps, or Infrastructure roles • 1+ years of hands-on experience with either Python, Go, or Bash scripting • Experience with cloud platforms (ideally GCP) and container orchestration (Kubernetes, Docker) • Proficiency with Infrastructure as Code tools (Terraform, CloudFormation, or similar) • Strong understanding of Linux systems, networking, and distributed systems • Experience with monitoring and observability tools (Prometheus, Grafana, Datadog, or similar) • Excellent problem-solving and communication skills • Able to work independently and as part of a team

🏖️ Benefits

• Be part of a mission-driven, rapidly scaling company changing the future of eye care • Work remotely from anywhere in the U.S. • Collaborate with a passionate, fun, and supportive team • Competitive salary - $150,000 - $200,000 • Equity in a fast-growing startup • Health, vision, and dental benefits • Unlimited PTO • Annual professional development stipend • A high-impact role with plenty of room for growth, ownership, and creativity

Apply Now

Similar Jobs

🕒 February 5

StarCompliance

201 - 500

📋 Compliance

💸 Finance

☁️ SaaS

Senior / Lead Site Reliability Engineer providing technical leadership for production promotion and reliability practices at StarCompliance. Based in East Coast Time Zone, focused on a distributed SaaS platform.

🕒 February 5

Startup Talent

11 - 50

🎯 Recruiter

👥 HR Tech

🤝 B2B

DevOps Engineer managing cloud infrastructure for a crypto financial startup. Collaborating on secure digital finance solutions and enhancing blockchain technology deployments.

AWS

Azure

Cloud

Docker

Google Cloud Platform

Kubernetes

Terraform

Web3

🕒 February 4

StarCompliance

201 - 500

📋 Compliance

💸 Finance

☁️ SaaS

Site Reliability Engineer leading evolution from monolithic to microservices at StarCompliance. Enhance reliability and scalability for SaaS offerings in a regulatory compliance context.

AWS

Azure

Cloud

Prometheus

Python

Terraform

Go

🕒 February 2

Veeva Systems

1001 - 5000

☁️ SaaS

⚕️ Healthcare Insurance

💊 Pharmaceuticals

Release Engineer at Veeva Systems delivering software on AWE and Kubernetes. Responsible for testing server upgrades and ensuring quality database migrations.

AWS

Cloud

ElasticSearch

Gradle

Java

JavaScript

Jenkins

Logstash

Python

🕒 February 2

Shuru

51 - 200

🤖 Artificial Intelligence

🤝 B2B

🏢 Enterprise

Senior DevOps/SRE Engineer operating production Kubernetes and AWS infrastructure at Shuru Technologies. Ensuring reliability and performance of AI-powered product infrastructure in a fast-paced environment.

AWS

Cloud

Distributed Systems

EC2

Grafana

Jenkins

Kubernetes

Prometheus

Python

Spinnaker

Terraform

Go