Senior Site Reliability Engineer

Job not on LinkedIn

🕒 March 20

Apply Now
Find Similar Remote Jobs

📊 Check your resume score for this job

Improve your chances of getting an interview by checking your resume score before you apply.

Logo of Docusign

Docusign

5001 - 10000 employees

Founded 2003

🛍️ eCommerce

💸 Finance

☁️ SaaS

eCommerce • Finance • SaaS

Docusign is a leading provider of electronic signature technology and Intelligent Agreement Management (IAM), enabling organizations to create, manage, and secure agreements digitally. It simplifies contract lifecycle management, automates document processes, and facilitates customer experiences by transforming agreement data into actionable insights. With a trusted platform used by millions worldwide, Docusign helps businesses reduce risk, save time, and improve efficiency in various sectors, including financial services, insurance, real estate, and government.

📋 Description

• Design, implement, and operate highly available, scalable services in cloud environments (primarily Azure, with some multi‑cloud scenarios) • Define and evolve SLOs/SLIs, error budgets, and capacity strategies for owned services; use them to guide engineering trade‑offs and release decisions • Analyze patterns in incidents and outages; own long‑term reliability improvements for your domain and contribute to reliability strategy across services • Write high quality code that is easy to maintain and test • Ensure design and architecture is extensible across projects, and participate in technical design and code reviews • Identify operational toil and lead automation efforts to eliminate it—deployment, runbook, and remediation workflows that make incidents rarer and faster to resolve • Develop robust, well‑tested tooling and shared libraries that are adopted across multiple teams • Improve CI/CD pipelines and guardrails to reduce change failure rate while increasing deployment velocity • Design and implement logging, metrics, tracing, and alerting for complex distributed systems; ensure signals are actionable and aligned to business impact • Build and automate tools and solutions for incident impact analysis and effective mitigation • Participate in and often lead incident response for Sev0–Sev2 events: triage, mitigation, coordination, and clear communication • Perform and contribute to blameless post‑incident reviews, root‑cause analysis, and follow‑through on corrective actions • Work with Operations and Incident Command teams during and post incidents to drive excellence in Incident Management Process • Compose and analyze dashboard to highlight areas of the business that need attention and help drive organizational KPI • Create and respond to system generated alerts to maintain system health • Work with Operations and Engineers to fill any gaps in alerting and telemetry • Act as the primary SRE partner for one or more engineering teams—shaping architecture, reviewing designs, and embedding reliability best practices • Mentor and coach other SREs and software engineers on topics such as debugging, observability, incident management, and performance optimization • Contribute to and help standardize SRE practices, runbooks, and production readiness criteria across CPE and product teams • Work with Product Management, collaborators and other developers to understand design requirements and provide estimates for development • Learn and grow in all key technologies in Docusign and be a partner to Eng and Operations teams

🎯 Requirements

• 8+ years of experience in Site Reliability Engineering, DevOps, or Software Engineering roles with ownership of production systems at scale (or equivalent experience) • Experience coding in at least one modern language (e.g., Go, Python, C#, Java), with the ability to design, implement, test, and debug production‑grade automation and services • Practical experience operating large‑scale services in public cloud (Azure preferred; AWS/GCP acceptable with willingness to learn Azure) • Experience with Linux, networking fundamentals, and common infrastructure components (load balancers, DNS, certificates, queues, caches, databases) • Experience with Observability stacks (e.g., Prometheus/Grafana, OpenTelemetry/Chronicle, centralized logging) • Experience with CI/CD systems and deployment strategies (blue/green, canary, rolling updates) • Experience with incident management and on‑call operations for 24x7 services • Experience in building dashboards and metrics analysis

🏖️ Benefits

• Paid Time Off: earned time off, as well as paid company holidays based on region • Paid Parental Leave: take up to six months off with your child after birth, adoption or foster care placement • Full Health Benefits Plans: options for 100% employer paid and minimum employee contribution health plans from day one of employment • Retirement Plans: select retirement and pension programs with potential for employer contributions • Learning and Development: options for coaching, online courses and education reimbursements • Compassionate Care Leave: paid time off following the loss of a loved one and other life-changing events

Apply Now

Similar Jobs

🕒 March 19

Upstart

1001 - 5000

Senior Software Engineer leading technical direction and large initiatives at Upstart. Focusing on building consumer-facing systems and evolving platform architecture.

Distributed Systems

🕒 March 19

Weekday (YC W21)

11 - 50

☁️ SaaS

🎯 Recruiter

DevOps Engineer responsible for building, managing, and scaling cloud infrastructure at Weekday's clients. Focus on automating processes and ensuring system reliability and security.

AWS

Cloud

Terraform

🕒 March 19

Weekday (YC W21)

11 - 50

☁️ SaaS

🎯 Recruiter

DevOps Engineer constructing and managing cloud infrastructure for Weekday's clients. Automating deployments and ensuring system reliability in a security-oriented organization.

Cloud

Terraform

🕒 March 19

Nimble.LA

51 - 200

☁️ SaaS

DevOps Engineer working with a team of CTOs and Engineering professionals at Nimble. Overcoming challenges in product design and development.

🕒 March 19

Nimble.LA

51 - 200

☁️ SaaS

Sr DevOps Engineer improving the DevOps chain for various projects at Nimble.la. Collaborating with CTOs and Engineering professionals in a remote environment.

🗣️🇪🇸 Spanish Required

Ansible

AWS

Docker

Grafana

Linux

Node.js

Python

Terraform

Go