
201 - 500 Mitarbeiter
Gegründet 2007
🛍️ eCommerce
🏢 Unternehmen
💰 €5.000.000 Series A im 2012-07
Cloud Storage • eCommerce • Enterprise
Backblaze ist ein Cloud-Speicherunternehmen, das skalierbare und sichere Datensicherungslösungen sowohl für Unternehmen als auch für Privatpersonen bietet. Ihr B2 Cloud Storage-Service bietet S3-kompatiblen Objektspeicher, der es den Nutzern ermöglicht, ihre Daten mit transparenter Preisgestaltung einfach zu schützen und zu verwalten. Backblaze ist auf automatische und unbegrenzte Backup-Dienste für Computersysteme spezialisiert, um den Benutzern Datensicherheit und Wiederherstellungsoptionen zu gewährleisten, während gleichzeitig die Integration mit Anwendungen für erweiterte Funktionalitäten unterstützt wird.
🕒 vor 2 Monaten
🇺🇸 Vereinigte Staaten – Remote
💵 $150.000 - $200.000 / Jahr
⏰ Vollzeit
🟠 Senior
⛑ DevOps- und Site Reliability Engineer (SRE)
🦅 H1B-Visum-Sponsor
🗣️🇺🇸🇬🇧 Englisch erforderlich
Ansible
Distributed Systems
Docker
Grafana
Jenkins
Kubernetes
Linux
Microservices
Prometheus
Python
Terraform
Vault
Go
Verbessern Sie Ihre Chancen auf ein Vorstellungsgespräch, indem Sie Ihre Lebenslauf-Bewertung vor der Bewerbung überprüfen.

201 - 500 Mitarbeiter
Gegründet 2007
🛍️ eCommerce
🏢 Unternehmen
💰 €5.000.000 Series A im 2012-07
Cloud Storage • eCommerce • Enterprise
Backblaze ist ein Cloud-Speicherunternehmen, das skalierbare und sichere Datensicherungslösungen sowohl für Unternehmen als auch für Privatpersonen bietet. Ihr B2 Cloud Storage-Service bietet S3-kompatiblen Objektspeicher, der es den Nutzern ermöglicht, ihre Daten mit transparenter Preisgestaltung einfach zu schützen und zu verwalten. Backblaze ist auf automatische und unbegrenzte Backup-Dienste für Computersysteme spezialisiert, um den Benutzern Datensicherheit und Wiederherstellungsoptionen zu gewährleisten, während gleichzeitig die Integration mit Anwendungen für erweiterte Funktionalitäten unterstützt wird.
• Own and drive the availability, durability, and performance of critical services across all production environments. • Lead and champion complex projects from problem discovery through complete, cross-functional resolution, demonstrating high-level technical ownership. • Define, establish, and enforce service health standards, including working with engineering leadership to implement SLIs, SLOs, and error budget policies for multiple services. • Lead critical incident response and post-incident reviews, translating findings into strategic, long-term service improvements and architectural changes. • Mentor others and act as a subject matter expert in following and evolving established ITIL/OSS processes (incident, change, problem, and capacity management). • Design and architect scalable automation solutions to eliminate toil and improve the efficiency of operational tasks across the entire platform. • Drive the strategic direction of monitoring, logging, and alerting frameworks (e.g., Prometheus, Grafana, Catchpoint, ELK), and integrate them for comprehensive observability. • Build, maintain, and secure advanced CI/CD pipelines, configuration management, and complex infrastructure as code solutions (Terraform, Ansible, Jenkins). • Write production-grade code (Bash, Python, Go, etc.) to develop new reliability tools and enhance existing systems. • Act as a principal partner to engineering, product, and operations teams, consulting on resilient system design, architecture, and operation. • Lead and formalize the Production Readiness Review (PRR) process, ensuring robust operational handoff for all new services and features. • Lead capacity planning and disaster recovery strategy across critical infrastructure components. • Manage the relationship with vendors and service providers to troubleshoot systemic issues and ensure strict adherence to SLA performance. • Drive the creation of high-quality documentation, proactively share advanced learnings, and cultivate a reliability-first engineering culture across teams. • Own the creation, maintenance, and dissemination of operational playbooks, runbooks, and detailed system documentation. • Proactively identify systemic, recurring issues and architect and drive the implementation of long-term improvements and strategic design action plans. • Be a leading voice in promoting and embedding reliability-focused practices within development and operations teams.
• Bachelor’s degree in Computer Science, Engineering, or related field (or equivalent experience). • 8+ years of progressive experience in site reliability, systems engineering, or operations. • Extensive experience designing, scaling, and operating large-scale, production-grade distributed systems. • Expert-level Linux systems administration and advanced troubleshooting skills. • Lead security-minded operations, focusing on system-wide patching, hardening, and proactive vulnerability identification. • Deep mastery of service reliability concepts, including advanced monitoring, complex alerting strategy, leading incident response, and in-depth root cause analysis. • Advanced proficiency in at least one modern scripting/programming language (Python or Go strongly preferred). • Expert knowledge of incident response methodologies and operational best practices. • Proven experience designing and operating container orchestration (Kubernetes, Docker) and microservices concepts required. • Expert experience with Hashicorp products (Nomad, Vault, Terraform) in a production environment.
• Healthcare for family, including dental and vision • Competitive compensation and 401K • RSU grants for full-time employees • ESPP program • Flexible vacation policy • Maternity & paternity leave • MacBook Pro to use for work, plus a generous stipend to personalize your workstation • Childcare bonus (human children only) • Fertility treatment and support • Learning & development program • Commuter benefits • Culture that supports a healthy work-life balance
Jetzt Bewerben🕒 vor 2 Monaten
Site Reliability Engineer at Arista managing CloudVision-as-a-Service platform, ensuring global service reliability, scalability, and stability with a focus on automation and operational excellence.
🇺🇸 Vereinigte Staaten – Remote
💵 $101.000 - $161.000 / Jahr
💰 €2.600.000 Post-IPO Debt im 2015-05
⏰ Vollzeit
🟠 Senior
⛑ DevOps- und Site Reliability Engineer (SRE)
🦅 H1B-Visum-Sponsor
🗣️🇺🇸🇬🇧 Englisch erforderlich
🕒 vor 2 Monaten
Senior Site Reliability Engineer designing infrastructure primitives for decentralized networks. Collaborate on Kubernetes-based control planes and improve operational efficiency.
🗣️🇺🇸🇬🇧 Englisch erforderlich
🕒 vor 2 Monaten
Senior Site Reliability Engineer designing and implementing tools for reliable cloud infrastructure. Collaborating with teams to enhance system observability and incident response.
🗣️🇺🇸🇬🇧 Englisch erforderlich
🕒 vor 2 Monaten
Staff Software Engineer overseeing day-to-day operational support of SAP BTP applications at NBCUniversal. Collaborating with onsite teams to enhance engineering strategies and manage production deployments.
🇺🇸 Vereinigte Staaten – Remote
💵 $130.000 - $170.000 / Jahr
⏰ Vollzeit
🟠 Senior
⛑ DevOps- und Site Reliability Engineer (SRE)
🦅 H1B-Visum-Sponsor
🗣️🇺🇸🇬🇧 Englisch erforderlich
🕒 vor 2 Monaten
Senior Site Reliability Engineer at Docusign managing critical systems and driving reliability initiatives. Collaborating with teams to enhance observability and incident response for high-impact services across cloud environments.
🇺🇸 Vereinigte Staaten – Remote
💵 $157.500 - $254.350 / Jahr
⏰ Vollzeit
🟠 Senior
⛑ DevOps- und Site Reliability Engineer (SRE)
🦅 H1B-Visum-Sponsor
🗣️🇺🇸🇬🇧 Englisch erforderlich