Software Architect, Reliability Engineering

🕒 vor 3 Monaten

🗣️🇺🇸🇬🇧 Englisch erforderlich

Jetzt Bewerben
Ähnliche Remote-Jobs finden

📊 Überprüfen Sie Ihre Lebenslauf-Bewertung für diese Stelle

Verbessern Sie Ihre Chancen auf ein Vorstellungsgespräch, indem Sie Ihre Lebenslauf-Bewertung vor der Bewerbung überprüfen.

Logo of Twilio

Twilio

5001 - 10000 Mitarbeiter

Millionen von Entwicklern weltweit haben Twilio genutzt, um die Magie der Kommunikation zu entfalten und jede Menschenerfahrung zu verbessern. Twilio hat Kommunikationskanäle wie Sprache, Text, Chat, Video und E-Mail demokratisiert, indem es die Kommunikationsinfrastruktur der Welt durch APIs virtualisiert hat, die einfach genug für jeden Entwickler zu verwenden sind, aber robust genug, um die anspruchsvollsten Anwendungen der Welt zu unterstützen. Indem Kommunikation ein Teil des Werkzeugkastens jedes Softwareentwicklers wird, ermöglicht Twilio Innovatoren in jeder Branche - von aufstrebenden Führungskräften bis hin zu den größten Organisationen der Welt -, neu zu definieren, wie Unternehmen mit ihren Kunden interagieren. Gegründet im Jahr 2008, beschäftigt Twilio über 5.000 Mitarbeiter in 26 Büros in 17 Ländern, mit Hauptsitz in San Francisco und weiteren Büros in Atlanta, Bangalore, Berlin, Bogotá, Denver, Dublin, Paris, Prag, Hongkong, Irvine, London, Madrid, München, Malmö, Mountain View, Redwood City, New York City, São Paulo, Sydney, Melbourne, Singapur, Tallinn und Tokio.

Beschreibung

• Partner with senior technical leaders across Twilio to set and communicate the reliability strategy, translating business goals into measurable outcomes. • Influence company-wide architectural decisions while balancing long-term vision with near-term and compliance needs. • Lead the design, implementation, and operation of scalable solutions and paved roads that enable reliable, high-traffic services; • Influence company-wide architectural decisions to focus on availability, performance, resilience, and cost efficiency using Kubernetes, AWS, Terraform, and modern observability. • Ensure integrity and quality across the service lifecycle; design fault-tolerant architectures, incident response, disaster recovery, and capacity/cost management. • Collaborate with product and cross-functional teams to identify reliability risks and convert them into actionable designs, programs, and tooling. • Establish and champion reliability practices and drive systemic improvements. • Mentor and grow engineers and technical leaders • Track and apply emerging SRE, cloud, and large-scale systems best practices; introduce pragmatic innovations that improve reliability at scale.

🎯 Anforderungen

• 15+ years of experience in Reliability Engineering, Software Engineering, DevOps roles with a focus on infrastructure, backend systems, and reliability, including as a principal/architect. • Strong experience in driving strategic technical decisions and defining long-term technical vision. • In-depth understanding of the role of Reliability Engineering in a large and diverse SaaS organization. • Experience driving cross-org technical architecture outcomes. • Knowledge of cloud architecture, devops practices, and large-scale systems design with microservices. • Bachelor's or Master's degree in Computer Science, Engineering, or a related field (or equivalent experience). • Strong production experience, including operational management, scaling, partitioning strategies, and tuning for performance and reliability in high-scale environments. • Hands-on experience with Kubernetes (e.g., EKS), deploying and managing stateful services, and cloud services like AWS. • Proficiency in infrastructure-as-code tools such as Terraform or CloudFormation for automating infrastructure. • Expertise in observability tools (e.g., Prometheus, Grafana, Datadog) for monitoring distributed systems and setting up alerting. • Proficient in at least one programming language (e.g., Go, Python, Java) for building automation and tooling. • Experience designing incident response processes, SLOs/SLIs, runbooks, and participating in on-call rotations. • Experience running cross-functional post-incident reviews and driving improvements. • Strong understanding of distributed systems principles, including consensus, durability, throughput, and availability tradeoffs. • Proven track record of leading reliability improvements in data-intensive or mission-critical systems and collaborating with engineering teams. • Excellent problem-solving, analytical, verbal, and written communication skills, with the ability to work in cross-functional and distributed environments. • Demonstrated leadership in mentoring teams, influencing decisions, and balancing long-term objectives with short-term needs. • Ability to influence and build effective working relationships with all levels of the organization.

🏖️ Vorteile

• health care insurance • 401(k) retirement account • paid sick time • paid personal time off • paid parental leave

Jetzt Bewerben

Ähnliche Jobs

🕒 vor 3 Monaten

Knox Systems, Inc.

201 - 500

🏛️ Regierung

🔒 Cybersecurity

📋 Compliance

Devops Security Engineer at Knox securing cloud-native environments for U.S. government missions. Focus on preventative security, automation, and continuous compliance within FedRAMP frameworks.

🇺🇸 Vereinigte Staaten – Remote

💵 $110.000 - $140.000 / Jahr

🔥 Finanzierung im letzten Jahr

💰 €6.500.000 Seed im 2025-08

⏰ Vollzeit

🟡 Mittelstufe

🟠 Senior

⛑ DevOps- und Site Reliability Engineer (SRE)

🗣️🇺🇸🇬🇧 Englisch erforderlich

🕒 vor 3 Monaten

JFrog

1001 - 5000

🏢 Unternehmen

☁️ SaaS

🔐 Sicherheit

Senior Professional Services DevOps Engineer designing CI/CD pipelines at JFrog. Collaborating with clients and teams to enhance DevOps experience.

🗣️🇺🇸🇬🇧 Englisch erforderlich

🕒 vor 3 Monaten

Nick AI

1 - 10

🤖 Künstliche Intelligenz

₿ Crypto

☁️ SaaS

Backend/DevOps Engineer managing deployments and infrastructure for AI trading platform. Responsible for security, reliability, and scaling of systems across multiple venues.

🇺🇸 Vereinigte Staaten – Remote

⏰ Vollzeit

🟡 Mittelstufe

🟠 Senior

⛑ DevOps- und Site Reliability Engineer (SRE)

🗣️🇺🇸🇬🇧 Englisch erforderlich

🕒 vor 3 Monaten

WorkOS

51 - 200

🔌 API

🏢 Unternehmen

🤝 B2B

Site Reliability Engineer ensuring reliability and performance at WorkOS across complex systems. Leading incident response and collaborating with cross-functional teams for operational excellence.

🇺🇸 Vereinigte Staaten – Remote

💵 $175.000 - $275.000 / Jahr

💰 €80.000.000 Series B - WorkOS im 2022-05

⏰ Vollzeit

🟡 Mittelstufe

🟠 Senior

⛑ DevOps- und Site Reliability Engineer (SRE)

🦅 H1B-Visum-Sponsor

info

🗣️🇺🇸🇬🇧 Englisch erforderlich

🕒 vor 3 Monaten

Vultr

201 - 500

🤖 Künstliche Intelligenz

🤝 B2B

🔧 Hardware

Senior Site Reliability Engineer at Vultr ensuring performance and reliability of cloud services for 1.5 million users. Focused on large-scale systems and infrastructure automation.

🇺🇸 Vereinigte Staaten – Remote

💵 $120.000 - $130.000 / Jahr

🔥 Finanzierung im letzten Jahr

💰 €329.000.000 Debt Financing - Vultr im 2025-06

⏰ Vollzeit

🟠 Senior

⛑ DevOps- und Site Reliability Engineer (SRE)

🗣️🇺🇸🇬🇧 Englisch erforderlich