Principal Site Reliability Engineer, SRE

Emploi pas sur LinkedIn

🕒 il y a 3 mois

🗣️🇺🇸🇬🇧 Anglais requis

Postuler Maintenant
Trouver des Emplois à Distance Similaires

📊 Vérifiez votre score de CV pour ce poste

Améliorez vos chances d'obtenir un entretien en vérifiant votre score de CV avant de postuler.

Logo of InfiniteChoice

InfiniteChoice

11 - 50 employés

Fondée en 2015

🛍️ eCommerce

🤖 Intelligence artificielle

🤝 B2B

eCommerce • Artificial Intelligence • B2B

InfiniteChoice est une entreprise plateforme qui développe et fait croître des startups et des entreprises de consommation à forte croissance en combinant capital, expertise opérationnelle et propriété intellectuelle. La société met l'accent sur l'exécution automatisée d'abord et un écosystème dirigé par l'IA pour accélérer le temps de mise à l'échelle des entreprises avec un ajustement clair sur le marché des produits, en se concentrant sur le lancement et l'optimisation de marques de commerce électronique et de plateformes clients. Soutenue par le capital-investissement et dirigée par des opérateurs expérimentés, InfiniteChoice déploie des capitaux stratégiques, la technologie, et des talents opérationnels pour stimuler une croissance rentable et à fort taux de marge à travers son portefeuille.

Description

• Build SRE practices from scratch - define SLIs, SLOs, error budgets, and reliability metrics • Establish incident response procedures, on-call rotations, and post-mortem processes • Create reliability engineering standards and best practices across all engineering teams • Develop disaster recovery and business continuity strategies • Design and implement capacity planning and performance optimization frameworks • Drive architecture decisions for comprehensive application and infrastructure monitoring solutions • Design and develop custom SRE tools for automated monitoring, alerting, and remediation • Build observability platforms that provide deep insights into system performance and user experience • Create automation frameworks for deployment, scaling, and incident response • Architect logging, metrics, and tracing systems for distributed microservices environments • Leverage Google Cloud Platform services to build resilient, scalable infrastructure • Implement cloud-native monitoring using Stackdriver, Cloud Monitoring, and Cloud Logging • Design auto-scaling and self-healing systems using GKE, Cloud Functions, and managed services

🎯 Exigences

• 12+ years of experience in Site Reliability Engineering or Infrastructure Engineering • 5+ years in lead SRE roles building and scaling SRE teams and processes • Proven track record designing and implementing monitoring and observability solutions at scale • Deep understanding of distributed systems, microservices architectures, and cloud-native patterns • Experience with infrastructure as code, configuration management, and deployment automation • Hands-on experience with Google Cloud Platform is required • Expertise with GCP monitoring and observability stack (Cloud Monitoring, Cloud Logging, Cloud Trace) • Experience with GKE, Compute Engine, Cloud Functions, and other core GCP services • Bachelor's degree in Computer Science, Engineering, or equivalent professional experience • Industry certifications (Google Cloud Professional, SRE or related certifications preferred)

🏖️ Avantages

• Ground-floor opportunity to build SRE practices and culture from scratch • Full autonomy to define processes, select technologies, and establish best practices • Direct impact on platform reliability serving millions of users • Opportunity to create lasting engineering culture and operational excellence • Remote-first culture with in-person meeting in Dallas, TX on need basis • Collaborative environment with smart, passionate engineers and cross-functional teams • Access to cutting-edge technologies and AI-driven development tools • Competitive compensation, equity participation, and comprehensive benefits

Postuler Maintenant

Emplois Similaires

🕒 il y a 5 mois

PathAI

501 - 1000

🤖 Intelligence artificielle

⚕️ Assurance santé

🧬 Biotechnologie

Staff Site Reliability Engineer designing and operating a hybrid cloud environment at PathAI. Focused on implementing SRE best practices and enhancing infrastructure reliability.

🇺🇸 États-Unis – Télétravail

💵 $165 750 - $224 450 / an

💰 €165 000 000 Series C en 2021-05

⏰ Temps Plein

🔴 Expert

⛑ Ingénieur DevOps & SRE

🦅 Parrain de Visa H1B

info

🗣️🇺🇸🇬🇧 Anglais requis

🕒 il y a 5 mois

Upshop

51 - 200

☁️ SaaS

🛒 Commerce de détail

🛍️ eCommerce

SRE / DevOps Manager at Upshop leading reliability and operations engineering team. Responsible for scalability, security, and performance of infrastructure.

🇺🇸 États-Unis – Télétravail

⏰ Temps Plein

🟠 Senior

🔴 Expert

⛑ Ingénieur DevOps & SRE

🗣️🇺🇸🇬🇧 Anglais requis

🕒 il y a 7 mois

FloSports

201 - 500

Staff SRE at FloSports improving developer enablement and migrating infrastructure to AWS. Leading technical architecture and critical tooling development with a focus on reliability and automation.

🇺🇸 États-Unis – Télétravail

⏰ Temps Plein

🔴 Expert

⛑ Ingénieur DevOps & SRE

🗣️🇺🇸🇬🇧 Anglais requis

🕒 il y a 7 mois

CloudScouts

11 - 50

🤝 B2B

🏢 Entreprise

💸 Finance

AWS DevOps Engineer designing cloud-native applications for SAP S/4HANA processes. Optimizing AWS cost/performance in fully remote work environment.

🇺🇸 États-Unis – Télétravail

⏰ Temps Plein

🟠 Senior

🔴 Expert

⛑ Ingénieur DevOps & SRE

🗣️🇺🇸🇬🇧 Anglais requis

🕒 il y a 8 mois

Veeva Systems

1001 - 5000

☁️ SaaS

⚕️ Assurance santé

💊 Pharmaceutique

Lead migration and build scalable AWS infrastructure; own CI/CD and DevOps tooling at Veeva, a life sciences cloud company

🗣️🇺🇸🇬🇧 Anglais requis