Senior Incident Manager

Vaga não está no LinkedIn

🕒 5 dias atrás

🏄 California – Remoto

info

💵 $125.000 - $195.000 / ano

⏰ Tempo Integral

🟠 Sênior

👔 Gerente

🦅 Patrocina Visto H1B

info

🗣️🇺🇸🇬🇧 Inglês obrigatório

Candidatar-se
Encontrar Vagas Remotas Similares

📊 Verifique sua pontuação de currículo para esta vaga

Melhore suas chances de conseguir uma entrevista verificando sua pontuação de currículo antes de se candidatar.

Logo of Lambda

Lambda

51 - 200 funcionários

🤖 Inteligência Artificial

☁️ SaaS

🔧 Hardware

💰 $39.700.000 Venture Round em 2022-11

Artificial Intelligence • SaaS • Hardware

Lambda é uma empresa de computação em nuvem que fornece instâncias e clusters de GPU sob demanda, adaptados para treinamento e inferência de IA. A empresa oferece uma variedade de produtos de GPU, como instâncias de GPU na nuvem sob demanda cobradas por minuto, clusters privados de GPU em grande escala e servidores PCIe com GPUs NVIDIA Tensor Core personalizáveis. A Lambda é conhecida por sua nuvem para desenvolvedores de IA, permitindo que desenvolvedores de IA criem instâncias de GPU com foco no hardware mais recente da NVIDIA. A empresa também oferece produtos de workstation configurados com GPUs NVIDIA, projetados para deep learning e outras aplicações de IA.

Descrição

• Lead the response to critical (SEV-1 / SEV-2) incidents impacting AI infrastructure, GPU clusters, networking, storage, and data center operations. • Serve as the Incident Commander during major outages, coordinating engineering, networking, facilities, and vendor teams. • Act as the liaison between leadership and external teams during incidents/post-incidents to provide updates and status summaries. • Own the incident response lifecycle including: • - Assisting Technical Triage • - Escalation • - Coordination • - Resolution • Ensure timely and accurate communication with internal stakeholders and leadership. • Maintain incident response documentation and operational playbooks. • Conduct analysis on incidents and identify patterns/trends for improvement in response and systems reliability. • Work in an On-Call Rotation to respond to, lead, and coordinate incidents • Drive alignment during outages involving multiple infrastructure layers. • Lead post-incident reviews (PIRs) and root cause analysis. Identify systemic reliability gaps and implement corrective actions.

🎯 Requisitos

• 8+ years experience in incident management, site reliability engineering, or infrastructure operations • Experience managing incidents in large-scale distributed infrastructure environments • Strong understanding of: • - Data center operations • - GPU compute clusters • - Networking and storage infrastructure • - Cloud or hybrid infrastructure platforms • Proven ability to lead high-pressure incident response situations • Experience with incident management frameworks (ITIL, SRE, or equivalent) • Excellent communication and stakeholder management skills • Experience with incident tracking and monitoring tools such as: • - PagerDuty • - ServiceNow • - Jira • - Datadog • - Prometheus / Grafana

🏖️ Benefícios

• Health, dental, and vision coverage for you and your dependents • Wellness and commuter stipends for select roles • 401k Plan with 2% company match (USA employees) • Flexible paid time off plan that we all actually use

Candidatar-se

Vagas Similares

🕒 5 dias atrás

Gray

1001 - 5000

🤝 B2B

🛍️ Comércio Eletrônico

📱 Mídia

Site Quality Manager overseeing quality control for construction projects across multiple US locations. Responsibilities include managing quality programs and conducting inspections/audits for compliance.

🗣️🇺🇸🇬🇧 Inglês obrigatório

🕒 5 dias atrás

Airbnb

5001 - 10000

👥 B2C

🛍️ Comércio Eletrônico

Platform Product Manager leading product vision and strategy for equity products at Airbnb. Collaborating with cross-functional teams to enhance safety and integrity for Airbnb users.

🇺🇸 Estados Unidos – Remoto (EUA)

💵 $179.000 - $207.000 / ano

💰 Post-IPO Equity em 2020-12

⏰ Tempo Integral

🟠 Sênior

👔 Gerente

🦅 Patrocina Visto H1B

info

🗣️🇺🇸🇬🇧 Inglês obrigatório

🕒 5 dias atrás

LoanCare

1001 - 5000

💸 Finanças

👥 B2C

🏠 Imobiliário

Escrow Manager leading escrow operations including staff management at LoanCare. Responsible for vendor oversight and regulatory compliance in mortgage servicing.

🇺🇸 Estados Unidos – Remoto (EUA)

💵 $64.800 - $121.500 / ano

⏰ Tempo Integral

🟠 Sênior

🔴 Especialista

👔 Gerente

🗣️🇺🇸🇬🇧 Inglês obrigatório

🕒 5 dias atrás

Theoria Medical

1001 - 5000

⚕️ Seguro de Saúde

☁️ SaaS

📡 Telecomunicações

Manager of Revenue Cycle Management overseeing billing and collections at Theoria Medical. Responsible for optimizing revenue cycle operations and communication with stakeholders.

🗣️🇺🇸🇬🇧 Inglês obrigatório

🕒 5 dias atrás

Össur

1001 - 5000

🧬 Biotecnologia

⚕️ Seguro de Saúde

🔬 Ciência

Area Manager responsible for sales growth of Össur's bracing & support products. Develops strategies and relationships to achieve sales goals in Arizona, Utah, and New Mexico.

🇺🇸 Estados Unidos – Remoto (EUA)

💵 $68.518 - $84.630 / ano

💰 Grant em 2022-06

⏰ Tempo Integral

🟡 Pleno

🟠 Sênior

👔 Gerente

🗣️🇺🇸🇬🇧 Inglês obrigatório