Senior Incident Manager

Stelle nicht auf LinkedIn

🕒 vor 3 Tagen

🏄 California – Remote

info

💵 $125.000 - $195.000 / Jahr

⏰ Vollzeit

🟠 Senior

👔 Manager

🦅 H1B-Visum-Sponsor

info

🗣️🇺🇸🇬🇧 Englisch erforderlich

Jetzt Bewerben
Ähnliche Remote-Jobs finden

📊 Überprüfen Sie Ihre Lebenslauf-Bewertung für diese Stelle

Verbessern Sie Ihre Chancen auf ein Vorstellungsgespräch, indem Sie Ihre Lebenslauf-Bewertung vor der Bewerbung überprüfen.

Logo of Lambda

Lambda

51 - 200 Mitarbeiter

🤖 Künstliche Intelligenz

☁️ SaaS

🔧 Hardware

💰 €39.700.000 Venture Round im 2022-11

Artificial Intelligence • SaaS • Hardware

Lambda ist ein Cloud-Computing-Unternehmen, das bedarfsgerechte GPU-Instanzen und -Cluster für das Training und die Inferenz von KI bereitstellt. Es bietet eine Vielzahl von GPU-Produkten an, darunter bedarfsgerechte Cloud-GPU-Instanzen, die minutenweise abgerechnet werden, private großskalige GPU-Cluster und PCIe-Server mit anpassbaren NVIDIA Tensor Core GPUs. Lambda ist bekannt für seine AI-Entwickler-Cloud, die es KI-Entwicklern ermöglicht, GPU-Instanzen mit einem Fokus auf die neueste Hardware von NVIDIA zu nutzen. Das Unternehmen bietet zudem Workstation-Produkte an, die mit NVIDIA GPUs für Deep Learning und andere KI-Anwendungen konfiguriert sind.

Beschreibung

• Lead the response to critical (SEV-1 / SEV-2) incidents impacting AI infrastructure, GPU clusters, networking, storage, and data center operations. • Serve as the Incident Commander during major outages, coordinating engineering, networking, facilities, and vendor teams. • Act as the liaison between leadership and external teams during incidents/post-incidents to provide updates and status summaries. • Own the incident response lifecycle including: • - Assisting Technical Triage • - Escalation • - Coordination • - Resolution • Ensure timely and accurate communication with internal stakeholders and leadership. • Maintain incident response documentation and operational playbooks. • Conduct analysis on incidents and identify patterns/trends for improvement in response and systems reliability. • Work in an On-Call Rotation to respond to, lead, and coordinate incidents • Drive alignment during outages involving multiple infrastructure layers. • Lead post-incident reviews (PIRs) and root cause analysis. Identify systemic reliability gaps and implement corrective actions.

🎯 Anforderungen

• 8+ years experience in incident management, site reliability engineering, or infrastructure operations • Experience managing incidents in large-scale distributed infrastructure environments • Strong understanding of: • - Data center operations • - GPU compute clusters • - Networking and storage infrastructure • - Cloud or hybrid infrastructure platforms • Proven ability to lead high-pressure incident response situations • Experience with incident management frameworks (ITIL, SRE, or equivalent) • Excellent communication and stakeholder management skills • Experience with incident tracking and monitoring tools such as: • - PagerDuty • - ServiceNow • - Jira • - Datadog • - Prometheus / Grafana

🏖️ Vorteile

• Health, dental, and vision coverage for you and your dependents • Wellness and commuter stipends for select roles • 401k Plan with 2% company match (USA employees) • Flexible paid time off plan that we all actually use

Jetzt Bewerben

Ähnliche Jobs

🕒 vor 3 Tagen

Gray

1001 - 5000

🤝 B2B

🛍️ eCommerce

📱 Medien

Site Quality Manager overseeing quality control for construction projects across multiple US locations. Responsibilities include managing quality programs and conducting inspections/audits for compliance.

🗣️🇺🇸🇬🇧 Englisch erforderlich

🕒 vor 3 Tagen

Airbnb

5001 - 10000

👥 B2C

🛍️ eCommerce

Platform Product Manager leading product vision and strategy for equity products at Airbnb. Collaborating with cross-functional teams to enhance safety and integrity for Airbnb users.

🇺🇸 Vereinigte Staaten – Remote

💵 $179.000 - $207.000 / Jahr

💰 Post-IPO Equity im 2020-12

⏰ Vollzeit

🟠 Senior

👔 Manager

🦅 H1B-Visum-Sponsor

info

🗣️🇺🇸🇬🇧 Englisch erforderlich

🕒 vor 3 Tagen

LoanCare

1001 - 5000

💸 Finanzen

👥 B2C

🏠 Immobilien

Escrow Manager leading escrow operations including staff management at LoanCare. Responsible for vendor oversight and regulatory compliance in mortgage servicing.

🇺🇸 Vereinigte Staaten – Remote

💵 $64.800 - $121.500 / Jahr

⏰ Vollzeit

🟠 Senior

🔴 Experte

👔 Manager

🗣️🇺🇸🇬🇧 Englisch erforderlich

🕒 vor 3 Tagen

Theoria Medical

1001 - 5000

⚕️ Krankenversicherung

☁️ SaaS

📡 Telekommunikation

Manager of Revenue Cycle Management overseeing billing and collections at Theoria Medical. Responsible for optimizing revenue cycle operations and communication with stakeholders.

🗣️🇺🇸🇬🇧 Englisch erforderlich

🕒 vor 3 Tagen

Össur

1001 - 5000

🧬 Biotechnologie

⚕️ Krankenversicherung

🔬 Wissenschaft

Area Manager responsible for sales growth of Össur's bracing & support products. Develops strategies and relationships to achieve sales goals in Arizona, Utah, and New Mexico.

🇺🇸 Vereinigte Staaten – Remote

💵 $68.518 - $84.630 / Jahr

💰 Grant im 2022-06

⏰ Vollzeit

🟡 Mittelstufe

🟠 Senior

👔 Manager

🗣️🇺🇸🇬🇧 Englisch erforderlich