Staff Infrastructure Engineer – Observability

🔥 17 minutes ago

🇺🇸 United States – Remote

💵 $132k - $215k / year

⏰ Full Time

🔴 Lead

👷 Infrastructure Engineer

🦅 H1B Visa Sponsor

info
Apply Now
Find Similar Remote Jobs

📊 Check your resume score for this job

Improve your chances of getting an interview by checking your resume score before you apply.

Logo of SentinelOne

SentinelOne

1001 - 5000 employees

Founded 2013

🔒 Cybersecurity

🤖 Artificial Intelligence

☁️ SaaS

Cybersecurity • Artificial Intelligence • SaaS

SentinelOne is a leader in autonomous cybersecurity, known for its innovative use of AI across endpoint, cloud, and identity protection solutions. It is recognized by Gartner as a leader in the Magic Quadrant for Endpoint Protection Platforms for four consecutive years. SentinelOne's Singularity platform integrates enterprise security, offering features like AI-powered threat detection, endpoint and cloud security, vulnerability management, and threat intelligence. The company supports various industries by delivering real-time protection and operational efficiency while leveraging AI for advanced threat hunting and log analytics. With a strong focus on reducing risk and enhancing security performance, SentinelOne caters to enterprises worldwide with secure, scalable solutions.

📋 Description

• Architect and implement robust, scalable telemetry platforms that empower SentinelOne engineers to deploy and monitor features with speed, safety, and reliability. • Act as the primary Subject Matter Expert (SME) and administrator for our core observability stack, including Grafana, Prometheus, Thanos/Mimir/Cortex, and OpenTelemetry (OTEL) pipelines. • Partner strategically with diverse engineering teams across the organization to define platform requirements, ensuring the observability ecosystem evolves ahead of stakeholder needs. • Take complete ownership of critical features, from initial architectural design and requirements refinement through to production deployment and operational maturity. • Drive exemplary operational efficiency for critical observability services across AWS and GCP, meticulously balancing unwavering system reliability with smart cloud cost-optimization. • Build robust automation and self-service tooling to drastically reduce operational toil, optimize resource utilization, and minimize pager fatigue. • Drive the deployment, maintenance, and compliance of observability systems in critical, high-security environments, including FedRAMP and air-gapped deployments. • Cultivate platform transparency and reliability by rigorously implementing IaC (Terraform/Ansible) and standardizing industry best practices. • Elevate engineering quality by mentoring team members, leading comprehensive technical design and code reviews, and providing constructive feedback that fosters growth. • Lead the swift resolution of highly complex production incidents, perform thorough root-cause analyses, and participate in on-call rotations to ensure peak system integrity.

🎯 Requirements

• 8+ years experience in Infrastructure Engineering, Site Reliability Engineering (SRE), or a related systems-focused field. • 8+ years experience in architecting, scaling, and managing enterprise-grade observability stacks utilizing Prometheus, Grafana, Thanos (or Mimir/Cortex), and OpenTelemetry (OTEL). • Experience design-engineering cloud-native infrastructure within major cloud providers (AWS or GCP) and managing production Kubernetes environments (EKS, GKE). • Advanced proficiency with IaC and automation tools, specifically Terraform and Ansible, to manage immutable infrastructure. • Experience maintaining and optimizing high-throughput, large-scale distributed systems with a focus on cost-efficiency, scalability, and disaster recovery. • Demonstrated ability to lead complex technical designs, mentor other engineers, and collaborate cross-functionally with product and application teams. • US Citizenship and the ability to work in a government-regulated environment.

🏖️ Benefits

• Restricted Stock Units (RSUs) • Employee Stock Purchase Plan (ESPP) • Flexible time off • Paid company holidays and paid sick time • Gender-neutral parental leave • Grandparent leave • Medical, dental, and vision coverage • 401(k) retirement plan with company match • Life and disability insurance • Health and dependent care FSA • Voluntary benefits (hospital, accident, critical illness) • Employee Assistance Program (EAP) • ARAG pre-paid legal • Nationwide pet insurance • Cancer Care program • Global business travel medical insurance • Home office allowance • Mobile phone reimbursement • Wellness coach • Wellness/gym reimbursement • Fertility coverage • Adoption & surrogacy reimbursement

Apply Now

Similar Jobs

🔥 31 minutes ago

Render

11 - 50

☁️ SaaS

Software Engineer responsible for developing and operating compute infrastructure across multiple cloud providers. Focused on Kubernetes, container orchestration, and performance optimization.

🕒 2 days ago

General Motors

10,000+ employees

🚗 Transport

⚡ Energy

🏢 Enterprise

Staff ML Infrastructure Engineer developing and deploying machine learning solutions. Leading design and implementation of scalable platforms for autonomous vehicle behavior at General Motors.

🕒 4 days ago

CACI International Inc

10,000+ employees

🔒 Cybersecurity

Cloud Engineer developing Azure Virtual Desktop solutions as part of CACI's EITaaS program for the Air Force. Engaging in full life cycle activities from design to implementation for enterprise IT services.

🕒 4 days ago

Intersect Power

51 - 200

⚡ Energy

Electrical Engineer focused on overseeing natural gas power plant projects. Collaborating with multidisciplinary teams to advance clean energy infrastructure and manage technical deliverables.

🕒 5 days ago

Wisdom

1 - 10

⚕️ Healthcare Insurance

☁️ SaaS

🤝 B2B

Staff Software Engineer focused on Infrastructure at Wisdom providing reliability solutions to optimize dental practices leveraging advanced technology and AI.