Senior Engineer – Site Reliability

October 7

Apply Now
Logo of CrowdStrike

CrowdStrike

Cybersecurity • SaaS • Artificial Intelligence

CrowdStrike is a cybersecurity company that provides cloud-based security services to stop breaches. It is recognized as a leader in endpoint protection, identity and cloud security, and managed detection and response. CrowdStrike's platform, Falcon, integrates artificial intelligence to offer real-time visibility, detection, and protection against sophisticated cyber threats. The company is lauded for its effectiveness in securing networks and data, making it a trusted partner for businesses worldwide.

📋 Description

• Expertise with Linux engineering and administration for thousands of bare metal servers and virtual machines • Responsible for troubleshooting server hardware issues • Responsible for all operational aspects of our platform - Availability, Latency, Throughput, Monitoring, Issue Response (analysis, remediation, deployment) and Capacity Planning with respect to Latency and Throughput • Work in a team of highly motivated engineers distributed across the globe • Use your passion for technology, automation, and tooling to ensure our platform operates 24x7 • Obsess about learning, and champion the newest technologies & tricks with others, raising the technical IQ of the team. • Have broad exposure to our entire architecture and become one of our experts in our overall process flow • Have an intrinsic drive to make things better • Have experience with modern monitoring and telemetry stacks (ELK, Prometheus, Grafana) • Gather and analyze metrics from both operating systems and applications to assist in performance tuning • Ability to lead incident analysis for incidents, champion incident response practices and assist in correlating incidents to systemic problems, and drive towards resolution.

🎯 Requirements

• Bachelors degree and/or equivalent experience in Computer Science • A minimum of 7 years of experience in software engineering • A minimum of 7 years of experience in one or more of: C++, Java, Python, Go • Experience with storage technologies (Examples: SAN, NAS, NFS, Object Storage, FreeNAS, iSCSI) • Experience with Infrastructure technologies (Examples: Linux, Windows, VMware, Docker, Kubernetes, etc.) • Experience writing technical documentation • Configuration management experience with one or more tools such as Puppet, Chef, Ansible • Solid understanding of application design, including operational trade-offs of various designs • Analytical skills coupled with a strong sense of urgency, ownership, and drive • Ability to work with well in a team-focused environment with other SREs and Engineers • Ability to broadly communicate and present recommended conventions defined by the reliability team broadly

🏖️ Benefits

• Remote-friendly and flexible work culture • Market leader in compensation and equity awards • Comprehensive physical and mental wellness programs • Competitive vacation and holidays for recharge • Paid parental and adoption leaves • Professional development opportunities for all employees regardless of level or role • Employee Networks, geographic neighborhood groups, and volunteer opportunities to build connections • Vibrant office culture with world class amenities • Great Place to Work Certified™ across the globe

Apply Now

Similar Jobs

September 22

Senior Customer Reliability Engineer solving Kubernetes and Linux deployment issues for vendors using Replicated's self-hosted application distribution platform. Provide expert support, onboarding, and on-call coverage.

🇺🇸 United States – Remote

💵 $149.5k - $192.5k / year

💰 $50M Series C on 2021-07

⏰ Full Time

🟠 Senior

Kubernetes

Linux

Go

September 17

CoreSite

201 - 500

Network Reliability Engineer advancing automation, SDN, and cloud interconnection at data center operator CoreSite. Focus on automation, observability, and mentoring engineering teams.

🇺🇸 United States – Remote

💰 $570M Private Equity Round on 2022-10

⏰ Full Time

🟡 Mid-level

🟠 Senior

AWS

Azure

Cloud

Google Cloud Platform

Switching

September 10

Tier III Customer Reliability Engineer ensuring Pager Health platform stability and resolving escalated technical incidents. Collaborate with engineering, product, and customer teams.

AWS

Azure

Cloud

Kubernetes

Python

August 30

Horizon3.ai

51 - 200

Design and operate resilient database systems across AWS; automate provisioning, backups, and monitoring while collaborating with security and product teams.

AWS

Cloud

Cyber Security

DynamoDB

EC2

Kafka

Kubernetes

NoSQL

Postgres

Redis

Terraform

Vault

August 27

Lead DevSecOps for Technical Product Management, managing platform configuration, support, and audits. Coach support teams, coordinate upgrades, and enforce DevOps security practices across enterprise applications.

Azure

Built by Lior Neu-ner. I'd love to hear your feedback — Get in touch via DM or support@remoterocketship.com