Systems Reliability Engineer

Job not on LinkedIn

🕒 April 6

Apply Now
Find Similar Remote Jobs

📊 Check your resume score for this job

Improve your chances of getting an interview by checking your resume score before you apply.

Logo of Arkenstone Defense

Arkenstone Defense

51 - 200 employees

🔒 Cybersecurity

📋 Compliance

🏛️ Government

Cybersecurity • Compliance • Government

Arkenstone Defense is a company that builds foundational operating infrastructure enabling technology companies to operate inside the U. S. national security and regulated federal markets. It embeds operational discipline, continuous compliance, and accountable execution—aligned to CMMC Level II, NIST 800-171, and DFARS 7012—into customer environments so defense, dual-use, and venture-backed teams can move at mission speed without adding prime-level overhead. Arkenstone focuses on secure operations, audit-ready evidence, identity and access, monitoring, and encryption to make regulated execution scalable and repeatable.

📋 Description

• Design, implement, and own the infrastructure reliability strategy across AWS, Azure, and GCP • Champion observability by developing and maintaining effective logging, monitoring, and alerting systems • Lead efforts in performance tuning, system hardening, capacity planning, and disaster recovery • Own the incident management lifecycle: from detection to postmortem and root cause analysis • Automate deployment, scaling, and recovery workflows to reduce manual toil • Contribute to infrastructure as code (Terraform, ARM templates, CloudFormation, etc.) • Act as a mentor and technical leader to junior engineers and cross-functional partners.

🎯 Requirements

• 5+ years of experience in SRE, DevOps, or infrastructure engineering roles • Proven track record of operating large-scale systems in multi-cloud environments • Strong knowledge of cloud-native architecture, container orchestration (e.g., Kubernetes), and CI/CD pipelines • Proficient in scripting (Python, Bash, etc.) and infrastructure automation tools • Experience with monitoring/observability platforms (e.g., Prometheus, Grafana, Datadog, ELK, etc.) • Excellent problem-solving skills and a bias toward ownership and action • Comfortable making decisions under pressure and leading through incidents • Working knowledge of FedRAMP or NIST 800-53 controls preferred • Comfortable participating in customer discussions • Clear communicator who can translate technical concepts to mixed audiences.

🏖️ Benefits

• Competitive Salary: Recognizing your hard work with attractive compensation and rewarding excellence. • Health and Wellness Programs: Including medical, dental, and vision insurance options, along with mental health support and wellness initiatives. • Retirement Planning: Secure your future with our flexible 401(k) plan and matching company contributions. • Paid Time Off & Holidays: Generous PTO, sick leave, and holiday pay to help you recharge and enjoy life outside of work. • Employee Assistance Program: Confidential resources for personal and professional support. • Professional Development: Access to training, certifications, and continuing education to foster your career growth.

Apply Now

Similar Jobs

🕒 April 4

New Charter Technologies

501 - 1000

🔒 Cybersecurity

DevSecOps Engineer providing security for internal tooling and public-facing SaaS product at New Charter Technologies. Collaborating with engineering and information security teams to drive secure development practices.

Azure

Cloud

Docker

Python

Vault

Go

🕒 April 3

Mercury

201 - 500

💳 Fintech

💸 Finance

☁️ SaaS

Engineering Manager leading the Release Engineering team to improve CI/CD processes at Mercury. Focused on building a culture of operational excellence in a fast-paced environment.

🕒 April 3

PhoenixTeam

51 - 200

💳 Fintech

🏠 Real Estate

Release Engineer supporting deployment and release management for cloud-based platform leveraging Salesforce and AWS. Collaborating with cross-functional teams to ensure system stability and compliance.

AWS

Azure

Cloud

Jenkins

🕒 April 3

Avive Solutions Inc.

11 - 50

⚕️ Healthcare Insurance

🔧 Hardware

DevOps Engineer for Avive Solutions, building cloud infrastructure to revolutionize cardiac arrest responses. Collaborate cross-functionally to optimize systems for high-impact healthcare technology.

AWS

Cloud

Docker

Kubernetes

Linux

Python

Terraform

🕒 April 3

Runlayer

11 - 50

🤖 Artificial Intelligence

🔒 Cybersecurity

☁️ SaaS

Site Reliability Engineer ensuring performance and scalability of Runlayer’s AI infrastructure. Collaborating with founders and engineers in a fast-paced environment to support cloud and on-prem setups.

AWS

Cloud

Google Cloud Platform

Kubernetes

Python