Principal DevOps Engineer

12 hours ago

Apply Now
Logo of Center for Internet Security

Center for Internet Security

Cybersecurity • Compliance • Artificial Intelligence

Center for Internet Security is an organization dedicated to enhancing the cybersecurity posture of businesses and individuals. It provides resources and tools focused on improving baseline cybersecurity measures, encouraging adherence to best practices, and ensuring compliance with privacy regulations. Their mission involves promoting improved security and resilience through collaboration and the sharing of knowledge and resources.

📋 Description

• Architect and implement secure, production-grade EKS clusters using infrastructure-as-code (IaC) and GitOps principles • Integrate and configure open-source tools including ArgoCD (GitOps), Kyverno (policy enforcement), Karpenter (autoscaling), and the Grafana stack (monitoring and observability) • Ensure security best practices are applied across all infrastructure components, including IAM, network policies, secrets management, and container runtime configurations • Design and enforce Kubernetes security policies, RBAC, and network segmentation using tools like Kyverno and AWS-native controls • Collaborate with Product and Platform teams to ensure infrastructure meets performance, reliability, and compliance requirements • Build and maintain CI/CD pipelines with embedded security checks, vulnerability scanning, and policy validation • Develop reusable Terraform modules and Helm charts that enforce secure defaults and compliance standards • Monitor and troubleshoot production workloads, ensuring high availability, performance, and security posture • Participate in an on-call rotation to support production systems and respond to incidents • Advocate for DevSecOps principles and mentor engineers on secure cloud-native tooling and automation • Evaluate emerging technologies and make strategic recommendations to leadership, with a focus on security and operational excellence • Document architecture decisions, operational runbooks, and incident response procedures with a security-first mindset • Other tasks and responsibilities as assigned

🎯 Requirements

• Bachelor’s degree in Computer Science, Engineering, or related field* • 8+ years of experience in DevOps, site reliability engineering, or cloud infrastructure roles • Deep expertise with Kubernetes (preferably EKS) in production environments • Hands-on experience with ArgoCD, Karpenter, Prometheus, Grafana, Loki, and Tempo • Proficiency in Terraform and Helm for infrastructure and application deployment • Strong understanding of GitOps workflows and CI/CD pipeline design • Experience with AWS services including IAM, VPC, EC2, S3, and CloudWatch • Solid grasp of container security, Kubernetes RBAC, and policy-as-code (PaC) • Excellent troubleshooting skills across infrastructure, networking, and application layers • Strong communication skills and ability to work effectively with remote teams • Must be authorized to work in the United States • * Additional years of relevant experience or a combination of an Associate’s degree or equivalent and relevant experience may be substituted for the Bachelor’s degree.

🏖️ Benefits

• Health (PPO, EPO, HSA), Dental & Vision Insurance eligibility starting from the first day of hire • $500 wellness card for Health Coverage Participants • 401(k) with 4% Company Match, vested from the first day of hire • Flexible Spending Account (FSA) & Dependent Care Account (DCA) • Life Insurance • Bonding Leave • Paid Volunteering Program • Bonus eligibility • Paid Time Off (PTO) inclusive of vacation, personal and sick time • Paid Holidays • Wellness Program • Employee Engagement Activities • Professional Development Opportunities • Tuition Reimbursement • Student Loan PayDown Program • Employee Referral program • Employee Assistance Program

Apply Now

Similar Jobs

4 days ago

Staff Site Reliability Engineer optimizing infrastructure for cloud-native AI services at ServiceTitan. Leading design, implementation, and automation for AI service reliability and performance.

Azure

Cloud

Distributed Systems

Docker

Flux

Grafana

Kubernetes

Postgres

Prometheus

SQL

Terraform

4 days ago

Director of Site Reliability Engineering leading a globally-distributed team for Akamai's cloud network. Ensuring reliability, performance, and operational excellence in a fast-paced environment.

Ansible

Cloud

Distributed Systems

Grafana

Linux

Prometheus

SaltStack

4 days ago

DevOps Engineer supporting U.S. government cloud services with compliance and infrastructure coding. Collaborating within Agile teams to enhance security and system functionality.

Cloud

Oracle

SDLC

Terraform

4 days ago

DevOps Engineer at Alaska Northstar Federal joining a long-term project. Collaborating with stakeholders to advance user-centric design and accessibility best practices in cloud environments.

Cloud

Oracle

SDLC

Terraform

5 days ago

Staff Site Reliability Engineer working with developers to ensure infrastructure reliability and performance. Collaborating with engineering teams on cloud infrastructure and deployment pipelines for a clean tech company.

AWS

Cloud

DNS

Docker

Google Cloud Platform

JavaScript

Kubernetes

Linux

Python

TypeScript

Yarn