Platform Infrastructure Engineer – SRE Core

🕒 May 12

Apply Now
Find Similar Remote Jobs

📊 Check your resume score for this job

Improve your chances of getting an interview by checking your resume score before you apply.

Logo of Menlo Security Inc.

Menlo Security Inc.

201 - 500 employees

🔒 Cybersecurity

🏢 Enterprise

💰 $100M Series E on 2020-11

Cybersecurity • Enterprise

Menlo Security Inc. is a cybersecurity company focused on providing advanced internet security solutions for enterprises. They specialize in securing browsers across hybrid enterprise environments to prevent phishing and malware attacks. Menlo Security offers a cloud-based browser security solution that transforms any browser into a secure enterprise browser, helping organizations protect against highly evasive adaptive threats, zero-hour phishing, and ransomware. Their zero-trust access architecture enables safe internet use and secure application access, supporting over 800 customers globally, including financial institutions and government agencies.

📋 Description

• Design, deploy, and maintain VM and Kubernetes infrastructure on GCP and AWS across dozens of clusters spanning development, staging, and production environments in multiple regions. • Coordinate with your peers in your direct team as well as across teams to ensure that the tasks you’re working on are going to solve the problems that we need them to solve. • Build and maintain Infrastructure as Code (IaC) using Terraform modules, managing resources through Spacelift or equivalent Terraform Automation and Collaboration Software (TACOS). Provision cloud infrastructure including networking, compute, storage, and security components primarily on GCP, with secondary AWS support. • Implement and manage workflows with sophisticated multi-layer configuration management. • Build and maintain comprehensive observability solutions using Grafana Cloud, Prometheus/Mimir, and OTel collectors. Design Grafana dashboards, configure alerting rules, and ensure visibility across all platform components. • Manage certificate lifecycle, DNS automation, ingress controllers, and service mesh networking with Cilium. • Partner with Engineering, Product, Compliance, and Security teams to design resilient, scalable systems. Consult on capacity planning, disaster recovery, and architectural decisions for cloud-native applications. • Identify and eliminate toil through automation. Write scripts, develop tools, and build CI/CD pipelines to improve operational efficiency and reduce manual work. • Participate in a 24x7 on-call rotation as part of a globally distributed team, responding to incidents and driving post-incident reviews.

🎯 Requirements

• Bachelor's degree in Computer Science, similar technical field of study, or equivalent practical experience. • Proficiency in common programming & scripting languages. We use a lot of python, bash and go. • Understanding of network topologies, communication protocols (ie. TCP/IP, HTTP/S, UDP, TLS) and enterprise grade connectivity solutions. • Kubernetes expertise including cluster administration, RBAC, networking, workload management, and troubleshooting across production environments. • Proven experience with Terraform for infrastructure provisioning and management. • Knowledge of Google Cloud Platform services including GKE, VPC networking, Cloud DNS, Artifact Registry, Secret Manager, IAM, Gemini Code Assist, and Workload Identity. • Experience with GitOps methodologies and tools.

🏖️ Benefits

• Collaborative, inclusive, and fun culture • Opportunities to take initiative • Support for new ideas • Open communication

Apply Now

Similar Jobs

🕒 May 12

Arbor Education

51 - 200

📚 Education

🤝 B2B

Senior DevOps Engineer improving the resilience and performance of Arbor's platform. Collaborating closely with teams on architecture, infrastructure and development processes.

Ansible

AWS

Cloud

NGINX

Prometheus

Terraform

🕒 May 11

Intermedia Cloud Communications

1001 - 5000

🤝 B2B

🏢 Enterprise

☁️ SaaS

DevOps Engineer deploying and managing application infrastructure for a leading cloud tech provider. Focused on utilizing Kubernetes, GCP, and infrastructure automation tools.

Ansible

Cloud

Docker

ElasticSearch

Google Cloud Platform

Jenkins

Kubernetes

Linux

Postgres

Python

RabbitMQ

Redis

Terraform

Go

🕒 May 8

Arbor Education

51 - 200

📚 Education

🤝 B2B

Site Reliability Engineer ensuring world-class resilience and performance for Arbor's MIS and school management tools. Collaborating with teams to improve site reliability and scalability.

AWS

Cloud

Docker

NGINX

Prometheus

Terraform

🕒 May 7

TechInsights

201 - 500

Senior Site Reliability Engineer responsible for AI operations and reliability at TechInsights. Leading strategic initiatives to support demanding semiconductor intelligence workflows.

AWS

Cloud

Docker

Java

Kubernetes

Python

Spring

Spring Boot

SpringBoot

Terraform

🕒 May 6

Leidos

10,000+ employees

🔒 Cybersecurity

🔬 Science

Senior DevOps Engineer designing and implementing secure cloud infrastructure for Leidos UK. Leveraging AWS services and Agile methodologies for automation and continuous delivery.

Ansible

AWS

Cloud

Docker

EC2

Jenkins

NGINX

Python

Terraform

Vault