Senior DevOps Engineer

Job not on LinkedIn

November 6

Apply Now
Logo of Hercules

Hercules

Transport

Hercules is an award-winning, asset-based motor carrier and customs brokerage, specializing in US-to-Canada and intra US shipments. The company offers services that combine the flexibility of a regional carrier with the national coverage, positioning them as one of the fastest-growing less-than-truckload (LTL) carriers in the market. Hercules operates over 1000 pieces of equipment through more than 30 terminals, boasting an innovative 'no breakbulk' structure that reduces transit time and minimizes the risk of damage and misrouting. Additionally, Hercules employs the latest green fleet technology in partnership with both the Canada Border Services Agency and US Customs and Border Protection, demonstrating a commitment to responsible and efficient transportation solutions.

201 - 500 employees

🚗 Transport

📋 Description

• Deploy, scale, and manage Kubernetes clusters in both cloud and on-premises environments • Build and maintain CI/CD pipelines using modern automation tools and Infrastructure as Code practices • Manage hybrid infrastructure ensuring scalability, resilience, and disaster recovery readiness • Strengthen security and compliance through identity management, network policies, and encryption strategies • Implement observability solutions (metrics, logging, tracing) to ensure system reliability and performance • Optimize infrastructure performance and cloud costs while maintaining high availability • Own Kubernetes operations across cloud and on-prem: provision clusters, manage upgrades, enforce policies, and standardize app delivery (Helm/Kustomize) with progressive rollouts (blue/green, canary) • Design, build, and maintain CI/CD pipelines (GitHub Actions/Azure DevOps/Argo CD) using IaC (Terraform) and GitOps; enforce quality/security gates and artifact promotion • Architect and operate hybrid infrastructure (Azure, AWS, GCP, on-prem): networking, identity, storage, backup/DR, and capacity planning with clear RTO/RPO objectives; run DR tests regularly • Implement zero-trust and compliance controls: IAM/least-privilege, secrets management (Vault/KMS), mTLS, network policies, container image scanning/signing/attestation (Trivy/COSIGN), SBOMs, and policy-as-code (OPA/Gatekeeper/Kyverno) • Establish observability end-to-end: metrics, logs, traces (Prometheus/Grafana/OpenTelemetry/ELK), SLOs/SLIs, alerting, runbooks, and on-call rotation hygiene • Optimize performance and cost: right-size workloads, set requests/limits, enable autoscaling, implement spot/reserved strategies, and produce FinOps reporting • Partner with Engineering, Product, AI, and Security to support AI/LLM workloads (GPU scheduling, device plugins, quotas), model artifact storage, data pipelines; drive post-release verification and incident retrospectives • Create paved roads and reusable templates: environment blueprints, bootstrap scripts, golden images, and self-service tooling for developers • Lead incident response: triage, rollback, root-cause analysis, corrective actions, and knowledge base updates

🎯 Requirements

• 6+ years in DevOps/SRE/Platform Engineering with hands-on ownership of Kubernetes, CI/CD, and hybrid cloud operations at scale • Required: Strong with Terraform (IaC), Helm/Kustomize, container registries, and GitOps workflows (e.g., Argo CD/Flux) • Proficient with at least one major CI system (GitHub Actions, Azure DevOps) and artifact management; fluent in scripting (Bash) and one programming language (Python or Go preferred) • Deep knowledge of cloud primitives (Azure/AWS/GCP) and on-prem virtualization; networking (VPC/VNet, ingress, service mesh), storage, and security controls • Observability stack experience (Prometheus, Grafana, OpenTelemetry, ELK), SLI/SLO design, and actionable alerting • Security by default: IAM, secrets management (Vault/KMS), image scanning/signing (Trivy/COSIGN), SBOMs, and policy as code (OPA/Gatekeeper/Kyverno) • Proven track record with DR/BCP, backup/restore testing, and capacity planning; comfort with incident command and postmortems • Bonus: Support for AI workloads (GPU nodes, quotas), performance profiling, and cost modeling/FinOps.

🏖️ Benefits

• Opportunities to support AI-powered workloads • Hands-on engineering position

Apply Now

Similar Jobs

October 23

DevOps Team Lead overseeing engineers in a fast-paced environment at Intermedia. Ensuring efficient CI/CD pipelines and leading teams for cloud communication solutions.

Ansible

Azure

DNS

Docker

ElasticSearch

Grafana

Kubernetes

Linux

Prometheus

Python

React

Terraform

October 23

Mid-Level Platform Engineer (DevOps) enhancing security and manageability of legacy platform with AWS IAM and Terraform. Collaborating with engineering teams, executing technical plans in an Agile environment.

AWS

Docker

JavaScript

Kubernetes

Node.js

Python

Terraform

October 16

Site Reliability Engineer bridging software development and operations for Valtech. Delivering reliable speed and collaborating with teams while ensuring focus on production.

AWS

Azure

Cloud

Docker

Google Cloud Platform

Grafana

Java

Jenkins

Kafka

Kubernetes

Microservices

Prometheus

Spring Boot

SpringBoot

October 7

Senior & Mid Platform Engineer (DevOps) optimizing AWS IAM and automation workflows in a delivery-focused engineering team. Engaging in Terraform implementations and collaborating within an agile environment.

AWS

Docker

JavaScript

Kubernetes

Node.js

Python

Terraform

October 7

Senior DevOps Engineer overseeing Kafka streaming clusters at 8x8. Collaborating with cross-functional teams to standardize data solutions and ensure performance.

Ansible

AWS

Cloud

Kafka

Kubernetes

Linux

Python

Redis

Terraform

VMware

Built by Lior Neu-ner. I'd love to hear your feedback — Get in touch via DM or support@remoterocketship.com