Search Remote Jobs

Senior DevOps Engineer

Job not on LinkedIn

November 6

Apply Now
Logo of Hercules

Hercules

Transport

Hercules is an award-winning, asset-based motor carrier and customs brokerage, specializing in US-to-Canada and intra US shipments. The company offers services that combine the flexibility of a regional carrier with the national coverage, positioning them as one of the fastest-growing less-than-truckload (LTL) carriers in the market. Hercules operates over 1000 pieces of equipment through more than 30 terminals, boasting an innovative 'no breakbulk' structure that reduces transit time and minimizes the risk of damage and misrouting. Additionally, Hercules employs the latest green fleet technology in partnership with both the Canada Border Services Agency and US Customs and Border Protection, demonstrating a commitment to responsible and efficient transportation solutions.

201 - 500 employees

🚗 Transport

📋 Description

• Deploy, scale, and manage Kubernetes clusters in both cloud and on-premises environments • Build and maintain CI/CD pipelines using modern automation tools and Infrastructure as Code practices • Manage hybrid infrastructure ensuring scalability, resilience, and disaster recovery readiness • Strengthen security and compliance through identity management, network policies, and encryption strategies • Implement observability solutions (metrics, logging, tracing) to ensure system reliability and performance • Optimize infrastructure performance and cloud costs while maintaining high availability • Own Kubernetes operations across cloud and on-prem: provision clusters, manage upgrades, enforce policies, and standardize app delivery (Helm/Kustomize) with progressive rollouts (blue/green, canary) • Design, build, and maintain CI/CD pipelines (GitHub Actions/Azure DevOps/Argo CD) using IaC (Terraform) and GitOps; enforce quality/security gates and artifact promotion • Architect and operate hybrid infrastructure (Azure, AWS, GCP, on-prem): networking, identity, storage, backup/DR, and capacity planning with clear RTO/RPO objectives; run DR tests regularly • Implement zero-trust and compliance controls: IAM/least-privilege, secrets management (Vault/KMS), mTLS, network policies, container image scanning/signing/attestation (Trivy/COSIGN), SBOMs, and policy-as-code (OPA/Gatekeeper/Kyverno) • Establish observability end-to-end: metrics, logs, traces (Prometheus/Grafana/OpenTelemetry/ELK), SLOs/SLIs, alerting, runbooks, and on-call rotation hygiene • Optimize performance and cost: right-size workloads, set requests/limits, enable autoscaling, implement spot/reserved strategies, and produce FinOps reporting • Partner with Engineering, Product, AI, and Security to support AI/LLM workloads (GPU scheduling, device plugins, quotas), model artifact storage, data pipelines; drive post-release verification and incident retrospectives • Create paved roads and reusable templates: environment blueprints, bootstrap scripts, golden images, and self-service tooling for developers • Lead incident response: triage, rollback, root-cause analysis, corrective actions, and knowledge base updates

🎯 Requirements

• 6+ years in DevOps/SRE/Platform Engineering with hands-on ownership of Kubernetes, CI/CD, and hybrid cloud operations at scale • Required: Strong with Terraform (IaC), Helm/Kustomize, container registries, and GitOps workflows (e.g., Argo CD/Flux) • Proficient with at least one major CI system (GitHub Actions, Azure DevOps) and artifact management; fluent in scripting (Bash) and one programming language (Python or Go preferred) • Deep knowledge of cloud primitives (Azure/AWS/GCP) and on-prem virtualization; networking (VPC/VNet, ingress, service mesh), storage, and security controls • Observability stack experience (Prometheus, Grafana, OpenTelemetry, ELK), SLI/SLO design, and actionable alerting • Security by default: IAM, secrets management (Vault/KMS), image scanning/signing (Trivy/COSIGN), SBOMs, and policy as code (OPA/Gatekeeper/Kyverno) • Proven track record with DR/BCP, backup/restore testing, and capacity planning; comfort with incident command and postmortems • Bonus: Support for AI workloads (GPU nodes, quotas), performance profiling, and cost modeling/FinOps.

🏖️ Benefits

• Opportunities to support AI-powered workloads • Hands-on engineering position

Apply Now

Similar Jobs

October 23

Intermedia Cloud Communications

1001 - 5000

🤝 B2B

🏢 Enterprise

☁️ SaaS

DevOps Team Lead overseeing engineers in a fast-paced environment at Intermedia. Ensuring efficient CI/CD pipelines and leading teams for cloud communication solutions.

🇵🇹 Portugal – Remote

💵 €70k - €80k / year

💰 Venture Round on 2017-02

⏰ Full Time

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

October 23

Vigil

51 - 200

🤝 B2B

☁️ SaaS

🏢 Enterprise

Mid-Level Platform Engineer (DevOps) enhancing security and manageability of legacy platform with AWS IAM and Terraform. Collaborating with engineering teams, executing technical plans in an Agile environment.

🇵🇹 Portugal – Remote

⏰ Full Time

🟡 Mid-level

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

October 16

Valtech

5001 - 10000

🤝 B2B

☁️ SaaS

Site Reliability Engineer bridging software development and operations for Valtech. Delivering reliable speed and collaborating with teams while ensuring focus on production.

🇵🇹 Portugal – Remote

⏰ Full Time

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

October 7

Vigil

51 - 200

🤝 B2B

☁️ SaaS

🏢 Enterprise

Senior & Mid Platform Engineer (DevOps) optimizing AWS IAM and automation workflows in a delivery-focused engineering team. Engaging in Terraform implementations and collaborating within an agile environment.

🇵🇹 Portugal – Remote

⏰ Full Time

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

October 7

8x8

1001 - 5000

☁️ SaaS

📡 Telecommunications

🏢 Enterprise

Senior DevOps Engineer overseeing Kafka streaming clusters at 8x8. Collaborating with cross-functional teams to standardize data solutions and ensure performance.

🇵🇹 Portugal – Remote

💰 $121.9M Post-IPO Equity on 2022-01

⏰ Full Time

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

Developed by Lior Neu-ner. I'd love to hear your feedback — Get in touch via DM or support@remoterocketship.com