
11 - 50 employees
Founded 2024
đ¤ Artificial Intelligence
đ Security
đ§ Hardware
đ° Seed Round on 2024-09
Artificial Intelligence ⢠Security ⢠Hardware
HavocAI is a developer of collaborative autonomy for maritime operations, offering a modular software and vehicle stack that enables fleets of autonomous maritime systems to perform contested logistics, sensor fusion and tracking, domain awareness, and escort-and-engage missions. Their product suite includes onboard autonomy (HAVOC OS), scalable communications (HAVOC CLOUD), and a handheld operator interface (HAVOC CONTROL), marketed as a single solution for theater-scaled security and rapid deployment. HavocAI emphasizes real-time, team-led autonomous solutions that run across diverse environments and supports both hardware (autonomous vessels) and software deployment.
đĽ 0 minutes ago
đşđ¸ United States â Remote
đľ $150k - $185k / year
â° Full Time
đ Senior
â DevOps & Site Reliability Engineer (SRE)
Improve your chances of getting an interview by checking your resume score before you apply.

11 - 50 employees
Founded 2024
đ¤ Artificial Intelligence
đ Security
đ§ Hardware
đ° Seed Round on 2024-09
Artificial Intelligence ⢠Security ⢠Hardware
HavocAI is a developer of collaborative autonomy for maritime operations, offering a modular software and vehicle stack that enables fleets of autonomous maritime systems to perform contested logistics, sensor fusion and tracking, domain awareness, and escort-and-engage missions. Their product suite includes onboard autonomy (HAVOC OS), scalable communications (HAVOC CLOUD), and a handheld operator interface (HAVOC CONTROL), marketed as a single solution for theater-scaled security and rapid deployment. HavocAI emphasizes real-time, team-led autonomous solutions that run across diverse environments and supports both hardware (autonomous vessels) and software deployment.
⢠Design and evolve reliability architecture for distributed and cloud-hosted systems ⢠Define and implement SRE best practices, including SLIs, SLOs, error budgets, and capacity planning ⢠Partner with platform and application teams to design systems for reliability, scalability, and operability ⢠Identify and mitigate systemic reliability risks across infrastructure, applications, services, and data pipelines ⢠Establish reliability patterns that support autonomy, simulation, and mission-critical cloud workloads ⢠Lead incident response processes, including on-call rotations, escalation paths, and post-incident reviews ⢠Conduct root cause analysis for complex production incidents and drive long-term corrective actions ⢠Improve operational readiness through runbooks, automation, resilience testing, and production-readiness reviews ⢠Reduce operational toil through tooling, automation, and process improvements ⢠Help build a culture of ownership, accountability, and continuous improvement across production systems ⢠Design, implement, and maintain observability systems for metrics, logging, tracing, alerting, and service health ⢠Ensure services and data pipelines are observable, debuggable, and performant in production ⢠Drive performance analysis and tuning across infrastructure, application, and service layers ⢠Improve alert quality, reduce noise, and ensure operational signals are actionable ⢠Partner with engineering teams to define meaningful reliability and performance metrics ⢠Build automation to improve system reliability, deployment safety, and recovery processes ⢠Partner with DevOps and Cloud Platform teams on CI/CD reliability, rollout strategies, and safe deployment patterns ⢠Support and improve Kubernetes-based environments and containerized workloads ⢠Contribute to infrastructure-as-code practices and platform automation ⢠Help define operational standards for cloud infrastructure, deployment workflows, and production services ⢠Collaborate with security teams to ensure secure and resilient system design ⢠Participate in disaster recovery planning, backup strategy, and resilience testing ⢠Maintain strong operational practices around access control, secrets management, change management, and production access ⢠Support secure operations for systems that may serve defense, autonomy, or mission-sensitive use cases
⢠7+ years of experience in SRE, infrastructure engineering, systems engineering, or related roles ⢠Strong experience operating large-scale distributed production systems ⢠Deep understanding of Linux systems, networking, cloud infrastructure, and distributed systems fundamentals ⢠Hands-on experience with Kubernetes and container orchestration ⢠Programming or scripting experience in Go, Python, or similar languages ⢠Experience designing and operating observability systems for production environments ⢠Proven ability to lead incident response and drive reliability improvements ⢠Strong communication skills and ability to collaborate across engineering teams ⢠Ability to operate calmly and effectively under pressure ⢠Must be a U.S. Citizen and eligible to obtain a U.S. Government security clearance if required
⢠100% Employer paid Health, Dental and Vision Insurance for you and your families ⢠Life Insurance (Employer Paid) ⢠Ability to participate in the companies 401k program (Matching) ⢠Unlimited PTO policy with an enforced 2 week minimum ⢠Equity Package ⢠Work / Home Office Stipend ⢠Global Entry ⢠16 Week Paid Parental Leave ⢠Monthly Health and Wellness Stipend
Apply NowđĽ 6 hours ago
Senior DevOps Engineer at Ad Hoc creating scalable digital services and improving software engineering processes. Collaborating with federal agencies to enhance service delivery through technology.
đşđ¸ United States â Remote
đľ $125k - $140k / year
â° Full Time
đ Senior
â DevOps & Site Reliability Engineer (SRE)
đĽ 11 hours ago
Senior DevSecOps Engineer at Generac managing cloud services and ensuring security and compliance in data handling. Leading efforts in secure cloud infrastructure design and integrating security in development processes.
đşđ¸ United States â Remote
đľ $145k - $185k / year
đ° $200M Grant on 2024-07
â° Full Time
đ Senior
â DevOps & Site Reliability Engineer (SRE)
đŚ H1B Visa Sponsor
đĽ 13 hours ago
DevOps Engineer designing and managing cloud environments and automation tools for RethinkFirst. Delivering CI/CD pipelines, quality code, and incident management in a fast-paced environment.
đĽ 14 hours ago
Lead Site Reliability Engineer enhancing observability and telemetry platform for athenahealth's cloud infrastructure. Collaborating with engineering teams to improve reliability and operational efficiency.
đşđ¸ United States â Remote
đľ $143k - $243k / year
đ° Post-IPO Equity on 2017-05
â° Full Time
đ Senior
â DevOps & Site Reliability Engineer (SRE)
đŚ H1B Visa Sponsor
đĽ 15 hours ago
Senior DevOps Engineer focusing on cloud architecture and CI/CD at TrueML, enhancing infrastructure scalability and reliability. Engaging in hands-on technical execution and team collaboration.
đşđ¸ United States â Remote
đľ $120k - $155k / year
â° Full Time
đ Senior
â DevOps & Site Reliability Engineer (SRE)