
API • Artificial Intelligence • SaaS
Aerospike is a distributed NoSQL database known for its high-speed, low-latency read and write operations, delivering millisecond response times. It is designed to handle massive amounts of streaming data from a variety of sources, making it suitable for applications that require real-time analytics, caching, session management, and AI-driven solutions. Aerospike supports various data models, including key-value pairs and JSON documents, enabling developers to efficiently manage complex datasets while maintaining high performance and scalability.
51 - 200 employees
Founded 2012
🔌 API
🤖 Artificial Intelligence
☁️ SaaS
August 1

API • Artificial Intelligence • SaaS
Aerospike is a distributed NoSQL database known for its high-speed, low-latency read and write operations, delivering millisecond response times. It is designed to handle massive amounts of streaming data from a variety of sources, making it suitable for applications that require real-time analytics, caching, session management, and AI-driven solutions. Aerospike supports various data models, including key-value pairs and JSON documents, enabling developers to efficiently manage complex datasets while maintaining high performance and scalability.
51 - 200 employees
Founded 2012
🔌 API
🤖 Artificial Intelligence
☁️ SaaS
• As a Staff Site Reliability Engineer at Aerospike, you’ll be a technical leader within our global SRE organization, helping drive reliability, performance, and scalability across our hybrid and multi-cloud environments. You’ll bring deep operational experience and lead by example—mentoring others, designing resilient systems, and championing modern SRE practices across new and legacy platforms. • You’ll play a key role in shaping the direction of our infrastructure initiatives, from Kubernetes-based platforms like AKS and the Aerospike Kubernetes Operator to existing services in AWS and GCP. Your impact will span teams and systems as you solve complex problems, influence architecture, and foster a culture of ownership, resilience, and continuous improvement. • Provide technical leadership across multiple systems and environments, proactively identifying risks, shaping architecture decisions, and improving reliability and performance at scale. • Lead key infrastructure efforts including Kubernetes platform expansion (AKS, AKO), and application of SRE principles to legacy systems and new cloud offerings. • Define, measure, and enforce reliability standards through SLIs/SLOs, observability tooling, and incident response frameworks. • Mentor and guide other SREs by leading design sessions, conducting technical deep dives, and reviewing code, configurations, and infrastructure decisions. • Partner with product, engineering, and cloud teams to align reliability goals with delivery objectives. • Lead root cause analyses and implement systemic fixes for issues spanning multiple platforms or services. • Drive automation-first approaches using IaC, CI/CD pipelines, and scripting to reduce toil and increase deployment confidence. • Influence cross-functional roadmaps, identifying areas for innovation, technical debt reduction, and long-term scalability. • Participate in the global on-call rotation, bringing senior-level calm and clarity during incidents and escalations.
• 8+ years of experience in SRE, DevOps, or infrastructure engineering, including significant time operating production systems at scale. • Deep hands-on experience with at least one major public cloud (AWS, GCP, Azure), and working knowledge of the others; Azure experience is a plus. • Production experience with Kubernetes, including operating clusters, Helm, operators, and supporting microservices in real-world environments. • Strong proficiency in infrastructure-as-code tools such as Terraform and CI/CD automation platforms. • Expertise in observability tools and practices (Datadog, Prometheus, Grafana, ELK, etc.) and using them to define SLIs and SLOs.; DataDog experience is a plus • Programming and scripting ability in one or more languages (Python, Go, Bash, etc.). • Experience with large-scale incident response and post-incident review practices. • Proven ability to mentor other engineers and influence technical strategy across multiple teams. • Strong communication skills to articulate complex concepts to technical and non-technical stakeholders.
Apply Now