Technical Operations Engineer

Job not on LinkedIn

July 23

Apply Now
Logo of QuickNode ⚡

QuickNode ⚡

Crypto • Web 3 • SaaS

QuickNode is a company that provides advanced blockchain infrastructure to enable secure, decentralized innovation. It offers tools and resources such as API services, real-time data streams, managed blockchain solutions, and enhanced security measures, all backed by reliable global infrastructure. QuickNode's services support the development of decentralized applications by providing multi-chain capabilities, low-latency API processing, and comprehensive customer support. This allows businesses and developers to build and scale their projects across various blockchain networks with ease.

51 - 200 employees

Founded 2017

₿ Crypto

🌐 Web 3

☁️ SaaS

📋 Description

• QuickNode is a cloud-based infrastructure company that powers the blockchain ecosystem. • The QuickNode team has over 120 people maintaining high performance global data infrastructure for amazing customers serving billions of requests daily. • We’re seeking a dedicated Technical Operations Engineer specializing in Solana infrastructure to ensure the reliability, scalability, and exceptional performance of our Solana-based services. • Your contributions will directly impact QuickNode's operational excellence and customer trust by proactively addressing issues, refining processes, and collaborating closely with foundational teams and the broader Solana ecosystem. • Lead end-to-end deployment and optimization projects for Solana infrastructure, including validator nodes, RPC endpoints, and indexing services. Drive design reviews, canary rollouts, and continuous improvements to performance and reliability. • Own SEV 0/1 response, coordinating mitigation across Teams, running postmortems, and ensuring root-cause resolution with follow-through on corrective actions. • Define and manage service-level objectives (SLOs) and SLAs. Build and maintain cost models and capacity planning tools to forecast infrastructure needs and control spend. • Develop dashboards and alerting solutions using tools like Grafana and DataDog. Identify anomalies and trends to prevent outages before they occur. • Implement and maintain automation via Ansible, Terraform, and Kubernetes. Reduce toil, accelerate deployment timelines, and ensure consistent environments across staging and production. • Provide mentorship to engineers on deployment, observability, and Solana-specific ops. Review infrastructure code and monitoring configs. Raise the bar through shared knowledge. • Act as a technical representative in Solana forums and community calls. Collaborate directly with the Solana Foundation and ecosystem contributors to troubleshoot and evolve protocol-level operations. • Partner with internal infrastructure, platform, and support Teams to solve customer-impacting issues. Contribute insights to architectural and product-level discussions. • Participate in an on-call rotation, ensuring 24/7 availability for critical systems and supporting rapid incident resolution.

🎯 Requirements

• Minimum of 5+ years in Technical Operations, Site Reliability Engineering (SRE), or related roles, with proven Linux/Unix system administration and advanced troubleshooting capabilities. Holding an RHCE-level Linux or similar certification would be beneficial. • Hands-on experience operating and optimizing Solana validator nodes, RPC endpoints, and associated infrastructure at scale. Must be familiar with high-level Solana protocol and core components. Proficient in analyzing validator logs, RPC debugging, and addressing Solana-specific operational issues. Contributions into open-source Solana projects is an asset. • Solid hands-on experience with configuration management and infrastructure automation tools (Helm, Terraform, Ansible, Consul), including containerization expertise (Docker, Kubernetes), managing and scaling services in cloud environments. • Competency in scripting/programming languages (Rust, Go, JavaScript). • Advanced proficiency in monitoring and analytics platforms (Grafana, DataDog), enabling proactive and data-driven operational decision-making. • Demonstrated ability to identify performance patterns, forecast potential issues, and implement preventive solutions. • Strong track record defining, measuring, and maintaining SLAs/SLOs, and experienced with incident response tooling and processes (PagerDuty), ensuring quick resolution and systematic root-cause analyses. • Exceptional interpersonal and communication skills, with a proven ability to collaborate effectively across multiple teams and stakeholders. • Self-motivated, solution-oriented, and consistently striving for operational improvements, quality enhancements, and reduced technical debt. • Solid professional attributes, committed to transparency, accountability, and ethical behavior. Capable of managing complexity and staying adaptable under pressure, and able to demonstrate continuous learning and comfort evolving within a rapidly changing technical landscape. • Self-starter driven by curiosity and initiative, proactively identifying opportunities, addressing gaps, and implementing solutions autonomously. • Thrives in dynamic environments and committed to maintaining industry leadership through close collaboration with the most innovative and talented minds in Web3.

Apply Now
Developed by Lior Neu-ner. I'd love to hear your feedback — Get in touch via DM or support@remoterocketship.com