Senior Site Reliability Engineer – SRE

Job not on LinkedIn

🕒 March 18

Apply Now
Find Similar Remote Jobs

📊 Check your resume score for this job

Improve your chances of getting an interview by checking your resume score before you apply.

Logo of Moonlite

Moonlite

1 - 10 employees

📚 Education

🏪 Marketplace

👥 B2C

Education • Marketplace • B2C

Moonlite is a community-driven web platform that helps people discover, compare, and trust proven ways to make money. It curates thousands of business ideas, resources, creators, and courses, all validated and rated by real users so seekers can avoid hype and focus on what works. Moonlite offers community discussions, reviews, side-by-side comparisons, and a quick survey to match users with suitable income paths, aimed at helping individuals build financial freedom with confidence.

📋 Description

• Design, build, and operate production Kubernetes clusters on bare-metal infrastructure. • Implement and operate custom Kubernetes networking solutions. • Develop and maintain custom Kubernetes operators and controllers. • Deploy and optimize NVIDIA GPU operators and custom scheduling logic for GPU workloads. • Build deep integrations between Kubernetes and underlying infrastructure. • Design and implement automation using Terraform, Ansible, Helm, and custom operators. • Manage production bare-metal infrastructure across multiple regions ensuring high availability, fault tolerance, and graceful degradation. • Build comprehensive monitoring, logging, and alerting using Prometheus, Grafana, and ELK stack. • Identify and resolve performance bottlenecks across infrastructure domains.

🎯 Requirements

• 5+ years in SRE, DevOps, or infrastructure engineering roles with proven experience operating production infrastructure at scale. • Deep hands-on experience building and operating production Kubernetes clusters on bare-metal infrastructure. • Strong understanding of Kubernetes internals including custom resource definitions (CRDs), operators, controllers, admission webhooks, and scheduling. • Strong fundamentals in Linux systems administration, performance tuning, troubleshooting, and automation in production environments. • Proficiency with infrastructure-as-code tools (Terraform, Ansible, Helm) and building automation to reduce operational overhead. • Solid understanding of networking concepts including IPAM, DNS, DHCP, VLAN/VXLAN, routing, load balancing, and experience troubleshooting network issues in production. • Experience building and maintaining comprehensive monitoring solutions using tools like Prometheus, Grafana, and centralized logging systems. • Understanding of SRE principles including SLIs/SLOs/SLAs, error budgets, incident management, and blameless postmortems. • Strong scripting skills in Go, Python, or Bash for automation, tooling development, and operational efficiency. • Demonstrated ability to troubleshoot complex issues under pressure, manage incidents effectively, and communicate clearly during outages. • Excellent communication skills and ability to work across teams including systems engineers, network engineers, and software developers.

🏖️ Benefits

• 6% 401(k) match • Fully covered health insurance premiums • Other comprehensive offerings to support your well-being and success as we grow together.

Apply Now

Similar Jobs

🕒 March 18

Owner.com

201 - 500

☁️ SaaS

🤝 B2B

🏪 Marketplace

Senior DevOps Engineer evolving and operating Owner’s cloud platform. Design systems for reliability, security, and developer productivity as we scale.

🇺🇸 United States – Remote

💵 $190k - $240k / year

💰 $120M Series C - Owner on 2025-05

⏰ Full Time

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

🕒 March 18

Vytwo Technologies Inc

201 - 500

🤝 B2B

🏢 Enterprise

🎯 Recruiter

Meanstack Architect with DevOps expertise for TCoE, designing scalable applications and leading technical teams in a fully remote environment.

🇺🇸 United States – Remote

💵 $45 - $50 / hour

⏰ Full Time

🟡 Mid-level

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

🕒 March 17

Truv

51 - 200

Senior DevOps Engineer architecting and scaling AWS infrastructure and building observability platforms. Leading compliance projects and optimizing CI/CD pipelines in a remote setup.

🕒 March 14

Icmarc

-

💸 Finance

🤝 B2B

DevSecOps Engineer at MissionSquare integrating security into software development lifecycle. Collaborating with teams to deliver secure applications and improve security practices across platforms.

🇺🇸 United States – Remote

💵 $128.5k - $205.6k / year

⏰ Full Time

🟠 Senior

🔴 Lead

⛑ DevOps & Site Reliability Engineer (SRE)

🕒 March 14

Panopto

51 - 200

☁️ SaaS

📚 Education

🏢 Enterprise

Mid-Level DevOps Engineer at Panopto transforming outdated build processes into automated pipelines. Elevate the engineering experience by enhancing delivery lifecycle and collaboration.

🇺🇸 United States – Remote

💵 $155k - $175k / year

💰 Private Equity Round on 2021-04

⏰ Full Time

🟡 Mid-level

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

🦅 H1B Visa Sponsor

info