Search Remote Jobs

Senior Site Reliability Engineer

đź•’ May 11

Apply Now
Find Similar Remote Jobs

📊 Check your resume score for this job

Improve your chances of getting an interview by checking your resume score before you apply.

Logo of PlayOn! Sports

PlayOn! Sports

201 - 500 employees

📱 Media

âš˝ Sports

📚 Education

đź’° $26M Series D on 2013-07

Media • Sports • Education

PlayOn! Sports is a company specializing in providing digital solutions for high school athletics and activities. They offer a comprehensive platform that integrates athletic management, digital ticketing, broadcasting and streaming, concessions, sponsorships, fundraisers, and athletic websites. Their brands include GoFan, the NFHS Network, and rSchoolToday, which collectively serve thousands of schools with over 500,000 events annually. PlayOn! Sports aims to enhance community engagement and ease administrative tasks for schools by offering tools like digital payment options, event streaming, and activity management. They play a significant role in connecting fans and communities through digital engagement with local school events.

đź“‹ Description

• Contribute to system observability i.e implementing, improving metrics, alerting, and dashboards for better insight and faster recovery. • Develop automation, tooling, and monitoring solutions to support high service availability. • Partner with application and quality engineering teams to implement best practices in reliability, release automation, and testing. • Drive operational excellence through proactive incident prevention, blameless postmortems, and capacity planning. • Participate in on-call rotations to support critical services and ensure rapid response to incidents.

🎯 Requirements

• Solid experience in Python, especially for automation, tooling, and data-driven operational tasks. • Proficiency in at least one (Java, C++, or Go). • Strong understanding of Linux systems, cloud infrastructure (AWS, GCP, or Azure), and modern deployment practices (Docker, Kubernetes, Terraform). • Experience with CI/CD pipelines, version control, and automated testing frameworks. • Experience with observability tools (e.g., Prometheus, Grafana, ELK, Datadog, etc.) and log/metric analysis for diagnosing issues. • Proven experience facilitating and documenting Critical User Journeys translating them to actionable SLA/SLO for automation. • Demonstrated ability to collaborate with cross-functional teams and communicate clearly in high-impact situations. • A problem-solver who approaches reliability as a shared responsibility across engineering. • Familiarity with AI-augmented development tools (Claude, Codex) as part of a modern engineering workflow. • **Nice to Have** • Experience writing or maintaining end-to-end or integration tests for distributed systems. • Background in performance testing, capacity planning, or chaos engineering. • Contributions to internal developer tooling or reliability-focused frameworks. • Exposure to security, compliance, or change management processes in production environments. • Relevant certifications.

🏖️ Benefits

• Multiple medical insurance plans to choose from • Dental, vision life and disability insurance • Employee Emergency Fund • Company equity (stock options) • Open PTO policy • 401K plan with company match • Hybrid/flexible work environment

Apply Now

Similar Jobs

đź•’ May 11

CyberSheath

51 - 200

đź”’ Cybersecurity

đź“‹ Compliance

đź’ł Fintech

Cloud Operations Engineer responsible for deploying and managing CyberSheath solutions for clients. Engaging with new clients and supporting their IT systems in a remote capacity.

đź•’ May 9

Ford Motor Company

10,000+ employees

đźš— Transport

Site Reliability Engineer at Ford, developing and enhancing global monitoring systems. Join the team redefining transportation using advanced technology.

đź•’ May 9

Visionary Integration Professionals (VIP)

501 - 1000

🤝 B2B

🏛️ Government

Forward Deployment Engineer working on AI-enabled solutions for clients at Visionary Integration Professionals. Collaborating with customers to design, prototype, and support implementations across various sectors.

đź•’ May 9

Datmos

51 - 200

🛍️ eCommerce

🤝 B2B

🏢 Enterprise

Full-Stack Developer / DevOps at Datmos transforming AI models into production-grade products. Creating secure interfaces and robust cloud infrastructures within the AI Task Force.

đź•’ May 9

TechInsights

201 - 500

Senior Site Reliability Engineer building reliability and AI operations for semiconductor workflows. Owning strategic initiatives, collaborating across teams, and driving reliability standards and practices.