
Artificial Intelligence • SaaS • Cloud Engineering
InfraCloud Technologies is a company specializing in cloud native technologies and services. They provide expertise in building, modernizing, and managing cloud infrastructure using Kubernetes and open source technologies. InfraCloud offers a range of services such as AI infrastructure consulting, platform engineering, and application modernization. The company is recognized for its capabilities in DevSecOps, observability, and containerization, and is a trusted partner for deploying and managing Kubernetes-based solutions. They also contribute to open-source projects, enhancing their offerings in areas like site reliability engineering and cloud native product development.
October 13

Artificial Intelligence • SaaS • Cloud Engineering
InfraCloud Technologies is a company specializing in cloud native technologies and services. They provide expertise in building, modernizing, and managing cloud infrastructure using Kubernetes and open source technologies. InfraCloud offers a range of services such as AI infrastructure consulting, platform engineering, and application modernization. The company is recognized for its capabilities in DevSecOps, observability, and containerization, and is a trusted partner for deploying and managing Kubernetes-based solutions. They also contribute to open-source projects, enhancing their offerings in areas like site reliability engineering and cloud native product development.
• Own Tier 3 technical escalations from Technical Support and ensure rapid resolution. • Investigate, triage, and mitigate incidents, ensuring accountability and timely communication. • Conduct trend and root-cause analysis to identify recurring issues, bug patterns, and product gaps. • Read and interpret application code to isolate, reproduce, and diagnose complex technical problems. • Collaborate with Support and Product Engineering to drive systemic improvements and long-term fixes. • Contribute to the creation and maintenance of runbooks, escalation workflows, and troubleshooting guides. • Partner with cross-functional teams to improve monitoring, logging, and alerting for production systems. • Automate repetitive tasks and build tools to improve team efficiency. • Participate in on-call rotations as part of a 24×7 follow-the-sun model.
• 3–5 years of experience in Production Engineering, Technical Support (Tier 3), SRE, or similar roles in a SaaS or enterprise software environment. • Strong understanding of incident management, troubleshooting, and root cause analysis. • Ability to read and understand code (golang preferred) to debug issues, analyze stack traces, and collaborate effectively with developers. • Proficiency with ServiceNow, Jira, Azure DevOps, or equivalent tools. • Familiarity with monitoring and observability platforms (Grafana, Prometheus, Splunk, etc.). • Hands-on experience with cloud platforms such as Azure, AWS, or GCP. • Basic scripting or automation skills (e.g., Python, PowerShell, or Bash). • Strong communication and cross-functional collaboration skills. • Data-driven mindset with a focus on efficiency, metrics, and continuous improvement. • Nice to Have • Experience working in globally distributed, follow-the-sun teams. • Exposure to AI or automation for incident triage or resolution. • Experience contributing to DevOps or SRE practices. • Prior experience in backup, recovery, or data management products.
• Fully remote
Apply Now