Cloud Operations Engineer

Job not on LinkedIn

September 3

Apply Now
Logo of ARCOS LLC

ARCOS LLC

Enterprise • Utilities • Transportation

ARCOS LLC is a workforce management solution provider focused on improving operational efficiency and communication within critical industries. Their products facilitate daily operations, emergency response, and operational analytics, enabling organizations such as utilities and airlines to manage their crews effectively. By leveraging real-time data and streamlined workflows, ARCOS ensures that teams are well-equipped to respond quickly to challenges, thereby enhancing productivity and customer satisfaction.

51 - 200 employees

🏢 Enterprise

📋 Description

• Support solutions hosted on AWS, including Linux/Windows servers running on EC2; responsible for production lifecycle, maintenance, and administration of the ARCOS platform. • Design, develop and maintain scalable AWS solutions and infrastructure (EC2, RDS, S3, DynamoDB, Elasticache, Route53, etc.). • Develop tooling and processes to automate deployment of SaaS applications and underlying OS and infrastructure. • Perform PostgreSQL and Oracle database administration including maintenance, troubleshooting, tuning, optimization, upgrades, backup/recovery and data migration. • Partner with Engineering, Development, Quality Assurance, Professional Services, and Technical Support to ensure product success and schedules. • Engage in Agile team practices such as daily standups, backlog refinement, release planning and sprint planning. • Coordinate configuration changes, installs, and upgrades following company change control procedures. • Participate in 24x7 on-call responsibilities to maintain availability and performance of customer-facing production services. • Triage and resolve complex problems spanning multiple tiers of application/infrastructure, including network connectivity issues. • Actively monitor supported systems and respond promptly to security or usability concerns. • Review application logs and analyze events using cloud-native services (CloudWatch, CloudTrail) or third-party SIEM tools (Splunk). • Upgrade systems and processes for enhanced functionality and security compliance. • Accurately document all processes and procedures for routine and non-routine tasks. • Perform all other duties and responsibilities as assigned.

🎯 Requirements

• Bachelor’s degree in Computer Science or related field, or equivalent work experience. • 4-5 years of system administration experience, ideally in global management and operations of highly trafficked production applications. Experience working in a 24x7 SaaS environment is preferred. • 4-5 years of experience designing solutions for and managing AWS services, including but not limited to: EC2, RDS, S3, DynamoDB, Elasticache, WAF/Shield, Route53, IAM and Directory Service, ECS, EKS, ECR, DNS, Parameter Store, ALB • 2 years of experience with CI/CD technologies and best practices using AWS CodePipeline, CodeBuild, Github Actions or Bitbucket Pipelines. • 2 years of experience with PostgreSQL, Oracle, SQL Server. • Experience with hosting and supporting ESRI ArcGIS Server and FME Data Integration tools is a plus • Experience with Linux and Windows system administration, automation and performance tuning. • Experience with configuration management and infrastructure as code tools such as Ansible and Terraform. • Experience with Apache, Nginx, Tomcat, NodeJS/PM2. • Experience with scripting languages, including Bash, Python and Powershell. • Knowledge of Kubernetes, Docker, Jira, Confluence. • Advanced knowledge of system vulnerability management and security best practices. • Solid understanding of observability, networking concepts and troubleshooting. • Proven ability to work effectively with highly reliable and highly available mission critical technologies with detail and results shown while meeting deadlines. • Ability to operate deployment automation, SaaS operations, internal and external SaaS infrastructure, security and cost management. • Solid understanding of technical issues and opportunities related to modern cloud infrastructure and operations. Excellent written and verbal communication skills. • Participating in a rotational on-call schedule to handle significant production issues. • Rapidly diagnosing and resolving technical challenges that arise in production. • Collaborating with customer support and engineering teams for seamless issue resolution. • Maintaining clear communication and documentation during and after incidents. • Leveraging these experiences to contribute to continuous process improvement.

Apply Now
Developed by Lior Neu-ner. I'd love to hear your feedback — Get in touch via DM or support@remoterocketship.com