Site Reliability Engineer – SRE

51 - 200 employees

Founded 2018

🤖 Artificial Intelligence

☁️ SaaS

🏛️ Government

Artificial Intelligence • SaaS • Government

Attus Procuradoria Digital is an AI-powered SaaS platform designed for public procuradorias and legal departments. It uses generative AI and automations to read and synthesize legal documents (petitions, sentences, agreements), generate drafts and minute texts, monitor deadlines, distribute and manage cases, and support fiscal debt recovery through data hygiene and enrichment with government sources. The solution is trained and customized to each procuradoria's reality to help prosecutors, managers and advisors focus on strategic tasks while routine processing and automation run in the background.

Site Reliability Engineer – SRE

Job not on LinkedIn

🔥 5 minutes ago

🇧🇷 Brazil – Remote

⏰ Full Time

🟡 Mid-level

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

🗣️🇧🇷🇵🇹 Portuguese Required

Ansible

DNS

Docker

ElasticSearch

Grafana

Kafka

Kubernetes

Linux

Prometheus

Python

Redis

Terraform

Apply Now

Find Similar Remote Jobs

📊 Check your resume score for this job

Improve your chances of getting an interview by checking your resume score before you apply.

Attus Procuradoria Digital

51 - 200 employees

Founded 2018

🤖 Artificial Intelligence

☁️ SaaS

🏛️ Government

Artificial Intelligence • SaaS • Government

📋 Description

• Define and track reliability indicators (SLI, SLO, SLA) and operate based on Error Budget; • Establish high availability, resilience and disaster recovery strategies (RTO/RPO); • Conduct capacity planning and service performance analysis; • Work on the reliability and performance of applications running on Kubernetes; • Design and evolve system observability (logs, metrics, traces and alerts); • Create dashboards and alerts focused on visibility and action, reducing noise and false positives; • Detect issues before customers do by instrumenting services; • Establish and run the incident response process (classification, severity, on-call); • Lead or support troubleshooting of applications and distributed environments; • Perform root cause analysis (RCA) and post-mortems, proposing preventive measures; • Develop and maintain operational runbooks; • Automate operational tasks and incident responses (self-healing), eliminating repetitive manual work; • Use AI for log analysis, anomaly detection, troubleshooting and optimization (AIOps); • Continuously pursue the principle “automate before repeating,” advancing operational maturity; • Collaborate with development and platform teams to continuously improve reliability; • Promote a culture of reliability and best practices across teams; • Apply security best practices in production environments (secrets, access control, segregation); • Ensure traceability (logs, auditing and events); • Support compliance with standards such as ISO 27001 and DevSecOps practices; • Integrate reliability and security (Security by Design).

🎯 Requirements

• Experience or familiarity with observability tools (Grafana, Prometheus, Elastic, Dynatrace or similar) • Experience or familiarity with Kubernetes and containers (Docker) • Knowledge of Linux and networking (HTTP, DNS, TLS/SSL) • Knowledge of scripting and automation (Shell, Python or similar) • Analytical skills and a strong problem-solving focus • Regular use of AI in daily work and an automation mindset ("automate before repeating") • Organized, autonomous profile with strong technical communication — comparable to a mid/senior Full Stack Developer with production projects • Quick learner • Continuous desire to learn • Empathy for customer logic • Focus on delivering the best customer experience • Collaborative mindset; able to offer and ask for help • Strong communication skills to interact with different areas • Proactive and well-organized • Alignment with our values: Honesty and Ethics; Excellence and Care in Deliverables; Recognition; Respect and Courtesy • Experience with SLI, SLO and Error Budget • Experience troubleshooting distributed systems • Experience with critical, high-availability environments • Experience with APM tools (Dynatrace, Datadog) • Knowledge of OpenTelemetry and instrumentation • Knowledge of Kafka, Elasticsearch or Redis • Experience with incident automation (self-healing) and IaC (Terraform, Ansible) • Knowledge of Chaos Engineering and service mesh • Experience applying AI to operations (AIOps, technical copilots) • Experience in regulated environments (government, legal or financial)

🏖️ Benefits

• Health plan: Comprehensive care for your health. • Life insurance: Security and peace of mind for you and your family. • Partner discounts: Access to pharmacies, nutritionists and psychologists with special conditions. • Well-being app (Clude): Encouragement for physical activities and well-being. • Total Pass: Access to a wide network of nearby gyms. • Workplace exercise: Active breaks to care for your body during work. • Meal allowance: For CLT employment contracts. • Caju Card: A special gift for your birthday month. • Home office allowance: Support to set up a comfortable and productive workspace. • Education assistance: Support for your academic and professional development. • Book allowance: Encouragement to expand your knowledge. • Continuous development: Programs and initiatives to boost your career. • Innovation program: A space for you to bring ideas and make a difference. • Dual screen: Proper tools for improved productivity. • 100% remote position: Work from where you feel best. • FreeDay • Moment Off: We encourage breaks for disconnection and rest. • Time off for your graduation: We celebrate your achievements with you. • Gift for new children of employees: A token to celebrate the arrival of a new family member. • Welcome-back gift after paternity leave: Support upon returning from this important phase. • Supportive and collaborative environment: A team that helps and grows together. • Eco-friendly welcome kit: Start your journey with us sustainably. • Sustainable culture: Practical actions such as promoting composting. • Virtual social gatherings: Moments to celebrate and connect with the team. • Ongoing engagement campaigns throughout the year.

Apply Now

Similar Jobs

DevOps, Java

🕒 2 days ago

Sensedia

501 - 1000

🔌 API

☁️ SaaS

💳 Fintech

DevOps engineer at Sensedia leading Java application development and cloud infrastructure management. Promoting DevOps culture and ensuring system reliability in a fully remote environment.

🇧🇷 Brazil – Remote

💰 Private Equity Round on 2021-05

⏰ Full Time

🟡 Mid-level

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

🗣️🇧🇷🇵🇹 Portuguese Required

Ansible

AWS

Azure

Cloud

Docker

Java

Jenkins

Kubernetes

MongoDB

MySQL

NGINX

Postgres

Redis

ServiceNow

Terraform

Mid-level DevOps Analyst

🕒 2 days ago

Runtalent

501 - 1000

🤝 B2B

👥 HR Tech

☁️ SaaS

Analista DevOps pleno no desenvolvimento e manutenção de pipelines de CI/CD. Envolvendo automação e gerenciamento de ambientes containerizados com foco em qualidade e segurança.

🇧🇷 Brazil – Remote

⏰ Full Time

🟡 Mid-level

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

🗣️🇧🇷🇵🇹 Portuguese Required

Azure

Cloud

Docker

Kubernetes

Linux

Master DevOps Engineer, AWS

🕒 4 days ago

CI&T

5001 - 10000

🤖 Artificial Intelligence

☁️ SaaS

Senior DevOps Engineer at CI&T creating scalable tech solutions and leading a team on AWS infrastructure management. Collaborating with clients to enhance platform performance using modern DevOps practices.

🇧🇷 Brazil – Remote

💰 $5.5M Venture Round on 2014-04

⏰ Full Time

🟡 Mid-level

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

AWS

Cloud

Docker

Kubernetes

Python

Terraform

Senior DevOps / SRE

🕒 4 days ago

CI&T

5001 - 10000

🤖 Artificial Intelligence

☁️ SaaS

Tech transformation specialist at CI&T creating scalable tech solutions with over 30 years of expertise. Focus on implementing and managing CI/CD for .NET and Next.js applications.

🇧🇷 Brazil – Remote

💰 $5.5M Venture Round on 2014-04

⏰ Full Time

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

🗣️🇧🇷🇵🇹 Portuguese Required

AWS

Azure

Docker

Grafana

JavaScript

Kubernetes

Next.js

Prometheus

Terraform

TypeScript

.NET

Senior SRE

🕒 4 days ago

CI&T

5001 - 10000

🤖 Artificial Intelligence

☁️ SaaS

Senior SRE ensuring high reliability of applications through advanced SRE practices. Focused on observability, resilience, and cloud automation in a tech transformation company.

🇧🇷 Brazil – Remote

💰 $5.5M Venture Round on 2014-04

⏰ Full Time

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

🗣️🇧🇷🇵🇹 Portuguese Required

AWS

Docker

Ray