Senior AI Engineer – Voice Agent Platform

November 24

🇨🇷 Costa Rica – Remote

⏰ Full Time

🟠 Senior

🤖 AI Engineer

Apply Now
Logo of Gorilla Logic

Gorilla Logic

SaaS • Enterprise • Artificial Intelligence

Gorilla Logic is a company renowned for its expertise in modern software and data engineering. Serving as a strategic partner rather than just a vendor, Gorilla Logic specializes in digital product design, cloud engineering, data and AI delivery, DevOps, quality assurance, and legacy modernization. With a team of skilled digital product designers, solutions architects, and Agile nearshore teams, Gorilla Logic has been instrumental in developing business-critical software applications for Fortune 500 and SMB companies for over 20 years. Their services include creating SaaS platforms, enhancing digital experiences, and providing flexible, security-focused solutions. Gorilla Logic operates with teams located in Costa Rica, Colombia, Mexico, and the United States, emphasizing collaborative partnerships to deliver cutting-edge digital engineering solutions.

501 - 1000 employees

☁️ SaaS

🏢 Enterprise

🤖 Artificial Intelligence

📋 Description

• Design and implement LangGraph-based agent architectures with multi-turn memory, real-time decision-making, and complex state management. • Build autonomous voice agents that handle interruptions, context switching, and live customer interactions. • Develop specialized agent types (customer service, sales, routing) with intelligent tool and function calling capabilities. • Implement agent evaluation systems using LLM-as-Judge methodologies to assess accuracy, hallucination detection, and goal achievement. • Create configurable templates for rapid, multi-tenant deployment and scalability. • Integrate and optimize LLM providers (OpenAI GPT-4o/GPT-5, Groq Llama 4, Anthropic Claude) with dynamic model routing and fallback strategies. • Apply advanced prompt engineering techniques for voice-first applications, including templating, few-shot learning, and context management. • Build streaming LLM pipelines that coordinate sentence-level text generation with real-time text-to-speech synthesis. • Develop function calling frameworks for tools like call transfer, conferencing, recording, and external integrations. • Build real-time speech-to-text pipelines using Deepgram Nova-3 with voice activity detection and interruption handling. • Implement multi-provider text-to-speech orchestration (ElevenLabs, Deepgram, Cartesia) with voice cloning and tone control. • Develop low-latency audio streaming over WebSockets with buffering, codec handling, and error recovery. • Create dual-channel recording systems with speaker separation for QA and data collection. • Optimize end-to-end latency in the STT → LLM → TTS pipeline to achieve natural conversational flow. • Extend agents to handle text, voice, and vision inputs using GPT-4o multimodal capabilities. • Build cross-modal reasoning systems that combine transcription, context, and visual data. • Implement document and image understanding features for real-time reference during conversations. • Design evaluation frameworks to assess multimodal performance and interaction quality. • Architect event-driven microservices using NATS JetStream for reliable message delivery. • Build multi-tenant RPC frameworks with access controls, secrets management, and isolation. • Deploy to Kubernetes with autoscaling, health checks, and fault-tolerant design. • Implement observability solutions using OpenTelemetry for full pipeline visibility.

🎯 Requirements

• Proven experience building production-grade agentic AI systems using LangChain, LangGraph, or AutoGPT. • Deep understanding of ReAct agent architectures, tool use, memory systems, and multi-agent orchestration. • Hands-on integration with LLM APIs such as OpenAI GPT-4o/GPT-5, Anthropic Claude, and Groq Llama 4. • Expertise in prompt engineering, few-shot learning, and system prompt optimization. • Experience managing function calling pipelines, latency, hallucination control, and streaming responses. • 2+ years developing voice AI systems with Deepgram, OpenAI Whisper, ElevenLabs, or similar providers. • Knowledge of audio codecs (MULAW, PCM), VAD, noise cancellation, and real-time audio streaming. • Experience with WebRTC, LiveKit, Twilio, or Telnyx for real-time communications. • Familiarity with multimodal AI models like GPT-4o or Gemini for cross-modal reasoning. • Strong proficiency in Node.js (22+) and TypeScript, using modern async and event-driven patterns. • Experience with Express.js and MongoDB (Mongoose) for high-write and time-series workloads. • Hands-on experience with Kubernetes and Docker for scalable deployments. • Familiarity with AWS services such as Secrets Manager and S3. • Experience using GitHub Actions, Kustomize, pnpm workspaces, and Changesets for CI/CD. • Understanding of distributed systems fundamentals—idempotency, retries, circuit breakers, and high availability.

🏖️ Benefits

• Health insurance • Flexible work arrangements • Professional development opportunities

Apply Now
Developed by Lior Neu-ner. I'd love to hear your feedback — Get in touch via DM or support@remoterocketship.com