Lead Software Engineer II, AI Operations

🕒 January 19

Apply Now
Find Similar Remote Jobs

📊 Check your resume score for this job

Improve your chances of getting an interview by checking your resume score before you apply.

Logo of Best Egg

Best Egg

501 - 1000 employees

Founded 2014

💸 Finance

💳 Fintech

👥 B2C

Finance • Fintech • B2C

Best Egg is a lending company that offers a range of financial products and services, including personal loans, debt consolidation, credit card refinancing, and secured loans using vehicle equity or homeowner discounts. The company emphasizes hassle-free and fast financial service, with an online platform that allows users to manage their accounts, view offers, and access various financial health resources. Best Egg aims to support customers in building a brighter financial future by offering competitive rates and personalized loan terms. Best Egg loans are facilitated through partnerships with banks and are available in various amounts, with flexible terms and conditions.

📋 Description

• Deliver internal copilots and customer/agent-facing automations with clear SLAs, rollbacks, and observability from day one. • Design ingestion, chunking, embeddings, indexing, hybrid search/rerank, and retrieval evaluation; track retriever quality via offline golden sets and online metrics. • Design and implement scalable AWS architectures, including AWS AI features such as Bedrock, IAM, knowledge bases, secure secrets and policy enforcement, automated provisioning, and resource-usage governance as core platform capabilities. • Add tracing, prompt/agent version lineage, eval dashboards, and regression alerts; establish golden datasets and canary tests. • Enforce PII redaction, safety filters, role-based access, audit logs, and human-in-the-loop review paths to control quality and risk. • Version and deploy prompts, tools, agents, and retrieval pipelines; support blue/green and shadow deploys with automatic rollback triggers. • Cut run-rate spend through caching, truncation, batching, autoscaling, and model routing; establish clear unit economics per workflow. • Provide templates, SDKs, and high-quality abstractions that let product teams ship safely without bespoke plumbing; improve developer experience. • Build primarily in Python and Metaflow (Outerbounds); deploy on AWS (Bedrock + core services) and OpenAI; use Cursor in daily workflows; help evaluate and, when appropriate, run on Databricks. • Participate in on-call, author runbooks, and remove single-thread risk for AI services; drive reliability and resilience akin to ML Ops.

🎯 Requirements

• 5–10 years of professional software engineering (or equivalent) with 2+ years building AI/LLM applications; portfolio of shipped AI projects (links to code, demos, or case studies). • Demonstrated passion for relentless exploration of the latest AI models, frameworks, and tooling, ensuring constant adoption of state-of-the-art innovations in the workflow. • Hands-on with some/all of OpenAI, Bedrock, Huggingface/Ollama/vLLM; MCP servers and function/tool calling, multi-turn orchestration, streaming, and prompt/version management. • Practical experience designing and tuning retrieval systems (chunking, embeddings, hybrid search, reranking), integration with vector database, and measuring retrieval quality. • Comfortable building APIs/services and simple UIs where needed; strong fundamentals in Python and modern packaging/testing. • CI/CD, containers, cloud fundamentals (AWS), and runtime performance tuning; experience operating services in production. • Metaflow (Outerbounds) preferred; Databricks familiarity is a plus; ability to integrate data/feature pipelines and schedule/operate flows. • Tracing and logging, expertise in tools like Datadog, Dynatrace or Grafana where relevant for AI monitoring is essential. • Comfortable optimizing latency/throughput/cost, and implementing guardrails for PII/safety/compliance. • Partner effectively with data scientists, analysts, and engineers; promote best practices and high-leverage abstractions. • Fine-tuning or distillation experience; Kubernetes or FastAPI exposure; familiarity with Snowflake or similar warehousing for retrieval sources.

🏖️ Benefits

• Pre-tax and post-tax retirement savings plans with a competitive company matching program • Generous paid time-off plans including vacation, personal/sick time, paid short-term and long-term disability leaves, paid parental leave, and paid company holidays • Multiple health care plans to choose from, including dental and vision options • Flexible Spending Plans for Health Care, Dependent Care, and Health Reimbursement Accounts • Company-paid benefits such as life insurance, wellness platforms, employee assistance programs, and Health Advocate programs • Other great discounted benefits include identity theft protection, pet insurance, fitness center reimbursements, and many more!

Apply Now

Similar Jobs

🕒 January 19

RiskProfiler

51 - 200

🔒 Cybersecurity

☁️ SaaS

🤝 B2B

UI Integration Developer specializing in VueJS for RiskProfiler Inc. Integrating user interfaces with back-end systems and improving product performance.

JavaScript

TypeScript

Vue.js

🕒 January 19

3M Consultancy

1 - 10

🤝 B2B

🎯 Recruiter

Lead AEM Developer at 3M Consultancy responsible for high-quality solutions and document processing. Collaborating with cross-functional teams for optimal performance, scalability, and usability.

AEM

Angular

J2EE

Java

JavaScript

React

🕒 January 17

Fanatics, Inc.

1001 - 5000

🎮 Gaming

🛒 Retail

🛍️ eCommerce

Software Engineer II creating backend systems for Fanatics Betting & Gaming's sportsbook. Designing, developing, and optimizing high-performance applications.

AWS

Distributed Systems

Docker

Google Cloud Platform

Java

Kafka

Kubernetes

Microservices

Spring

Spring Boot

SpringBoot

Subversion

🕒 January 17

AuthZed

11 - 50

🔌 API

🔒 Cybersecurity

☁️ SaaS

Senior Software Engineer leading backend development of AuthZed Cloud’s control plane. Focusing on Kubernetes-native systems and secure access control integrations.

AWS

Azure

Cloud

Distributed Systems

Google Cloud Platform

Kubernetes

🕒 January 16

TextNow

51 - 200

📡 Telecommunications

Software Developer at TextNow redefining communication with iOS applications and intelligent systems. Leading technical initiatives and using automation for better performance and user experience.

Android

iOS

Objective-C

Swift