Fernando – AI agent orchestration, Multi-agent systems architecture, AI telemetry
Fernando is a senior AI Agent Architect with 8 years of experience and deep expertise in Python, LLMs, multi-agent systems architecture, AI agent orchestration, and RAG. He has led end-to-end delivery of AI-driven platforms in healthtech and legal domains, demonstrating strong product judgment, stakeholder communication, and technical ownership.
8 years of commercial experience in
Main technologies
Additional skills
Direct hire
PossibleReady to get matched with vetted developers fast?
Let’s get started today!Experience Highlights
Senior AI Engineer
A personal AI assistant and multi-agent orchestration layer built around Claude Code and a LangGraph-based agent core, coordinating multiple simultaneous LLM agent instances on long-running development tasks. The system combines an autonomous "brain" agent with a fleet of Claude Code sessions, exposing them through a FastAPI backend (REST + WebSocket), a Typer CLI, and a Flutter client for desktop and mobile. It supports always-on, wake-word-activated voice interaction and event-based triggers, real-time streaming of agent output, self-evolving tool creation, permission-gated tool execution, autonomous decision-making, long-term semantic memory, and multi-LLM routing across providers, all packaged as a one-command installable distribution.The project explores production patterns for agentic systems beyond single-agent prompting, focusing on observability, controlled autonomy, extensibility, and reliability in multi-agent workflows.
- Architected and built a multi-agent orchestration layer coordinating multiple Claude Code instances in parallel, with an autonomous "brain" agent on top of LangGraph with persistent checkpointing, capable of monitoring sessions and deciding when to act, respond, or escalate.
- Designed an extensible tool ecosystem combining a unified tool registry (filesystem, shell, web, PTY, semantic memory), a self-evolving system where the agent autonomously designs and registers new tools at runtime, and an MCP (Model Context Protocol) client for interoperability with external tool servers.
- Implemented a multi-LLM routing layer dispatching tasks across providers (Anthropic, OpenAI, local models) based on cost, latency, and task profile.
- Built long-term semantic memory with RAG, using a local vector store (Chroma + sentence-transformers) and applied advanced prompt engineering with persona/config-driven behavior (YAML-based prompts, tools, and triggers) to improve tool selection accuracy and decision quality.
- Developed real-time WebSocket streaming for agent output with per-session subscribe/unsubscribe and connection lifecycle handling, alongside a permission-gated tool execution model with explicit approval policies for sensitive actions.
- Integrated a full voice interaction stack: speech-to-text and text-to-speech (OpenAI Whisper + ElevenLabs) with a bidirectional audio I/O pipeline and always-on wake-word activation (e.g. "Hey Charles") via a continuous low-power keyword spotter for hands-free use.
- Built async backend infrastructure with FastAPI for concurrent agent session management, including metrics, admin, and event-trigger subsystems.
- Developed a Flutter client (mobile/desktop) consuming the same REST + WebSocket API as the CLI, and packaged the system as a one-command installable distribution with a macOS launchd service for background operation.
Tech Lead / AI & Full-Stack Engineer
A healthtech platform for pediatricians in Brazil that combines a specialized clinical forum, AI-driven decision support, and intelligent case management. It bridges the gap between static medical content platforms and real clinical workflows by integrating AI directly into forum discussions, with autonomous participation, RAG over historical cases, and clinical modules for diagnostic analysis, medical chat, and case triage. The main challenge was creating AI that clinicians would trust and adopt in a high-stakes domain, while balancing technical reliability with product judgment.




- Led end-to-end architecture and delivery of the platform, owning AI integration, backend, and product decisions;
- Designed and implemented AI clinical modules (diagnostic analysis, intelligent medical chat, case triage) with structured prompt engineering, chain-of-thought reasoning, output validation, and safety guardrails for medical content;
- Refactored the AI layer through LangChain to standardize prompt templates, chains, and structured output parsing across all modules;
- Built an autonomous AI agent that monitors forum discussions via Cloud Functions, used as an async message queue, applying contribution heuristics to decide when to act versus stay silent;
- Implemented a RAG pipeline over the forum's historical case base with embedding pipelines and vector search, grounding AI responses in past clinical discussions and surfacing similar cases at posting time;
- Implemented subscription-based access via Stripe Billing, role-based authentication, and normalized transactional data modeling;
- Built CI/CD pipelines with unit and integration testing, plus performance observability;
- Shaped the product UX of AI participation, moving away from generic AI-style responses toward concise doctor-to-doctor communication patterns.
Lead AI Engineer / Co-founder
An on-premises RAG assistant for law firms that cannot send confidential client data to external LLM providers. The product runs entirely locally, including a hosted LLM and a retrieval stack tuned for legal documents, and supports the full workflow from ingestion and semantic chunking to embedding generation, vector search, and retrieval-aware response generation. The main challenge was delivering production-grade RAG infrastructure under strict data isolation requirements, with no external API dependencies in the critical path.
- Co-founded the project and led technical design end-to-end, including the full RAG architecture;
- Designed and implemented the embedding generation pipeline tailored to legal document structure;
- Built the vector search layer on Qdrant with semantic chunking strategies optimized for legal content;
- Deployed a locally hosted LLM (Qwen3-32B) on Apple Silicon, evaluating model trade-offs against task and latency requirements;
- Implemented retrieval-aware prompt engineering with citation grounding, ensuring responses reference specific source passages;
- Integrated MCP-based document ingestion from Google Drive, enabling automated and structured document onboarding;
- Architected the system to operate fully air-gapped, meeting the data isolation requirements of legal clients;
- Defined product positioning and go-to-market approach in partnership with stakeholders, including hardware selection and operational cost modeling.
Senior AI & Full-Stack Engineer
A no-code landing page builder that lets users describe what they need in natural language and generates fully structured, editable landing pages. It was built to turn LLM output into reliable, deterministic UI content that the editor can consume directly, with validation, fallbacks, and idempotent updates instead of free-form text. The product was designed and delivered as a single-founder effort, from the AI generation pipeline to the editor and deployment infrastructure.



- Built the frontend and backend of the platform, from initial concept to deployed platform;
- Designed the product strategy, user experience, and technical architecture end-to-end;
- Engineered the LLM layer that translates natural language input into deterministic JSON/YAML mapped to UI components (layout, copy, CTAs, forms);
- Implemented validation, fallback handling, content scoring, and idempotent updates to ensure reliable generation across varied user inputs;
- Defined the structured output contract between the LLM and the rendering layer, enforcing schema validity end-to-end;
- Built the editing flow that allows end users to iterate on AI-generated pages without breaking structural integrity;
- Set up automated deployment pipelines with GitHub Actions for reliable releases;
- Balanced model selection, prompt design, and latency to maintain responsive UX during interactive AI generation.