Vinay
From Ireland (UTC+1)
Vinay – AI agent development, LLM, LangChain
Vinay brings 10 years of experience in AI/ML engineering, with a strong focus on production-ready AI agent systems, RAG architectures, and LLM integration. He's built enterprise-scale solutions across healthcare and fintech — including custom frameworks — and is comfortable working across the full stack from Python and FastAPI to vector databases and compliance-sensitive design. What sets Vinay apart is how he combines deep technical chops with a consultative, client-facing style. He's led hands-on projects in both startup and enterprise environments, and tends to gravitate toward pragmatic, end-to-end ownership — from architecture decisions all the way through to delivery.
10 years of commercial experience in
Main technologies
Additional skills
Direct hire
PossibleReady to get matched with vetted developers fast?
Let’s get started today!Experience Highlights
AI Engineer / Founder
An AI hiring platform for SMB retail and QSR teams in the US and Canada. It enables low-latency voice-based candidate screening, signal-based scoring across stated, demonstrated, and behavioral dimensions, and hybrid human-in-the-loop interview workflows.


- Implemented signals-based candidate scoring across stated, demonstrated, and behavioral dimensions;
- Built voice AI interviewer agents (Avon and Avey) with low-latency turn-taking and interruption handling;
- Designed a pluggable workflow engine using strategy and factory patterns for customizable hiring pipelines;
- Integrated ATS providers via Merge.dev for unified candidate data synchronization;
- Implemented credit-based usage tracking and billing integration;
- Hardened voice interviewers against prompt injection and jailbreak attacks;
- Shipped a human-in-the-loop interview room for hybrid AI and human screening;
- Built a multi-tenant FastAPI backend with PostgreSQL and pgvector for semantic candidate matching;
- Implemented the voice pipeline with streaming STT, LLM routing, and ElevenLabs TTS;
- Integrated Langfuse observability across voice and scoring pipelines;
- Containerized services with Docker and deployed to AWS with CI/CD via GitHub Actions;
- Tuned retrieval and prompt strategy to keep per-interview LLM costs predictable.
AI Engineer / Consultant
An AI recruitment platform used by multiple clients on a daily basis. As an AI consultant and engineer, he helped standardize multilingual support across 7+ AI services, established Langfuse as the main observability and prompt-governance layer, improved voice AI cost efficiency through prompt caching, and strengthened context handling and safety defenses across the platform. He also built a golden-dataset evaluation framework for prompt regression testing and resolved several production incidents through formal root-cause analysis, contributing to a 35% overall LLM cost reduction per candidate interview.
- Introduced AI governance and traceability with Langfuse across 10+ microservices;
- Implemented layered memory for agents to learn user-specific behavior via tool calling;
- Implemented prompt caching for conversational Voice AI, reducing LLM costs by 30%;
- Implemented strict guardrails and safety boundaries to prevent system abuse;
- Improved RAG retrieval with agentic RAG loops and RRF combining BM25 and semantic matches;
- Shipped to production with complete observability and tracing;
- Implemented PII redaction using Presidio with a local LLM to protect user data;
- Added multilingual Voice AI support for Spanish, French, and German;
- Implemented intelligent model selection per use case, reducing overall LLM costs by 5%.
AI Engineer
An FCA-compliant multi-agent AI platform for UK fintech users. It routes requests through an orchestrator agent to specialized workers for transaction analysis, subscription tracking, savings coaching, and money leak detection, with guardrails for PII redaction and compliance. The platform integrates with open banking and transaction enrichment services, uses token-based subscription tiers, and runs in production on AWS with Langfuse observability.
- Designed multi-agent architecture using LangGraph with an orchestrator-worker pattern and hierarchical agents pattern;
- Built specialized agents for transaction analysis, subscription tracking, savings coaching, and money leak detection;
- Implemented FCA-compliant input and output guardrails with PII redaction and prompt injection defense;
- Integrated TrueLayer for open banking and Ntropy for transaction enrichment;
- Developed a FastAPI backend with async LLM orchestration and streaming responses;
- Deployed on AWS ECS with Auto Scaling Groups and Celery workers for background processing;
- Built Langfuse observability for per-agent tracing, token cost tracking, and latency monitoring;
- Designed token-based subscription economics across Free, Plus, and Pro tiers;
- Implemented a PostgreSQL data layer and Redis session caching with rate limiting;
- Wrote unit and integration tests for agent flows and compliance guardrails;
- Set up CI/CD pipelines via GitHub Actions with infrastructure as code.
Senior AI Engineer / Lead
A production RAG system for healthcare insurance documentation, built to support executive-facing Q&A over 100,000+ policy and regulatory documents. It uses hybrid retrieval, reranking, metadata-preserving chunking, source citation, and confidence-based refusal logic to improve answer quality and reduce hallucinations. The platform also includes HIPAA-aligned data handling, PII protection, and an evaluation pipeline for retrieval recall and answer faithfulness.
- Built a production RAG pipeline over 100,000+ healthcare insurance documents;
- Implemented hybrid retrieval combining BM25 sparse search and dense semantic search;
- Added a reranking layer to improve top-k relevance for complex policy queries;
- Designed a metadata-preserving chunking strategy for regulatory and policy documents;
- Implemented source citation and confidence-threshold-based refusal logic to control hallucinations;
- Implemented a self-evaluation loop using LLM-as-a-judge for relevance checks;
- Ensured HIPAA-aligned data handling and PII protection across the pipeline;
- Built an evaluation pipeline measuring retrieval recall and answer faithfulness;
- Developed a FastAPI backend with pgvector for embedding storage and similarity search;
- Integrated observability and tracing across retrieval and generation stages;
- Tuned prompts and retrieval parameters to optimize cost and latency for executive-facing use cases.