Logo
Luca – Python, OpenAI, LLM, experts in Lemon.io

Luca

From Italy (UTC+2)flag

AI Engineer|Middle-to-senior

Luca – Python, OpenAI, LLM

Luca is a strong mid-to-early senior AI engineer specializing in applied LLM, RAG, and agent systems. He demonstrates a strong level of proficiency in Python and practical experience with modern LLM frameworks, retrieval pipelines, and cloud deployment. Luca is recognized for his systems thinking, clear technical communication, and collaborative approach, though advanced ML theory and defensive coding are areas for further growth.

6 years of commercial experience in
AI
Machine learning
AI software
NLP software
Main technologies
Python
9 years
OpenAI
3 years
LLM
3 years
LangChain
2 years
AI agent development
3 years
GCP
3.5 years
AWS
3.5 years
Additional skills
DigitalOcean
FastAPI
Hugging Face
Redis
Docker
PostgreSQL
PyTorch
RAG
LangGraph
Machine learning
Ansible
NumPy
Direct hire
Possible
Ready to get matched with vetted developers fast?
Let’s get started today!

Experience Highlights

Lead Engineer
Aug 2025 - Feb 20266 months
Project Overview

Developed a distributed inference engine that splits large neural networks across multiple FPGA accelerators on edge devices, enabling models too large or slow for a single chip to run efficiently. The system quantizes PyTorch models to int8, partitions the graph into parallel fragments, compiles each for the DPU, and orchestrates execution across a multi-node FPGA cluster via TCP.

A critical compiler boundary penalty was resolved with a novel output fix operator, making the system production-ready. HPC-grade FPGA resource management was integrated using SLURM, and benchmarks across CNN and ConvViT architectures provided clear scaling guidance: distribution is beneficial for models above ~100M parameters on 3 nodes and ~25M on 2 nodes.

Responsibilities:
  • Designed and built a distributed model-parallel inference system across FPGA edge devices (AMD Xilinx Kria KV260) end-to-end;
  • Discovered and solved a previously undocumented Vitis AI compiler boundary penalty that caused ~60x slowdown, bringing it to 0% CPU-offloaded convolutions via a novel output fix operator;
  • Built a custom XIR graph splitter that partitions quantized neural networks into stem/experts/tail fragments while preserving int8 quantization boundaries;
  • Achieved up to 1.93x speedup and 64% parallel efficiency on 3-node clusters for large models (>100M parameters);
  • Conducted 28 micro-model experiments to empirically characterize Vitis AI compiler behavior, producing a reusable "empirical oracle" for safe graph cut points;
  • Integrated SLURM for HPC-style FPGA orchestration with only 0.13% overhead, enabling automated resource management via GRES;
  • Validated cross-architecture generalization on ConvViT (convolutional Vision Transformer) with 86-97% DPU utilization;
  • Automated the full deployment pipeline with Ansible playbooks for cluster setup, model distribution, and benchmark execution.
Project Tech stack:
PyTorch
Python
Ansible
NumPy
Distributed Systems
Senior AI Engineer
Sep 2023 - Jan 20262 years 4 months
Project Overview

Core maintainer of semantic-router, a high-performance open-source library (3,200+ GitHub stars) that acts as a decision layer for LLM applications. It enables deterministic routing of user queries to the correct pipeline, tool, or model, optimizing for cost, latency, and reliability at scale.

Additionally, developed and delivered production RAG systems for enterprise clients, covering the full stack: retrieval pipelines, semantic routing, evaluation frameworks, and inference serving. The work focused on making LLM applications predictable, debuggable, and cost-efficient in production environments.

Responsibilities:
  • Core maintainer of semantic-router, an open-source LLM decision layer with 3,200+ GitHub stars;
  • Built production RAG systems for enterprise clients including retrieval, routing, and evaluation pipelines;
  • Designed deterministic routing logic to optimize cost, latency, and reliability for production LLM apps;
  • Shipped end-to-end enterprise NLP solutions from prototyping to production deployment;
  • Managed community contributions, issue triage, and release cycles for the open-source project.
Project Tech stack:
Python
OpenAI
LangChain
LangGraph
Hugging Face
LLM
Pinecone
Senior AI Engineer
Sep 2024 - Nov 20251 year 2 months
Project Overview

Rebuilt a large-scale conversational AI engine designed to provide engaging and coherent dialogue for millions of users. The project focused on improving the quality and reliability of conversations by implementing a hybrid RAG + RLHF architecture, optimizing the inference pipeline for lower latency, and ensuring robust safety and quality guardrails. The work involved end-to-end architecture design, model integration, reliability engineering, and close collaboration with product and safety teams to deliver a production-ready system.

Responsibilities:
  • Owned the full rebuild of the conversational AI engine serving 35M+ users globally;
  • Architected a hybrid RAG + RLHF stack that boosted engagement by 40% and dialogue coherence by 25%;
  • Reduced inference latency by 30% while maintaining safety and quality guardrails;
  • Designed and maintained the retrieval pipeline, prompt orchestration, and model serving infrastructure;
  • Coordinated with product and safety teams to ship iterative improvements on a weekly release cycle.
Project Tech stack:
Python
PyTorch
Hugging Face
RAG
Redis
Docker
PostgreSQL
FastAPI
Co-founder & AI Lead
Dec 2023 - Dec 20241 year
Project Overview

This project was focused on building an AI-powered platform to optimize eCommerce operations by automatically classifying and organizing large product catalogs. The goal was to help online retailers improve product discovery, streamline catalog management, and support scalable growth. The platform leveraged machine learning to deliver highly accurate product categorization and insights, enabling faster, more efficient business decisions.

Responsibilities:
  • Co-founded the company and secured €1.1M in pre-seed funding;
  • Built the core product classification engine using cross-encoders, achieving 95%+ accuracy;
  • Designed and implemented ML pipelines processing 100K+ product catalogs end-to-end;
  • Led technical strategy, architecture decisions, and investor-facing technical presentations;
  • Transitioned to an advisory role after a successful product launch.
Project Tech stack:
Python
FastAPI
LangChain
Machine learning

Education

2020
Computer Engineering
Bachelor

Languages

English
Advanced

Hire Luca or someone with similar qualifications in days
All developers are ready for interview and are are just waiting for your requestdream dev illustration
Copyright © 2026 lemon.io. All rights reserved.