Luca

From Italy (UTC+2)

AI Engineer|Middle-to-senior

Skills and seniority verified on Mar 12, 2026

Luca – Python, OpenAI, LLM

Luca is a strong mid-to-early senior AI engineer specializing in applied LLM, RAG, and agent systems. He demonstrates a strong level of proficiency in Python and practical experience with modern LLM frameworks, retrieval pipelines, and cloud deployment. Luca is recognized for his systems thinking, clear technical communication, and collaborative approach, though advanced ML theory and defensive coding are areas for further growth.

6 years of commercial experience in

Machine learning

AI software

NLP software

Main technologies

Python

9 years

OpenAI

3 years

LLM

3 years

LangChain

2 years

AI agent development

3 years

GCP

3.5 years

AWS

3.5 years

Additional skills

DigitalOcean

Redis

Docker

FastAPI

Hugging Face

PyTorch

PostgreSQL

RAG

LangGraph

Machine learning

NumPy

Ansible

Direct hire

Possible

Ready to get matched with vetted developers fast?

Let’s get started today!

Experience Highlights

Lead Engineer

Aug 2025 - Feb 20266 months

Project Overview

Developed a distributed inference engine that splits large neural networks across multiple FPGA accelerators on edge devices, enabling models too large or slow for a single chip to run efficiently. The system quantizes PyTorch models to int8, partitions the graph into parallel fragments, compiles each for the DPU, and orchestrates execution across a multi-node FPGA cluster via TCP.

A critical compiler boundary penalty was resolved with a novel output fix operator, making the system production-ready. HPC-grade FPGA resource management was integrated using SLURM, and benchmarks across CNN and ConvViT architectures provided clear scaling guidance: distribution is beneficial for models above ~100M parameters on 3 nodes and ~25M on 2 nodes.

Responsibilities:

Designed and built a distributed model-parallel inference system across FPGA edge devices (AMD Xilinx Kria KV260) end-to-end;
Discovered and solved a previously undocumented Vitis AI compiler boundary penalty that caused ~60x slowdown, bringing it to 0% CPU-offloaded convolutions via a novel output fix operator;
Built a custom XIR graph splitter that partitions quantized neural networks into stem/experts/tail fragments while preserving int8 quantization boundaries;
Achieved up to 1.93x speedup and 64% parallel efficiency on 3-node clusters for large models (>100M parameters);
Conducted 28 micro-model experiments to empirically characterize Vitis AI compiler behavior, producing a reusable "empirical oracle" for safe graph cut points;
Integrated SLURM for HPC-style FPGA orchestration with only 0.13% overhead, enabling automated resource management via GRES;
Validated cross-architecture generalization on ConvViT (convolutional Vision Transformer) with 86-97% DPU utilization;
Automated the full deployment pipeline with Ansible playbooks for cluster setup, model distribution, and benchmark execution.

Project Tech stack:

PyTorch

Python

Ansible

NumPy

Distributed Systems

Senior AI Engineer

Sep 2023 - Jan 20262 years 4 months

Project Overview

Core maintainer of semantic-router, a high-performance open-source library (3,200+ GitHub stars) that acts as a decision layer for LLM applications. It enables deterministic routing of user queries to the correct pipeline, tool, or model, optimizing for cost, latency, and reliability at scale.

Additionally, developed and delivered production RAG systems for enterprise clients, covering the full stack: retrieval pipelines, semantic routing, evaluation frameworks, and inference serving. The work focused on making LLM applications predictable, debuggable, and cost-efficient in production environments.

Responsibilities:

Core maintainer of semantic-router, an open-source LLM decision layer with 3,200+ GitHub stars;
Built production RAG systems for enterprise clients including retrieval, routing, and evaluation pipelines;
Designed deterministic routing logic to optimize cost, latency, and reliability for production LLM apps;
Shipped end-to-end enterprise NLP solutions from prototyping to production deployment;
Managed community contributions, issue triage, and release cycles for the open-source project.

Project Tech stack:

Python

OpenAI

LangChain

LangGraph

Hugging Face

LLM

Pinecone

Senior AI Engineer

Sep 2024 - Nov 20251 year 2 months

Project Overview

Rebuilt a large-scale conversational AI engine designed to provide engaging and coherent dialogue for millions of users. The project focused on improving the quality and reliability of conversations by implementing a hybrid RAG + RLHF architecture, optimizing the inference pipeline for lower latency, and ensuring robust safety and quality guardrails. The work involved end-to-end architecture design, model integration, reliability engineering, and close collaboration with product and safety teams to deliver a production-ready system.

Responsibilities:

Owned the full rebuild of the conversational AI engine serving 35M+ users globally;
Architected a hybrid RAG + RLHF stack that boosted engagement by 40% and dialogue coherence by 25%;
Reduced inference latency by 30% while maintaining safety and quality guardrails;
Designed and maintained the retrieval pipeline, prompt orchestration, and model serving infrastructure;
Coordinated with product and safety teams to ship iterative improvements on a weekly release cycle.

Project Tech stack:

Python

PyTorch

Hugging Face

RAG

Redis

Docker

PostgreSQL

FastAPI

Co-founder & AI Lead

Dec 2023 - Dec 20241 year

Project Overview

This project was focused on building an AI-powered platform to optimize eCommerce operations by automatically classifying and organizing large product catalogs. The goal was to help online retailers improve product discovery, streamline catalog management, and support scalable growth. The platform leveraged machine learning to deliver highly accurate product categorization and insights, enabling faster, more efficient business decisions.

Responsibilities:

Co-founded the company and secured €1.1M in pre-seed funding;
Built the core product classification engine using cross-encoders, achieving 95%+ accuracy;
Designed and implemented ML pipelines processing 100K+ product catalogs end-to-end;
Led technical strategy, architecture decisions, and investor-facing technical presentations;
Transitioned to an advisory role after a successful product launch.

Project Tech stack:

Python

FastAPI

LangChain

Machine learning

Keep in mind, the experience summary might exclude non-relevant projects

Education

2020

Computer Engineering

Bachelor

Languages

English

Advanced

Hire Luca or someone with similar qualifications in days

All developers are ready for interview and are are just waiting for your request