Andrea

From Italy (UTC+2)

Data Engineer|Senior

Lemon.io stats

1

projects done

308

hours worked

Skills and seniority verified on Dec 3, 2025

Andrea – SQL, Python, AWS

Andrea is a Senior Data Engineer with extensive experience in distributed data pipelines, Spark/Databricks, and fraud detection systems, primarily in large-scale fintech environments. He demonstrates strong skills in Apache Spark, Kafka, AWS, and medallion architecture, with hands-on ML lifecycle tooling. The feedback from vetting at Lemon.io highlights his calm, collaborative communication and real-world enterprise expertise!

16 years of commercial experience in

Data analytics

Fintech

Insurance

Machine learning

Marketing

Data monetization

AI software

Enterprise software

Web development

Software development

Main technologies

SQL

13 years

Python

9 years

AWS

9 years

Additional skills

JavaScript

LangChain

Scikit-learn

Airflow

CrewAI

MLflow

PySpark

Databricks

DBT

Pandas

FastAPI

Selenium

Data Modeling

ETL

API

LLM

Web scraping

Direct hire

Possible

Ready to get matched with vetted developers fast?

Let’s get started today!

Experience Highlights

Lead AI Engineer

Jun 2025 - Dec 20256 months

Project Overview

An intelligent AI agent platform for insurance automation, streamlining b2b workflows and reducing operational costs of insurtech by over 50% through AI-driven process optimization (e.g., quoting, product selection).

Responsibilities:

Built FastAPI microservices with Pydantic validation to receive and action automated quoting requests;
Integrated Gemini API for AI reasoning and autonomous decisions across insurance tasks;
Deployed on AWS with containerized services for elastic scaling and fault tolerance;
Added Nylas integration for email/calendar automation and customer workflow streamlining;
Implemented automated quote retrieval using Browser-Use and custom Selenium workflows;
Architected Celery processing for asynchronous quoting requests, enabling scalable task queuing;
Persist workflows data and metadata in PostgreSQL.

Project Tech stack:

FastAPI

Python

GoogleAPI

Selenium

QA Automation

Pydantic

Swagger

Celery

SQL

PostgreSQL

LLM

Web scraping

API

REST API

Tech lead

Feb 2024 - May 20251 year 3 months

Project Overview

An ML-as-a-service platform on Databricks for internal teams to accelerate model development and deployment. The platform enabled large-scale data engineering to ingest raw data from S3, Snowflake, and Kafka, with templated data pipelines for cleaning, feature engineering, encoding, and sampling following the medallion model. It leveraged Ray APIs for distributed model training and tuning, and included integrated model serving and monitoring for seamless production readiness.

Responsibilities:

Owned all technical decisions, defining architecture, tools, and frameworks to ensure scalability and performance;
Led a global team to deliver high-quality, collaborative code aligned with business goals;
Delivered scalable data engineering pipelines handling large volumes from S3, Snowflake, and Kafka, using the medallion model architecture;
Developed custom feature encoding libraries for distributed processing on Databricks and optimized feature engineering in PySpark for the calculation of critical time-based features;
Implemented and productionized Ray on Databricks for distributed model training and tuning, leveraging its richer APIs for superior training efficiency and performance;
Applied MLOps best practices, including multi-environment setups and champion-challenger strategies for robust production workflows.

Project Tech stack:

Databricks

Apache Spark

PySpark

Python

SQL

Ray

MLflow

Apache Kafka

Snowflake

Scikit-learn

Pandas

ETL

Senior Data Engineer

Aug 2023 - Feb 20246 months

Project Overview

Customized CDAP (open source version of Google Data Fusion) platform for designing no-to-low-code ETL pipelines. It leveraged AWS services like EMR, Kinesis, S3, and included custom serverless solutions for encryption and tokenization. The platform featured a visual pipeline builder, reusable plugins, and lifecycle management, enabling external teams to efficiently design and manage complex data workflows.

Responsibilities:

Successfully helped customers productionise a handful of data pipelines in a few months, including batch and real-time streaming use cases;
Developed a custom Spark-based plugin to add metadata, validate partitioning, and ensure transformation consistency;
Customised AWS Kinesis plugin to optimise shards to maximize throughput and minimize ingestion latency to Snowflake;
Created a custom plugin to integrate serverless AWS Lambda functions for data tokenization and encryption of sensitive PCI and PII data.

Project Tech stack:

Java

Apache Spark

AWS Lambda

AWS

Amazon S3

Kubernetes

Grafana

Snowflake

SQL

ETL

Data Modeling

Senior Data Engineer

May 2022 - Aug 20231 year 3 months

Project Overview

The project migrated an on-premises Hadoop/Hive platform to a cloud architecture using AWS, Snowflake, and Databricks. Data from Kafka is streamed into S3, ingested, and processed by Databricks for heavy transformations and ML workloads. Snowflake hosted BI-ready data with dbt transformations, serving key stakeholders. All the data pipelines that ingested data into the on-premises platform were updated and migrated to Snowflake and Databricks environments. This hybrid platform balances advanced data engineering in Databricks with scalable analytics in Snowflake, catering to diverse user needs.

Responsibilities:

Actively led end-to-end migration, redesigning data ingestion pipelines from Kafka and databases via AWS S3;
Implemented a custom Kafka Connect component to ingest data into S3 efficiently;
Migrated Spark pipelines from Hadoop to Databricks for scalable processing and advanced transformations;
Ported hive-based data pipelines into dbt ELT workflows running on Snowflake, optimizing for BI stakeholder needs;
Orchestrated automated ELT pipelines with Airflow for reliable and monitored data flows;
Coordinated data quality and performance tuning across both dbt and Spark pipelines;
Integrated Snowflake’s app/gold data layers with Tableau and delivered custom dashboards and reporting solutions.

Project Tech stack:

Snowflake

DBT

Python

Scala

Apache Spark

Amazon S3

AWS

Airflow

Tableau

SQL

ETL

Senior Data Engineer

Sep 2020 - Sep 20211 year

Project Overview

Containerized monitoring platform orchestrated with Docker on AWS Elastic Kubernetes Service (EKS), designed to oversee fraud model performance in real time. It processed streaming data via Kafka Streams while computing batch KPIs with Spark, ensuring comprehensive health tracking.

Responsibilities:

Implemented custom Spark processing libraries for hourly/daily batch KPIs, including model drift via KL divergence, and produced results to Kafka for downstream consumption;
Designed and developed a Kafka Streams solution for real-time KPI generation, processing streaming metrics with low-latency aggregations and anomaly thresholds;
Built Kafka consumers to push metrics into Prometheus and Grafana, plus multiple dashboards visualizing drift, accuracy, latency, and uptime;
Integrated Prometheus and Grafana with corporate observability platforms, configuring alerts via email and Slack for rapid incident response;
Coordinated optimization of container orchestration on EKS, ensuring high availability and fault-tolerant metric ingestion;
Designed end-to-end integration tests validating real-time Kafka Streams, batch Spark KPIs, Prometheus metrics flow, and Grafana dashboards in staging before prod promotion;
Introduced PostgreSQL to maintain state for static and non-temporal KPIs.

Project Tech stack:

Scala

Kafka

Grafana

Prometheus

Kubernetes

AWS

Helm

Python

Apache Spark

PySpark

PostgreSQL

SQL

Senior Data Engineer

Oct 2019 - Sep 202011 months

Project Overview

Machine learning feedback system that closed the loop between on-prem Hadoop data lake fraud analysis and company's AWS-hosted FraudSight API, enabling continuous model retraining with confirmed fraud cases (CHB/RFIs). It involved processing of massive transaction volumes to identify confirmed fraud/chargebacks/refunds, then fed negative feedback.

Responsibilities:

Optimized fraud matching pipeline from source tables through Hive partitioning and Spark resource tuning;
Implemented Spark producer with Confluent Schema Registry enforcement using Avro for FraudSight API contract;
Developed custom Kafka Connect HTTP sink consuming schema-validated messages with batching and serialization;
Implemented API throttling & retry logic with rate limiting, backpressure, and dead letter queue;
Added Hadoop pipeline monitoring, tracking job SLAs, data freshness, and failure alerts;
Implemented Kafka Connect monitoring via Prometheus/Grafana dashboards.

Project Tech stack:

Apache Kafka

Apache Spark

Scala

Python

Java

API

Senior Data Engineer

Apr 2019 - Oct 20196 months

Project Overview

A custom data processing solution that parsed and loaded company's Point-of-Sale (POS) transaction log files into the enterprise Hadoop data lake, providing analytics visibility to financial department.

Responsibilities:

Designed end-to-end ingestion pipeline from SFTP landing through tokenization, encryption, and Hadoop data lake loading.;
Developed a custom Scala parser with concurrent multi-threaded processing for high-throughput PTLF file handling;
Implemented multi-stage recovery logic enabling partial parsing of malformed files while skipping corrupt records;
Implemented SFTP server integration with automated file discovery, secure credential management, and incremental fetching.;
Integrated PCI-DSS compliant encryption for data at rest/transit with field-level tokenization of cardholder data;
Optimized Hive storage with ORC format for columnar compression, predicate pushdown, and 10x query performance gains.

Project Tech stack:

Scala

Python

Apache Spark

Hive

Apache Hadoop

SQL

ETL

Keep in mind, the experience summary might exclude non-relevant projects

Education

2007

Computer Engineering

Bachelor's

2009

Computing Systems Engineering

Master's

2010

Computer Science

Master's

Languages

Italian

Advanced

Spanish

Pre-intermediate

English

Advanced

Hire Andrea or someone with similar qualifications in days

All developers are ready for interview and are are just waiting for your request