Logo
Andrea – SQL, Apache Spark, Apache Kafka, experts in Lemon.io

Andrea

From Italy (UTC+2)flag

Data Engineer|Senior
Lemon.io stats
1
offers now 🔥

Andrea – SQL, Apache Spark, Apache Kafka

Andrea is a Senior Data Engineer with extensive experience in distributed data pipelines, Spark/Databricks, and fraud detection systems, primarily in large-scale fintech environments. He demonstrates strong skills in Apache Spark, Kafka, AWS, and medallion architecture, with hands-on ML lifecycle tooling. The feedback from vetting at Lemon.io highlights his calm, collaborative communication and real-world enterprise expertise!

15 years of commercial experience in
Data analytics
Fintech
Insurance
Machine learning
AI software
Enterprise software
Software development
Main technologies
SQL
13 years
Apache Spark
9 years
Apache Kafka
7.5 years
Python
9 years
AWS
9 years
Additional skills
HTML
Java
JavaScript
LangChain
Scikit-learn
Airflow
CrewAI
GCP
Microsoft Azure
Snowflake
Direct hire
Possible
Ready to get matched with vetted developers fast?
Let’s get started today!

Experience Highlights

Lead AI Engineer
Jun 2025 - Nov 20255 months
Project Overview

An intelligent AI agent platform for insurance automation, streamlining b2b workflows and reducing operational costs of insurtech by over 50% through AI-driven process optimization (e.g., quoting, product selection).

Responsibilities:
  • Built FastAPI microservices with Pydantic validation to receive and action automated quoting requests;
  • Integrated Gemini API for AI reasoning and autonomous decisions across insurance tasks​;
  • Deployed on AWS with containerized services for elastic scaling and fault tolerance;
  • Added Nylas integration for email/calendar automation and customer workflow streamlining;
  • Implemented automated quote retrieval using Browser-Use and custom Selenium workflows;
  • Architected Celery processing for asynchronous quoting requests, enabling scalable task queuing;
  • Persist workflows data and metadata in PostgreSQL.
Project Tech stack:
FastAPI
Python
GoogleAPI
Selenium
QA Automation
Pydantic
Swagger
Celery
SQL
PostgreSQL
Tech lead
Feb 2024 - May 20251 year 3 months
Project Overview

An ML-as-a-service platform on Databricks for internal teams to accelerate model development and deployment. The platform enabled large-scale data engineering to ingest raw data from S3, Snowflake, and Kafka, with templated data pipelines for cleaning, feature engineering, encoding, and sampling following the medallion model. It leveraged Ray APIs for distributed model training and tuning, and included integrated model serving and monitoring for seamless production readiness.

Responsibilities:
  • Owned all technical decisions, defining architecture, tools, and frameworks to ensure scalability and performance;
  • Led a global team to deliver high-quality, collaborative code aligned with business goals;
  • Delivered scalable data engineering pipelines handling large volumes from S3, Snowflake, and Kafka, using the medallion model architecture;
  • Developed custom feature encoding libraries for distributed processing on Databricks and optimized feature engineering in PySpark for the calculation of critical time-based features;
  • Implemented and productionized Ray on Databricks for distributed model training and tuning, leveraging its richer APIs for superior training efficiency and performance;
  • Applied MLOps best practices, including multi-environment setups and champion-challenger strategies for robust production workflows.
Project Tech stack:
Databricks
Apache Spark
PySpark
Python
SQL
Ray
MLflow
Apache Kafka
Snowflake
Scikit-learn
Pandas
Senior Data Engineer
Aug 2023 - Feb 20246 months
Project Overview

Customized CDAP (open source version of Google Data Fusion) platform for designing no-to-low-code ETL pipelines. It leveraged AWS services like EMR, Kinesis, S3, and included custom serverless solutions for encryption and tokenization. The platform featured a visual pipeline builder, reusable plugins, and lifecycle management, enabling external teams to efficiently design and manage complex data workflows.

Responsibilities:
  • Successfully helped customers productionise a handful of data pipelines in a few months, including batch and real-time streaming use cases;
  • Developed a custom Spark-based plugin to add metadata, validate partitioning, and ensure transformation consistency;
  • Customised AWS Kinesis plugin to optimise shards to maximize throughput and minimize ingestion latency to Snowflake;
  • Created a custom plugin to integrate serverless AWS Lambda functions for data tokenization and encryption of sensitive PCI and PII data.
Project Tech stack:
Java
Apache Spark
AWS Lambda
AWS
Amazon S3
Kubernetes
Grafana
Snowflake
SQL
Senior Data Engineer
May 2022 - Aug 20231 year 3 months
Project Overview

The project migrated an on-premises Hadoop/Hive platform to a cloud architecture using AWS, Snowflake, and Databricks. Data from Kafka is streamed into S3, ingested, and processed by Databricks for heavy transformations and ML workloads. Snowflake hosted BI-ready data with dbt transformations, serving key stakeholders. All the data pipelines that ingested data into the on-premises platform were updated and migrated to Snowflake and Databricks environments. This hybrid platform balances advanced data engineering in Databricks with scalable analytics in Snowflake, catering to diverse user needs.

Responsibilities:
  • Actively led end-to-end migration, redesigning data ingestion pipelines from Kafka and databases via AWS S3;
  • Implemented a custom Kafka Connect component to ingest data into S3 efficiently;
  • Migrated Spark pipelines from Hadoop to Databricks for scalable processing and advanced transformations;
  • Ported hive-based data pipelines into dbt ELT workflows running on Snowflake, optimizing for BI stakeholder needs;
  • Orchestrated automated ELT pipelines with Airflow for reliable and monitored data flows;
  • Coordinated data quality and performance tuning across both dbt and Spark pipelines;
  • Integrated Snowflake’s app/gold data layers with Tableau and delivered custom dashboards and reporting solutions.
Project Tech stack:
Snowflake
DBT
Python
Scala
Apache Spark
Amazon S3
AWS
Airflow
Tableau
SQL
Senior Data Engineer
Sep 2020 - Sep 20211 year
Project Overview

Containerized monitoring platform orchestrated with Docker on AWS Elastic Kubernetes Service (EKS), designed to oversee fraud model performance in real time. It processed streaming data via Kafka Streams while computing batch KPIs with Spark, ensuring comprehensive health tracking.

Responsibilities:
  • Implemented custom Spark processing libraries for hourly/daily batch KPIs, including model drift via KL divergence, and produced results to Kafka for downstream consumption;
  • Designed and developed a Kafka Streams solution for real-time KPI generation, processing streaming metrics with low-latency aggregations and anomaly thresholds;
  • Built Kafka consumers to push metrics into Prometheus and Grafana, plus multiple dashboards visualizing drift, accuracy, latency, and uptime;
  • Integrated Prometheus and Grafana with corporate observability platforms, configuring alerts via email and Slack for rapid incident response;
  • Coordinated optimization of container orchestration on EKS, ensuring high availability and fault-tolerant metric ingestion;
  • Designed end-to-end integration tests validating real-time Kafka Streams, batch Spark KPIs, Prometheus metrics flow, and Grafana dashboards in staging before prod promotion;
  • Introduced PostgreSQL to maintain state for static and non-temporal KPIs.
Project Tech stack:
Scala
Kafka
Grafana
Prometheus
Kubernetes
AWS
Helm
Python
Apache Spark
PySpark
PostgreSQL
SQL
Senior Data Engineer
Oct 2019 - Sep 202010 months
Project Overview

Machine learning feedback system that closed the loop between on-prem Hadoop data lake fraud analysis and company's AWS-hosted FraudSight API, enabling continuous model retraining with confirmed fraud cases (CHB/RFIs). It involved processing of massive transaction volumes to identify confirmed fraud/chargebacks/refunds, then fed negative feedback.

Responsibilities:
  • Optimized fraud matching pipeline from source tables through Hive partitioning and Spark resource tuning;
  • Implemented Spark producer with Confluent Schema Registry enforcement using Avro for FraudSight API contract;
  • Developed custom Kafka Connect HTTP sink consuming schema-validated messages with batching and serialization;
  • Implemented API throttling & retry logic with rate limiting, backpressure, and dead letter queue;
  • Added Hadoop pipeline monitoring, tracking job SLAs, data freshness, and failure alerts;
  • Implemented Kafka Connect monitoring via Prometheus/Grafana dashboards.
Project Tech stack:
Apache Kafka
Apache Spark
Scala
Python
Java
Senior Data Engineer
Apr 2019 - Oct 20196 months
Project Overview

A custom data processing solution that parsed and loaded company's Point-of-Sale (POS) transaction log files into the enterprise Hadoop data lake, providing analytics visibility to financial department.

Responsibilities:
  • Designed end-to-end ingestion pipeline from SFTP landing through tokenization, encryption, and Hadoop data lake loading.;
  • Developed a custom Scala parser with concurrent multi-threaded processing for high-throughput PTLF file handling;
  • Implemented multi-stage recovery logic enabling partial parsing of malformed files while skipping corrupt records;
  • Implemented SFTP server integration with automated file discovery, secure credential management, and incremental fetching.;
  • Integrated PCI-DSS compliant encryption for data at rest/transit with field-level tokenization of cardholder data;
  • Optimized Hive storage with ORC format for columnar compression, predicate pushdown, and 10x query performance gains.​
Project Tech stack:
Scala
Python
Apache Spark
Hive
Apache Hadoop
SQL

Education

2007
Computer Engineering
Bachelor's
2009
Computing Systems Engineering
Master's
2010
Computer Science
Master's

Languages

Italian
Advanced
Spanish
Pre-intermediate
English
Advanced

Hire Andrea or someone with similar qualifications in days
All developers are ready for interview and are are just waiting for your requestdream dev illustration
Copyright © 2025 lemon.io. All rights reserved.