Logo
Daniel – Pandas, Tableau, SQL, experts in Lemon.io

Daniel

From United Kingdom (UTC+1)flag

Data Analyst|Middle-to-senior
Data Scientist|Senior

Daniel – Pandas, Tableau, SQL

Daniel is a senior Data Scientist and Data Analyst with over 7 years of hands-on experience in Python, Pandas, Tableau, and classical machine learning (scikit-learn, gradient boosting). He has led end-to-end analytics projects in manufacturing, supply chain, and fintech, building robust ETL pipelines and dashboards. Daniel demonstrates strong communication, stakeholder alignment, and ownership of production, consistently translating complex data into actionable insights. He would be an excellent addition to any team, bringing not only technical expertise but also a collaborative mindset, mentoring capabilities, and a proactive approach to problem-solving.

8 years of commercial experience in
Accounting
Advertising
AI
Apparel
Asset management
Banking
Beauty
Customer support
Data analytics
E-commerce
Electronics
Fintech
Machine learning
Manufacturing
Supply chain
Geospatial software
Main technologies
Pandas
7.5 years
Tableau
5 years
SQL
7.5 years
Python
7 years
Additional skills
Scikit-learn
SQL Server
GCP
Data visualization
Microsoft Azure
Machine learning
AWS
ETL
Data Warehouse
Data Modeling
Plotly
Cloud Computing
Data analysis
Data Science
Docker
AI
Airflow
Databricks
Cron
Linux
Direct hire
Possible
Ready to get matched with vetted developers fast?
Let’s get started today!

Experience Highlights

Lead Data Scientist
Jun 2025 - Dec 20256 months
Project Overview

In semiconductor wafer manufacturing, pick tools may collect defective dies because of their close proximity to functional ones on the wafer. To minimize this risk, dies are often manually screened to guide the tool toward clusters with a higher likelihood of reliability.

This project focused on developing an automated die screening approach using machine learning techniques. The objective was to identify reliable die clusters more efficiently and consistently, reducing reliance on manual inspection while improving the scalability and overall efficiency of the manufacturing workflow.

Responsibilities:
  • Re-architected the screening solution by shifting decision-making from manual to a machine learning - based approach. Gathered requirements, including data collected from manual screening for ML modelling;
  • Performed advanced feature engineering to generate spatial die features reflecting patterns previously applied in manual screening;
  • Trained and validated an XGBoost classification model to identify dies suitable for picking;
  • Applied SHAP analysis to interpret model predictions at the die level and confirm that the model captured physically meaningful screening patterns;
  • Evaluated model performance using precision–recall curves and ROC analysis, selecting a probability threshold aligned with the operational objective of high-precision screening;
  • Packaged the trained model as a reusable pickle bundle and deployed to production by integrating it into the existing ETL pipeline.
Project Tech stack:
Python
Scikit-learn
Machine learning
GCP
Data Science
ETL
Pandas
Cloud Computing
Microsoft SQL Server
Data analysis
Linux
Lead Data Scientist
Dec 2024 - Jun 20256 months
Project Overview

A machine learning solution for early-stage screening of low-yield semiconductor wafers. The approach enabled teams to prioritize high-potential wafers, reduce unnecessary testing, optimize resource allocation, and accelerate production ramp-up.

Responsibilities:
  • Collaborated with test engineers and operations teams to gather requirements, propose the solution, and ensure alignment with production needs;
  • Built ETL pipeline to collect and integrate wafer parametric test data and historical yield data from multiple sources;
  • Performed data profiling, cleaning, exploratory analysis, and feature engineering to prepare datasets and identify key parameters influencing yield;
  • Developed, evaluated, and validated a supervised machine learning model to predict wafer yield using identified parametric features and appropriate performance metrics;
  • Built dashboards to show predictions, parameter trends, and insights to engineering teams and trained them on how to independently use the tool;
  • Documented modelling approaches, assumptions, and workflows to ensure reproducibility and support future improvements.
Project Tech stack:
Python
Machine learning
ETL
Pandas
scikit-learn
Plotly
AWS
Data visualization
Data Modeling
Data Warehouse
Azure DevOps
Microsoft SQL Server
GitHub
API
Machine Learning Engineer
Feb 2024 - Sep 20247 months
Project Overview

A machine learning model to detect fraudulent transactions in a fintech application. The solution aimed to improve transaction security, reduce financial risk, and enhance user trust while supporting early-stage product development.

Responsibilities:
  • Performed exploratory data analysis to understand transaction patterns, fraud distribution, and key risk indicators across transaction types;
  • Engineered and selected meaningful features, including transaction balances, log-transformed transaction amounts, and temporal variables.Encoded categorical variables and prepared structured datasets for model training and evaluation;
  • Implemented strategies to address class imbalance, including balanced class weights during training;
  • Trained a fraud detection model using XGBoost, optimised for imbalanced classification, and evaluated its performance using recall as a primary metric;
  • Structured the project into a reproducible pipeline with modules for ETL, feature engineering, and prediction;
  • Deployed model in app, stress testing by simulating fraudulent transactions, and validating functionality.
Project Tech stack:
Python
AI
Scikit-learn
Database Management Systems
Airflow
Pandas
Plotly
AWS
Data Warehouse
Data Modeling
Data visualization
Senior Data Scientist
Jan 2023 - Aug 20237 months
Project Overview

A machine learning solution to predict potential delays along the supply chain’s critical path. The system provides early alerts to stakeholders, enabling proactive mitigation, improving operational efficiency, and minimizing disruption risks.

Responsibilities:
  • Collaborated with stakeholders to understand the causes and impact of process delays and define requirements for a predictive solution;
  • Collected and integrated historical supply chain data, including process timelines, order information, and operational metrics from multiple sources;
  • Performed data profiling, cleaning, and feature engineering to structure time-series datasets suitable for sequential modelling;
  • Developed and trained an LSTM (Long Short-Term Memory) neural network to model temporal dependencies and predict delays in critical supply chain processes;
  • Evaluated and validated model performance using historical outcomes and appropriate forecasting metrics to ensure reliability;
  • Collaborated with app developers to integrate the model in the team's app and to develop a delay notification feature.
Project Tech stack:
Python
Machine learning
Data Modeling
Data Warehouse
Docker
Microsoft SQL Server
Cloud Computing
Data visualization
Database Management Systems
Data analysis
scikit-learn
Microsoft Azure
Lead Data Analyst
Sep 2022 - Feb 20235 months
Project Overview

Developed the foundational analytics infrastructure by creating automated data pipelines, stored procedures, a custom Python module, and interactive dashboards. The solution standardized data processing and reporting, enabling more efficient analysis, consistent insights, and scalable decision-making.

Responsibilities:
  • Built the foundational analytics infrastructure, including automated data pipelines and stored procedures;
  • Designed and developed a custom Python library to standardise common data processing, analysis, and modelling tasks, enabling reusable and efficient workflows across projects;
  • Built interactive dashboards and reporting tools to visualise key business metrics and provide stakeholders with real-time operational insights;
  • Established documentation and coding practices to ensure reproducibility, maintainability, and scalability of analytical solutions.
Project Tech stack:
Python
Tableau
ETL
Data Warehouse
Data visualization
Data Modeling
eCommerce
Pandas
Databricks
Cron

Education

2023
Data Science (MSc)
Masters Degree
2018
Electronics and Computer Engineering
Bachelors

Languages

English
Advanced

Hire Daniel or someone with similar qualifications in days
All developers are ready for interview and are are just waiting for your requestdream dev illustration
Copyright © 2026 lemon.io. All rights reserved.