Logo
Faizan – Python, SQL, AWS, experts in Lemon.io

Faizan

From Pakistan (UTC+3)flag

Data Scientist|Senior

Faizan – Python, SQL, AWS

Faizan is a Senior Level Engineer who is very responsible and thoughtful. He has good theoretical knowledge in ML and also can think out of the box. He is specialized in NLP, Reinforcement Learning, and Machine Learning on tabular data. Faizan will be a great addition to your team.

13 years of commercial experience in
Asset management
Cybersecurity
Data analytics
AI software
Enterprise software
NLP software
Main technologies
Python
6.5 years
SQL
6.5 years
AWS
3 years
Lambda
3 years
Flask
2.5 years
NLP
6 years
NumPy
6 years
Data Science
6 years
Pandas
6 years
Tensorflow
4 years
Additional skills
Data Modeling
GCP
Keras
MySQL
REST API
Scikit-learn
NLTK
BigQuery
PyTorch
GPT-3
Git
GitHub
Amazon ECS
Amazon EC2
Docker
Big Data
Redshift
AI
GCP Compute Engine
AWS Lambda
API Gateway
Neo4j
Google Analytics
Microsoft SQL Server
Microsoft Azure
Direct hire
Possible
Ready to get matched with vetted developers fast?
Let’s get started today!

Experience Highlights

Data Scientist
Jul 2022 - Ongoing3 years 1 month
Project Overview

Auguria.io is a startup that aims to revolutionize security data management and analysis, providing a robust, efficient, and future-proof platform that empowers security analysts to excel in their roles. The main features of the product are as follows:

  • Enhanced data ingestion capabilities to accommodate a broader spectrum of data sources;
  • Proprietary analytical features equipping security analysts with powerful tools to detect, analyze, and respond to security threats;
  • Enhancing the efficiency of security analysts by streamlining workflows and reducing response times;
  • Cost-effective solution for data storage and analysis.
Responsibilities:
  • developed a scalable clustering algorithm using advanced feature extraction techniques and locality-sensitive hashing. Enriched the clusters with representative topic words and descriptions using LLMs.
  • performed an efficient cluster scoring technique to detect outliers and flag anomalies;
  • dockerized the above and deployed it on Amazon Elastic Container Service;
  • integrated the system with AWS Redshift, Google Bigquery, and Neo4j to perform advanced analytics;
  • performed data sanitization by removing personal data such as names, email id, date of birth using Named Entity Recognition;
  • implemented batch pipelines with various sources and sinks using Mageai.
Project Tech stack:
Python
PyTorch
GPT-3
Docker
Amazon ECS
Amazon EC2
GitHub
Git
GCP Compute Engine
Redshift
BigQuery
Big Data
AWS Lambda
API
API Gateway
AI
Machine leaning
Neo4j
SQL
Data Scientist
Jan 2021 - Jan 20232 years
Project Overview

The objective of this project was to deliver an auto-assignment algorithm that can take an input of a panel and an equipment list containing all power circuits required to run the equipment in a factory and outputs the best available panel to assign to a circuit. In order for an assignment to be successful, there were lot of criterion to be met:

  • Maintaining minimum distance between panel and circuit
  • Limiting the maximum amount of connections to a panel
  • Panel and circuit voltages must be compatible
  • Load balancing across the panels The project was deployed as async web application and is currently used in one of the intel's production factories.
Responsibilities:
  • implemented an arc builder script that generated all the potential connections between various panels and circuits;
  • formulated and implemented the problem as a mixed integer linear program;
  • created sparse matrices representing the coefficient of the required constraints;
  • implemented a post-processing algorithm that identified the reason why a particular circuit is not assigned;
  • maintained the code base, implemented the new features, and fixed bugs
Project Tech stack:
Pandas
Python
NumPy
networkx
cplex
Microsoft SQL Server
Microsoft Azure
Machine leaning
Data Scientist
Feb 2021 - Jul 20215 months
Project Overview

This project was intended to conduct an assessment for students to gauge their knowledge on a particular topic. The problem was formulated in a two-stage pipeline form, where the first sequence-to-sequence language model (Pegasus and T5) was used to generate question and answer pairs, and the second language model (BERT) was used to filter out the generations that didn't met our quality criteria. This project was deployed as an end-to-end REST api which was consumed by the client's web app. The main features included data preparation, usage of state of the art language models, and text generation based on input passage.

Responsibilities:
  • created a custom training dataset using question-answering datasets such as squad, newsqa, race, and sciq;
  • trained large transformer-based language models (Pegasus, T5 and BERT) using TPUs and GPUs on google cloud platform;
  • deployed the ML models using huggingface deep learning container on AWS;
  • deployed the end-to-end system using AWS api gateway, lambda function and sagemaker;
  • experience on data preparation, hyperparameter tuning, and model optimization.
Project Tech stack:
Python
Machine leaning
NLP
Pandas
AWS
NumPy
Data Modeling
GCP
transformers
Machine Learning Engineer
May 2020 - Aug 20203 months
Project Overview

The aim of the project was to predict the future value of an asset in order to make an informed decision on future investment. Contrary to the conventional approach, this project involved developing a deep reinforcement learning agent that will trade automatically to maximize profits. The final product was deployed as web service on a local server. The key features of the product included price forecasting (long or short position), automated signal (buy, sell or hold) generation, robustness to price fluctuations, and, adaptability to predict various assets.

Responsibilities:
  • prepared OHLC time series data;
  • implemented a Deep Q-learning based agent to perform automated trading;
  • formulated the problem as control-based to withstand large amounts of price fluctuations;
  • deployed the agent as web API.
Project Tech stack:
Data Science
Tensorflow
Keras
Python
Pandas
Machine learning
gym
Data Science Intern
Aug 2019 - Nov 20193 months
Project Overview

United Nation Development Program (UNDP) reviews national plan and sector strategies of various countries in order to align them with their Substantiable Development Goals (SDG). This process involves manually reading national plan documents containing hundreds of pages and then mapping each paragraph to one or more of the 169 targets of 17 SDGs. This project employed various machine learning and Natural Language Processing (NLP) techniques to automatically map new sentences from a national plan to the relevant SDG. The key features of the application were identifying the interlinkages between various SDGs, extracting topic words from documents, and classifying new sentences to the relevant SDG category.

Responsibilities:
  • performed data ETL on thousands of word and pdf project documents;
  • implemented semantic search on a collection of documents;
  • implemented Named Entity recognition on source text;
  • implemented keyword-based search on a collection of documents;
  • trained language models (BERT, RobertA) for classification of sentence/paragraph.
Project Tech stack:
Python
SQL
REST API
Machine leaning
NLP
Tensorflow
Pandas
NumPy
Scikit-learn
NLTK
transformers

Education

2012
Computer Engineering
Bachelor of Science
2019
Computer Science
Master of Science
2023
Computer Science
Doctor of Philosophy

Languages

English
Advanced

Hire Faizan or someone with similar qualifications in days
All developers are ready for interview and are are just waiting for your requestdream dev illustration
Copyright © 2025 lemon.io. All rights reserved.