Faizan – Python, SQL, AWS
Faizan is a Senior Level Engineer who is very responsible and thoughtful. He has good theoretical knowledge in ML and also can think out of the box. He is specialized in NLP, Reinforcement Learning, and Machine Learning on tabular data. Faizan will be a great addition to your team.
13 years of commercial experience in
Main technologies
Additional skills
Direct hire
PossibleReady to get matched with vetted developers fast?
Let’s get started today!Experience Highlights
Data Scientist
Auguria.io is a startup that aims to revolutionize security data management and analysis, providing a robust, efficient, and future-proof platform that empowers security analysts to excel in their roles. The main features of the product are as follows:
- Enhanced data ingestion capabilities to accommodate a broader spectrum of data sources;
- Proprietary analytical features equipping security analysts with powerful tools to detect, analyze, and respond to security threats;
- Enhancing the efficiency of security analysts by streamlining workflows and reducing response times;
- Cost-effective solution for data storage and analysis.
- developed a scalable clustering algorithm using advanced feature extraction techniques and locality-sensitive hashing. Enriched the clusters with representative topic words and descriptions using LLMs.
- performed an efficient cluster scoring technique to detect outliers and flag anomalies;
- dockerized the above and deployed it on Amazon Elastic Container Service;
- integrated the system with AWS Redshift, Google Bigquery, and Neo4j to perform advanced analytics;
- performed data sanitization by removing personal data such as names, email id, date of birth using Named Entity Recognition;
- implemented batch pipelines with various sources and sinks using Mageai.
Data Scientist
The objective of this project was to deliver an auto-assignment algorithm that can take an input of a panel and an equipment list containing all power circuits required to run the equipment in a factory and outputs the best available panel to assign to a circuit. In order for an assignment to be successful, there were lot of criterion to be met:
- Maintaining minimum distance between panel and circuit
- Limiting the maximum amount of connections to a panel
- Panel and circuit voltages must be compatible
- Load balancing across the panels The project was deployed as async web application and is currently used in one of the intel's production factories.
- implemented an arc builder script that generated all the potential connections between various panels and circuits;
- formulated and implemented the problem as a mixed integer linear program;
- created sparse matrices representing the coefficient of the required constraints;
- implemented a post-processing algorithm that identified the reason why a particular circuit is not assigned;
- maintained the code base, implemented the new features, and fixed bugs
Data Scientist
This project was intended to conduct an assessment for students to gauge their knowledge on a particular topic. The problem was formulated in a two-stage pipeline form, where the first sequence-to-sequence language model (Pegasus and T5) was used to generate question and answer pairs, and the second language model (BERT) was used to filter out the generations that didn't met our quality criteria. This project was deployed as an end-to-end REST api which was consumed by the client's web app. The main features included data preparation, usage of state of the art language models, and text generation based on input passage.
- created a custom training dataset using question-answering datasets such as squad, newsqa, race, and sciq;
- trained large transformer-based language models (Pegasus, T5 and BERT) using TPUs and GPUs on google cloud platform;
- deployed the ML models using huggingface deep learning container on AWS;
- deployed the end-to-end system using AWS api gateway, lambda function and sagemaker;
- experience on data preparation, hyperparameter tuning, and model optimization.
Machine Learning Engineer
The aim of the project was to predict the future value of an asset in order to make an informed decision on future investment. Contrary to the conventional approach, this project involved developing a deep reinforcement learning agent that will trade automatically to maximize profits. The final product was deployed as web service on a local server. The key features of the product included price forecasting (long or short position), automated signal (buy, sell or hold) generation, robustness to price fluctuations, and, adaptability to predict various assets.
- prepared OHLC time series data;
- implemented a Deep Q-learning based agent to perform automated trading;
- formulated the problem as control-based to withstand large amounts of price fluctuations;
- deployed the agent as web API.
Data Science Intern
United Nation Development Program (UNDP) reviews national plan and sector strategies of various countries in order to align them with their Substantiable Development Goals (SDG). This process involves manually reading national plan documents containing hundreds of pages and then mapping each paragraph to one or more of the 169 targets of 17 SDGs. This project employed various machine learning and Natural Language Processing (NLP) techniques to automatically map new sentences from a national plan to the relevant SDG. The key features of the application were identifying the interlinkages between various SDGs, extracting topic words from documents, and classifying new sentences to the relevant SDG category.
- performed data ETL on thousands of word and pdf project documents;
- implemented semantic search on a collection of documents;
- implemented Named Entity recognition on source text;
- implemented keyword-based search on a collection of documents;
- trained language models (BERT, RobertA) for classification of sentence/paragraph.