Faizan – Python, SQL, AWS
Faizan is a Senior Level Engineer who is very responsible and thoughtful. He has good theoretical knowledge in ML and also can think out of the box. He is specialized in NLP, Reinforcement Learning, and Machine Learning on tabular data. Faizan will be a great addition to your team.
13 years of commercial experience in
Main technologies
Additional skills
Direct hire
PossibleReady to get matched with vetted developers fast?
Let’s get started today!Experience Highlights
Data Scientist
Basesite's primary product Fab Builder, is a facility design automation tool that uses proprietary algorithms and AI to optimize the design of complex manufacturing facilities, primarily in the semiconductor industry. Its key functions and benefits include:
- Solving Complex Design Problems: The Fab Builder is specifically designed to handle the immense complexity of designing a semiconductor fabrication plant ("fab"). It can analyze millions of data points to determine the optimal layout for electrical panels, process laterals, and chemical cabinets, which are connected by thousands of miles of cables and pipes. This process is too complex for manual, human-based design methods to perform efficiently.
- Reducing Costs and Materials: By using its optimization algorithms, the tool reduces the "overscoping" of materials. A case study shows it saved a fab over $17 million by reducing the number of electrical panels from 792 to 534 and the total cable length from 221 km to 204 km.
- Improving Efficiency and Speed: It significantly shortens the design phase. A manual design process that could take around six weeks can be completed in as little as one week using The Optimizer. This speed allows engineers to quickly adapt to changes in a project, saving time and resources.
- Empowering Engineers: By automating repetitive, data-heavy tasks like calculating load requirements and finding panel locations, Fab Builder allows highly skilled engineers to focus on more complex, high-value engineering problems and innovation.
- implemented an arc builder script that generated all the potential connections between various panels and circuits;
- Developed an advanced Mixed-Integer Programming (MIP) model to solve a complex tool-toequipment assignment problem;
- Engineered a multi-objective optimization solution to minimize costs and maximize equipment utilization while satisfying strict constraints;
- Managed the data preparation phase, including ETL jobs, to process raw data from databases & spreadsheets for the optimization model;
- implemented a post-processing algorithm that identified the reason why a particular circuit is not assigned;
- maintained the code base, implemented the new features, and fixed bugs
Data Scientist/Machine learning Engineer
Auguria.io's primary product acts as a sophisticated data processing engine for security operations (SecOps) teams. Its core purpose is to tackle the challenges of data overload, alert fatigue, and high costs associated with security telemetry. The platform intelligently ingests massive volumes of data from various sources like SIEMs and XDRs and uses an AI-powered approach to refine it. Key features of the product are as follows:
- AI-Powered Noise Reduction: Filters out up to 99% of non-actionable data, allowing analysts to focus on genuine threats.
- Cost Optimization: Reduces SIEM and XDR data ingestion costs by intelligently routing data to appropriate storage tiers.
- Data Normalization: Automatically standardizes disparate security data into a single, unified format (OCSF schema) for seamless integration and analysis.
- Contextual Enrichment: Enriches every security event with relevant context and a priority score, making it easier to understand and act upon.
- Vendor Agnostic: Acts as a flexible and interoperable layer that connects various security tools, helping to prevent vendor lock-in.
- developed a scalable clustering algorithm using advanced feature extraction techniques and locality-sensitive hashing. Enriched the clusters with representative topic words and descriptions using LLMs.
- performed an efficient cluster scoring technique to detect outliers and flag anomalies;
- dockerized the above and deployed it on Amazon Elastic Container Service; -Engineered and deployed high-performance Text Embeddings Inference pipelines for scalable text embedding and sequence classification models;
- integrated the system with AWS Redshift, Google Bigquery, and Neo4j to perform advanced analytics;
- performed data sanitization by removing personal data such as names, email id, date of birth using Named Entity Recognition;
- implemented batch pipelines with various sources and sinks using Mageai.
Data Scientist
This project was intended to conduct an assessment for students to gauge their knowledge on a particular topic. The problem was formulated in a two-stage pipeline form, where the first sequence-to-sequence language model (Pegasus and T5) was used to generate question and answer pairs, and the second language model (BERT) was used to filter out the generations that didn't met our quality criteria. This project was deployed as an end-to-end REST api which was consumed by the client's web app. The main features included data preparation, usage of state of the art language models, and text generation based on input passage.
- created a custom training dataset using question-answering datasets such as squad, newsqa, race, and sciq;
- trained large transformer-based language models (Pegasus, T5 and BERT) using TPUs and GPUs on google cloud platform;
- deployed the ML models using huggingface deep learning container on AWS;
- deployed the end-to-end system using AWS api gateway, lambda function and sagemaker;
- experience on data preparation, hyperparameter tuning, and model optimization.
Machine Learning Engineer
The aim of the project was to predict the future value of an asset in order to make an informed decision on future investment. Contrary to the conventional approach, this project involved developing a deep reinforcement learning agent that will trade automatically to maximize profits. The final product was deployed as web service on a local server. The key features of the product included price forecasting (long or short position), automated signal (buy, sell or hold) generation, robustness to price fluctuations, and, adaptability to predict various assets.
- prepared OHLC time series data;
- implemented a Deep Q-learning based agent to perform automated trading;
- formulated the problem as control-based to withstand large amounts of price fluctuations;
- deployed the agent as web API.
Data Science Intern
United Nation Development Program (UNDP) reviews national plan and sector strategies of various countries in order to align them with their Substantiable Development Goals (SDG). This process involves manually reading national plan documents containing hundreds of pages and then mapping each paragraph to one or more of the 169 targets of 17 SDGs. This project employed various machine learning and Natural Language Processing (NLP) techniques to automatically map new sentences from a national plan to the relevant SDG. The key features of the application were identifying the interlinkages between various SDGs, extracting topic words from documents, and classifying new sentences to the relevant SDG category.
- performed data ETL on thousands of word and pdf project documents;
- implemented semantic search on a collection of documents;
- implemented Named Entity recognition on source text;
- implemented keyword-based search on a collection of documents;
- trained language models (BERT, RobertA) for classification of sentence/paragraph.