Faizan

From Pakistan (UTC+5)

Data Scientist|Senior

AI Engineer|Senior

Machine Learning Engineer|Strong senior

MLOps Engineer|Middle

Skills and seniority verified on Sep 20, 2022

Faizan – Python, SQL, AWS

Faizan is a senior-level engineer with a strong background in applied machine learning and a thoughtful, responsible approach to problem-solving. He combines solid theoretical knowledge with practical expertise across NLP, reinforcement learning, and optimization for complex real-world systems. His experience spans large-scale data processing, model deployment, and multi-objective optimization—ranging from semiconductor design automation to cybersecurity analytics. Faizan consistently demonstrates creativity, depth, and reliability, making him a valuable addition to any team.

13 years of commercial experience in

Asset management

Construction

Cybersecurity

Data analytics

Machine learning

Manufacturing

AI software

Enterprise software

NLP software

Main technologies

Python

8 years

SQL

8 years

AWS

5 years

Lambda

3 years

Flask

2.5 years

Data Science

6 years

NumPy

6 years

Pandas

6 years

NLP

6 years

Tensorflow

5 years

MLOps

3 years

LLM

5 years

Additional skills

Data Modeling

GCP

Keras

MySQL

REST API

Scikit-learn

NLTK

BigQuery

PyTorch

GPT-3

Git

GitHub

Docker

Amazon EC2

Amazon ECS

Big Data

Redshift

GCP Compute Engine

AWS Lambda

API Gateway

Neo4j

Google Analytics

Microsoft SQL Server

Microsoft Azure

Data Warehouse

GPT-4

AWS SageMaker

LangChain

MLflow

GitHub Actions

Algorithms and Data Structures

Direct hire

Possible

Ready to get matched with vetted developers fast?

Let’s get started today!

Experience Highlights

Data Scientist

Jan 2021 - Ongoing4 years 10 months

Project Overview

A facility design automation tool that uses proprietary algorithms and AI to optimize the design of complex manufacturing facilities, primarily in the semiconductor industry. Its key functions and benefits include:

Solving Complex Design Problems: The Fab Builder is specifically designed to handle the immense complexity of designing a semiconductor fabrication plant ("fab"). It can analyze millions of data points to determine the optimal layout for electrical panels, process laterals, and chemical cabinets, which are connected by thousands of miles of cables and pipes. This process is too complex for manual, human-based design methods to perform efficiently.
Reducing Costs and Materials: By using its optimization algorithms, the tool reduces the "overscoping" of materials. A case study shows it saved a fab over $17 million by reducing the number of electrical panels from 792 to 534 and the total cable length from 221 km to 204 km.
Improving Efficiency and Speed: It significantly shortens the design phase. A manual design process that could take around six weeks can be completed in as little as one week using The Optimizer. This speed allows engineers to quickly adapt to changes in a project, saving time and resources.
Empowering Engineers: By automating repetitive, data-heavy tasks like calculating load requirements and finding panel locations, Fab Builder allows highly skilled engineers to focus on more complex, high-value engineering problems and innovation.

Responsibilities:

implemented an arc builder script that generated all the potential connections between various panels and circuits;
developed an advanced Mixed-Integer Programming (MIP) model to solve a complex tool-toequipment assignment problem;
engineered a multi-objective optimization solution to minimize costs and maximize equipment utilization while satisfying strict constraints;
managed the data preparation phase, including ETL jobs, to process raw data from databases & spreadsheets for the optimization model;
implemented a post-processing algorithm that identified the reason why a particular circuit is not assigned;
maintained the code base, implemented the new features, and fixed bugs

Project Tech stack:

Pandas

Python

NumPy

Microsoft SQL Server

Microsoft Azure

Machine learning

Algorithms and Data Structures

SQL

Scikit-learn

Data Science

Docker

Git

GitHub Actions

Data Scientist/Machine learning Engineer

Jul 2022 - Oct 20242 years 3 months

Project Overview

A sophisticated data processing engine for security operations (SecOps) teams. Its core purpose is to tackle the challenges of data overload, alert fatigue, and high costs associated with security telemetry. The platform intelligently ingests massive volumes of data from various sources like SIEMs and XDRs and uses an AI-powered approach to refine it. Key features of the product are as follows:

AI-Powered Noise Reduction: Filters out up to 99% of non-actionable data, allowing analysts to focus on genuine threats.
Cost Optimization: Reduces SIEM and XDR data ingestion costs by intelligently routing data to appropriate storage tiers.
Data Normalization: Automatically standardizes disparate security data into a single, unified format (OCSF schema) for seamless integration and analysis.
Contextual Enrichment: Enriches every security event with relevant context and a priority score, making it easier to understand and act upon.
Vendor Agnostic: Acts as a flexible and interoperable layer that connects various security tools, helping to prevent vendor lock-in.

Responsibilities:

developed a scalable clustering algorithm using advanced feature extraction techniques and locality-sensitive hashing. Enriched the clusters with representative topic words and descriptions using LLMs.
performed an efficient cluster scoring technique to detect outliers and flag anomalies;
dockerized the above and deployed it on Amazon Elastic Container Service; -Engineered and deployed high-performance Text Embeddings Inference pipelines for scalable text embedding and sequence classification models;
integrated the system with AWS Redshift, Google Bigquery, and Neo4j to perform advanced analytics;
performed data sanitization by removing personal data such as names, email id, date of birth using Named Entity Recognition;
implemented batch pipelines with various sources and sinks using Mageai.

Project Tech stack:

Python

PyTorch

GPT-3

Docker

Amazon ECS

Amazon EC2

GitHub

Git

GCP Compute Engine

Redshift

BigQuery

Big Data

AWS Lambda

API

API Gateway

Neo4j

SQL

AWS SageMaker

Data Warehouse

LangChain

LLM

GPT-4

MLflow

Data Scientist

Feb 2021 - Jul 20215 months

Project Overview

This project was intended to conduct an assessment for students to gauge their knowledge on a particular topic. The problem was formulated in a two-stage pipeline form, where the first sequence-to-sequence language model (Pegasus and T5) was used to generate question and answer pairs, and the second language model (BERT) was used to filter out the generations that didn't met our quality criteria. This project was deployed as an end-to-end REST api which was consumed by the client's web app. The main features included data preparation, usage of state of the art language models, and text generation based on input passage.

Responsibilities:

created a custom training dataset using question-answering datasets such as squad, newsqa, race, and sciq;
trained large transformer-based language models (Pegasus, T5 and BERT) using TPUs and GPUs on google cloud platform;
deployed the ML models using huggingface deep learning container on AWS;
deployed the end-to-end system using AWS api gateway, lambda function and sagemaker;
experience on data preparation, hyperparameter tuning, and model optimization.

Project Tech stack:

Python

Machine leaning

NLP

Pandas

AWS

NumPy

Data Modeling

GCP

transformers

Machine Learning Engineer

May 2020 - Aug 20203 months

Project Overview

The aim of the project was to predict the future value of an asset in order to make an informed decision on future investment. Contrary to the conventional approach, this project involved developing a deep reinforcement learning agent that will trade automatically to maximize profits. The final product was deployed as web service on a local server. The key features of the product included price forecasting (long or short position), automated signal (buy, sell or hold) generation, robustness to price fluctuations, and, adaptability to predict various assets.

Responsibilities:

prepared OHLC time series data;
implemented a Deep Q-learning based agent to perform automated trading;
formulated the problem as control-based to withstand large amounts of price fluctuations;
deployed the agent as web API.

Project Tech stack:

Data Science

Tensorflow

Keras

Python

Pandas

Machine learning

gym

Data Science Intern

Aug 2019 - Nov 20193 months

Project Overview

United Nation Development Program (UNDP) reviews national plan and sector strategies of various countries in order to align them with their Substantiable Development Goals (SDG). This process involves manually reading national plan documents containing hundreds of pages and then mapping each paragraph to one or more of the 169 targets of 17 SDGs. This project employed various machine learning and Natural Language Processing (NLP) techniques to automatically map new sentences from a national plan to the relevant SDG. The key features of the application were identifying the interlinkages between various SDGs, extracting topic words from documents, and classifying new sentences to the relevant SDG category.

Responsibilities:

performed data ETL on thousands of word and pdf project documents;
implemented semantic search on a collection of documents;
implemented Named Entity recognition on source text;
implemented keyword-based search on a collection of documents;
trained language models (BERT, RobertA) for classification of sentence/paragraph.

Project Tech stack:

Python

SQL

REST API

Machine leaning

NLP

Tensorflow

Pandas

NumPy

Scikit-learn

NLTK

transformers

Keep in mind, the experience summary might exclude non-relevant projects

Education

2012

Computer Engineering

Bachelor of Science

2019

Computer Science

Master of Science

2023

Computer Science

PhD. Program in Computer Science

Languages

English

Advanced

Hire Faizan or someone with similar qualifications in days

All developers are ready for interview and are are just waiting for your request