Faisal

From Indonesia (UTC+7)

Data Engineer|Senior

DevOps|Middle

Lemon.io stats

2

projects done

2440

hours worked

1

ongoing project

Skills and seniority verified on Feb 29, 2024

Faisal – AWS, Apache Spark, Python

Faisal is a seasoned Senior Data Engineer with extensive proficiency in SQL, Python, and AWS services, demonstrating a solid understanding of complex data queries and varied approaches. His proficiency as an efficient communicator shines through in his ability to deliver clear, concise responses and accurately articulate complex technical concepts. With strong analytical abilities and knowledge, he is primed for success in a Senior Data Engineer role, adapting seamlessly to diverse technical environments.

8 years of commercial experience in

Analytics

Architecture

Banking

Biotech

Cryptocurrency

Data analytics

E-commerce

Edtech

Healthcare

Healthtech

Insurance

Main technologies

AWS

4 years

Apache Spark

3 years

Python

5 years

Apache Kafka

2 years

Apache Airflow

4 years

SQL

5 years

Additional skills

AWS CloudFormation

Terraform

Big Data

BigQuery

LLM

MLOps

LangChain

MongoDB

PySpark

GCP

CI/CD

MySQL

AWS SageMaker

Docker

Kubernetes

Data Warehouse

ETL

NLP

PyTorch

Amazon EC2

Vertex AI

Tensorflow

Rewards and achievements

Tech interviewer

Direct hire

Possible

Ready to get matched with vetted developers fast?

Let’s get started today!

Experience Highlights

Senior Software Engineer

Aug 2024 - Ongoing1 year 1 month

Project Overview

The company is working on a project to create a deep-learning semantic segmentation model to classify microscope-captured tissue images. The model is expected to classify whether some area of the tissue is a tumour or normal cells. Moreover, the model is capable of detecting tumour transition and different cell areas like stroma, lymphocytes and so on.

Faisal addressed issues with Google Colab's inefficiencies, where training processes couldn't run in the background, requiring Data Scientists' constant monitoring. To improve the workflow, he implemented a GitHub Action workflow that launches an EC2 Instance as a GitHub Action Runner On-Demand to execute training code. This setup allows logs to be viewed on the GitHub Actions UI. Additionally, using Git for collaboration made the process more engineering-friendly. Faisal also optimized the EC2 instance for cost-performance efficiency to better meet their requirements.

Responsibilities:

Set up GitHub Action Workflow Dispatch to run the Deep Learning Training on demand;
Set up on-demand EC2 instance as a GitHub Actions runner;
Set up instance termination post training to avoid idle instance cost;
Conducted research to find the best instance type (based on the CPU, Memory and GPU types) and AMI with the most compatible CUDA and Pytorch version.

Project Tech stack:

Python

GitHub Actions

Amazon EC2

Ubuntu

PyTorch

Senior Software Engineer

Jun 2024 - Jul 20241 month

Project Overview

The project required a Deep Learning Model to be deployed on a serverless infrastructure in the AWS Ecosystem. The input data was mostly a large image, which needed DZI Tiling as a preprocessing step. The training data used for the model was tiles, so it made sense to use this tiling strategy. To introduce parallelism, Faisal pushed the tiles S3 URIs to SQS which later was consumed by multiple ECS Task Instances that did the model inference. Faisal also utilized DynamoDB to indicate which tiles had been processed and gave an indicator showing whether the inference was completed or not.

Responsibilities:

Deployed deep learning model inference on AWS ECS Fargate;
Implemented paralelization using SQS Queue to store tiles;
Stored inference progress on DynamoDB table;
Implemented post-inference script to combine and summarize the inference results.

Project Tech stack:

Python

PyTorch

Amazon ECS

Amazon SQS

DynamoDB

AWS SageMaker

Senior Data Engineer

Feb 2024 - Jun 20244 months

Project Overview

An AI data platform/startup for educators. It's a predictive, generative AI data platform that delivers meaningful insights to educators by re-imagining analytics for K-12 education. I was assigned to migrate their data pipeline to a modern data orchestration tool. The project involves moving an existing data pipeline from an API source to BigQuery and a new data source from Snowflake to BigQuery.

Responsibilities:

Migrated legacy data pipeline, which doesn't have orchestration and monitoring to Mage;
Modified the ingestion from using NDJSON to using Parquet, which improves the loading time by ~3 times;
Implemented ingestion from Snowflake to BigQuery and tackled huge data ingestion issues, especially for backfill purposes;
Implemented multi-region job executions to avoid quota exhaustion.

Project Tech stack:

Python

Serverless Computing

Terraform

Docker

Docker Compose

API

BigQuery

Cloud Firestore

Firestore

Google API and Services

Senior Machine Learning Engineer

Dec 2023 - Jan 20241 month

Project Overview

The project required a machine-learning model to be deployed in the GCP Ecosystem. Faisal utilized Vertex AI to deploy that model using Vertex AI Model Registry and then launched an online prediction using Vertex AI endpoint service. Their backend team then integrated this endpoint with their application. They also required a batch prediction to allow their ton of data to be processed at once, i.e. outside their application, while they didn't want to overpopulate the online endpoint traffic. So Faisal set up Vertex AI Batch inference.

Responsibilities:

Set up online prediction using Vertex AI Endpoints Service;
Set up batch prediction using Vertex AI Batch Inference service.

Project Tech stack:

Vertex AI

GCP

Cloud Firestore

Python

Tensorflow

Lead Data Engineer

Sep 2023 - Dec 20233 months

Project Overview

This project was focused on creating a monitoring and alert system to ensure data quality and timely delivery.

Responsibilities:

Streamlined daily responsibilities for engineers by automating daily delivery checks;
Enhanced engineers' productivity by allocating more time for development and improvement tasks.

Project Tech stack:

Python

Amazon ECS

EventBus

Apache Spark

PySpark

Lead Data Engineer

Apr 2023 - Aug 20234 months

Project Overview

Prepare CI/CD pipeline to allow standardized and simplified data preparation executions.

Responsibilities:

Reduced client request workload from 70% to 30% of overall responsibilities, thereby allocating more time for the Data Engineering team to focus on development and improvement tasks;
Minimized human errors resulting from non-automated and non-standard processes.

Project Tech stack:

Python

GitHub

GitHub Actions

Amazon S3

Lead Data Engineer

Dec 2022 - Mar 20233 months

Project Overview

Deploy data quality validation pipeline to automate the validation on data.

Responsibilities:

Reduced extra steps to validate data manually before sending data to the client.

Project Tech stack:

Python

Apache Spark

PySpark

Senior Data Engineer

Jul 2022 - Dec 20225 months

Project Overview

Collect data related to property prices based on location and size.

Responsibilities:

Collected POI (Points of Interest) data using the Google Maps API to enhance property value estimation;
Integrated with government-provided property valuation data based on location and tax valuation APIs;
Deployed the integration flow to the production environment for operational use.

Project Tech stack:

Python

Apache Airflow

Google Maps API

Senior Data Engineer

Dec 2021 - Jun 20226 months

Project Overview

The project involved developing a data pipeline from various sources to Data Lake on S3 and then connecting it to Athena for BI use cases.

Responsibilities:

Implemented complex data transformations using Spark for efficient processing;
Enabled streaming functionality to support real-time data processing for the use case.

Project Tech stack:

Python

AWS

AWS CloudFormation

Apache Spark

PySpark

Apache Airflow

Amazon S3

Senior Data Engineer

Aug 2021 - Dec 20214 months

Project Overview

The project involved the implementation of Partitions and Clustering to the BigQuery Data warehouse tables.

Responsibilities:

Reduced analytics costs by 300 times, from approximately 20k USD per month to around 60 to 70 USD per month;
Enhanced query performance by 600%.

Project Tech stack:

BigQuery

Apache Airflow

Big Data

Apache Spark

PySpark

Senior Data Engineer

Apr 2021 - Sep 20215 months

Project Overview

The project involved utilizing Migration Analytics to transfer data from a MySQL replication database to BigQuery.

Responsibilities:

Improved data and report retrieval speed by 180 times;
Decreased database replication workload and infrastructure-related issues by 80%;
Collaborated with the BI team to design data models.

Project Tech stack:

Python

BigQuery

GCP

MySQL

Data Engineer

Jan 2021 - Apr 20213 months

Project Overview

Anomaly detection in real-time data using DynamoDB, Lambda, and SageMaker.

Responsibilities:

Successfully alerted suspicious transactions that secured hundreds of thousands of dollars worth of crypto assets.

Project Tech stack:

Python

AWS Lambda

DynamoDB

AWS SageMaker

Data Engineer

Jan 2020 - Dec 202011 months

Project Overview

Batch processing data pipeline from various sources in AWS databases and some 3rd party tools to BigQuery Data Warehouse.

Responsibilities:

Ingested data from various sources, including RDBMS, NoSQL databases, 3rd party APIs, and Google Sheets;
Preprocessed data on the go using batch and stream processing frameworks;
Managed storage to optimize cost and performance;
Designed and maintained data warehouse models in BigQuery.

Project Tech stack:

Python

BigQuery

AWS Lambda

PySpark

Apache Spark

Apache Airflow

SQL

MongoDB

Amazon S3

Keep in mind, the experience summary might exclude non-relevant projects

Education

2019

Computer Science

Bachelor

Languages

Arabic

Pre-intermediate

Sundanese

Pre-intermediate

Indonesian

Advanced

Javanese

Intermediate

English

Advanced

Hire Faisal or someone with similar qualifications in days

All developers are ready for interview and are are just waiting for your request