Logo
Faisal – Apache Spark, Python, AWS, experts in Lemon.io

Faisal

From Indonesia (GMT+7)

flag
Data EngineerSenior
DevOpsMiddle
Hire developer
8 years of commercial experience
AI
Analytics
Architecture
Banking
Biotech
Cryptocurrency
Data analytics
E-commerce
Edtech
Healthcare
Healthtech
Insurance
Lemon.io stats
2
projects done
920
hours worked
Open
to new offers

Faisal – Apache Spark, Python, AWS

Faisal is a seasoned Senior Data Engineer with extensive proficiency in SQL, Python, and AWS services, demonstrating a solid understanding of complex data queries and varied approaches. His proficiency as an efficient communicator shines through in his ability to deliver clear, concise responses and accurately articulate complex technical concepts. With strong analytical abilities and knowledge, he is primed for success in a Senior Data Engineer role, adapting seamlessly to diverse technical environments.

Main technologies
Apache Spark
3 years
Python
5 years
AWS
4 years
Apache Kafka
2 years
Apache Airflow
4 years
SQL
5 years
Additional skills
AWS CloudFormation
Terraform
Big Data
BigQuery
LLM
MLOps
LangChain
MongoDB
PySpark
GCP
CI/CD
MySQL
AWS SageMaker
Docker
Kubernetes
Data Warehouse
ETL
NLP
PyTorch
Amazon EC2
Vertex AI
Tensorflow
Rewards and achievements
Tech interviewer
Ready to start
To be verified
Direct hire
Potentially possible

Ready to get matched with vetted developers fast?
Let’s get started today!Hire developer

Experience Highlights

Senior Software Engineer
Aug 2024 - Ongoing4 months
Project Overview

The company is working on a project to create a deep-learning semantic segmentation model to classify microscope-captured tissue images. The model is expected to classify whether some area of the tissue is a tumour or normal cells. Moreover, the model is capable of detecting tumour transition and different cell areas like stroma, lymphocytes and so on.

Faisal addressed issues with Google Colab's inefficiencies, where training processes couldn't run in the background, requiring Data Scientists' constant monitoring. To improve the workflow, he implemented a GitHub Action workflow that launches an EC2 Instance as a GitHub Action Runner On-Demand to execute training code. This setup allows logs to be viewed on the GitHub Actions UI. Additionally, using Git for collaboration made the process more engineering-friendly. Faisal also optimized the EC2 instance for cost-performance efficiency to better meet their requirements.

Skeleton
Skeleton
Skeleton
Responsibilities:
  • Set up GitHub Action Workflow Dispatch to run the Deep Learning Training on demand;
  • Set up on-demand EC2 instance as a GitHub Actions runner;
  • Set up instance termination post training to avoid idle instance cost;
  • Conducted research to find the best instance type (based on the CPU, Memory and GPU types) and AMI with the most compatible CUDA and Pytorch version.
Project Tech stack:
Python
GitHub Actions
Amazon EC2
Ubuntu
PyTorch
Senior Software Engineer
Jun 2024 - Jul 20241 month
Project Overview

The project required a Deep Learning Model to be deployed on a serverless infrastructure in the AWS Ecosystem. The input data was mostly a large image, which needed DZI Tiling as a preprocessing step. The training data used for the model was tiles, so it made sense to use this tiling strategy. To introduce parallelism, Faisal pushed the tiles S3 URIs to SQS which later was consumed by multiple ECS Task Instances that did the model inference. Faisal also utilized DynamoDB to indicate which tiles had been processed and gave an indicator showing whether the inference was completed or not.

Skeleton
Skeleton
Skeleton
Responsibilities:
  • Deployed deep learning model inference on AWS ECS Fargate;
  • Implemented paralelization using SQS Queue to store tiles;
  • Stored inference progress on DynamoDB table;
  • Implemented post-inference script to combine and summarize the inference results.
Project Tech stack:
Python
PyTorch
Amazon ECS
Amazon SQS
DynamoDB
AWS SageMaker
Senior Data Engineer
Feb 2024 - Jun 20244 months
Project Overview

An AI data platform/startup for educators. It's a predictive, generative AI data platform that delivers meaningful insights to educators by re-imagining analytics for K-12 education. I was assigned to migrate their data pipeline to a modern data orchestration tool. The project involves moving an existing data pipeline from an API source to BigQuery and a new data source from Snowflake to BigQuery.

Skeleton
Skeleton
Skeleton
Responsibilities:
  • Migrated legacy data pipeline, which doesn't have orchestration and monitoring to Mage;
  • Modified the ingestion from using NDJSON to using Parquet, which improves the loading time by ~3 times;
  • Implemented ingestion from Snowflake to BigQuery and tackled huge data ingestion issues, especially for backfill purposes;
  • Implemented multi-region job executions to avoid quota exhaustion.
Project Tech stack:
Python
AI
Serverless Computing
Terraform
Docker
Docker Compose
API
BigQuery
Cloud Firestore
Firestore
Google API and Services
Senior Machine Learning Engineer
Dec 2023 - Jan 20241 month
Project Overview

The project required a machine-learning model to be deployed in the GCP Ecosystem. Faisal utilized Vertex AI to deploy that model using Vertex AI Model Registry and then launched an online prediction using Vertex AI endpoint service. Their backend team then integrated this endpoint with their application. They also required a batch prediction to allow their ton of data to be processed at once, i.e. outside their application, while they didn't want to overpopulate the online endpoint traffic. So Faisal set up Vertex AI Batch inference.

Skeleton
Skeleton
Skeleton
Responsibilities:
  • Set up online prediction using Vertex AI Endpoints Service;
  • Set up batch prediction using Vertex AI Batch Inference service.
Project Tech stack:
Vertex AI
GCP
Cloud Firestore
Python
Tensorflow
Lead Data Engineer
Sep 2023 - Dec 20233 months
Project Overview

This project was focused on creating a monitoring and alert system to ensure data quality and timely delivery.

Skeleton
Skeleton
Skeleton
Responsibilities:
  • Streamlined daily responsibilities for engineers by automating daily delivery checks;
  • Enhanced engineers' productivity by allocating more time for development and improvement tasks.
Project Tech stack:
Python
Amazon ECS
EventBus
Apache Spark
PySpark
Lead Data Engineer
Apr 2023 - Aug 20234 months
Project Overview

Prepare CI/CD pipeline to allow standardized and simplified data preparation executions.

Skeleton
Skeleton
Skeleton
Responsibilities:
  • Reduced client request workload from 70% to 30% of overall responsibilities, thereby allocating more time for the Data Engineering team to focus on development and improvement tasks;
  • Minimized human errors resulting from non-automated and non-standard processes.
Project Tech stack:
Python
CI
CD
GitHub
GitHub Actions
Amazon S3
Lead Data Engineer
Dec 2022 - Mar 20233 months
Project Overview

Deploy data quality validation pipeline to automate the validation on data.

Skeleton
Skeleton
Skeleton
Responsibilities:
  • Reduced extra steps to validate data manually before sending data to the client.
Project Tech stack:
CI
CD
Python
Apache Spark
PySpark
Senior Data Engineer
Jul 2022 - Dec 20225 months
Project Overview

Collect data related to property prices based on location and size.

Skeleton
Skeleton
Skeleton
Responsibilities:
  • Collected POI (Points of Interest) data using the Google Maps API to enhance property value estimation;
  • Integrated with government-provided property valuation data based on location and tax valuation APIs;
  • Deployed the integration flow to the production environment for operational use.
Project Tech stack:
Python
Apache Airflow
Google Maps API
Senior Data Engineer
Dec 2021 - Jun 20226 months
Project Overview

The project involved developing a data pipeline from various sources to Data Lake on S3 and then connecting it to Athena for BI use cases.

Skeleton
Skeleton
Skeleton
Responsibilities:
  • Implemented complex data transformations using Spark for efficient processing;
  • Enabled streaming functionality to support real-time data processing for the use case.
Project Tech stack:
Python
AWS
AWS CloudFormation
Apache Spark
PySpark
Apache Airflow
Amazon S3
Senior Data Engineer
Aug 2021 - Dec 20214 months
Project Overview

The project involved the implementation of Partitions and Clustering to the BigQuery Data warehouse tables.

Skeleton
Skeleton
Skeleton
Responsibilities:
  • Reduced analytics costs by 300 times, from approximately 20k USD per month to around 60 to 70 USD per month;
  • Enhanced query performance by 600%.
Project Tech stack:
BigQuery
Apache Airflow
Big Data
Apache Spark
PySpark
Senior Data Engineer
Apr 2021 - Sep 20215 months
Project Overview

The project involved utilizing Migration Analytics to transfer data from a MySQL replication database to BigQuery.

Skeleton
Skeleton
Skeleton
Responsibilities:
  • Improved data and report retrieval speed by 180 times;
  • Decreased database replication workload and infrastructure-related issues by 80%;
  • Collaborated with the BI team to design data models.
Project Tech stack:
Python
BigQuery
CI
CD
GCP
MySQL
Data Engineer
Jan 2021 - Apr 20213 months
Project Overview

Anomaly detection in real-time data using DynamoDB, Lambda, and SageMaker.

Skeleton
Skeleton
Skeleton
Responsibilities:
  • Successfully alerted suspicious transactions that secured hundreds of thousands of dollars worth of crypto assets.
Project Tech stack:
Python
AWS Lambda
DynamoDB
AWS SageMaker
Data Engineer
Jan 2020 - Dec 202011 months
Project Overview

Batch processing data pipeline from various sources in AWS databases and some 3rd party tools to BigQuery Data Warehouse.

Skeleton
Skeleton
Skeleton
Responsibilities:
  • Ingested data from various sources, including RDBMS, NoSQL databases, 3rd party APIs, and Google Sheets;
  • Preprocessed data on the go using batch and stream processing frameworks;
  • Managed storage to optimize cost and performance;
  • Designed and maintained data warehouse models in BigQuery.
Project Tech stack:
Python
BigQuery
AWS Lambda
PySpark
Apache Spark
Apache Airflow
SQL
MongoDB
Amazon S3

Education

2019
Computer Science
Bachelor

Languages

Arabic
Pre-intermediate
Sundanese
Pre-intermediate
Javanese
Intermediate
Indonesian
Advanced
English
Advanced

Hire Faisal or someone with similar qualifications in days
All developers are ready for interview and are are just waiting for your requestPlace a free quotedream dev illustration
Copyright © 2025 lemon.io. All rights reserved.