Logo
Faisal – AWS, Apache Spark, Python, experts in Lemon.io

Faisal

From Singapore (UTC+7)flag

Data Engineer|Senior
Back-end Web Developer|Senior
DevOps|Middle
AI Engineer|Senior
Lemon.io stats
2
projects done
2560
hours worked
1
offers now 🔥

Faisal – AWS, Apache Spark, Python

Faisal is a seasoned Senior Data Engineer with extensive proficiency in SQL, Python, and AWS services, demonstrating a solid understanding of complex data queries and varied approaches. His proficiency as an efficient communicator shines through in his ability to deliver clear, concise responses and accurately articulate complex technical concepts. With strong analytical abilities and knowledge, he is primed for success in a Senior Data Engineer role, adapting seamlessly to diverse technical environments.

9 years of commercial experience in
AI
Analytics
Architecture
Banking
Biotech
Cryptocurrency
Data analytics
E-commerce
Edtech
Fintech
Healthcare
Healthtech
Insurance
Main technologies
AWS
5 years
Apache Spark
3 years
Python
6 years
Apache Kafka
2 years
Apache Airflow
4 years
SQL
6 years
FastAPI
6 years
Additional skills
Node.js
AWS CloudFormation
Terraform
Big Data
Amazon S3
BigQuery
Vector Databases
LLM
MLOps
LangChain
MongoDB
PySpark
AWS Lambda
GCP
MySQL
DynamoDB
AWS SageMaker
Amazon ECS
Databricks
Docker
API
Microsoft Azure
Kubernetes
Data Warehouse
ETL
NLP
Amazon SQS
PyTorch
Amazon EC2
Vertex AI
Tensorflow
OpenAI API
LangGraph
Snowflake
DBT
Dagster
Datadog
Rewards and achievements
Tech interviewer
Direct hire
Possible
Ready to get matched with vetted developers fast?
Let’s get started today!

Experience Highlights

Senior Software Engineer
Aug 2024 - Ongoing1 year 7 months
Project Overview

To build a deep learning model for semantic segmentation of tissue images (identifying tumors, normal cells, and transitional regions), Faisal automated and optimized the ML training pipeline. He addressed the inefficiencies of Google Colab by implementing a cost-effective, on-demand EC2 instance via GitHub Actions, enabling background training, better collaboration, and centralized logging.

Responsibilities:
  • Set up GitHub Action Workflow Dispatch to run the Deep Learning Training on demand;
  • Set up on-demand EC2 instance as a GitHub Actions runner;
  • Set up instance termination post training to avoid idle instance cost;
  • Conducted research to find the best instance type (based on the CPU, Memory and GPU types) and AMI with the most compatible CUDA and Pytorch version.
Project Tech stack:
Python
GitHub Actions
Amazon EC2
Ubuntu
PyTorch
Senior MLOps Engineer
Oct 2024 - Oct 20251 year
Project Overview

As a Senior MLOps Engineer, developed an LLM-powered news feed to support investment decisions. The system on Google Cloud Platform (GCP) automates the ingestion of financial documents, uses LLMs (including OpenAI and Groq) to extract relevant companies and generate summaries with market-impact metrics, and then ranks the insights based on their relevance to a user's investment portfolio.

Responsibilities:
  • Cut 50% of OpenAI token consumption cost by migrating all async calls to the OpenAI API to the OpenAI Batch API;
  • Standardize the pipeline using multiple reusable Kubeflow components so all team members can productionize their Pipeline seamlessly;
  • Set up CI/CD Pipeline to test and deploy from GitHub to the GCP ecosystem;
  • Set up GitHub Actions Workflow Dispatch to submit and schedule Vertex AI Pipelines;
  • Provisioned and managed GCP infrastructure using IaC tools like Pulumi;
  • Refactored experimental codes from data scientists to meet production standards and deployment;
  • Integrated different services and components built by other team members so the system can run smoothly and efficiently;
  • Write and maintain technical documentation for the system;
  • Provide technical support and guidance to the team;
  • Participate in the design and architecture of the system.
Project Tech stack:
Vertex AI
GCP
Cloud Firestore
Python
OpenAI API
LangChain
LangGraph
Serverless Computing
Pulumi
GitHub Actions
Senior Software Engineer
Jun 2024 - Jul 20241 month
Project Overview

The project involved deploying a Deep Learning model on a serverless AWS architecture designed to handle high-resolution image processing at scale. To manage the lifecycle and consistency of this infrastructure, Faisal utilized Terraform to implement Infrastructure as Code (IaC), ensuring reproducible environments across the development lifecycle.

Architectural Implementation:

  • Preprocessing & Orchestration: The workflow began with DZI Tiling to break down large-scale input images into manageable tiles, aligning the inference data with the model's training format. To drive high throughput, Faisal implemented a decoupled architecture where tile S3 URIs were pushed to Amazon SQS.
  • Compute & Scaling: These messages were consumed by a distributed fleet of Kubernetes (EKS) pods. By leveraging Kubernetes, he was able to manage container orchestration effectively, ensuring that the model inference tasks scaled dynamically based on queue depth.
  • State Management: Faisal utilized Amazon DynamoDB as a distributed state store to track the processing status of individual tiles, providing a real-time indicator of overall inference completion.

Observability & Monitoring To ensure system health and performance, Faisal integrated Datadog for full-stack observability. This included:

  • APM & Tracing: Monitoring the latency of inference tasks across the Kubernetes clusters.
  • Infrastructure Metrics: Tracking SQS visibility timeouts and DynamoDB RCU/WCU utilization.
  • Log Management: Centralizing logs to quickly troubleshoot bottlenecks in the tiling or inference stages.
Responsibilities:
  • Infrastructure as Code: Deployed deep learning inference on AWS EKS and Fargate using Terraform for reproducible infrastructure.
  • Parallelization & Scaling: Engineered high-throughput tile processing using SQS to decouple ingestion from model inference.
  • Observability: Integrated Datadog for real-time monitoring of queue depth, pod health, and inference latency.
  • State Management: Utilized DynamoDB to track tile processing status and ensure idempotent execution of long-running jobs.
  • Data Synthesis: Developed post-inference scripts to aggregate tile-level predictions into consolidated result summaries.
Project Tech stack:
Python
PyTorch
Amazon ECS
Amazon SQS
DynamoDB
AWS SageMaker
Senior Data Engineer
Feb 2024 - Jun 20244 months
Project Overview

An AI data platform/startup for educators. It's a predictive, generative AI data platform that delivers meaningful insights to educators by re-imagining analytics for K-12 education. I was assigned to migrate their data pipeline to a modern data orchestration tool. The project involves moving an existing data pipeline from an API source to BigQuery and a new data source from Snowflake to BigQuery.

Responsibilities:
  • Migrated legacy data pipeline, which doesn't have orchestration and monitoring to Mage;
  • Modified the ingestion from using NDJSON to using Parquet, which improves the loading time by ~3 times;
  • Implemented ingestion from Snowflake to BigQuery and tackled huge data ingestion issues, especially for backfill purposes;
  • Implemented multi-region job executions to avoid quota exhaustion.
Project Tech stack:
Python
AI
Serverless Computing
Terraform
Docker
Docker Compose
API
BigQuery
Cloud Firestore
Firestore
Google API and Services
Senior Data Engineer
Aug 2023 - Nov 20232 months
Project Overview

Led the development of an internal BI data platform on AWS, implementing a medallion architecture. The platform ingests data from diverse sources (databases, APIs, S3 files) into a Snowflake data warehouse. Orchestrated with Dagster, pipelines use dbt to transform raw data through bronze, silver, and gold layers, with a Cube.js semantic layer for governed data marts.

Responsibilities:
  • Deployed Dasgter on EKS;
  • Implemented medallion architecture on Snowflake with dbt to handle the transformation;
  • Implemented a semantic layer using Cube.dev to act as data marts and allow centralized governance;
  • Optimized query and storage on Snowflake.
Project Tech stack:
Dagster
Python
Kubernetes
Amazon S3
DBT
Snowflake
Lead Data Engineer
Nov 2022 - Feb 20233 months
Project Overview

Deploy data quality validation pipeline to automate the validation on data.

Responsibilities:
  • Reduced extra steps to validate data manually before sending data to the client.
Project Tech stack:
CI
CD
Python
Apache Spark
PySpark
Senior Data Engineer
Jun 2022 - Nov 20225 months
Project Overview

Collect data related to property prices based on location and size.

Responsibilities:
  • Collected POI (Points of Interest) data using the Google Maps API to enhance property value estimation;
  • Integrated with government-provided property valuation data based on location and tax valuation APIs;
  • Deployed the integration flow to the production environment for operational use.
Project Tech stack:
Python
Apache Airflow
Google Maps API
Senior Data Engineer
Nov 2021 - May 20226 months
Project Overview

The project involved developing a data pipeline from various sources to Data Lake on S3 and then connecting it to Athena for BI use cases.

Responsibilities:
  • Implemented complex data transformations using Spark for efficient processing;
  • Enabled streaming functionality to support real-time data processing for the use case.
Project Tech stack:
Python
AWS
AWS CloudFormation
Apache Spark
PySpark
Apache Airflow
Amazon S3
Senior Data Engineer
Jul 2021 - Nov 20213 months
Project Overview

The project involved the implementation of Partitions and Clustering to the BigQuery Data warehouse tables.

Responsibilities:
  • Reduced analytics costs by 300 times, from approximately 20k USD per month to around 60 to 70 USD per month;
  • Enhanced query performance by 600%.
Project Tech stack:
BigQuery
Apache Airflow
Big Data
Apache Spark
PySpark
Senior Data Engineer
Mar 2021 - Aug 20215 months
Project Overview

The project involved utilizing Migration Analytics to transfer data from a MySQL replication database to BigQuery.

Responsibilities:
  • Improved data and report retrieval speed by 180 times;
  • Decreased database replication workload and infrastructure-related issues by 80%;
  • Collaborated with the BI team to design data models.
Project Tech stack:
Python
BigQuery
CI
CD
GCP
MySQL
Data Engineer
Dec 2019 - Nov 202010 months
Project Overview

Batch processing data pipeline from various sources in AWS databases and some 3rd party tools to BigQuery Data Warehouse.

Responsibilities:
  • Ingested data from various sources, including RDBMS, NoSQL databases, 3rd party APIs, and Google Sheets;
  • Preprocessed data on the go using batch and stream processing frameworks;
  • Managed storage to optimize cost and performance;
  • Designed and maintained data warehouse models in BigQuery.
Project Tech stack:
Python
BigQuery
AWS Lambda
PySpark
Apache Spark
Apache Airflow
SQL
MongoDB
Amazon S3

Education

2019
Computer Science
Bachelor

Languages

Arabic
Pre-intermediate
Sundanese
Pre-intermediate
Indonesian
Advanced
Javanese
Intermediate
English
Advanced

Hire Faisal or someone with similar qualifications in days
All developers are ready for interview and are are just waiting for your requestdream dev illustration
Copyright © 2026 lemon.io. All rights reserved.