Faisal
From Indonesia (UTC+7)
Lemon.io stats
2
projects done2560
hours worked1
ongoing projectFaisal – Apache Spark, Python, AWS
Faisal is a seasoned Senior Data Engineer with extensive proficiency in SQL, Python, and AWS services, demonstrating a solid understanding of complex data queries and varied approaches. His proficiency as an efficient communicator shines through in his ability to deliver clear, concise responses and accurately articulate complex technical concepts. With strong analytical abilities and knowledge, he is primed for success in a Senior Data Engineer role, adapting seamlessly to diverse technical environments.
8 years of commercial experience in
Main technologies
Additional skills
Rewards and achievements
Direct hire
PossibleReady to get matched with vetted developers fast?
Let’s get started today!Experience Highlights
Senior Software Engineer
To build a deep learning model for semantic segmentation of tissue images (identifying tumors, normal cells, and transitional regions), Faisal automated and optimized the ML training pipeline. He addressed the inefficiencies of Google Colab by implementing a cost-effective, on-demand EC2 instance via GitHub Actions, enabling background training, better collaboration, and centralized logging.
- Set up GitHub Action Workflow Dispatch to run the Deep Learning Training on demand;
- Set up on-demand EC2 instance as a GitHub Actions runner;
- Set up instance termination post training to avoid idle instance cost;
- Conducted research to find the best instance type (based on the CPU, Memory and GPU types) and AMI with the most compatible CUDA and Pytorch version.
Senior MLOps Engineer
As a Senior MLOps Engineer, developed an LLM-powered news feed to support investment decisions. The system on Google Cloud Platform (GCP) automates the ingestion of financial documents, uses LLMs (including OpenAI and Groq) to extract relevant companies and generate summaries with market-impact metrics, and then ranks the insights based on their relevance to a user's investment portfolio.
- Cut 50% of OpenAI token consumption cost by migrating all async calls to the OpenAI API to the OpenAI Batch API;
- Standardize the pipeline using multiple reusable Kubeflow components so all team members can productionize their Pipeline seamlessly;
- Set up CI/CD Pipeline to test and deploy from GitHub to the GCP ecosystem;
- Set up GitHub Actions Workflow Dispatch to submit and schedule Vertex AI Pipelines;
- Provisioned and managed GCP infrastructure using IaC tools like Pulumi;
- Refactored experimental codes from data scientists to meet production standards and deployment;
- Integrated different services and components built by other team members so the system can run smoothly and efficiently;
- Write and maintain technical documentation for the system;
- Provide technical support and guidance to the team;
- Participate in the design and architecture of the system.
Senior Software Engineer
The project required a Deep Learning Model to be deployed on a serverless infrastructure in the AWS Ecosystem. The input data was mostly a large image, which needed DZI Tiling as a preprocessing step. The training data used for the model was tiles, so it made sense to use this tiling strategy. To introduce parallelism, Faisal pushed the tiles S3 URIs to SQS which later was consumed by multiple ECS Task Instances that did the model inference. Faisal also utilized DynamoDB to indicate which tiles had been processed and gave an indicator showing whether the inference was completed or not.
- Deployed deep learning model inference on AWS ECS Fargate;
- Implemented paralelization using SQS Queue to store tiles;
- Stored inference progress on DynamoDB table;
- Implemented post-inference script to combine and summarize the inference results.
Senior Data Engineer
An AI data platform/startup for educators. It's a predictive, generative AI data platform that delivers meaningful insights to educators by re-imagining analytics for K-12 education. I was assigned to migrate their data pipeline to a modern data orchestration tool. The project involves moving an existing data pipeline from an API source to BigQuery and a new data source from Snowflake to BigQuery.
- Migrated legacy data pipeline, which doesn't have orchestration and monitoring to Mage;
- Modified the ingestion from using NDJSON to using Parquet, which improves the loading time by ~3 times;
- Implemented ingestion from Snowflake to BigQuery and tackled huge data ingestion issues, especially for backfill purposes;
- Implemented multi-region job executions to avoid quota exhaustion.
Lead Data Engineer
This project was focused on creating a monitoring and alert system to ensure data quality and timely delivery.
- Streamlined daily responsibilities for engineers by automating daily delivery checks;
- Enhanced engineers' productivity by allocating more time for development and improvement tasks.
Senior Data Engineer
Led the development of an internal BI data platform on AWS, implementing a medallion architecture. The platform ingests data from diverse sources (databases, APIs, S3 files) into a Snowflake data warehouse. Orchestrated with Dagster, pipelines use dbt to transform raw data through bronze, silver, and gold layers, with a Cube.js semantic layer for governed data marts.
- Deployed Dasgter on EKS;
- Implemented medallion architecture on Snowflake with dbt to handle the transformation;
- Implemented a semantic layer using Cube.dev to act as data marts and allow centralized governance;
- Optimized query and storage on Snowflake.
Lead Data Engineer
Prepare CI/CD pipeline to allow standardized and simplified data preparation executions.
- Reduced client request workload from 70% to 30% of overall responsibilities, thereby allocating more time for the Data Engineering team to focus on development and improvement tasks;
- Minimized human errors resulting from non-automated and non-standard processes.
Lead Data Engineer
Deploy data quality validation pipeline to automate the validation on data.
- Reduced extra steps to validate data manually before sending data to the client.
Senior Data Engineer
Collect data related to property prices based on location and size.
- Collected POI (Points of Interest) data using the Google Maps API to enhance property value estimation;
- Integrated with government-provided property valuation data based on location and tax valuation APIs;
- Deployed the integration flow to the production environment for operational use.
Senior Data Engineer
The project involved developing a data pipeline from various sources to Data Lake on S3 and then connecting it to Athena for BI use cases.
- Implemented complex data transformations using Spark for efficient processing;
- Enabled streaming functionality to support real-time data processing for the use case.
Senior Data Engineer
The project involved the implementation of Partitions and Clustering to the BigQuery Data warehouse tables.
- Reduced analytics costs by 300 times, from approximately 20k USD per month to around 60 to 70 USD per month;
- Enhanced query performance by 600%.
Senior Data Engineer
The project involved utilizing Migration Analytics to transfer data from a MySQL replication database to BigQuery.
- Improved data and report retrieval speed by 180 times;
- Decreased database replication workload and infrastructure-related issues by 80%;
- Collaborated with the BI team to design data models.
Data Engineer
Anomaly detection in real-time data using DynamoDB, Lambda, and SageMaker.
- Successfully alerted suspicious transactions that secured hundreds of thousands of dollars worth of crypto assets.
Data Engineer
Batch processing data pipeline from various sources in AWS databases and some 3rd party tools to BigQuery Data Warehouse.
- Ingested data from various sources, including RDBMS, NoSQL databases, 3rd party APIs, and Google Sheets;
- Preprocessed data on the go using batch and stream processing frameworks;
- Managed storage to optimize cost and performance;
- Designed and maintained data warehouse models in BigQuery.