Faisal
From Indonesia (UTC+7)
8 years of commercial experience
Lemon.io stats
2
projects done1000
hours workedOpen
to new offersFaisal – AWS, Apache Spark, Python
Faisal is a seasoned Senior Data Engineer with extensive proficiency in SQL, Python, and AWS services, demonstrating a solid understanding of complex data queries and varied approaches. His proficiency as an efficient communicator shines through in his ability to deliver clear, concise responses and accurately articulate complex technical concepts. With strong analytical abilities and knowledge, he is primed for success in a Senior Data Engineer role, adapting seamlessly to diverse technical environments.
Main technologies
Additional skills
Rewards and achievements
Ready to start
To be verifiedDirect hire
Potentially possibleExperience Highlights
Senior Software Engineer
The company is working on a project to create a deep-learning semantic segmentation model to classify microscope-captured tissue images. The model is expected to classify whether some area of the tissue is a tumour or normal cells. Moreover, the model is capable of detecting tumour transition and different cell areas like stroma, lymphocytes and so on.
Faisal addressed issues with Google Colab's inefficiencies, where training processes couldn't run in the background, requiring Data Scientists' constant monitoring. To improve the workflow, he implemented a GitHub Action workflow that launches an EC2 Instance as a GitHub Action Runner On-Demand to execute training code. This setup allows logs to be viewed on the GitHub Actions UI. Additionally, using Git for collaboration made the process more engineering-friendly. Faisal also optimized the EC2 instance for cost-performance efficiency to better meet their requirements.
- Set up GitHub Action Workflow Dispatch to run the Deep Learning Training on demand;
- Set up on-demand EC2 instance as a GitHub Actions runner;
- Set up instance termination post training to avoid idle instance cost;
- Conducted research to find the best instance type (based on the CPU, Memory and GPU types) and AMI with the most compatible CUDA and Pytorch version.
Senior Software Engineer
The project required a Deep Learning Model to be deployed on a serverless infrastructure in the AWS Ecosystem. The input data was mostly a large image, which needed DZI Tiling as a preprocessing step. The training data used for the model was tiles, so it made sense to use this tiling strategy. To introduce parallelism, Faisal pushed the tiles S3 URIs to SQS which later was consumed by multiple ECS Task Instances that did the model inference. Faisal also utilized DynamoDB to indicate which tiles had been processed and gave an indicator showing whether the inference was completed or not.
- Deployed deep learning model inference on AWS ECS Fargate;
- Implemented paralelization using SQS Queue to store tiles;
- Stored inference progress on DynamoDB table;
- Implemented post-inference script to combine and summarize the inference results.
Senior Data Engineer
An AI data platform/startup for educators. It's a predictive, generative AI data platform that delivers meaningful insights to educators by re-imagining analytics for K-12 education. I was assigned to migrate their data pipeline to a modern data orchestration tool. The project involves moving an existing data pipeline from an API source to BigQuery and a new data source from Snowflake to BigQuery.
- Migrated legacy data pipeline, which doesn't have orchestration and monitoring to Mage;
- Modified the ingestion from using NDJSON to using Parquet, which improves the loading time by ~3 times;
- Implemented ingestion from Snowflake to BigQuery and tackled huge data ingestion issues, especially for backfill purposes;
- Implemented multi-region job executions to avoid quota exhaustion.
Senior Machine Learning Engineer
The project required a machine-learning model to be deployed in the GCP Ecosystem. Faisal utilized Vertex AI to deploy that model using Vertex AI Model Registry and then launched an online prediction using Vertex AI endpoint service. Their backend team then integrated this endpoint with their application. They also required a batch prediction to allow their ton of data to be processed at once, i.e. outside their application, while they didn't want to overpopulate the online endpoint traffic. So Faisal set up Vertex AI Batch inference.
- Set up online prediction using Vertex AI Endpoints Service;
- Set up batch prediction using Vertex AI Batch Inference service.
Lead Data Engineer
This project was focused on creating a monitoring and alert system to ensure data quality and timely delivery.
- Streamlined daily responsibilities for engineers by automating daily delivery checks;
- Enhanced engineers' productivity by allocating more time for development and improvement tasks.
Lead Data Engineer
Prepare CI/CD pipeline to allow standardized and simplified data preparation executions.
- Reduced client request workload from 70% to 30% of overall responsibilities, thereby allocating more time for the Data Engineering team to focus on development and improvement tasks;
- Minimized human errors resulting from non-automated and non-standard processes.
Lead Data Engineer
Deploy data quality validation pipeline to automate the validation on data.
- Reduced extra steps to validate data manually before sending data to the client.
Senior Data Engineer
Collect data related to property prices based on location and size.
- Collected POI (Points of Interest) data using the Google Maps API to enhance property value estimation;
- Integrated with government-provided property valuation data based on location and tax valuation APIs;
- Deployed the integration flow to the production environment for operational use.
Senior Data Engineer
The project involved developing a data pipeline from various sources to Data Lake on S3 and then connecting it to Athena for BI use cases.
- Implemented complex data transformations using Spark for efficient processing;
- Enabled streaming functionality to support real-time data processing for the use case.
Senior Data Engineer
The project involved the implementation of Partitions and Clustering to the BigQuery Data warehouse tables.
- Reduced analytics costs by 300 times, from approximately 20k USD per month to around 60 to 70 USD per month;
- Enhanced query performance by 600%.
Senior Data Engineer
The project involved utilizing Migration Analytics to transfer data from a MySQL replication database to BigQuery.
- Improved data and report retrieval speed by 180 times;
- Decreased database replication workload and infrastructure-related issues by 80%;
- Collaborated with the BI team to design data models.
Data Engineer
Anomaly detection in real-time data using DynamoDB, Lambda, and SageMaker.
- Successfully alerted suspicious transactions that secured hundreds of thousands of dollars worth of crypto assets.
Data Engineer
Batch processing data pipeline from various sources in AWS databases and some 3rd party tools to BigQuery Data Warehouse.
- Ingested data from various sources, including RDBMS, NoSQL databases, 3rd party APIs, and Google Sheets;
- Preprocessed data on the go using batch and stream processing frameworks;
- Managed storage to optimize cost and performance;
- Designed and maintained data warehouse models in BigQuery.