Brian
From Canada (GMT-4)
7 years of commercial experience
Lemon.io stats
1
projects done840
hours worked1
offers now 🔥Brian – Machine learning, Data Science, Big Data
This engineer has experience with Python, SQL, cloud services, and various data science-related ecosystem tools. He also has a strong understanding of some of the cloud-related MLOps concepts. Brian is adept at effectively managing non-technical stakeholders and communicating complex ideas clearly. Outside of daily work, Brian can be found practicing some sports, including muay thai!
Main technologies
Additional skills
Ready to start
ASAPDirect hire
Potentially possibleExperience Highlights
Senior AI Engineer
A multi-strategy hedge fund management firm that now focuses on delivering a financial platform for investors.
- Designed and built the end-to-end architecture for integrating multiple data sources into a centralized data warehouse from scratch, optimizing data flow, accessibility, and scalability to support business analytics and decision-making;
- Developed and deployed custom machine learning models and large language models (LLMs) from the ground up to classify and categorize report topics, significantly improving reporting accuracy and speeding up the delivery of actionable insights;
Software Engineer Lead, Machine Learning Engineer
The world’s leading digital cross-device graph. It enables marketers to identify a brand customer or related household across multiple devices, unlocking critical use cases across programmatic targeting, media measurement, attribution, and personalization globally.
- Spearheaded the design and deployment of a cutting-edge model-serving system using Ray Serve, orchestrating distributed request-response workflows to deliver real-time predictions at scale with 99.99 % uptime.
- Developed a production-grade SDK for seamless model deployment, empowering Data Scientists to iterate models 2x faster and ensuring smooth integration with the serving infrastructure, driving adoption across multiple teams.
- Integrated advanced model optimization techniques such as pruning, quantization, and batching within the Ray Serve framework, achieving a 20% reduction in latency and cutting GPU utilization by 10%, resulting in significant cost savings.
- Architected a scalable and resilient inference pipeline, leveraging Kubernetes, Docker, and Ray Serve, to handle real-time prediction workflows, delivering a 10x improvement in scalability for mission-critical applications.
- Enhanced model observability and reliability by integrating monitoring tools like Prometheus, Grafana, and Ray Dashboard, providing real-time insights into cluster performance, drift detection, and model behavior.
- Spun up and optimized distributed Ray clusters, enabling dynamic resource scaling and fault tolerance for high-throughput inference workloads, reducing infrastructure costs by 20% while maintaining sub-50ms response latency.
- Mentored engineers on best practices for distributed systems and ML engineering, increasing team productivity.
Senior Software Engineer, Machine Learning
Data & AI platform solutions for various IBM external clients across diverse industries for ensuring the scalability of their data and machine learning models.
- Designed and implemented a real-time nearest neighbor retrieval system leveraging Redis for precomputed results and dynamic FAISS search as a fallback for unseen queries, reducing latency while ensuring adaptability to new data.
- Implemented a dynamic FAISS partition update strategy using Kafka for streaming new data, and maintaining availability during updates with a hot-swapping versioned indices.
- Optimized a large-scale FAISS-based nearest neighbor search system by implementing a two-level query pre-filtering mechanism using IVF clustering for coarse-grained filtering and Locality-Sensitive Hashing (LSH) for fine-grained search, reducing query latency by 50% and improving system scalability to millions of vectors.
- Deployed and maintained Apache Airflow on Kubernetes to orchestrate end-to-end machine learning pipelines for terabyte scale data, automating data pre-processing, model training, and deployment workflows.
Data Scientist
The team provided data modeling solutions to various external clients in multiple industries through IBM, addressing their specific business use cases.
- Trained and optimized machine learning models, including Logistic Regression, Naive Bayes, Random Forest, LSTM, and Transformer-based architectures, to solve NLP tasks such as text classification and summarization enabling better insights for stakeholder's decision-making.
- Applied core NLP preprocessing techniques (tokenization, stopword removal, n-grams, TF-IDF, text normalization, part-of-speech tagging, text summarization, and word embeddings) to transform unstructured text into structured features for model development.
- Designed and implemented a synthetic data generation framework for structured datasets using statistical sampling, Gaussian Copulas, and GAN-based models (CTGAN), preserving data distributions and feature relationships for downstream tasks.
- Developed custom evaluation metrics, including semantic similarity scores and token-level accuracy, to rigorously assess and optimize NLP classification and summarization models.
- Applied PySpark for distributed data transformations on terabyte scale datasets reducing data pipeline runtime by 40%.
Data Analyst
The revenue service of the Canadian federal government, and most provincial and territorial governments. The CRA collects taxes, administers tax law and policy, and delivers benefit programs and tax credits.
- Developed a machine learning pipeline to recommend non-compliant businesses for CRA audits using financial data.
- Implemented a web scraping solution using ScraPy, BeautifulSoup with features like rotating proxies, dynamic user-agents, and rate limiting to handle anti-scraping mechanisms and ensure reliable data extraction to support auditors.