Logo
Shubhankar – Data Science, Python, MLOps, experts in Lemon.io

Shubhankar

From United States (UTC-5)flag

Data Scientist|Senior
MLOps Engineer|Senior
Machine Learning Engineer|Senior

Shubhankar – Data Science, Python, MLOps

Shubhankar is a senior Data Scientist, MLOps, and Machine Learning engineer with strong experience in large-scale ML systems, remote sensing, and climate data pipelines. He has led teams and architected distributed solutions for terabyte-scale scientific datasets, demonstrating practical production awareness and domain-driven feature engineering. His strengths lie in scalable data processing, MLOps practices, and domain-driven feature engineering, with solid exposure to real-world scientific and environmental use cases.

11 years of commercial experience in
AI
Analytics
Climate tech
Data analytics
Scientific research
Main technologies
Data Science
5 years
Python
5 years
MLOps
4 years
Machine learning
5 years
NumPy
7 years
Additional skills
Claude LLM
Pandas
Snowflake
Keras
Tensorflow
LLM
Tableau
SQL
Scikit-learn
PyTorch
Terraform
MLflow
Direct hire
Possible
Ready to get matched with vetted developers fast?
Let’s get started today!

Experience Highlights

Lead ML & MLOps Architect
Nov 2025 - Mar 20264 months
Project Overview

An internal full-stack simulation platform for long-horizon forest ecosystem modeling under real-world climate change scenarios. It supports 100-year iLand forest projections, experiment tracking, and artifact management for internal research teams working across insurance, finance, and forest management.

Project gallery:
Portfolio example for Symbiose Management by Shubhankar , Lead ML and MLOps architect
Portfolio example for Symbiose Management by Shubhankar , Lead ML and MLOps architect
Responsibilities:
  • Architected and built an end-to-end experiment management platform for iLand forest simulations from scratch, including a custom launcher UI, Azure Blob Storage artifact pipeline, and MLflow 3.7 tracking integration;
  • Designed simulation orchestration for 100-year forest projections using ICHEC-EC-EARTH RCP8.5 climate scenarios with disturbance modeling enabled;
  • Instrumented and visualized 11 forest ecosystem metrics, including carbon sequestration, tree volume, basal area, height, and NPP, across factorial and batch experiment runs;
  • Delivered a reproducible ML experimentation system with run comparison, metric visualization, and artifact versioning for internal research teams.
Project Tech stack:
AI system design
AI API integration
AI deployment
Data analysis
Python
Microsoft Azure
Docker
MLflow
Lead ML & MLOps architect
Nov 2025 - Mar 20264 months
Project Overview

A large-scale data orchestration platform for ingesting, processing, and delivering geospatial and environmental datasets. It pulls data from satellite imagery providers, climate model outputs, and government land registries, then transforms raw inputs into analysis-ready formats for GIS and analytics consumers across Europe.

Project gallery:
Portfolio example for Symbiose Management by Shubhankar , Lead ML & MLOps architect
Portfolio example for Symbiose Management by Shubhankar , Lead ML & MLOps architect
Responsibilities:
  • Architected a production Airflow environment managing 45+ scheduled pipelines with full health monitoring and 128 concurrent task slots;
  • Built automated ingestion pipelines pulling from satellite providers, climate agencies, and government land registries, including European and French national sources;
  • Developed format conversion pipelines for Cloud-Optimized GeoTIFF and PMTiles to optimize large raster datasets for web and tile-based delivery;
  • Integrated Google Earth Engine and cloud blob storage as data sources across multiple pipeline families;
  • Implemented tagging, scheduling, and dependency strategies to coordinate 45+ DAGs with varying cadences;
  • Monitored pipeline health and resolved failures to maintain high success rates across all scheduled runs.
Project Tech stack:
Apache Airflow
Python
GoogleAPI
Azure DevOps
Docker
PostgreSQL
CTO
Dec 2024 - Nov 202510 months
Project Overview

An AI meteorology platform for automated climate and solar forecasting. It combines LLM agents, MCP tools, and an interactive global weather map to support time-series forecasts, solar analysis, and climate data exploration.

Project gallery:
Portfolio example for SoranoAI by Shubhankar , CTO
Portfolio example for SoranoAI by Shubhankar , CTO
Portfolio example for SoranoAI by Shubhankar , CTO
Portfolio example for SoranoAI by Shubhankar , CTO
Portfolio example for SoranoAI by Shubhankar , CTO
Portfolio example for SoranoAI by Shubhankar , CTO
Portfolio example for SoranoAI by Shubhankar , CTO
Portfolio example for SoranoAI by Shubhankar , CTO
Portfolio example for SoranoAI by Shubhankar , CTO
Portfolio example for SoranoAI by Shubhankar , CTO
Portfolio example for SoranoAI by Shubhankar , CTO
Portfolio example for SoranoAI by Shubhankar , CTO
Portfolio example for SoranoAI by Shubhankar , CTO
Portfolio example for SoranoAI by Shubhankar , CTO
Responsibilities:
  • Led a 5-person team in architecting and shipping an AI meteorologist product with a natural language chat interface, interactive global solar radiation map, and time-series forecast playback powered by Claude LLM agents and MCP tools;
  • Built a distributed climate data pipeline using Dask, Ray, and GCP, processing 40TB of ECMWF global climate data and reducing processing time from months to days;
  • Achieved 10-15% RMSE improvement in temperature forecasts and 5% RMSE improvement in solar power forecasts through large-scale bias correction and ensemble modeling;
  • Integrated pvlib ModelChain with CEC models and 17,544 hours of historical weather reanalysis (2023-2024) for professional-grade solar energy analysis;
  • Secured Techstars 2025 ($120k), Stanford StartX, and Stanford TomKat Sustainability sponsorship to scale R&D.
Project Tech stack:
PyTorch
Python
XGBoost
Claude API
Claude LLM
MCP
MLOps
LLM
GCP Compute Engine
Ray
Dask
CTO
Nov 2024 - Oct 202511 months
Project Overview

A high-performance distributed ML preprocessing pipeline for climate and weather data stored in Google Cloud Storage. It processes 160TB of data across a small distributed cluster and reduced end-to-end preprocessing time from 8 days to 1.3 days.

Project gallery:
Portfolio example for SoranoAI by Shubhankar , CTO
Portfolio example for SoranoAI by Shubhankar , CTO
Portfolio example for SoranoAI by Shubhankar , CTO
Portfolio example for SoranoAI by Shubhankar , CTO
Responsibilities:
  • Reduced ML preprocessing time from 8 days to 1.3 days by architecting a distributed pipeline across 3 machines processing 160TB+ of GCS-hosted climate data;
  • Achieved a 35x GCS read speedup (70 min → 2 min) through aggressive gcsfuse tuning, including a 512GB file cache, 80 parallel connections per host, and 200 parallel downloads;
  • Designed distributed workload splitting across 680 variables (land and ocean features) with data locality optimization, writing approximately 17TB of preprocessed output per machine;
  • Built full observability into the pipeline with automated progress tracking, ETA estimation, and structured logging for 30+ hour production runs;
  • Developed a test-mode framework validating the full pipeline in 40 minutes before committing to multi-day production runs.
Project Tech stack:
Python
GCP
Dask
Distributed Systems

Education

2018
Computer Science
Masters

Languages

English
Advanced

Hire Shubhankar or someone with similar qualifications in days
All developers are ready for interview and are are just waiting for your requestdream dev illustration
Copyright © 2026 lemon.io. All rights reserved.