Shubhankar
From United States (UTC-5)
Shubhankar – Data Science, Python, MLOps
Shubhankar is a senior Data Scientist, MLOps, and Machine Learning engineer with strong experience in large-scale ML systems, remote sensing, and climate data pipelines. He has led teams and architected distributed solutions for terabyte-scale scientific datasets, demonstrating practical production awareness and domain-driven feature engineering. His strengths lie in scalable data processing, MLOps practices, and domain-driven feature engineering, with solid exposure to real-world scientific and environmental use cases.
11 years of commercial experience in
Main technologies
Additional skills
Direct hire
PossibleReady to get matched with vetted developers fast?
Let’s get started today!Experience Highlights
Lead ML & MLOps Architect
An internal full-stack simulation platform for long-horizon forest ecosystem modeling under real-world climate change scenarios. It supports 100-year iLand forest projections, experiment tracking, and artifact management for internal research teams working across insurance, finance, and forest management.


- Architected and built an end-to-end experiment management platform for iLand forest simulations from scratch, including a custom launcher UI, Azure Blob Storage artifact pipeline, and MLflow 3.7 tracking integration;
- Designed simulation orchestration for 100-year forest projections using ICHEC-EC-EARTH RCP8.5 climate scenarios with disturbance modeling enabled;
- Instrumented and visualized 11 forest ecosystem metrics, including carbon sequestration, tree volume, basal area, height, and NPP, across factorial and batch experiment runs;
- Delivered a reproducible ML experimentation system with run comparison, metric visualization, and artifact versioning for internal research teams.
Lead ML & MLOps architect
A large-scale data orchestration platform for ingesting, processing, and delivering geospatial and environmental datasets. It pulls data from satellite imagery providers, climate model outputs, and government land registries, then transforms raw inputs into analysis-ready formats for GIS and analytics consumers across Europe.


- Architected a production Airflow environment managing 45+ scheduled pipelines with full health monitoring and 128 concurrent task slots;
- Built automated ingestion pipelines pulling from satellite providers, climate agencies, and government land registries, including European and French national sources;
- Developed format conversion pipelines for Cloud-Optimized GeoTIFF and PMTiles to optimize large raster datasets for web and tile-based delivery;
- Integrated Google Earth Engine and cloud blob storage as data sources across multiple pipeline families;
- Implemented tagging, scheduling, and dependency strategies to coordinate 45+ DAGs with varying cadences;
- Monitored pipeline health and resolved failures to maintain high success rates across all scheduled runs.
CTO
An AI meteorology platform for automated climate and solar forecasting. It combines LLM agents, MCP tools, and an interactive global weather map to support time-series forecasts, solar analysis, and climate data exploration.














- Led a 5-person team in architecting and shipping an AI meteorologist product with a natural language chat interface, interactive global solar radiation map, and time-series forecast playback powered by Claude LLM agents and MCP tools;
- Built a distributed climate data pipeline using Dask, Ray, and GCP, processing 40TB of ECMWF global climate data and reducing processing time from months to days;
- Achieved 10-15% RMSE improvement in temperature forecasts and 5% RMSE improvement in solar power forecasts through large-scale bias correction and ensemble modeling;
- Integrated pvlib ModelChain with CEC models and 17,544 hours of historical weather reanalysis (2023-2024) for professional-grade solar energy analysis;
- Secured Techstars 2025 ($120k), Stanford StartX, and Stanford TomKat Sustainability sponsorship to scale R&D.
CTO
A high-performance distributed ML preprocessing pipeline for climate and weather data stored in Google Cloud Storage. It processes 160TB of data across a small distributed cluster and reduced end-to-end preprocessing time from 8 days to 1.3 days.




- Reduced ML preprocessing time from 8 days to 1.3 days by architecting a distributed pipeline across 3 machines processing 160TB+ of GCS-hosted climate data;
- Achieved a 35x GCS read speedup (70 min → 2 min) through aggressive gcsfuse tuning, including a 512GB file cache, 80 parallel connections per host, and 200 parallel downloads;
- Designed distributed workload splitting across 680 variables (land and ocean features) with data locality optimization, writing approximately 17TB of preprocessed output per machine;
- Built full observability into the pipeline with automated progress tracking, ETA estimation, and structured logging for 30+ hour production runs;
- Developed a test-mode framework validating the full pipeline in 40 minutes before committing to multi-day production runs.