Logo
Amit – Python, LangChain, LLM, experts in Lemon.io

Amit

From Japan (UTC+9)flag

Data Engineer|Strong senior

Amit – Python, LangChain, LLM

Amit is a strong senior data engineer with deep expertise in Python, SQL, Airflow, Snowflake, and cloud platforms (AWS, GCP). He has led end-to-end delivery of streaming, lakehouse, and AI-powered data platforms, demonstrating strong architectural judgment and operational maturity. Screenings confirm his ability to communicate technical decisions with business context, operate autonomously, and lead teams in complex, ambiguous environments. He is fluent in English and comfortable in client-facing, international settings.

9 years of commercial experience in
AI
Analytics
Business intelligence
Data analytics
Software development
Main technologies
Python
8 years
LangChain
2 years
LLM
4 years
SQL
10 years
Additional skills
Snowflake
Direct hire
Possible
Ready to get matched with vetted developers fast?
Let’s get started today!

Experience Highlights

Lead Software Engineer (AI)
Jun 2025 - Jan 20267 months
Project Overview

An AI-powered assistant that lets data analysts and BI users query enterprise databases and a data catalog in natural language. The product turns plain-language questions into SQL, surfaces relevant tables, columns, and lineage, and supports multi-turn follow-up questions, enabling users to refine and explore iteratively. It targets data teams who need fast access to information stored across multiple databases without writing SQL by hand.

Responsibilities:
  • Led the initiative end-to-end from concept to MVP delivery;
  • Defined the technical roadmap and translated product requirements into specifications;
  • Designed and built the multi-turn conversational engine, including dialogue state management and context tracking across turns;
  • Implemented schema and catalog metadata grounding to reduce LLM hallucination on database queries;
  • Designed and implemented the RAG architecture using LangChain for orchestration and pgvector for embedding storage and similarity search;
  • Evaluated FAISS and Pinecone as alternative vector stores during technology selection;
  • Developed the natural language to SQL generation pipeline and the catalog search functionality;
  • Managed BI testing and validation cycles with internal users;
  • Reported progress, risks, and mitigation plans to leadership weekly.
Project Tech stack:
Python
PostgreSQL
LangChain
RAG
OpenAI
Anthropic
Docker
Vector Databases
Lead Software Engineer (Data Platform)
Mar 2024 - Jun 20251 year 2 months
Project Overview

A cloud-based platform that connects to enterprise databases, scans and ingests their metadata, and exposes a unified catalog with lineage, quality sampling, and governance controls. It supports 10+ database types including Snowflake, Oracle, and PostgreSQL, and serves enterprise customers who need centralized visibility into their data estate for compliance and discovery.

Responsibilities:
  • Led the platform from prototype to production, coordinating backend, frontend, and data platform teams;
  • Developed the core backend services in Go, including high-throughput metadata collection agents, API services, and distributed task orchestration;
  • Architected and implemented a high-performance metadata ingestion engine connecting 10+ databases (Snowflake, Oracle, PostgreSQL);
  • Reduced pipeline runtimes by 80% (from hours to minutes) through parallelized distributed processing and schema-level optimization;
  • Designed and deployed cloud-agnostic ETL pipelines on AWS Batch with automated data-quality sampling and end-to-end lineage tracking;
  • Built Avro-based data workflows and optimized storage schemas for efficient transformation and querying;
  • Evaluated platform tooling and migrated from Azure Synapse to Databricks (Delta Lake) based on scalability and lakehouse architecture requirements;
  • Introduced observability infrastructure, including deployment automation, CloudWatch dashboards, and alerting pipelines;
  • Managed AWS budgeting and infrastructure optimization across Batch, Lambda, S3, CloudWatch, ECR, and Fargate.
Project Tech stack:
Docker
Golang
Golang REST API
AWS
AWS CloudFormation
AWS Lambda
Amazon EC2
Amazon Cognito
Docker Compose
Apache Airflow
Apache Kafka
Data Engineering Consultant
Oct 2022 - Jul 20238 months
Project Overview

A streaming and batch data pipeline built on Google Cloud for an enterprise customer who needed low-latency analytics across structured and semi-structured data sources. The system ingests data continuously from operational databases via change-data-capture, transforms it through real-time and batch paths, and lands it in a unified BigQuery warehouse for downstream BI and analytics. It processes millions of records daily and supports a hybrid multi-cloud setup spanning GCP and AWS to reduce vendor lock-in.

Responsibilities:
  • Designed and operated streaming data pipelines using Dataflow and Datastream for low-latency change-data-capture and real-time analytics;
  • Built and optimized batch ETL workflows on BigQuery and Dataproc, processing millions of records daily;
  • Architected the pipeline to support both structured and semi-structured data, including schema evolution handling for upstream changes;
  • Contributed to the hybrid multi-cloud architecture spanning GCP and AWS, improving scalability and reducing vendor lock-in;
  • Investigated and resolved global-level product issues affecting pipeline reliability, working directly with Google engineering teams;
  • Tuned BigQuery query performance and storage layout to reduce cost and improve downstream analytics latency;
  • Established monitoring and alerting on pipeline health, freshness, and SLA compliance;
  • Mentored team members on async Python patterns, distributed systems design, and cloud-native data engineering best practices.
Project Tech stack:
BigQuery
SQL
GCP
Airflow
Apache Spark
Apache Airflow
Distributed Systems
Data Engineer / Platform Automation
Jul 2021 - May 202210 months
Project Overview

A platform-automation initiative inside a global internet services company covering two production systems. The first replaced a brittle manual enrollment process with an automated ETL pipeline that integrates the company's LMS as a SaaS source into the internal data warehouse, eliminating recurring human error and weeks of administrative work. The second built an automated security analytics pipeline that ingested penetration test results from multiple scanners, normalized them, and delivered analyst-ready reporting through BigQuery and DOMO, compressing security reporting cycles from weeks to days.

Responsibilities:
  • Designed and automated Python-based ETL pipelines integrating the platform's LMS as a SaaS data source, cutting manual enrollment workload by 70% and improving downstream data accuracy;
  • Built and deployed data pipelines on GCP (BigQuery, DOMO) to ingest and process penetration test results, accelerating the security reporting cycle from weeks to days;
  • Partnered with engineering and security teams to migrate on-premises databases to BigQuery and Azure SQL, reducing infrastructure cost and improving scalability;
  • Designed normalization and schema strategies for heterogeneous security tool outputs to support unified reporting and trend analysis;
  • Established monitoring, validation, and data quality checks on critical reporting pipelines;
  • Mentored 20+ employees on Tableau and SQL best practices, expanding internal data self-service adoption and team-wide data literacy;
  • Documented pipeline architectures and operational runbooks for ongoing handover and team scalability.
Project Tech stack:
BigQuery
SQL
GCP
Tableau
ETL
API

Education

2021
Graduate School of Interdisciplinary Information Studies
M.A.S
2017
Asian Studies, Sociology, Anthropology
B.A

Languages

Japanese
Upper-intermediate
English
Advanced

Hire Amit or someone with similar qualifications in days
All developers are ready for interview and are are just waiting for your requestdream dev illustration
Copyright © 2026 lemon.io. All rights reserved.