Amit – Python, LangChain, LLM, experts in Lemon.io

Amit

From Japan (UTC+9)

Data Engineer|Strong senior

Skills and seniority verified on Jun 3, 2026

Amit – Python, LangChain, LLM

Amit is a strong senior data engineer with deep expertise in Python, SQL, Airflow, Snowflake, and cloud platforms (AWS, GCP). He has led end-to-end delivery of streaming, lakehouse, and AI-powered data platforms, demonstrating strong architectural judgment and operational maturity. Screenings confirm his ability to communicate technical decisions with business context, operate autonomously, and lead teams in complex, ambiguous environments. He is fluent in English and comfortable in client-facing, international settings.

9 years of commercial experience in

AI

Analytics

Business intelligence

Data analytics

Software development

Main technologies

Python

8 years

LangChain

2 years

LLM

4 years

SQL

10 years

Additional skills

Snowflake

MongoDB

Direct hire

Possible

Ready to get matched with vetted developers fast?

Let’s get started today!

Experience Highlights

Lead Software Engineer (AI)

Jun 2025 - Jan 20267 months

Project Overview

An AI-powered assistant that lets data analysts and BI users query enterprise databases and a data catalog in natural language. The product turns plain-language questions into SQL, surfaces relevant tables, columns, and lineage, and supports multi-turn follow-up questions, enabling users to refine and explore iteratively. It targets data teams who need fast access to information stored across multiple databases without writing SQL by hand.

Responsibilities:

Led the initiative end-to-end from concept to MVP delivery;
Defined the technical roadmap and translated product requirements into specifications;
Designed and built the multi-turn conversational engine, including dialogue state management and context tracking across turns;
Implemented schema and catalog metadata grounding to reduce LLM hallucination on database queries;
Designed and implemented the RAG architecture using LangChain for orchestration and pgvector for embedding storage and similarity search;
Evaluated FAISS and Pinecone as alternative vector stores during technology selection;
Developed the natural language to SQL generation pipeline and the catalog search functionality;
Managed BI testing and validation cycles with internal users;
Reported progress, risks, and mitigation plans to leadership weekly.

Project Tech stack:

Python

PostgreSQL

LangChain

RAG

OpenAI

Anthropic

Docker

Vector Databases

Lead Software Engineer (Data Platform)

Mar 2024 - Jun 20251 year 2 months

Project Overview

A cloud-based platform that connects to enterprise databases, scans and ingests their metadata, and exposes a unified catalog with lineage, quality sampling, and governance controls. It supports 10+ database types including Snowflake, Oracle, and PostgreSQL, and serves enterprise customers who need centralized visibility into their data estate for compliance and discovery.

Responsibilities:

Led the platform from prototype to production, coordinating backend, frontend, and data platform teams;
Developed the core backend services in Go, including high-throughput metadata collection agents, API services, and distributed task orchestration;
Architected and implemented a high-performance metadata ingestion engine connecting 10+ databases (Snowflake, Oracle, PostgreSQL);
Reduced pipeline runtimes by 80% (from hours to minutes) through parallelized distributed processing and schema-level optimization;
Designed and deployed cloud-agnostic ETL pipelines on AWS Batch with automated data-quality sampling and end-to-end lineage tracking;
Built Avro-based data workflows and optimized storage schemas for efficient transformation and querying;
Evaluated platform tooling and migrated from Azure Synapse to Databricks (Delta Lake) based on scalability and lakehouse architecture requirements;
Introduced observability infrastructure, including deployment automation, CloudWatch dashboards, and alerting pipelines;
Managed AWS budgeting and infrastructure optimization across Batch, Lambda, S3, CloudWatch, ECR, and Fargate.

Project Tech stack:

Docker

Golang

Golang REST API

AWS

AWS CloudFormation

AWS Lambda

Amazon EC2

Amazon Cognito

Docker Compose

Apache Airflow

Apache Kafka

Data Engineering Consultant

Oct 2022 - Jul 20238 months

Project Overview

A streaming and batch data pipeline built on Google Cloud for an enterprise customer who needed low-latency analytics across structured and semi-structured data sources. The system ingests data continuously from operational databases via change-data-capture, transforms it through real-time and batch paths, and lands it in a unified BigQuery warehouse for downstream BI and analytics. It processes millions of records daily and supports a hybrid multi-cloud setup spanning GCP and AWS to reduce vendor lock-in.

Responsibilities:

Designed and operated streaming data pipelines using Dataflow and Datastream for low-latency change-data-capture and real-time analytics;
Built and optimized batch ETL workflows on BigQuery and Dataproc, processing millions of records daily;
Architected the pipeline to support both structured and semi-structured data, including schema evolution handling for upstream changes;
Contributed to the hybrid multi-cloud architecture spanning GCP and AWS, improving scalability and reducing vendor lock-in;
Investigated and resolved global-level product issues affecting pipeline reliability, working directly with Google engineering teams;
Tuned BigQuery query performance and storage layout to reduce cost and improve downstream analytics latency;
Established monitoring and alerting on pipeline health, freshness, and SLA compliance;
Mentored team members on async Python patterns, distributed systems design, and cloud-native data engineering best practices.

Project Tech stack:

BigQuery

SQL

GCP

Airflow

Apache Spark

Apache Airflow

Distributed Systems

Data Engineer / Platform Automation

Jul 2021 - May 202210 months

Project Overview

A platform-automation initiative inside a global internet services company covering two production systems. The first replaced a brittle manual enrollment process with an automated ETL pipeline that integrates the company's LMS as a SaaS source into the internal data warehouse, eliminating recurring human error and weeks of administrative work. The second built an automated security analytics pipeline that ingested penetration test results from multiple scanners, normalized them, and delivered analyst-ready reporting through BigQuery and DOMO, compressing security reporting cycles from weeks to days.

Responsibilities:

Designed and automated Python-based ETL pipelines integrating the platform's LMS as a SaaS data source, cutting manual enrollment workload by 70% and improving downstream data accuracy;
Built and deployed data pipelines on GCP (BigQuery, DOMO) to ingest and process penetration test results, accelerating the security reporting cycle from weeks to days;
Partnered with engineering and security teams to migrate on-premises databases to BigQuery and Azure SQL, reducing infrastructure cost and improving scalability;
Designed normalization and schema strategies for heterogeneous security tool outputs to support unified reporting and trend analysis;
Established monitoring, validation, and data quality checks on critical reporting pipelines;
Mentored 20+ employees on Tableau and SQL best practices, expanding internal data self-service adoption and team-wide data literacy;
Documented pipeline architectures and operational runbooks for ongoing handover and team scalability.

Project Tech stack:

BigQuery

SQL

GCP

Tableau

ETL

API

Keep in mind, the experience summary might exclude non-relevant projects

Education

2021

Graduate School of Interdisciplinary Information Studies

M.A.S

2017

Asian Studies, Sociology, Anthropology

B.A

Languages

Japanese

Upper-intermediate

English

Advanced

Hire Amit or someone with similar qualifications in days

All developers are ready for interview and are are just waiting for your request

Copyright © 2026 lemon.io. All rights reserved.

Terms of use Privacy policy