Amit – Python, LangChain, LLM
Amit is a strong senior data engineer with deep expertise in Python, SQL, Airflow, Snowflake, and cloud platforms (AWS, GCP). He has led end-to-end delivery of streaming, lakehouse, and AI-powered data platforms, demonstrating strong architectural judgment and operational maturity. Screenings confirm his ability to communicate technical decisions with business context, operate autonomously, and lead teams in complex, ambiguous environments. He is fluent in English and comfortable in client-facing, international settings.
9 years of commercial experience in
Main technologies
Additional skills
Direct hire
PossibleReady to get matched with vetted developers fast?
Let’s get started today!Experience Highlights
Lead Software Engineer (AI)
An AI-powered assistant that lets data analysts and BI users query enterprise databases and a data catalog in natural language. The product turns plain-language questions into SQL, surfaces relevant tables, columns, and lineage, and supports multi-turn follow-up questions, enabling users to refine and explore iteratively. It targets data teams who need fast access to information stored across multiple databases without writing SQL by hand.
- Led the initiative end-to-end from concept to MVP delivery;
- Defined the technical roadmap and translated product requirements into specifications;
- Designed and built the multi-turn conversational engine, including dialogue state management and context tracking across turns;
- Implemented schema and catalog metadata grounding to reduce LLM hallucination on database queries;
- Designed and implemented the RAG architecture using LangChain for orchestration and pgvector for embedding storage and similarity search;
- Evaluated FAISS and Pinecone as alternative vector stores during technology selection;
- Developed the natural language to SQL generation pipeline and the catalog search functionality;
- Managed BI testing and validation cycles with internal users;
- Reported progress, risks, and mitigation plans to leadership weekly.
Lead Software Engineer (Data Platform)
A cloud-based platform that connects to enterprise databases, scans and ingests their metadata, and exposes a unified catalog with lineage, quality sampling, and governance controls. It supports 10+ database types including Snowflake, Oracle, and PostgreSQL, and serves enterprise customers who need centralized visibility into their data estate for compliance and discovery.
- Led the platform from prototype to production, coordinating backend, frontend, and data platform teams;
- Developed the core backend services in Go, including high-throughput metadata collection agents, API services, and distributed task orchestration;
- Architected and implemented a high-performance metadata ingestion engine connecting 10+ databases (Snowflake, Oracle, PostgreSQL);
- Reduced pipeline runtimes by 80% (from hours to minutes) through parallelized distributed processing and schema-level optimization;
- Designed and deployed cloud-agnostic ETL pipelines on AWS Batch with automated data-quality sampling and end-to-end lineage tracking;
- Built Avro-based data workflows and optimized storage schemas for efficient transformation and querying;
- Evaluated platform tooling and migrated from Azure Synapse to Databricks (Delta Lake) based on scalability and lakehouse architecture requirements;
- Introduced observability infrastructure, including deployment automation, CloudWatch dashboards, and alerting pipelines;
- Managed AWS budgeting and infrastructure optimization across Batch, Lambda, S3, CloudWatch, ECR, and Fargate.
Data Engineering Consultant
A streaming and batch data pipeline built on Google Cloud for an enterprise customer who needed low-latency analytics across structured and semi-structured data sources. The system ingests data continuously from operational databases via change-data-capture, transforms it through real-time and batch paths, and lands it in a unified BigQuery warehouse for downstream BI and analytics. It processes millions of records daily and supports a hybrid multi-cloud setup spanning GCP and AWS to reduce vendor lock-in.
- Designed and operated streaming data pipelines using Dataflow and Datastream for low-latency change-data-capture and real-time analytics;
- Built and optimized batch ETL workflows on BigQuery and Dataproc, processing millions of records daily;
- Architected the pipeline to support both structured and semi-structured data, including schema evolution handling for upstream changes;
- Contributed to the hybrid multi-cloud architecture spanning GCP and AWS, improving scalability and reducing vendor lock-in;
- Investigated and resolved global-level product issues affecting pipeline reliability, working directly with Google engineering teams;
- Tuned BigQuery query performance and storage layout to reduce cost and improve downstream analytics latency;
- Established monitoring and alerting on pipeline health, freshness, and SLA compliance;
- Mentored team members on async Python patterns, distributed systems design, and cloud-native data engineering best practices.
Data Engineer / Platform Automation
A platform-automation initiative inside a global internet services company covering two production systems. The first replaced a brittle manual enrollment process with an automated ETL pipeline that integrates the company's LMS as a SaaS source into the internal data warehouse, eliminating recurring human error and weeks of administrative work. The second built an automated security analytics pipeline that ingested penetration test results from multiple scanners, normalized them, and delivered analyst-ready reporting through BigQuery and DOMO, compressing security reporting cycles from weeks to days.
- Designed and automated Python-based ETL pipelines integrating the platform's LMS as a SaaS data source, cutting manual enrollment workload by 70% and improving downstream data accuracy;
- Built and deployed data pipelines on GCP (BigQuery, DOMO) to ingest and process penetration test results, accelerating the security reporting cycle from weeks to days;
- Partnered with engineering and security teams to migrate on-premises databases to BigQuery and Azure SQL, reducing infrastructure cost and improving scalability;
- Designed normalization and schema strategies for heterogeneous security tool outputs to support unified reporting and trend analysis;
- Established monitoring, validation, and data quality checks on critical reporting pipelines;
- Mentored 20+ employees on Tableau and SQL best practices, expanding internal data self-service adoption and team-wide data literacy;
- Documented pipeline architectures and operational runbooks for ongoing handover and team scalability.