Logo
Harish – Python, LangChain, LLM, experts in Lemon.io

Harish

From United Statesflag

AI Engineer|Senior

Harish – Python, LangChain, LLM

Harish is a Senior AI engineer with strong expertise in Python, RAG architectures, LLM systems, and production-scale AI pipelines. He has hands-on experience with LangChain, LangGraph, PyTorch, and inference optimization using TensorRT and Triton, primarily in enterprise environments. His strengths include methodical system design, embedding model consistency, and robust failure handling. So far, Harish has showcased communication that is structured, clear, and well-suited for client-facing roles!

6 years of commercial experience in
AI
Machine learning
AI software
NLP software
Main technologies
Python
5 years
LangChain
2.5 years
LLM
2 years
AWS
3 years
Additional skills
AI agent development
GCP
LangGraph
Pinecone
MLOps
RAG
PyTorch
MLflow
Vector Databases
OpenAI
Amazon S3
Weights & Biases
LLaMA
Direct hire
Possible
Ready to get matched with vetted developers fast?
Let’s get started today!

Experience Highlights

AI/ML Engineer
Dec 2024 - Feb 20261 year 2 months
Project Overview

A high-performance inference platform for large language models, focusing on reducing latency and maximizing GPU utilization in production environments. The system supports optimized deployment of LLMs with advanced techniques such as quantization, speculative decoding, and distributed inference.

Responsibilities:
  • Built optimized inference pipelines using TensorRT-LLM and Triton Inference Server;
  • Reduced inference latency by 50% using quantization and speculative decoding;
  • Improved GPU utilization through DeepSpeed and CUDA optimizations;
  • Implemented dynamic batching and parallel inference using VLLM;
  • Deployed scalable infrastructure using Kubernetes for high-throughput workloads;
  • Integrated monitoring using Prometheus and Grafana for performance tracking;
  • Enabled production-grade LLM serving for enterprise-scale applications.
Project Tech stack:
LLM
Triton
Kubernetes
Prometheus
Grafana
AI/ML Engineer
Apr 2024 - Nov 20247 months
Project Overview

A multi-modal generative AI system capable of processing and understanding text, images, and structured data to generate contextual insights for enterprise workflows. The platform enables intelligent document processing, summarization, and cross-modal reasoning, supporting use cases such as document intelligence and automated reporting

Responsibilities:
  • Developed multi-modal pipelines combining text, image, and tabular data using NeMo and PyTorch;
  • Implemented document understanding workflows using OCR and layout-aware models;
  • Integrated LLMs for summarization and contextual reasoning across modalities;
  • Improved data extraction and analysis accuracy by 35%;
  • Built scalable APIs using FastAPI and deployed using Docker and Kubernetes;
  • Collaborated with enterprise teams to integrate AI solutions into production workflows;
  • Optimized model performance for real-time processing scenarios.
Project Tech stack:
PyTorch
OpenCV
FastAPI
Docker
Kubernetes
AI/ML Engineer
Dec 2022 - Jun 20235 months
Project Overview

A retrieval-augmented generation (RAG) system for enterprise knowledge retrieval, enabling users to query internal documents and receive context-aware, accurate responses. The system integrates LLMs with enterprise search to provide grounded answers, improving information accessibility across teams and reducing dependency on manual search processes.

Responsibilities:
  • Built RAG pipelines using Azure OpenAI and Azure Cognitive Search for contextual retrieval;
  • Integrated LangChain for prompt orchestration and multi-step query handling;
  • Improved answer accuracy and relevance by 30% through optimized retrieval strategies;
  • Developed APIs using FastAPI for real-time enterprise usage;
  • Fine-tuned LLMs for domain-specific tasks such as summarization and Q&A;
  • Collaborated with product teams to integrate GenAI features into enterprise applications;
  • Implemented evaluation metrics for response quality and latency.
Project Tech stack:
Microsoft Azure
OpenAI
LangChain
Python
FastAPI
Hugging Face
AI/ML Engineer
Mar 2021 - Nov 20221 year 7 months
Project Overview

An end-to-end machine learning platform designed for enterprise use cases, enabling automated data ingestion, model training, and real-time inference. The system supports batch and streaming pipelines, allowing scalable deployment of ML models for business-critical applications. Internal teams used it to generate predictions and insights in near real-time, improving decision-making efficiency.

Responsibilities:
  • Designed end-to-end ML pipelines using Azure ML, reducing model lifecycle time by 35%;
  • Built real-time inference APIs using FastAPI and deployed on AKS for scalable serving;
  • Developed data pipelines using Azure Data Factory and Databricks for large-scale processing;
  • Implemented model tracking, versioning, and monitoring using MLflow;
  • Improved prediction latency by 40% through optimized deployment strategies;
  • Collaborated with cross-functional teams to productionize ML models;
  • Ensured data quality and consistency through automated validation pipelines.
Project Tech stack:
Microsoft Azure
Databricks
PyTorch
Scikit-learn
FastAPI
Kubernetes
MLflow
Python

Education

2025
Computer Science
Master's

Languages

English
Advanced

Hire Harish or someone with similar qualifications in days
All developers are ready for interview and are are just waiting for your requestdream dev illustration
Copyright © 2026 lemon.io. All rights reserved.