Logo
Harish – Python, LangChain, LLM, experts in Lemon.io

Harish

From United Statesflag

AI Engineer|Senior

Harish – Python, LangChain, LLM

Harish is a Senior AI engineer with strong expertise in Python, RAG architectures, LLM systems, and production-scale AI pipelines. He has hands-on experience with LangChain, LangGraph, PyTorch, and inference optimization using TensorRT and Triton, primarily in enterprise environments. His strengths include methodical system design, embedding model consistency, and robust failure handling. So far, Harish has showcased communication that is structured, clear, and well-suited for client-facing roles!

6 years of commercial experience in
AI
Machine learning
AI software
NLP software
Main technologies
Python
5 years
LangChain
2.5 years
LLM
2 years
AWS
3 years
Additional skills
AI agent development
GCP
LangGraph
Pinecone
MLOps
RAG
PyTorch
MLflow
Vector Databases
OpenAI
Amazon S3
Weights & Biases
LLaMA
Direct hire
Possible
Ready to get matched with vetted developers fast?
Let’s get started today!

Experience Highlights

AI/ML Engineer
Dec 2024 - Feb 20261 year 2 months
Project Overview

Built a high-performance inference platform for large language models focused on reducing latency and maximizing GPU utilization in production environments, leveraging techniques like quantization, speculative decoding, and distributed inference.

Responsibilities:
  • Built optimized inference pipelines using TensorRT-LLM and Triton Inference Server for large-scale LLM serving
  • Reduced inference latency by 50% by applying quantization and speculative decoding, balancing performance and model accuracy trade-offs
  • Improved GPU utilization using DeepSpeed and CUDA optimizations for efficient large-scale deployments
  • Implemented dynamic batching and parallel inference using VLLM to support high-throughput workloads
  • Designed and deployed scalable infrastructure on Kubernetes for production-grade AI systems
  • Partnered with product and platform teams to align performance improvements with business requirements
  • Translated complex optimization strategies into clear insights for non-technical stakeholders
  • Led code reviews and shared best practices for performance tuning across the team
  • Integrated monitoring using Prometheus and Grafana to track system performance and reliability
Project Tech stack:
LLM
Triton
Kubernetes
Prometheus
Grafana
AI/ML Engineer
Apr 2024 - Nov 20247 months
Project Overview

Developed a multi-modal generative AI platform capable of processing text, images, and structured data to generate contextual enterprise insights, supporting document intelligence and automated reporting workflows.

Responsibilities:
  • Developed multi-modal pipelines combining text, image, and tabular data using NeMo and PyTorch
  • Built document understanding workflows using OCR and layout-aware models for structured data extraction
  • Integrated LLMs for summarization and cross-modal reasoning across enterprise datasets
  • Improved data extraction and analysis accuracy by 35% through model and pipeline optimization
  • Built scalable APIs using FastAPI and deployed services using Docker and Kubernetes
  • Collaborated with enterprise stakeholders to integrate AI capabilities into real-world workflows
  • Translated business requirements into AI-driven solutions aligned with operational goals
  • Optimized model performance for real-time processing, balancing latency and accuracy constraints
  • Contributed to internal knowledge sharing and system design discussions
Project Tech stack:
PyTorch
OpenCV
FastAPI
Docker
Kubernetes
AI/ML Engineer
Dec 2022 - Jun 20236 months
Project Overview

Built a retrieval-augmented generation system enabling enterprise users to query internal knowledge bases and receive accurate, context-aware responses, improving information accessibility across teams.

Responsibilities:
  • Built RAG pipelines using Azure OpenAI and Azure Cognitive Search for contextual information retrieval
  • Integrated LangChain for prompt orchestration and multi-step reasoning workflows
  • Improved answer accuracy by 30% through optimized retrieval strategies and prompt engineering
  • Developed FastAPI-based services for real-time enterprise usage
  • Fine-tuned LLMs for domain-specific use cases such as summarization and Q&A
  • Partnered with product teams to integrate GenAI features into enterprise applications
  • Communicated system capabilities and limitations to non-technical stakeholders
  • Implemented evaluation frameworks to measure response quality, latency, and reliability
Project Tech stack:
Microsoft Azure
OpenAI
LangChain
Python
FastAPI
Hugging Face
AI/ML Engineer
Mar 2021 - Nov 20221 year 8 months
Project Overview

Designed and developed an end-to-end ML platform supporting automated data pipelines, model training, and real-time inference for enterprise applications.

Responsibilities:
  • Designed end-to-end ML pipelines using Azure ML, reducing model lifecycle time by 35%
  • Built real-time inference APIs using FastAPI and deployed on AKS for scalable serving
  • Developed large-scale data pipelines using Azure Data Factory and Databricks
  • Implemented model tracking, versioning, and monitoring using MLflow
  • Improved prediction latency by 40% through optimized deployment strategies
  • Collaborated with cross-functional teams to productionize machine learning solutions
  • Ensured data quality and consistency through automated validation pipelines
  • Evaluated architectural choices to balance scalability, cost, and performance
Project Tech stack:
Microsoft Azure
Databricks
PyTorch
Scikit-learn
FastAPI
Kubernetes
MLflow
Python

Education

2025
Computer Science
Master's

Languages

English
Advanced

Hire Harish or someone with similar qualifications in days
All developers are ready for interview and are are just waiting for your requestdream dev illustration
Copyright © 2026 lemon.io. All rights reserved.