Harish

From United States

AI Engineer|Senior

Skills and seniority verified on Mar 19, 2026

Harish – Python, LangChain, LLM

Harish is a Senior AI engineer with strong expertise in Python, RAG architectures, LLM systems, and production-scale AI pipelines. He has hands-on experience with LangChain, LangGraph, PyTorch, and inference optimization using TensorRT and Triton, primarily in enterprise environments. His strengths include methodical system design, embedding model consistency, and robust failure handling. So far, Harish has showcased communication that is structured, clear, and well-suited for client-facing roles!

6 years of commercial experience in

Machine learning

AI software

NLP software

Main technologies

Python

5 years

LangChain

2.5 years

LLM

2 years

AWS

3 years

Additional skills

AI agent development

GCP

LangGraph

Pinecone

MLOps

RAG

PyTorch

MLflow

Vector Databases

OpenAI

Amazon S3

Weights & Biases

LLaMA

Direct hire

Possible

Ready to get matched with vetted developers fast?

Let’s get started today!

Experience Highlights

AI/ML Engineer

Dec 2024 - Feb 20261 year 2 months

Project Overview

Built a high-performance inference platform for large language models focused on reducing latency and maximizing GPU utilization in production environments, leveraging techniques like quantization, speculative decoding, and distributed inference.

Responsibilities:

Built optimized inference pipelines using TensorRT-LLM and Triton Inference Server for large-scale LLM serving
Reduced inference latency by 50% by applying quantization and speculative decoding, balancing performance and model accuracy trade-offs
Improved GPU utilization using DeepSpeed and CUDA optimizations for efficient large-scale deployments
Implemented dynamic batching and parallel inference using VLLM to support high-throughput workloads
Designed and deployed scalable infrastructure on Kubernetes for production-grade AI systems
Partnered with product and platform teams to align performance improvements with business requirements
Translated complex optimization strategies into clear insights for non-technical stakeholders
Led code reviews and shared best practices for performance tuning across the team
Integrated monitoring using Prometheus and Grafana to track system performance and reliability

Project Tech stack:

LLM

Triton

Kubernetes

Prometheus

Grafana

AI/ML Engineer

Apr 2024 - Nov 20247 months

Project Overview

Developed a multi-modal generative AI platform capable of processing text, images, and structured data to generate contextual enterprise insights, supporting document intelligence and automated reporting workflows.

Responsibilities:

Developed multi-modal pipelines combining text, image, and tabular data using NeMo and PyTorch
Built document understanding workflows using OCR and layout-aware models for structured data extraction
Integrated LLMs for summarization and cross-modal reasoning across enterprise datasets
Improved data extraction and analysis accuracy by 35% through model and pipeline optimization
Built scalable APIs using FastAPI and deployed services using Docker and Kubernetes
Collaborated with enterprise stakeholders to integrate AI capabilities into real-world workflows
Translated business requirements into AI-driven solutions aligned with operational goals
Optimized model performance for real-time processing, balancing latency and accuracy constraints
Contributed to internal knowledge sharing and system design discussions

Project Tech stack:

PyTorch

OpenCV

FastAPI

Docker

Kubernetes

AI/ML Engineer

Dec 2022 - Jun 20236 months

Project Overview

Built a retrieval-augmented generation system enabling enterprise users to query internal knowledge bases and receive accurate, context-aware responses, improving information accessibility across teams.

Responsibilities:

Built RAG pipelines using Azure OpenAI and Azure Cognitive Search for contextual information retrieval
Integrated LangChain for prompt orchestration and multi-step reasoning workflows
Improved answer accuracy by 30% through optimized retrieval strategies and prompt engineering
Developed FastAPI-based services for real-time enterprise usage
Fine-tuned LLMs for domain-specific use cases such as summarization and Q&A
Partnered with product teams to integrate GenAI features into enterprise applications
Communicated system capabilities and limitations to non-technical stakeholders
Implemented evaluation frameworks to measure response quality, latency, and reliability

Project Tech stack:

Microsoft Azure

OpenAI

LangChain

Python

FastAPI

Hugging Face

AI/ML Engineer

Mar 2021 - Nov 20221 year 8 months

Project Overview

Designed and developed an end-to-end ML platform supporting automated data pipelines, model training, and real-time inference for enterprise applications.

Responsibilities:

Designed end-to-end ML pipelines using Azure ML, reducing model lifecycle time by 35%
Built real-time inference APIs using FastAPI and deployed on AKS for scalable serving
Developed large-scale data pipelines using Azure Data Factory and Databricks
Implemented model tracking, versioning, and monitoring using MLflow
Improved prediction latency by 40% through optimized deployment strategies
Collaborated with cross-functional teams to productionize machine learning solutions
Ensured data quality and consistency through automated validation pipelines
Evaluated architectural choices to balance scalability, cost, and performance

Project Tech stack:

Microsoft Azure

Databricks

PyTorch

Scikit-learn

FastAPI

Kubernetes

MLflow

Python

Keep in mind, the experience summary might exclude non-relevant projects

Education

2025

Computer Science

Master's

Languages

English

Advanced

Hire Harish or someone with similar qualifications in days

All developers are ready for interview and are are just waiting for your request