Harish – Python, LangChain, LLM
Harish is a Senior AI engineer with strong expertise in Python, RAG architectures, LLM systems, and production-scale AI pipelines. He has hands-on experience with LangChain, LangGraph, PyTorch, and inference optimization using TensorRT and Triton, primarily in enterprise environments. His strengths include methodical system design, embedding model consistency, and robust failure handling. So far, Harish has showcased communication that is structured, clear, and well-suited for client-facing roles!
6 years of commercial experience in
Main technologies
Additional skills
Direct hire
PossibleReady to get matched with vetted developers fast?
Let’s get started today!Experience Highlights
AI/ML Engineer
A high-performance inference platform for large language models, focusing on reducing latency and maximizing GPU utilization in production environments. The system supports optimized deployment of LLMs with advanced techniques such as quantization, speculative decoding, and distributed inference.
- Built optimized inference pipelines using TensorRT-LLM and Triton Inference Server;
- Reduced inference latency by 50% using quantization and speculative decoding;
- Improved GPU utilization through DeepSpeed and CUDA optimizations;
- Implemented dynamic batching and parallel inference using VLLM;
- Deployed scalable infrastructure using Kubernetes for high-throughput workloads;
- Integrated monitoring using Prometheus and Grafana for performance tracking;
- Enabled production-grade LLM serving for enterprise-scale applications.
AI/ML Engineer
A multi-modal generative AI system capable of processing and understanding text, images, and structured data to generate contextual insights for enterprise workflows. The platform enables intelligent document processing, summarization, and cross-modal reasoning, supporting use cases such as document intelligence and automated reporting
- Developed multi-modal pipelines combining text, image, and tabular data using NeMo and PyTorch;
- Implemented document understanding workflows using OCR and layout-aware models;
- Integrated LLMs for summarization and contextual reasoning across modalities;
- Improved data extraction and analysis accuracy by 35%;
- Built scalable APIs using FastAPI and deployed using Docker and Kubernetes;
- Collaborated with enterprise teams to integrate AI solutions into production workflows;
- Optimized model performance for real-time processing scenarios.
AI/ML Engineer
A retrieval-augmented generation (RAG) system for enterprise knowledge retrieval, enabling users to query internal documents and receive context-aware, accurate responses. The system integrates LLMs with enterprise search to provide grounded answers, improving information accessibility across teams and reducing dependency on manual search processes.
- Built RAG pipelines using Azure OpenAI and Azure Cognitive Search for contextual retrieval;
- Integrated LangChain for prompt orchestration and multi-step query handling;
- Improved answer accuracy and relevance by 30% through optimized retrieval strategies;
- Developed APIs using FastAPI for real-time enterprise usage;
- Fine-tuned LLMs for domain-specific tasks such as summarization and Q&A;
- Collaborated with product teams to integrate GenAI features into enterprise applications;
- Implemented evaluation metrics for response quality and latency.
AI/ML Engineer
An end-to-end machine learning platform designed for enterprise use cases, enabling automated data ingestion, model training, and real-time inference. The system supports batch and streaming pipelines, allowing scalable deployment of ML models for business-critical applications. Internal teams used it to generate predictions and insights in near real-time, improving decision-making efficiency.
- Designed end-to-end ML pipelines using Azure ML, reducing model lifecycle time by 35%;
- Built real-time inference APIs using FastAPI and deployed on AKS for scalable serving;
- Developed data pipelines using Azure Data Factory and Databricks for large-scale processing;
- Implemented model tracking, versioning, and monitoring using MLflow;
- Improved prediction latency by 40% through optimized deployment strategies;
- Collaborated with cross-functional teams to productionize ML models;
- Ensured data quality and consistency through automated validation pipelines.