Harish – Python, LangChain, LLM
Harish is a Senior AI engineer with strong expertise in Python, RAG architectures, LLM systems, and production-scale AI pipelines. He has hands-on experience with LangChain, LangGraph, PyTorch, and inference optimization using TensorRT and Triton, primarily in enterprise environments. His strengths include methodical system design, embedding model consistency, and robust failure handling. So far, Harish has showcased communication that is structured, clear, and well-suited for client-facing roles!
6 years of commercial experience in
Main technologies
Additional skills
Direct hire
PossibleReady to get matched with vetted developers fast?
Let’s get started today!Experience Highlights
AI/ML Engineer
Built a high-performance inference platform for large language models focused on reducing latency and maximizing GPU utilization in production environments, leveraging techniques like quantization, speculative decoding, and distributed inference.
- Built optimized inference pipelines using TensorRT-LLM and Triton Inference Server for large-scale LLM serving
- Reduced inference latency by 50% by applying quantization and speculative decoding, balancing performance and model accuracy trade-offs
- Improved GPU utilization using DeepSpeed and CUDA optimizations for efficient large-scale deployments
- Implemented dynamic batching and parallel inference using VLLM to support high-throughput workloads
- Designed and deployed scalable infrastructure on Kubernetes for production-grade AI systems
- Partnered with product and platform teams to align performance improvements with business requirements
- Translated complex optimization strategies into clear insights for non-technical stakeholders
- Led code reviews and shared best practices for performance tuning across the team
- Integrated monitoring using Prometheus and Grafana to track system performance and reliability
AI/ML Engineer
Developed a multi-modal generative AI platform capable of processing text, images, and structured data to generate contextual enterprise insights, supporting document intelligence and automated reporting workflows.
- Developed multi-modal pipelines combining text, image, and tabular data using NeMo and PyTorch
- Built document understanding workflows using OCR and layout-aware models for structured data extraction
- Integrated LLMs for summarization and cross-modal reasoning across enterprise datasets
- Improved data extraction and analysis accuracy by 35% through model and pipeline optimization
- Built scalable APIs using FastAPI and deployed services using Docker and Kubernetes
- Collaborated with enterprise stakeholders to integrate AI capabilities into real-world workflows
- Translated business requirements into AI-driven solutions aligned with operational goals
- Optimized model performance for real-time processing, balancing latency and accuracy constraints
- Contributed to internal knowledge sharing and system design discussions
AI/ML Engineer
Built a retrieval-augmented generation system enabling enterprise users to query internal knowledge bases and receive accurate, context-aware responses, improving information accessibility across teams.
- Built RAG pipelines using Azure OpenAI and Azure Cognitive Search for contextual information retrieval
- Integrated LangChain for prompt orchestration and multi-step reasoning workflows
- Improved answer accuracy by 30% through optimized retrieval strategies and prompt engineering
- Developed FastAPI-based services for real-time enterprise usage
- Fine-tuned LLMs for domain-specific use cases such as summarization and Q&A
- Partnered with product teams to integrate GenAI features into enterprise applications
- Communicated system capabilities and limitations to non-technical stakeholders
- Implemented evaluation frameworks to measure response quality, latency, and reliability
AI/ML Engineer
Designed and developed an end-to-end ML platform supporting automated data pipelines, model training, and real-time inference for enterprise applications.
- Designed end-to-end ML pipelines using Azure ML, reducing model lifecycle time by 35%
- Built real-time inference APIs using FastAPI and deployed on AKS for scalable serving
- Developed large-scale data pipelines using Azure Data Factory and Databricks
- Implemented model tracking, versioning, and monitoring using MLflow
- Improved prediction latency by 40% through optimized deployment strategies
- Collaborated with cross-functional teams to productionize machine learning solutions
- Ensured data quality and consistency through automated validation pipelines
- Evaluated architectural choices to balance scalability, cost, and performance