Mahesh – SQL, Kubernetes, Python
Mahesh is a Senior Data Engineer with strong expertise in AWS, Spark, and SQL, and a proven ability to build scalable, company-wide data solutions. He designed and implemented a generalized data pipeline framework that empowers even non-engineers to create pipelines via configuration, demonstrating both technical depth and architectural foresight. A pragmatic problem solver with a product mindset, Mahesh brings a rare combination of infrastructure strength, automation skill, and cross-team enablement.
12 years of commercial experience in
Main technologies
Additional skills
Direct hire
PossibleReady to get matched with vetted developers fast?
Let’s get started today!Experience Highlights
Tech lead
Mahesh designed and developed a low-code ETL framework to standardize and simplify the creation of data pipelines across the company’s data platform. The framework enables data engineers and analysts to define pipeline configurations declaratively (using YAML/JSON), eliminating repetitive coding and reducing maintenance overhead.
Before this project, pipeline development was highly manual — each data ingestion or transformation job required custom PySpark or SQL scripts. This led to inconsistent code patterns, longer development times, and high onboarding effort for new data engineers.
Solution:
- Built a configuration-driven ETL framework using Python and Apache Spark, where pipeline logic (sources, transformations, targets, schedules) is defined through metadata rather than code.
- Integrated with Airflow for orchestration, enabling automatic DAG generation from configurations.
- Added support for multiple data sources (REST APIs, S3, Snowflake, Kafka, Presto/Trino) and data targets (S3, Snowflake).
- Implemented data quality checks, schema validation, and error handling as reusable modules.
- Designed the framework to be extensible, allowing teams to plug in new connectors or transformations easily.
- Deployed on AWS EMR and Kubernetes (EKS) to support both batch and streaming workloads.
Impact:
- Adopted by 20+ teams and used in 200+ production pipelines across the organization.
- Reduced average pipeline development time from 2 weeks to less than 2 days.
- Standardized ETL development, improving maintainability and reducing operational incidents.
- Empowered non-engineering teams (like analysts) to onboard new data sources with minimal coding effort.
Tech lead
Mahesh built the company’s data platform from scratch to centralize and streamline data collection, processing, and analytics. The goal was to enable faster reporting, improve data reliability, and support the company’s growing analytics and product needs.
Before this initiative, data was scattered across multiple operational systems with no single source of truth. Analysts and product teams faced delays due to manual data pulls and inconsistent data models. There was no unified ETL process or data lake.
Solution:
- Designed and implemented a data ingestion framework that could dynamically handle multiple data sources like MySQL, REST APIs, and application logs.
- Used AWS S3 as the foundation for a centralized data lake, ensuring scalable and cost-effective storage.
- Built modular PySpark and Airflow jobs for ETL workflows, supporting both incremental and full data loads.
- Exposed curated datasets through Athena and Hive for analytics and BI teams.
- Integrated with Snowflake for downstream data warehousing and reporting.
- Automated pipeline deployments using Jenkins and version control via Bitbucket.
Impact:
- Established a fully operational data platform within months, reducing reporting delays from days to hours.
- Cut down manual data ingestion time by over 70% through framework automation.
- Empowered business and BI teams to self-serve data using Athena queries and dashboards.
- Created a foundation for future streaming and machine learning use cases.
Senior Data Engineer
The company's analytics teams required a consistent and scalable way to process vast volumes of clickstream, booking, and customer data. The existing pipelines lacked flexibility and required heavy manual maintenance, slowing down insight generation.
Mahesh developed and managed large-scale ETL and real-time data pipelines to support analytics, reporting, and product insights. He focused on improving data reliability, processing efficiency, and accessibility across multiple business domains.
Solution:
- Designed and implemented PySpark-based ETL pipelines on AWS EMR, processing terabytes of structured and semi-structured data daily.
- Integrated Snowflake as a unified data warehouse, automating data loading and schema management.
- Built real-time ingestion pipelines using Amazon Kinesis to support near real-time analytics and alerting.
- Orchestrated and scheduled workflows through Apache Airflow, ensuring reliability and observability.
- Developed Looker dashboards to visualize KPIs and operational metrics for product and analytics teams.
- Implemented data validation and monitoring processes to improve data quality and reduce downstream errors.
- Served as a Scrum Master, facilitating agile ceremonies and improving sprint delivery consistency. Impact:
- Reduced ETL pipeline failures and manual intervention by over 60%.
- Improved data freshness from daily to near real-time for critical product metrics.
- Enabled analysts and product managers to make data-driven decisions faster, increasing overall team productivity.
- Streamlined handoffs between data engineering and BI teams, improving collaboration.
Senior Data Engineer
Mahesh worked on building and optimizing Hadoop-based data pipelines, focusing on efficient data ingestion, transformation, and analytics. The goal was to enable scalable batch data processing and establish a solid foundation for future big data use cases.
Previously, the client relied heavily on traditional RDBMS systems, which made it difficult to handle large datasets and long-running analytical queries. Data ingestion and transformation were manual, slow, and lacked standardization across teams.
Solution:
- Developed Sqoop-based ingestion pipelines to extract data from multiple RDBMS systems (Oracle, MySQL, SQL Server) into HDFS.
- Created Hive-based data models and transformation scripts for data aggregation and reporting use cases.
- Wrote Python automation scripts to manage data ingestion schedules, reduce manual intervention, and streamline daily ETL processes.
- Tuned Hive queries and partition strategies to improve performance and reduce query latency.
- Collaborated with business analysts to design OLAP data models for downstream reporting.
- Introduced basic data validation and reconciliation scripts to ensure data consistency between source and target systems. Impact:
- Reduced data ingestion and transformation time by over 50% through automation and optimized Hive queries.
- Improved data accuracy and consistency across analytical systems.
- Established repeatable ETL workflows, enabling faster onboarding of new data sources.
- Laid the groundwork for migrating traditional ETL workloads to a modern big data ecosystem.