Mahesh

From Germany (UTC+2)

Data Engineer|Senior

Tech lead

Lemon.io stats

1

offers now 🔥

Skills and seniority verified on Oct 24, 2025

Mahesh – SQL, Kubernetes, Python

Mahesh is a Senior Data Engineer with strong expertise in AWS, Spark, and SQL, and a proven ability to build scalable, company-wide data solutions. He designed and implemented a generalized data pipeline framework that empowers even non-engineers to create pipelines via configuration, demonstrating both technical depth and architectural foresight. A pragmatic problem solver with a product mindset, Mahesh brings a rare combination of infrastructure strength, automation skill, and cross-team enablement.

12 years of commercial experience in

Data analytics

Main technologies

SQL

13 years

Kubernetes

3.5 years

Python

9 years

Terraform

4 years

AWS

9 years

PySpark

1 year

GitHub Actions

1 year

Additional skills

Apache Spark

Airflow

Big Data

Snowflake

CI/CD

Apache Kafka

Redshift

Hive

Apache Hadoop

Looker

DBT

Direct hire

Possible

Ready to get matched with vetted developers fast?

Let’s get started today!

Experience Highlights

Tech lead

Feb 2021 - Ongoing4 years 8 months

Project Overview

Mahesh designed and developed a low-code ETL framework to standardize and simplify the creation of data pipelines across the company’s data platform. The framework enables data engineers and analysts to define pipeline configurations declaratively (using YAML/JSON), eliminating repetitive coding and reducing maintenance overhead.

Before this project, pipeline development was highly manual — each data ingestion or transformation job required custom PySpark or SQL scripts. This led to inconsistent code patterns, longer development times, and high onboarding effort for new data engineers.

Responsibilities:

Solution:

Built a configuration-driven ETL framework using Python and Apache Spark, where pipeline logic (sources, transformations, targets, schedules) is defined through metadata rather than code.
Integrated with Airflow for orchestration, enabling automatic DAG generation from configurations.
Added support for multiple data sources (REST APIs, S3, Snowflake, Kafka, Presto/Trino) and data targets (S3, Snowflake).
Implemented data quality checks, schema validation, and error handling as reusable modules.
Designed the framework to be extensible, allowing teams to plug in new connectors or transformations easily.
Deployed on AWS EMR and Kubernetes (EKS) to support both batch and streaming workloads.

Impact:

Adopted by 20+ teams and used in 200+ production pipelines across the organization.
Reduced average pipeline development time from 2 weeks to less than 2 days.
Standardized ETL development, improving maintainability and reducing operational incidents.
Empowered non-engineering teams (like analysts) to onboard new data sources with minimal coding effort.

Project Tech stack:

AWS

PySpark

Python

Apache Kafka

Airflow

Tech lead

May 2020 - Feb 20219 months

Project Overview

Mahesh built the company’s data platform from scratch to centralize and streamline data collection, processing, and analytics. The goal was to enable faster reporting, improve data reliability, and support the company’s growing analytics and product needs.

Before this initiative, data was scattered across multiple operational systems with no single source of truth. Analysts and product teams faced delays due to manual data pulls and inconsistent data models. There was no unified ETL process or data lake.

Responsibilities:

Solution:

Designed and implemented a data ingestion framework that could dynamically handle multiple data sources like MySQL, REST APIs, and application logs.
Used AWS S3 as the foundation for a centralized data lake, ensuring scalable and cost-effective storage.
Built modular PySpark and Airflow jobs for ETL workflows, supporting both incremental and full data loads.
Exposed curated datasets through Athena and Hive for analytics and BI teams.
Integrated with Snowflake for downstream data warehousing and reporting.
Automated pipeline deployments using Jenkins and version control via Bitbucket.

Impact:

Established a fully operational data platform within months, reducing reporting delays from days to hours.
Cut down manual data ingestion time by over 70% through framework automation.
Empowered business and BI teams to self-serve data using Athena queries and dashboards.
Created a foundation for future streaming and machine learning use cases.

Project Tech stack:

Big Data

PySpark

AWS

Apache Kafka

Redshift

Snowflake

Hive

Apache Spark

Senior Data Engineer

Feb 2018 - May 20202 years 3 months

Project Overview

The company's analytics teams required a consistent and scalable way to process vast volumes of clickstream, booking, and customer data. The existing pipelines lacked flexibility and required heavy manual maintenance, slowing down insight generation.

Mahesh developed and managed large-scale ETL and real-time data pipelines to support analytics, reporting, and product insights. He focused on improving data reliability, processing efficiency, and accessibility across multiple business domains.

Responsibilities:

Solution:

Designed and implemented PySpark-based ETL pipelines on AWS EMR, processing terabytes of structured and semi-structured data daily.
Integrated Snowflake as a unified data warehouse, automating data loading and schema management.
Built real-time ingestion pipelines using Amazon Kinesis to support near real-time analytics and alerting.
Orchestrated and scheduled workflows through Apache Airflow, ensuring reliability and observability.
Developed Looker dashboards to visualize KPIs and operational metrics for product and analytics teams.
Implemented data validation and monitoring processes to improve data quality and reduce downstream errors.
Served as a Scrum Master, facilitating agile ceremonies and improving sprint delivery consistency. Impact:
Reduced ETL pipeline failures and manual intervention by over 60%.
Improved data freshness from daily to near real-time for critical product metrics.
Enabled analysts and product managers to make data-driven decisions faster, increasing overall team productivity.
Streamlined handoffs between data engineering and BI teams, improving collaboration.

Project Tech stack:

AWS

PySpark

Hive

Apache Hadoop

Snowflake

Looker

Senior Data Engineer

Dec 2015 - Feb 20182 years 2 months

Project Overview

Mahesh worked on building and optimizing Hadoop-based data pipelines, focusing on efficient data ingestion, transformation, and analytics. The goal was to enable scalable batch data processing and establish a solid foundation for future big data use cases.

Previously, the client relied heavily on traditional RDBMS systems, which made it difficult to handle large datasets and long-running analytical queries. Data ingestion and transformation were manual, slow, and lacked standardization across teams.

Responsibilities:

Solution:

Developed Sqoop-based ingestion pipelines to extract data from multiple RDBMS systems (Oracle, MySQL, SQL Server) into HDFS.
Created Hive-based data models and transformation scripts for data aggregation and reporting use cases.
Wrote Python automation scripts to manage data ingestion schedules, reduce manual intervention, and streamline daily ETL processes.
Tuned Hive queries and partition strategies to improve performance and reduce query latency.
Collaborated with business analysts to design OLAP data models for downstream reporting.
Introduced basic data validation and reconciliation scripts to ensure data consistency between source and target systems. Impact:
Reduced data ingestion and transformation time by over 50% through automation and optimized Hive queries.
Improved data accuracy and consistency across analytical systems.
Established repeatable ETL workflows, enabling faster onboarding of new data sources.
Laid the groundwork for migrating traditional ETL workloads to a modern big data ecosystem.

Project Tech stack:

Apache Hadoop

SQL

Hive

Keep in mind, the experience summary might exclude non-relevant projects

Education

2011

Electronics and telecom

Bachelor of engineering

Languages

English

Advanced

Hire Mahesh or someone with similar qualifications in days

All developers are ready for interview and are are just waiting for your request