Juan

From Mexico (UTC-6)

Data Engineer|Senior

Lemon.io stats

1

offers now 🔥

Skills and seniority verified on Nov 10, 2025

Juan – SQL, Python, AWS

Juan is a Senior Data Engineer and Architect with strong hands-on expertise in SQL, Spark, Airflow, and multi-cloud ecosystems (AWS, GCP, Azure). He demonstrates solid knowledge of large-scale data processing, ETL design, and workflow orchestration, with clear technical reasoning. Juan brings 20+ years of experience building scalable, secure data platforms and integrating AI solutions, and combines deep engineering expertise with strategic insight into data architecture. He is also currently pursuing postgraduate studies in Artificial Intelligence and Machine Learning at The University of Texas at Austin.

10 years of commercial experience in

Accounting

Administration

Analytics

Data analytics

Fintech

Govtech

Healthcare

Healthtech

Product management

Project management

Social impact

Data monetization

Hardware

Main technologies

SQL

14 years

Python

7 years

AWS

5 years

Microsoft Azure

7 years

GCP

1 year

Additional skills

BigQuery

Apache Spark

Snowflake

DBT

RAG

LLM

Databricks

JavaScript

ETL

MySQL

NoSQL

MongoDB

Airflow

PySpark

PowerBI

SQL Server

Azure DevOps

Amazon S3

Direct hire

Possible

Ready to get matched with vetted developers fast?

Let’s get started today!

Experience Highlights

Tech Lead

Jun 2025 - Ongoing5 months

Project Overview

The company's mission is to empower Americans by providing access to factual and transparent data. By aggregating information from federal, state, and local government sources, we make comprehensive government data easily accessible via our online platforms.

Responsibilities:

Designed and optimized Databricks Lakehouse pipelines unifying 1,000+ federal, state, and local datasets, improving ETL performance by 45% and reducing compute costs by 30%;
Implemented Delta Lake and Unity Catalog for reproducible, auditable data powering public dashboards on Builder.com and Flourish;
Built API integrations and visualization feeds enabling near-real-time civic data access for millions of users.

Project Tech stack:

Databricks

Python

JavaScript

SQL

PySpark

Amazon S3

Senior Data Architect

Jan 2024 - Jun 20251 year 4 months

Project Overview

A medical DataLake house importing several SQL Server and MySQL data to Snowflake for Patient and Clinical data Analytics. It handles data from over 30 cardiovascular practices across America, caring for 1.1 million patients.

Responsibilities:

Engineered a Snowflake Data Lakehouse integrating multi-source data from SQL Server and MySQL systems across 30+ cardiovascular practices, consolidating 1.1M+ patient records for clinical and operational analytics;
Designed and optimized ELT pipelines for patient, procedure, and EHR data, improving processing efficiency by 40% and enabling daily refreshes of key clinical KPIs;
Implemented data quality, lineage, and governance frameworks, ensuring HIPAA compliance and consistent metrics across sites;
Partnered with clinical and analytics teams to deliver interactive dashboards supporting physician performance tracking, patient outcomes, and RVU-based financial reporting.

Project Tech stack:

Snowflake

Microsoft SQL Server

Python

SQL

Transact-SQL (T-SQL)

SQL Server

DBT

MySQL

Airflow

PowerBI

Fivetran

Project Technical Manager

Jan 2024 - Oct 20249 months

Project Overview

A hardware lifecycle management platform designed to support OEM operations and device division projects.

Responsibilities:

Managed end-to-end delivery of a hardware lifecycle management platform, coordinating cross-functional teams across engineering, UX, and operations to streamline OEM device tracking and lifecycle visibility;
Defined and governed Master Data Management (MDM) and UX requirements, standardizing device metadata, improving data quality, and unifying the user experience across multiple product lines;
Established data governance frameworks ensuring secure, traceable, and ethical use of training and inference data across AI-enabled systems;
Partnered with UX and engineering teams to refine AI-driven user flows, aligning interface design with model capabilities and business objectives;
Led Agile project planning, stakeholder engagement, and sprint delivery, ensuring roadmap alignment and seamless integration with Microsoft’s global supply chain systems;
Improved platform usability and data consistency, reducing manual reconciliation by ~35% and enhancing reporting accuracy across global operations.

Project Tech stack:

.NET

React

Microsoft SQL Server

Azure DevOps

SQL Server

PowerBI

Agile

Jira

Confluence

REST API

Tech Lead

Nov 2022 - Feb 20233 months

Project Overview

Migration of external file processing from Scala to PySpark on Databricks to modernize Mexico’s tax data infrastructure.

Responsibilities:

Migrated legacy Scala-based ETL pipelines to PySpark within Databricks, modernizing SAT’s large-scale tax data processing framework and improving maintainability and performanceж
Optimized data ingestion and transformation workflows for high-volume fiscal datasets, reducing processing time by 40% and enabling more efficient reconciliation of taxpayer and fiscal records;
Implemented Delta Lake architecture and parameterized notebooks for scalable, auditable, and reusable data pipelines across multiple tax data domains;
Collaborated with internal data governance teams to ensure data lineage, compliance, and auditability within Mexico’s national tax data ecosystem.

Project Tech stack:

Apache Spark

PySpark

Scala

Python

Databricks

SQL

Git

Keep in mind, the experience summary might exclude non-relevant projects

Education

2024

Technology and Systems

Bachelor's

2026

Artificial Intelligence and Machine Learning

Postgraduate

Languages

Spanish

Advanced

English

Advanced

Hire Juan or someone with similar qualifications in days

All developers are ready for interview and are are just waiting for your request