Juan – SQL, Python, AWS
Juan is a Senior Data Engineer and Architect with strong hands-on expertise in SQL, Spark, Airflow, and multi-cloud ecosystems (AWS, GCP, Azure). He demonstrates solid knowledge of large-scale data processing, ETL design, and workflow orchestration, with clear technical reasoning. Juan brings 20+ years of experience building scalable, secure data platforms and integrating AI solutions, and combines deep engineering expertise with strategic insight into data architecture. He is also currently pursuing postgraduate studies in Artificial Intelligence and Machine Learning at The University of Texas at Austin.
10 years of commercial experience in
Main technologies
Additional skills
Direct hire
PossibleReady to get matched with vetted developers fast?
Let’s get started today!Experience Highlights
Tech Lead
The company's mission is to empower Americans by providing access to factual and transparent data. By aggregating information from federal, state, and local government sources, we make comprehensive government data easily accessible via our online platforms.
- Designed and optimized Databricks Lakehouse pipelines unifying 1,000+ federal, state, and local datasets, improving ETL performance by 45% and reducing compute costs by 30%;
- Implemented Delta Lake and Unity Catalog for reproducible, auditable data powering public dashboards on Builder.com and Flourish;
- Built API integrations and visualization feeds enabling near-real-time civic data access for millions of users.
Senior Data Architect
A medical DataLake house importing several SQL Server and MySQL data to Snowflake for Patient and Clinical data Analytics. It handles data from over 30 cardiovascular practices across America, caring for 1.1 million patients.
- Engineered a Snowflake Data Lakehouse integrating multi-source data from SQL Server and MySQL systems across 30+ cardiovascular practices, consolidating 1.1M+ patient records for clinical and operational analytics;
- Designed and optimized ELT pipelines for patient, procedure, and EHR data, improving processing efficiency by 40% and enabling daily refreshes of key clinical KPIs;
- Implemented data quality, lineage, and governance frameworks, ensuring HIPAA compliance and consistent metrics across sites;
- Partnered with clinical and analytics teams to deliver interactive dashboards supporting physician performance tracking, patient outcomes, and RVU-based financial reporting.
Project Technical Manager
A hardware lifecycle management platform designed to support OEM operations and device division projects.
- Managed end-to-end delivery of a hardware lifecycle management platform, coordinating cross-functional teams across engineering, UX, and operations to streamline OEM device tracking and lifecycle visibility;
- Defined and governed Master Data Management (MDM) and UX requirements, standardizing device metadata, improving data quality, and unifying the user experience across multiple product lines;
- Established data governance frameworks ensuring secure, traceable, and ethical use of training and inference data across AI-enabled systems;
- Partnered with UX and engineering teams to refine AI-driven user flows, aligning interface design with model capabilities and business objectives;
- Led Agile project planning, stakeholder engagement, and sprint delivery, ensuring roadmap alignment and seamless integration with Microsoft’s global supply chain systems;
- Improved platform usability and data consistency, reducing manual reconciliation by ~35% and enhancing reporting accuracy across global operations.
Tech Lead
Migration of external file processing from Scala to PySpark on Databricks to modernize Mexico’s tax data infrastructure.
- Migrated legacy Scala-based ETL pipelines to PySpark within Databricks, modernizing SAT’s large-scale tax data processing framework and improving maintainability and performanceж
- Optimized data ingestion and transformation workflows for high-volume fiscal datasets, reducing processing time by 40% and enabling more efficient reconciliation of taxpayer and fiscal records;
- Implemented Delta Lake architecture and parameterized notebooks for scalable, auditable, and reusable data pipelines across multiple tax data domains;
- Collaborated with internal data governance teams to ensure data lineage, compliance, and auditability within Mexico’s national tax data ecosystem.