Mario
From Guatemala (GMT-6)
18 years of commercial experience
Lemon.io stats
1
projects done1024
hours workedOpen
to new offersMario – Python, AWS, ETL
Versatile Data Engineer with 17 years of expertise, including leadership roles such as Team Lead and CTO of 15 people. Mario possesses strong self-presentation skills, a business-oriented approach, and demonstrated leadership qualities. During the technical interview, he excelled in justifying architectural decisions and exhibited proficiency in SQL, both written and theoretical aspects.
Main technologies
Additional skills
Ready to start
To be verifiedDirect hire
Potentially possibleExperience Highlights
Senior Data Engineer
The project focused on enhancing the company's datamart ETL processes through the development of stored procedures in Snowflake, AWS Lambda functions, and Data Definition Language (DDL) scripts.
- Migrated Adyen SFTP process pipeline and Paypal SFTP process pipeline;
- Developed new Snowflake Procedures;
- Generated lambdas for importing new data from Accounts Receivable Aging Data;
- Created views and procedures for Intacct information and Zuora;
- Processed incoming files for storage in the AWS S3 data lake through Lambda function development;
- Formulated intricate queries for views, altered statements, and new table creation as part of DDL scripting for ETL operations;
- Conducted rigorous unit testing prior to ticket progression into peer review and subsequent Quality Assurance (QA) stages;
- Demonstrated proficiency in deciphering and modifying inherited codebases from external teams, requiring meticulous code comprehension and adaptation.
CTO, Senior Data Engineer, SREs
This project addressed the infrastructure upgrade needs of a small firm managing data from over 15 clients, primarily received via email or WhatsApp in CSV or Excel formats. The initiative aimed to modernize their practices to industry standards. It involved deploying 15 distinct organizations, each tailored to a specific client, and establishing S3 buckets for data lake storage. Additionally, an ETL pipeline utilizing Glue/Airflow was implemented, with RDS PostgreSQL serving as the Data Warehouse solution due to budget constraints preventing the adoption of Redshift. This comprehensive overhaul enabled the firm to efficiently manage and process client data while aligning with contemporary data management practices.
- Created 15 distinct ETL pipelines;
- Established servers for the Data Scientist team to mitigate RAM issues on their local computers;
- Introduced QuickSight as a BI tool, replacing R for graph creation;
- Adopted Slack for internal communication, replacing email/WhatsApp;
- Deployed 1Password as a standard practice for password storage;
- Collaborated directly with clients to integrate ETL for each;
- Initiated migration to Jira for Agile methodology adoption.
Senior Data Engineer
The project focused on ensuring the punctual delivery of data for corporate and shareholder meetings, constituting the primary ETL process. This involved overseeing the functionality of numerous Airflow Directed Acyclic Graphs (DAGs). Additionally, the project involved spearheading the development of new ETL processes, including data extraction from various APIs such as Shopify and iAuditor. Moreover, it encompassed facilitating the migration from MSSQL Stored Procedures to Redshift for improved data management. Notably, the project identified and implemented optimizations within Airflow operators, leading to enhanced efficiency despite initial skepticism. Furthermore, advocating for transitioning from Athena to Redshift yielded substantial performance enhancements for query operations, amplifying the project's impact beyond routine duties.
- Provided support for Airflow DAGs;
- Was responsible for fixing DAGs in Non-prod and Prod environments;
- Created and maintained new Airflow operators;
- Monitored over 400 DAGs in Non-prod and prod daily;
- Established new data pipelines for Airflow;
- Managed and enhanced existing pipelines;
- Granted permissions to access the data lake to other team members;
- Offered primary Git support to other team members;
- Provided AWS support on S3, Docker, Glue, Athena, EMR, Lake Formation, and Redshift.
Application Architect, Senior Data Engineer, SREs
The project aimed to address a bottleneck in tenant data processing for a client with over 400 tenants. Due to inefficiencies with a service provider, there was a backlog of 300+ pending installations, causing delays of up to 10 months. To resolve this, the project replaced the service provider's processor and implemented two distinct pipelines for image and text files. These pipelines underwent sequential stages, including file sorting, text file cleansing/OCR, invoice detail extraction, and metadata addition, resulting in efficient data handling for the client's extensive tenant network.
- Crafted the entire back-end architecture;
- Generated the initial reports on Sisense, our BI tool;
- Established everything from scratch in a new AWS Organization dedicated to this project;
- Transitioned from a Batch process using EC2 servers to a serverless solution using lambdas;
- Increased file read accuracy from 45% to 95%-99%;
- Reduced configuration time for each tenant from 3 days to a maximum of 15 minutes;
- Implemented an approach to minimize reconfigurations compared to the provider queue;
- Enabled reading SKUs from invoices, a capability not previously available;
- Processed over 1.5 million files per month;
- Reduced the time from receiving the file to having it in our Data Warehouse in Redshift to 7 seconds;
- Developed a functional ETL Pipeline process within 2-3 months.