João

From Brazil (UTC-5)

Data EngineerSenior

11 years of commercial experience

Analytics

Banking

Cloud computing

Data analytics

Healthcare

Healthtech

Insurance

Modeling software

Lemon.io stats

1

projects done

102

hours worked

1

ongoing project

Skills and seniority verified on Jan 29, 2024

João – SQL, Python, AWS

Meet our Senior Data Engineer - a seasoned problem solver with a passion for crafting impactful solutions. With diverse experience in insurance, finance, banking, and sales, he brings a wealth of expertise. Proficient in Python, SQL, AWS, and Data Warehouse, João excels in developing source-to-target mappings, data analysis, and ETL processes. His skills span extraction, lineage, quality assurance, conversion, transformation, and loading. Beyond data, he enjoys the gym, books, and relaxation.

Main technologies

SQL

7 years

Python

6 years

AWS

3 years

Additional skills

Microsoft Azure

PostgreSQL

Snowflake

Apache Kafka

Microsoft SQL Server

Apache Airflow

AWS Lambda

GCP

MySQL

Data Warehouse

Data Modeling

Data analysis

Big Data

Database design

Oracle

Tableau

PySpark

Databricks

ETL

Pandas

Direct hire

Possible

Ready to get matched with vetted developers fast?

Let’s get started today!

Experience Highlights

Sr Data Engineer

Jan 2024 - Ongoing1 year 5 months

Project Overview

João is helping global clients—primarily in the United States—modernize their data platforms and optimize data pipelines. His focus is on building robust, scalable solutions using tools like Snowflake, dbt, Fivetran, AWS, and Azure.

Responsibilities:

Key projects & responsibilities:

Data Warehouse Project with EDI Integration: Participated in a global initiative to build a modern Data Warehouse ingesting EDI files from major retail partners such as Target, Amazon, Best Buy US, and Best Buy Canada. The project centralized data related to returns, sell-through, sales, inventory, product performance, and more. João contributed from the initial setup of the pipelines to the development of data models and monitoring.
Cloudera to Snowflake Migration: Led the migration of enterprise pipelines from Cloudera to Snowflake, with the goal of improving performance and architecture modernization.
Cloudera & DB2 Modernization: Worked on a hybrid migration and modernization project involving Cloudera and DB2, moving workloads to a cloud-native stack.

Project Tech stack:

AWS

Amazon S3

Amazon EC2

Amazon RDS

Azure DevOps

Redshift

Data Warehouse

Data Modeling

Big Data

Algorithms and Data Structures

Microsoft Azure

Snowflake

Databricks

Fivetran

Data Structures

Data analysis

Data mining

Data Security

GitHub Actions

Senior Data Engineer / Tech Lead

Oct 2022 - Ongoing2 years 8 months

Project Overview

The Data Warehouse project aimed to create indicators to map the duration of each stage/task in their thousands of products and services offered to customers. The main objective was to identify operational bottlenecks impacting the end customer and address business inquiries regarding average lead time, SLA compliance, customer quantity, values, and various stages.

Target Audience: Operations teams, business analysts, and decision-makers at Santander Bank.

Responsibilities:

Built the warehouse from the ground up, including requirements gathering and dashboard development support;
Collaborated with Product Owners to identify and prioritize business requirements;
Designed and implemented data models to analyze operational data effectively;
Developed ETL processes using PostgreSQL and the BIX tool to extract and transform data;
Established connections and scheduled updates with the PowerBI Report Server via ODBC;
Automated procedure execution through job scheduling in Control-M;
Contributed to the identification of operational bottlenecks and providing actionable insights to improve business processes;
Created dashboards and reports to visualize key performance indicators and support decision-making processes.

Project Tech stack:

Amazon S3

Amazon EC2

Azure DevOps

Big Data

Data Warehouse

Data mining

Data visualization

Data Structures

Data Security

Data Science

PostgreSQL

Azure SQL

NoSQL

MySQL

Transact-SQL (T-SQL)

Oracle SQL Developer

Snowflake

API

Microsoft Power BI

Data Modeling

Apache Spark

Senior Data Engineer

May 2023 - Dec 20237 months

Project Overview

The project involved migrating on-premise data systems (Cloudera, Oracle, PostgreSQL) to a cloud-based infrastructure. The primary goal was to enhance the banking experience for customers by leveraging advanced, secure, and efficient digital solutions. This initiative significantly improved data accessibility and processing capabilities.

Responsibilities:

Engineered batch data pipelines for AWS Glue/Data Factory, facilitating efficient data lake ingestion;
Developed Near Real-Time (NRT) and Stream processing pipelines using Kafka, EventHub, Azure;
Stream Analytics, and AWS Kinesis, ensuring timely data availability;
Optimized data processing and query execution using PySpark and SparkSQL;
Automated testing processes for improved reliability and efficiency in data handling;
Managed data storage and processing across various platforms, including Amazon S3, PostgreSQL, Oracle DB, Azure DataLake Storage, and Snowflake.

Project Tech stack:

AWS

Azure SQL

Azure Functions

Microsoft Azure

Azure DevOps

PostgreSQL

Oracle SQL Developer

NoSQL

SQL Server

Transact-SQL (T-SQL)

Data Engineer

Nov 2021 - Sep 20229 months

Project Overview

This project aimed to establish robust data management and governance practices within a global team using Azure DevOps. The product involved implementing data governance frameworks, developing data dictionaries, and creating data lineage to ensure data quality and compliance. The project targeted improving data management efficiency, facilitating data analysis, and enabling better decision-making processes for the organization.

Responsibilities:

Collaborated with stakeholders to define data requirements and created a comprehensive Data Glossary;
Developed a robust Data Dictionary and established Data Lineage to ensure effective data governance;
Designed and validated Technical Data Quality Rules and Business Rules using SQL, contributing to improved data quality;
Implemented validation and consistency pipelines using SSIS/SQL to automate data validation processes;
Developed and implemented a normalized Data Warehouse model, resulting in significant performance improvements when connecting with PowerBI;
Created pipelines using SSIS for ETL/ELT processes, enabling efficient data extraction, transformation, and loading;
Designed and implemented Data Marts for different business areas, facilitating targeted data analysis;
Conducted Data Modeling for both normalized and denormalized databases, ensuring efficient data storage and retrieval;
Utilized data mining techniques to identify insights and trends, contributing to informed decision-making;
Managed Master Data and worked as part of a DevOps team to ensure smooth deployment and maintenance of the solution.

Project Tech stack:

GCP

Azure SQL

Microsoft Azure

Grafana

Apache Kafka

Big Data

Data Modeling

Data Warehouse

Python

SQL

Oracle

PostgreSQL

Developer Data Engineer

Oct 2018 - Sep 20212 years 10 months

Project Overview

The project aimed to modernize and optimize data management and analysis processes for Life and Pension services.

Responsibilities:

Contributed as a developer and data engineer within the development team;
Conducted ETL/ELT processes to manipulate, transform, and load data for analysis;
Played a key role in modernizing data management processes and infrastructure;
Successfully migrated data warehouses to cloud platforms for improved scalability and accessibility;
Led the migration of dashboards from PowerPoint to Tableau for enhanced visualization and interactivity;
Implemented a centralized data warehouse for operational indicators, improving data consistency and accessibility;
Demonstrated expertise in achieving results related to life and pension services;
Acted as a collaborative member within the data, reports, indicators, and models teams;
Provided support to teams in business/data analytics, UX/UI, and data architecture to influence solution design and implementation;
Migrated specific databases (System of Record - SOR) from Oracle and SQL Server Management Studio (SSMS) environments to Google Cloud Platform (GCP), ensuring seamless data storage and accessibility;
Developed batch processing pipelines to efficiently handle large data volumes, facilitating timely data analysis and reporting;
Utilized skills in Google Cloud Platform (GCP), Tableau, data governance, data engineering, SQL, ETL processes, Microsoft SQL Server, Oracle SQL Developer, Oracle Database, Python, PySpark, and SAS to contribute to project success.

Project Tech stack:

GCP

Apache Kafka

Data Warehouse

Data mining

SQL

Oracle

Apache Airflow

Python

PySpark

Apache Spark

Tableau

Firebase DB and Storage

SASS

Data Engineer

Dec 2019 - Nov 202010 months

Project Overview

The project was aimed at developing a comprehensive data warehouse encompassing life and pension information. It included detailed sales data, comparisons of budget versus actuals, breakdowns by sales channels and branches, and commercial productivity. The primary objective was to centralize and streamline data for better business intelligence and decision-making.

Responsibilities:

Designed and implemented the architecture of the data warehouse, ensuring scalability and efficiency.
Optimized data processing pipelines using PySpark/SparkSQL, enhancing performance and data throughput.
Conducted automated testing to ensure data integrity and reliability of the data warehouse.
Utilized Google Cloud Platform (GCP) technologies, including BigQuery, Cloud Storage, and Dataflow, for robust data management and analysis.
Employed Airflow for workflow orchestration, ensuring smooth and automated ETL processes.
Integrated SQL Server for database management, supporting complex queries and data storage.
Developed dashboards and reports in Tableau, providing insightful analytics and visualizations for business stakeholders.

Project Tech stack:

GCP

Apache Airflow

PySpark

Oracle SQL Developer

SQL

Transact-SQL (T-SQL)

Keep in mind, the experience summary might exclude non-relevant projects

Education

2020

Bachelor of Actuarial Science

Bachelor's Degree

2023

Big Data & Data Analytics

Master's degree

2025

AWS Practitioner

Certification

Languages

Portuguese

Advanced

English

Advanced

Hire João or someone with similar qualifications in days

All developers are ready for interview and are are just waiting for your request