Rodrigo
From Brazil (GMT-3)
7 years of commercial experience
Lemon.io stats
Rodrigo – Python, Apache Airflow, SQL
Rodrigo is a seasoned Data Engineer with over 5 years of experience. His areas of expertise include Python, Scala, and SQL. He is adept at designing and implementing data pipelines for real-time analytics and deploying machine learning projects using Python, TensorFlow, and Pandas. Rodrigo's proficiency extends to various database systems such as MS SQL Server, Databricks, Snowflake, and PostgreSQL. His work has resulted in improved decision-making processes in sectors such as Finance, Telecommunications, and more. Rodrigo excels in dynamic environments and drives projects that increase operational efficiency and intelligence.
Main technologies
Additional skills
Ready to start
ASAPDirect hire
Potentially possibleExperience Highlights
Senior Technical Leader
This is a migration project for the world's largest company by market capitalization and smartphone manufacturer by volume. The project covers the migration of Apache Airflow, including its workflows and related components, to AWS. It includes assessing the current environment, designing AWS infrastructure, migration planning, deployment and configuration, data migration, testing, documentation, and post-migration support.
- Created and implemented ELT pipelines using Airflow, Snowflake, DBT, and AWS services like Glue/S3;
- Optimized SQL and Jinja code for data transformation within DBT, significantly enhancing API response times by 25% through streamlined data processing and efficient query optimization;
- Developed data models that support business requirements, optimizing query performance by 20% while decreasing resource utilization by 10%;
- Implemented data quality checks and DBT tests to ensure the accuracy and completeness of our data, achieving a 15% increase in data reliability.
Senior Data Engineer
A biomedical engineering project that involved a code that is capable of reading an XML file and extracting data from it. This extracted data is then used to create nodes and relationships in a Neo4j graph database. The code used the py2neo library to connect to the database and the XML to dict library to parse the XML file.
- Designed and implemented the ETL process to ingest XML data from the UniProt database and transform it for optimal storage and querying within a Neo4j graph database;
- Configured and managed the Neo4j database, ensuring data integrity and optimizing queries for performance enhancements;
- Utilized Apache Airflow to orchestrate the data pipeline workflows, ensuring orderly and efficient execution of data processing tasks;
- Worked with Docker containers for various project components, including Neo4j, Python applications, and Airflow, to maintain consistent environments across development and production;
- Implemented robust testing strategies to validate the data pipeline and its integration with the graph database, ensuring accurate data and reliable system performance;
- Created comprehensive documentation for the data pipeline architecture and setup and developed reports to communicate insights derived from the data to stakeholders.
Senior Data Engineer
ML project for robust model development and deployment to enhance predictive accuracy for the Phoenix team of the Brain project conducting Score analysis for legal entities of a Brazilian Bank.
- Developed OLAP cubes and deployed an Azure Machine Learning project, incorporating TensorFlow and Pandas for predictive modeling, and demonstrated proficiency in MLOps practices to enhance AI model deployment efficiency;
- Focused on enhancing predictive accuracy for score analysis of legal entities in a Brazilian bank. As a result, successfully reduced response time from 2 days to under 2 hours.
Senior Data Engineer
The project involved aggregating data from the customer's database's WiFi modems. This data was then integrated into a Qlik Dashboard, facilitating real-time KPI monitoring and significantly boosting operational efficiency for a leading telecommunications provider.
- Developed OLAP cubes and deployed an Azure Machine Learning project, incorporating TensorFlow and Pandas for predictive modeling. Demonstrated proficiency in MLOps practices to enhance AI model deployment efficiency. Focused on enhancing predictive accuracy for score analysis of legal entities in a Brazilian bank, successfully reducing response time from 2 days to under 2 hours;
- Executed advanced T-SQL scripting to automate and optimize database tasks, reducing processing times by over 30%;
- Collaborated closely with cross-functional teams to deploy the SSIS package;
- Contributed to the integration of generative AI models into the data pipeline utilizing Databricks on Azure and AWS EMR, enhancing predictive analytics capabilities and supporting collaboration with AI engineers and data scientists;
- Integrated secure data access protocols with OAuth, employed Postman for robust API testing, and managed data security using Azure Identity, significantly reducing data processing times by 30%;
- Developed ETL routines using PySpark, SQL, and Hadoop to streamline data processing and integration for the bank’s data engineering team, resulting in a 25% reduction in data processing time.
Data Engineer
The product aim to enhance the learning experience and outcomes for students by providing real-time analytics on their learning progress, while also offering course creators and instructors actionable insights to improve course content and delivery.
- Redesigned the data storage strategy by implementing PostgreSQL and MongoDB to efficiently manage both structured and unstructured data;
- Treated, manipulated, and prepared complex data for analysis, utilizing Power BI for data exploration and storytelling. This improved data comprehension and decision-making for an edtech company's analytics project;
- Implemented high-throughput data processing solutions using Python's psycopg2 and PySpark for PostgreSQL databases, resulting in a 15% reduction in processing time. This optimization enhanced data accessibility and strengthened the data science team's analytical capabilities;
- Refactored on-premises pipelines into Azure Cloud infrastructure, increasing scalability and reliability for the data engineering project. This initiative led to a 25% decrease in pipeline processing time.
Data Engineer/Analytic Engineer
The project aimed to develop a data-driven supply chain analytics platform. The platform was designed to integrate smoothly with existing supply chain management systems and offer predictive analytics on various aspects, such as inventory levels, demand forecasting, and supplier performance.
- Developed the data processing pipeline using Spark and Python to analyze vast amounts of supply chain data, enhancing the capability to derive real-time insights and predictive analytics;
- Configured and managed PostgreSQL and MongoDB databases, ensuring efficient data storage and rapid retrieval for analytics purposes and facilitating real-time decision-making in supply chain operations;
- Utilized Azure Data Factory and SSIS packages to streamline ETL workflows, improving data accuracy and availability while reducing processing times by 25%;
- Created dynamic, interactive dashboards in Power BI, offering comprehensive visibility into inventory levels, supplier performance, and demand forecasts, enabling data-driven decision-making across the supply chain.