Logo
Raphael – Python, SQL, PySpark, experts in Lemon.io

Raphael

From Brazilflag

Data Engineer|Senior

Raphael – Python, SQL, PySpark

Raphael is a senior data engineer with 8 years of experience, specializing in AWS-based data pipelines using S3, Lambda, Glue, Athena, Redshift, and Databricks. He has hands-on expertise in batch and near-real-time ingestion, medallion-style data layering, and cost-efficient data lake design. Feedback highlights practical delivery skills, clear communication, and a client-first approach, though architectural depth is more implementation-focused than deeply specialized.

8 years of commercial experience in
AI
Banking
Fintech
Healthcare
Retail
Telecommunications
Main technologies
Python
6 years
SQL
8 years
PySpark
3 years
AWS
5 years
Cloud Computing
6 years
Additional skills
Apache Spark
Airflow
Unit testing
Claude LLM
GitHub Copilot
Terraform
CloudWatch
AWS CodeBuild
ETL
Databricks
Amazon ECS
Amazon S3
DynamoDB
Amazon EC2
AWS Lambda
PL/SQL
Microsoft SQL Server
Transact-SQL (T-SQL)
Data Modeling
Oracle
Direct hire
Possible
Ready to get matched with vetted developers fast?
Let’s get started today!

Experience Highlights

Senior Data Engineer
Jun 2025 - Ongoing9 months
Project Overview

A healthcare technology company that provides solutions to improve medication intelligence, compliance, and supply chain visibility across hospitals and pharmacies.

Raphael worked on data platforms powering 340B-related products used by hospitals and pharmacies across the U.S. These systems ingest data from multiple sources, including electronic health records (EHRs), pharmacy systems, and third-party APIs.

His role focused on building scalable AWS-based pipelines to process, standardize, and validate this data, enabling accurate tracking of drug transactions and compliance auditing.

Key features included data validation, compliance checks, and near-real-time processing, ensuring reliable, auditable data for healthcare providers operating under strict regulatory requirements.

Responsibilities:
  • Designed, developed, and maintained scalable AWS-based data pipelines for healthcare clients.
  • Provisioned and managed cloud infrastructure using Terraform (IaC) for secure, consistent deployments.
  • Developed and optimized ETL processes using Python and PySpark to process and transform large-scale datasets.
  • Orchestrated data pipelines using AWS Step Functions and Airflow.
  • Implemented CI/CD pipelines using GitHub Actions integrated with AWS CodeBuild.
  • Used GitHub Copilot and Claude LLM to accelerate ETL development and improve code quality.
  • Developed and expanded unit testing frameworks using pytest to increase test coverage.
  • Ensured data security, reliability, and compliance aligned with healthcare requirements.
Project Tech stack:
AWS
Terraform
ETL
Python
PySpark
Airflow
GitHub Actions
AWS CodeBuild
GitHub Copilot
Claude LLM
Unit testing
CloudWatch
Senior Data Platform Engineer
Feb 2024 - Jun 20251 year 4 months
Project Overview

A Brazilian fintech platform that provides financial services and payment infrastructure for small and medium-sized businesses. The product enables companies to manage billing, payments, and cash flow through a centralized platform.

Raphael worked on the data platform that supports both operational and analytical use cases, serving internal teams such as finance, risk, and product, and enabling data-driven decision-making across the company.

The platform ingests data from multiple sources, including transactional systems, external integrations, and event streams. It processes and transforms this data using scalable pipelines built on AWS and Databricks, making it available for analytics, reporting, and downstream services.

Key features included reliable data ingestion, event-driven processing, data standardization, and orchestration of complex workflows using Airflow.

Responsibilities:
  • Designed and implemented scalable and resilient data architectures.
  • Built serverless data pipelines using AWS and Databricks.
  • Implemented data ingestion from different sources using Amazon DMS.
  • Orchestrated data pipelines with Airflow.
  • Implemented monitoring and management mechanisms for data pipelines.
  • Ensured data security and compliance with industry standards.
Project Tech stack:
AWS
Databricks
Airflow
PySpark
Python
Terraform
Data Modeling
Senior Data Engineer
Apr 2021 - Feb 20242 years 10 months
Project Overview

A technology platform focused on the retail supply chain, connecting industries, distributors, and small retailers into a single digital ecosystem. The product helps optimize commercial execution, improve sales performance, and enable better decision-making through data.

Raphael worked on building cloud-based data platforms that supported this ecosystem, ingesting and processing data from multiple sources, including sales systems, distributors, and APIs. These pipelines enabled both batch and near-real-time use cases across the platform.

The solutions provided visibility into sales, inventory, and market behavior, helping retailers and suppliers act more efficiently.

Key features included scalable data ingestion, event-driven processing, and reliable data pipelines designed to support real-time insights and operational decision-making across the supply chain.

Responsibilities:
  • Designed and implemented scalable and resilient data architectures.
  • Built AWS serverless data pipelines using AWS Lambda, Amazon ECS, Amazon EC2, Amazon S3, DynamoDB, and EventBridge.
  • Leveraged event-driven data processing for real-time solutions.
  • Collected, transformed, and moved data from diverse sources for integration.
  • Implemented monitoring and management mechanisms for data pipelines.
  • Ensured data security and compliance with industry standards.
Project Tech stack:
AWS
AWS Lambda
Amazon ECS
Amazon EC2
Amazon S3
DynamoDB
Senior Data Engineer
Feb 2019 - May 20212 years 3 months
Project Overview

A consultancy company that worked for Claro Telecome, which is one of the largest telecommunications companies in Brazil, providing mobile, broadband, and digital services to millions of customers. The project focused on building and maintaining data solutions to support business intelligence and operational reporting across the company.

Raphael worked on developing and maintaining ETL pipelines using Informatica PowerCenter and Oracle (PL/SQL), processing large volumes of telecom data such as customer activity, billing, and service usage.

These pipelines enabled internal teams to generate reports and insights used for decision-making, performance monitoring, and operational analysis.

Key features included data integration across multiple systems, ETL performance optimization, and ensuring data consistency and reliability across environments, with a strong focus on production stability.

Responsibilities:
  • Performed analysis and modeling for business intelligence.
  • Developed PL/SQL (Oracle).
  • Developed ETL in PowerCenter.
  • Maintained shell scripts.
  • Monitored approval stages and production/post-production implementations.
  • Improved ETL process performance.
  • Manipulated files and folders on Unix.
Project Tech stack:
PL
SQL
Oracle
ETL

Education

2022
Big Data and Data Science
MBA
2017
Computer Engineering
Computer Engineering

Languages

English
Advanced

Hire Raphael or someone with similar qualifications in days
All developers are ready for interview and are are just waiting for your requestdream dev illustration
Copyright © 2026 lemon.io. All rights reserved.