Logo
Jakub – Snowflake, Apache Airflow, Python, experts in Lemon.io

Jakub

From Poland (UTC+2)

flag
Data EngineerSenior
Data AnalystSenior
10 years of commercial experience
Advertising
AI
Analytics
Automotive
Banking
Business intelligence
Customer support
Data analytics
Edtech
Fintech
Food and beverages
Healthcare
Marketing
Sales
Scientific research
Data monetization
CRM
Enterprise software
NLP software
Lemon.io stats
2
projects done
2886
hours worked

Jakub – Snowflake, Apache Airflow, Python

Jakub is an experienced Data Engineer with a solid educational foundation in computer science and a comprehensive grasp of the AWS platform. Proficient in SQL and adept at navigating complex data tasks with ease. This candidate is able to demonstrate strength in project management, complemented by diverse domain experience spanning fintech, marketing, and beyond.

Main technologies
Snowflake
3.5 years
Apache Airflow
4 years
Python
8 years
SQL
9 years
AWS
4 years
Additional skills
Bash
Docker
Data Warehouse
Business intelligence
Data Modeling
GPT
ETL
Data analysis
Tableau
CI/CD
Terraform
AI
Data visualization
PySpark
Microsoft SQL Server
Looker
PowerBI
Databricks
PostgreSQL
GCP
PL/SQL
ElasticSearch
BigQuery
React
Testimonials
#17320100767Data Engineer (with data migration experience), full-time/part-time, 1-2 months, US
"Jakub has been amazing! Want to give him the highest of reviews! Thanks so much. "
Direct hire
Possible
Ready to get matched with vetted developers fast?
Let’s get started today!

Experience Highlights

Senior Data Analyst/Leading Engineer
Sep 2024 - Jun 20259 months
Project Overview

A comprehensive executive dashboard system designed to visualize and analyze revenue metrics for a platform. The dashboard provides real-time insights into platform revenue performance, customer retention, and business efficiency metrics through interactive visualizations and detailed data breakdowns.

Responsibilities:
  • Architected and implemented a modular dashboard system with multiple view types (Sales View vs Board View)
  • Developed interactive data visualizations for complex revenue metrics including YoY, QoQ, and MoM comparisons
  • Created dynamic filtering capabilities for multi-dimensional data analysis across different business sectors
  • Implemented near real-time data refresh mechanisms with different cadences for various metrics (weekly/monthly)
  • Built efficient SQL queries for complex data aggregations across multiple data sources
  • Designed responsive layouts optimized for executive-level presentations
  • Integrated various data visualization components including line charts, bar charts, and heat maps
  • Implemented robust error handling and data validation for critical business metrics
  • Created reusable component library for standardized metric displays and visualizations
Project Tech stack:
Snowflake
Python
Looker
Data Warehouse
Jira
SQL
Tech Leading Data Engineer
Jan 2024 - Sep 20248 months
Project Overview

The project focused on creating a robust data management system for NXTWash's diverse client base. This involved migrating extensive datasets to a new environment using Python, SQL Server, AWS and Apache Airflow, optimizing ETL processes, and developing comprehensive data models and interactive dashboards. These enhancements were vital in providing clients with precise, actionable insights and improving the efficiency of their data operations, ultimately supporting NXTWash's mission to deliver superior automated car wash services.

Responsibilities:
  • Led the Data Migration and Analytics Enhancement project, ensuring seamless data integration and management for multiple clients, which is critical for the success of NXTWash's automated car wash solutions;
  • Directed the migration and validation processes, transitioning data to a more efficient environment using Python, SQL Server, AWS, and Apache Airflow, which significantly enhanced data processing capabilities;
  • Developed and implemented advanced data models and interactive dashboards using T-SQL and Sisense, which provided clients with deeper insights into their operations and facilitated better decision-making;
  • Achieved significant improvements in data processing workflows, enhancing overall efficiency and thereby contributing to the scalability and performance of NXTWash's services;
  • Provided ongoing support and optimization, ensuring continuous improvement and adaptation to evolving client needs and technologies, thus maintaining the high standard of NXTWash's data management practices.
Project Tech stack:
Python
Apache Airflow
Microsoft SQL Server
Data Modeling
Data Warehouse
Data visualization
Data analysis
Business intelligence
AWS
Scrum
SQL
Transact-SQL (T-SQL)
Tech Leading Data Engineer
Dec 2023 - Sep 20249 months
Project Overview

This project involved building a comprehensive suite of dashboards and reports to track and analyze various operational, customer, and financial metrics for a business. The goal was to provide real-time insights into key performance indicators (KPIs) across multiple dimensions, enabling data-driven decision-making for both business operations and customer management.

Responsibilities:
  • Tracked real-time revenue streams by service category, promotion, and customer type, providing granular insights into revenue per service and trends over time;
  • Focused on member behaviors with a Customer 360 view, monitoring member onboarding, retention, and churn, and providing projections to mitigate future churn;
  • Offered a holistic view of customer behavior through the Cx360 view, tracking service usage, identifying high-risk customers, and visualizing satisfaction metrics;
  • Monitored daily vehicle counts and segmented by membership type to support fleet management and operational insights;
  • Tracked employee labor hours and performance, optimizing scheduling and visualizing efficiency metrics like revenue per labor hour;
  • Analyzed service demand trends, transaction patterns, and payment types to identify popular services and reduce payment failures.
Project Tech stack:
Python
Microsoft SQL Server
Senior Data Engineer
Nov 2022 - Nov 20231 year
Project Overview

A complex batch processing system for a premier U.S. online food delivery company. This system was designed to handle and process millions of data events daily. It was crucial in managing vast volumes of transactional and customer data, ensuring seamless operation and service efficiency for the food delivery platform.

Responsibilities:
  • Led the development and optimization of the batch processing system, which is crucial for handling extensive data workloads;
  • Acted as a technical mentor, guiding the team in complex data processing tasks and fostering skill development;
  • Utilized Airflow and Python to manage and automate the processing of hundreds of millions of events each day;
  • Implemented Snowflake for data warehousing, enhancing the system's scalability and performance;
  • Played a vital role in the technical growth of the team, emphasizing skill enhancement in data processing and system optimization.
Project Tech stack:
Python
Apache Airflow
Snowflake
SQL
NoSQL
Docker
Bash
GitHub
ETL
API
Swagger
Grafana
Cassandra
Senior Data Engineer
Nov 2022 - Oct 202311 months
Project Overview

An analytics-driven system for a deep dive into consumer behavior to track and optimize the user journey. This system provided actionable insights into user journey mapping and optimizing core consumer funnels. Key aspects included creating a sequence analytics system for tracking unique devices, enhancing user identification, and integrating advanced bot recognition capabilities.

Responsibilities:
  • Led user engagement initiatives, employing user journey mapping and consumer funnel optimizations, resulting in an 18% increase in user engagement;
  • Built a system for sequence analytics, enabling precise tracking of unique devices and their associations with users and events collected by Segment and other third-party tools;
  • Implemented bot recognition techniques, significantly improving the accuracy of user tracking and identification;
  • Developed dashboards in Sigma and Tableau for visualizing core consumer funnels, enhancing data accessibility and decision-making processes;
  • Collaborated with marketing, IT, and data science teams to align user engagement strategies with overall business goals;
  • Continuously monitored and refined tracking methodologies to adapt to evolving user behavior and technological advancements.
Project Tech stack:
Python
SQL
NoSQL
Apache Airflow
Snowflake
Git
GitHub
Docker
Docker Compose
Bash
Data analysis
User-centered design
Tableau
Data visualization
Senior Data Engineer & AI Enthusiast
Jan 2023 - Oct 20239 months
Project Overview

This internal project within the company focused on leveraging advanced AI technologies to enhance software development processes, explicitly targeting performance and code quality improvements. A primary goal was to minimize the back-and-forth in pull requests during code reviews, thereby streamlining these processes for enhanced efficiency and effectiveness. Additionally, the team aimed to expedite the initiation of new feature additions and accelerate bug resolution in operation-critical domains.

Responsibilities:
  • Orchestrated the integration of OpenAI's GPT-4 and GitHub Copilot to revolutionize the software development lifecycle;
  • Pioneered the automation of code generation, drastically reducing manual coding and increasing efficiency;
  • Implemented AI tools to refine code reviews, reducing iterative communication in pull requests;
  • Improved pull request workflows and expedited bug fixes, mainly in operation-critical areas, enhancing the company's response capability and operational robustness.
Project Tech stack:
Python
API
OpenAI
GPT
GitHub
AWS
Algorithms and Data Structures
AI
Terraform
Design system
Software design
Agile
Jenkins
Visual Studio Code
Jira
Docker
Tech Leading Data Engineer
Mar 2023 - Sep 20235 months
Project Overview

Primarily aimed at developing a system for comprehensive metrics across various customer groups. This initiative was crucial for measuring the effectiveness of marketing campaigns. By transitioning ETL processes to a more efficient Data Lake environment using Databricks, Jakub achieved significant cost savings and enhanced the client's capability to analyze campaign data more nuanced and effectively.

Responsibilities:
  • Project for a Silicon Valley startup, focusing on optimizing ETL processes;
  • Spearheaded a pilot project to transition jobs from Snowflake to a Data Lake environment using Databricks for ETL processes;
  • Achieved annual cost savings of $140,000 and reduced compute costs by nearly 30%, shifting $460,000 in annual expenses;
  • Enhanced SLAs, making data available an hour earlier for daily runs and 12-18 hours earlier for weekly runs, providing quicker insights for marketing operators;
  • Demonstrated strong budget management and operational efficiency.
Project Tech stack:
Python
SQL
Apache Airflow
Data Warehouse
Apache Spark
Snowflake
ETL
Data Engineer
May 2022 - Nov 20225 months
Project Overview

Fintech company developing and operating IT solutions for 25% of the bank customers in Denmark. The owners are a strong group of Danish banks joining forces to make financial tech solutions more competitive. They enable their customers to invest online, take out loans and transfer money.

Responsibilities:

Second role in the company, before - ETL Developer

  • Successfully transitioned mortgage data from a legacy system to a modern platform, ensuring data integrity and compatibility across systems by Data Vault 2.0 principles;
  • Implemented and managed data pipelines using Informatica PowerCenter, SQL, and Python, efficiently processing hundreds of thousands of data rows daily from a variety of sources;
  • Architected and implemented robust ETL processes within the Enterprise Data Warehouse, enhancing data management and overall system efficiency.
Project Tech stack:
Python
SQL
Oracle
Git
Bash
Data Warehouse
Data Modeling
CI
CD
Senior Data Engineer
Nov 2021 - Mar 20224 months
Project Overview

The project entailed the enhancement of Growth Accounting Pipelines with a primary focus on engineering consumer growth strategies. The objective was to optimize the pipelines for tracking and boosting user growth metrics. A significant emphasis was placed on improving user retention rates, which is critical for long-term platform success and user engagement.

Responsibilities:
  • Led the engineering of consumer growth pipelines, focusing on optimizing user acquisition and retention strategies;
  • Successfully achieved a 16% increase in user retention through targeted pipeline enhancements;
  • Utilized advanced data analytics to inform and refine growth strategies, ensuring they were aligned with user behavior insights;
  • Developed a reporting dashboard to track growth metrics and pipeline efficacy.
Project Tech stack:
Python
Apache Airflow
Snowflake
SQL
NoSQL
Docker
Docker Compose
Git
GitHub
Tableau
Software Engineer
Oct 2019 - Nov 20201 year
Project Overview

This project addressed the crucial goal of monitoring investment portfolio performance to support informed decision-making, utilizing statistical measures and risk assessment tools. Key to the initiative was understanding the essential concepts of portfolio monitoring, which is pivotal in aligning investments with an investor's changing lifestyle and long-term goals. The focus was identifying underperforming assets for potential reallocation and enhancing investor knowledge for strategic portfolio adjustments.

Responsibilities:
  • Calculated statistical values for each asset and the overall portfolio, considering investor's data;
  • Implemented a sliding time window analysis of 30 consecutive data samples, advancing one sample at a time;
  • Identified and reported any significant statistical deviations (downward overshoots by at least 1%) for assets and the portfolio within each time window;
  • Compiled a summary of statistical exceedances and their distributions;
  • Documented the algorithms for statistical analysis in sliding time windows, ensuring a robust monitoring framework.
Project Tech stack:
Python
PySpark
Algorithms and Data Structures
R
Git

Education

2023
Computer Science, Intelligent Systems
Master's
2020
Electronics and Computer Engineering
Bachelor's

Languages

Polish
Advanced
English
Advanced

Hire Jakub or someone with similar qualifications in days
All developers are ready for interview and are are just waiting for your requestdream dev illustration
Copyright © 2025 lemon.io. All rights reserved.