Logo
Krzysztof – SQL, Python, AWS, experts in Lemon.io

Krzysztof

From Switzerland (UTC+2)flag

Data Engineer|Senior

Krzysztof – SQL, Python, AWS

Krzysztof is a Senior Data Engineer with extensive experience designing and building production-grade data ecosystems and distributed data pipelines. He has deep expertise in Python, PySpark, SQL, Kubernetes, Terraform, and cloud platforms (AWS, OVH), complemented by strong DevOps skills that enable him to manage both infrastructure and deployment at scale. Candidate brings a rare specialization in HTAP databases (TiDB), understanding their architectural trade-offs across performance, cost, and maintainability. He combines solid engineering discipline with a practical product focus. Ideal for senior data engineering or data architecture roles, Krzysztof can operate independently while mentoring and guiding a small technical team.

8 years of commercial experience in
Adtech
Advertising
AI
Aviation
Banking
Cloud computing
Data analytics
Energy
IoT
Life science
Logistics
Machine learning
Manufacturing
Marketing
Pharmaceutics
Scientific research
Transportation
Open source
AI software
Web development
Main technologies
SQL
6 years
Python
7 years
AWS
3.5 years
Additional skills
Apache Spark
DevOps
Microsoft Azure
Distributed Systems
Airflow
Typescript
React
PySpark
REST API
ETL
Direct hire
Possible
Ready to get matched with vetted developers fast?
Let’s get started today!

Experience Highlights

Senior Data Engineer
Oct 2024 - Sep 202510 months
Project Overview

A distributed lakehouse platform for ad targeting, combining batch and streaming pipelines to manage audience data for ad targeting.

Responsibilities:
  • Designed and implemented a scalable lakehouse architecture to handle audience data using batch and streaming pipelines (OVH, Kubernetes, PySpark, Terraform, Dagster, Python).
  • Built and maintained Rust and Python microservices using TiDB for audience management and natural-language audience definitions.
  • Developed an OpenAI-powered REST service for generating audience definitions from natural language.
  • Established CI/CD pipelines, testing frameworks, and modular libraries to ensure high-quality software.
  • Managed infrastructure with Kubernetes, Terraform, and Dagster, and used agentic workflows to accelerate feature delivery.
Project Tech stack:
Python
Rust
PySpark
Terraform
Kubernetes
Dagster
OpenAI
Cloud Computing
Cloud Architecture
SQL
MySQL
Ansible
Microservices
REST API
JSON API
AI
Machine learning
Data analysis
Requirement Analysis
Solution architecture
Docker
Algorithms and Data Structures
Big Data
Unit testing
CI
CD
DevOps
Senior Data Engineer
Oct 2022 - Oct 20242 years
Project Overview

The project involved creating an architecture and implementing a data lake using Databricks on Azure, delivering both streaming and batch pipelines to process energy data.

Responsibilities:
  • Designed and built streaming and batch pipelines using Databricks to process large-scale energy data.
  • Managed infrastructure and deployments with Terraform, Kubernetes, and Azure DevOps.
  • Developed backend and frontend applications using C#, .NET 6, TypeScript, React, and GraphQL.
  • Created a C# microservice leveraging large language models to automate code reviews, using MLflow for experiment tracking.
  • Collaborated with cross-functional teams to ensure data quality, reliability, and performance of the lakehouse platform.
Project Tech stack:
Python
Azure SQL
Apache Spark
Terraform
Kubernetes
Microsoft Azure
React
Typescript
GraphQL
C#
.NET
MLflow
Databricks
Microsoft SQL Server
SQL
ASP.NET
Big Data
Algorithms and Data Structures
DevOps
LLM
AI
Machine learning
MLOps
PySpark
Requirement Analysis
Microservices
Docker
Cloud Architecture
Unit testing
CI
CD
Senior Data Engineer (Freelancer)
Jun 2021 - Sep 20221 year 3 months
Project Overview

An IT data management system for a finance client. The project involved designing the backend with Python and Flask and integrating an Oracle database.

Responsibilities:

• Developed and scaled an IT data management system using Python, Flask, and Oracle. • Built REST and GraphQL APIs and data pipelines with PySpark and Apache Airflow. • Migrated legacy workloads to Kubernetes and automated deployments with Jenkins and Octopus. • Enabled self-service analytics via Tableau dashboards for finance stakeholders. • Ensured data quality, scalability, and reliability while coordinating with cross‑functional teams.

Project Tech stack:
Python
Flask
Oracle
GraphQL
PySpark
Apache Airflow
Kubernetes
Jenkins
Tableau
REST API
Apache Spark
Data visualization
Data analysis
SQL
Algorithms and Data Structures
Docker
Unit testing
CI
CD
DevOps
Data Engineer
Dec 2019 - Jun 20211 year 5 months
Project Overview

A big-data monitoring platform for a major bank, integrating over 50 data sources into a unified solution. Splunk and Palantir Foundry, along with Python, were leveraged to collect, process, and analyze logs and metrics. Dashboards and alerts were introduced for monitoring system performance and ensuring compliance.

Responsibilities:
  • Integrated over 50 data sources into a unified monitoring solution using Splunk and Palantir Foundry.
  • Developed Python modules to ingest, process, and analyze log data.
  • Designed dashboards and alerting systems to track critical business and system metrics.
  • Collaborated with data engineering and compliance teams to ensure data quality and regulatory compliance.
Project Tech stack:
Python
Splunk
Big Data
Data Engineer
Mar 2020 - Dec 20209 months
Project Overview

A pharmaceutical data analytics platform developed using Python, Pandas, and Scikit‑learn on AWS SageMaker to process clinical and manufacturing data. Implemented data pipelines orchestrated by Amazon ECS and Step Functions, storing processed data in Amazon S3 and Redshift, and delivering interactive dashboards through Microsoft Power BI. Managed infrastructure and workflows with AWS CloudFormation.

Responsibilities:
  • Developed machine learning and analytics pipelines using Python, Pandas, and Scikit‑learn on AWS SageMaker to process clinical and manufacturing data.
  • Implemented ETL workflows and orchestrated them with Amazon ECS and AWS Step Functions for reliability and scalability.
  • Stored and served processed data using Amazon S3 and Redshift to support downstream analytics and reporting.
  • Built interactive dashboards with Microsoft Power BI to visualize insights for stakeholders.
  • Managed infrastructure as code with AWS CloudFormation and collaborated with cross‑functional teams to ensure data quality and compliance.
Project Tech stack:
Python
Pandas
Scikit-learn
AWS SageMaker
Amazon S3
Redshift
Amazon ECS
AWS CloudFormation
Microsoft Power BI
AWS
AI
Machine learning
Data analysis
Data visualization
SQL
PostgreSQL
Data Engineer
Jun 2019 - Feb 20208 months
Project Overview

Legacy systems integrated into a unified data lake and Spark-based services developed for an airline. Ingestion pipelines were built using Python, Spark, and SQL to handle data from multiple source systems. Data consistency and quality were ensured across the platform.

Responsibilities:
  • Integrated multiple legacy systems into a unified data lake using Python and Spark.
  • Developed Spark-based ingestion and transformation services to process batch and streaming data.
  • Implemented Python and SQL code to handle complex data pipelines and ensure data quality.
  • Improved data consistency across source systems and provided scalable data processing.
  • Collaborated with cross-functional teams to deliver data to downstream analytics and reporting platforms.
Project Tech stack:
Python
Apache Spark
PySpark
Big Data
Algorithms and Data Structures
Data analysis
Requirement Analysis
Data Engineer
Jun 2018 - Jun 20191 year
Project Overview

An entity resolution system for a major bank that integrated customer data from over 20 disparate sources. Python and Apache Spark were leveraged to ingest, transform, and match records in a Palantir Foundry data lake, implementing fuzzy matching and rule‑based algorithms to resolve entities and improve data quality.

Responsibilities:
  • Integrated data from over 20 disparate sources into a unified data lake using Python, Spark, and SQL.
  • Designed and implemented fuzzy matching and rule-based algorithms to resolve duplicate entities across systems.
  • Built scalable data ingestion and transformation pipelines in Spark, ensuring data quality and consistency.
  • Collaborated with data analysts and compliance teams to define matching rules and improve customer data quality.
  • Optimized performance of Spark jobs and queries to handle large datasets efficiently.
Project Tech stack:
Python
Apache Spark
PySpark
Big Data
Algorithms and Data Structures
Data analysis
Requirement Analysis
Data Engineer
Jun 2018 - Jun 20191 year
Project Overview

A web application for monitoring industrial processes across a distributed chemical production network. The backend was created using Java and PostgreSQL to manage streaming sensor data and microservices, and the front-end for real-time dashboards was made using React and TypeScript. The project resulted in a scalable solution that provided plant operators and engineers with actionable insights.

Responsibilities:
  • Developed Java microservices and PostgreSQL data models to ingest and manage streaming data from industrial sensors.
  • Designed and built a React and TypeScript front-end with real-time dashboards and alerts for plant operators.
  • Implemented event-driven architecture and data pipelines to ensure reliable and scalable monitoring of production processes.
  • Collaborated with chemical engineers and operations teams to gather requirements and ensure usability of the solution.
  • Ensured system reliability, security, and performance across the distributed production network.
Project Tech stack:
Java
PostgreSQL
React
Typescript
Data Engineer
Oct 2016 - Feb 20174 months
Project Overview

Data pipelines for a B2B online advertising solution, combining lead generation and performance marketing. Scalable pipelines were developed in Scala, Java, and Spark to process campaign and user data. Infrastructure and deployments were managed on AWS using Ansible and Python for automation, ensuring high reliability and performance.

Responsibilities:
  • Developed and maintained data pipelines in Scala, Java, and Spark to process advertising data.
  • Automated infrastructure deployment and management using Ansible and Python on AWS.
  • Improved pipeline performance and reliability, enabling efficient lead generation and marketing analytics.
  • Collaborated with marketing teams to translate data requirements into scalable solutions.
  • Ensured data quality and compliance across the advertising platform.
Project Tech stack:
Scala
Java
Apache Spark
Python
Ansible
AWS

Education

2018
Software Engineering
Master's

Languages

German
Pre-intermediate
Polish
Advanced
English
Advanced

Hire Krzysztof or someone with similar qualifications in days
All developers are ready for interview and are are just waiting for your requestdream dev illustration
Copyright © 2025 lemon.io. All rights reserved.