Krzysztof

From Switzerland (UTC+2)

Data Engineer|Senior

Skills and seniority verified on Oct 8, 2025

Krzysztof – SQL, Python, AWS

Krzysztof is a Senior Data Engineer with extensive experience designing and building production-grade data ecosystems and distributed data pipelines. He has deep expertise in Python, PySpark, SQL, Kubernetes, Terraform, and cloud platforms (AWS, OVH), complemented by strong DevOps skills that enable him to manage both infrastructure and deployment at scale. Candidate brings a rare specialization in HTAP databases (TiDB), understanding their architectural trade-offs across performance, cost, and maintainability. He combines solid engineering discipline with a practical product focus. Ideal for senior data engineering or data architecture roles, Krzysztof can operate independently while mentoring and guiding a small technical team.

8 years of commercial experience in

Adtech

Advertising

Aviation

Banking

Cloud computing

Data analytics

Energy

IoT

Life science

Logistics

Machine learning

Manufacturing

Marketing

Pharmaceutics

Scientific research

Transportation

Open source

AI software

Web development

Main technologies

SQL

6 years

Python

7 years

AWS

3.5 years

Additional skills

Apache Spark

DevOps

Microsoft Azure

Distributed Systems

Airflow

Typescript

React

PySpark

REST API

PostgreSQL

SQL Server

LLM

ETL

CI/CD

PyTorch

Kubernetes

Terraform

Dagster

Databricks

GraphQL

Tableau

Redshift

Pandas

Scikit-learn

PowerBI

Direct hire

Possible

Ready to get matched with vetted developers fast?

Let’s get started today!

Experience Highlights

Senior Data Engineer

Oct 2024 - Sep 202510 months

Project Overview

A distributed lakehouse platform for ad targeting, combining batch and streaming pipelines to manage audience data for ad targeting.

Responsibilities:

Designed and implemented a scalable lakehouse architecture to handle audience data using batch and streaming pipelines (OVH, Kubernetes, PySpark, Terraform, Dagster, Python).
Built and maintained Rust and Python microservices using TiDB for audience management and natural-language audience definitions.
Developed an OpenAI-powered REST service for generating audience definitions from natural language.
Established CI/CD pipelines, testing frameworks, and modular libraries to ensure high-quality software.
Managed infrastructure with Kubernetes, Terraform, and Dagster, and used agentic workflows to accelerate feature delivery.

Project Tech stack:

Python

Rust

PySpark

Terraform

Kubernetes

Dagster

OpenAI

Cloud Computing

Cloud Architecture

SQL

MySQL

Ansible

Microservices

REST API

JSON API

Machine learning

Data analysis

Requirement Analysis

Solution architecture

Docker

Algorithms and Data Structures

Big Data

Unit testing

DevOps

Senior Data Engineer

Oct 2022 - Oct 20242 years

Project Overview

The project involved creating an architecture and implementing a data lake using Databricks on Azure, delivering both streaming and batch pipelines to process energy data.

Responsibilities:

Designed and built streaming and batch pipelines using Databricks to process large-scale energy data.
Managed infrastructure and deployments with Terraform, Kubernetes, and Azure DevOps.
Developed backend and frontend applications using C#, .NET 6, TypeScript, React, and GraphQL.
Created a C# microservice leveraging large language models to automate code reviews, using MLflow for experiment tracking.
Collaborated with cross-functional teams to ensure data quality, reliability, and performance of the lakehouse platform.

Project Tech stack:

Python

Azure SQL

Apache Spark

Terraform

Kubernetes

Microsoft Azure

React

Typescript

GraphQL

.NET

MLflow

Databricks

Microsoft SQL Server

SQL

ASP.NET

Big Data

Algorithms and Data Structures

DevOps

LLM

Machine learning

MLOps

PySpark

Requirement Analysis

Microservices

Docker

Cloud Architecture

Unit testing

Senior Data Engineer (Freelancer)

Jun 2021 - Sep 20221 year 3 months

Project Overview

An IT data management system for a finance client. The project involved designing the backend with Python and Flask and integrating an Oracle database.

Responsibilities:

• Developed and scaled an IT data management system using Python, Flask, and Oracle. • Built REST and GraphQL APIs and data pipelines with PySpark and Apache Airflow. • Migrated legacy workloads to Kubernetes and automated deployments with Jenkins and Octopus. • Enabled self-service analytics via Tableau dashboards for finance stakeholders. • Ensured data quality, scalability, and reliability while coordinating with cross‑functional teams.

Project Tech stack:

Python

Flask

Oracle

GraphQL

PySpark

Apache Airflow

Kubernetes

Jenkins

Tableau

REST API

Apache Spark

Data visualization

Data analysis

SQL

Algorithms and Data Structures

Docker

Unit testing

DevOps

Data Engineer

Dec 2019 - Jun 20211 year 5 months

Project Overview

A big-data monitoring platform for a major bank, integrating over 50 data sources into a unified solution. Splunk and Palantir Foundry, along with Python, were leveraged to collect, process, and analyze logs and metrics. Dashboards and alerts were introduced for monitoring system performance and ensuring compliance.

Responsibilities:

Integrated over 50 data sources into a unified monitoring solution using Splunk and Palantir Foundry.
Developed Python modules to ingest, process, and analyze log data.
Designed dashboards and alerting systems to track critical business and system metrics.
Collaborated with data engineering and compliance teams to ensure data quality and regulatory compliance.

Project Tech stack:

Python

Splunk

Big Data

Data Engineer

Mar 2020 - Dec 20209 months

Project Overview

A pharmaceutical data analytics platform developed using Python, Pandas, and Scikit‑learn on AWS SageMaker to process clinical and manufacturing data. Implemented data pipelines orchestrated by Amazon ECS and Step Functions, storing processed data in Amazon S3 and Redshift, and delivering interactive dashboards through Microsoft Power BI. Managed infrastructure and workflows with AWS CloudFormation.

Responsibilities:

Developed machine learning and analytics pipelines using Python, Pandas, and Scikit‑learn on AWS SageMaker to process clinical and manufacturing data.
Implemented ETL workflows and orchestrated them with Amazon ECS and AWS Step Functions for reliability and scalability.
Stored and served processed data using Amazon S3 and Redshift to support downstream analytics and reporting.
Built interactive dashboards with Microsoft Power BI to visualize insights for stakeholders.
Managed infrastructure as code with AWS CloudFormation and collaborated with cross‑functional teams to ensure data quality and compliance.

Project Tech stack:

Python

Pandas

Scikit-learn

AWS SageMaker

Amazon S3

Redshift

Amazon ECS

AWS CloudFormation

Microsoft Power BI

AWS

Machine learning

Data analysis

Data visualization

SQL

PostgreSQL

Data Engineer

Jun 2019 - Feb 20208 months

Project Overview

Legacy systems integrated into a unified data lake and Spark-based services developed for an airline. Ingestion pipelines were built using Python, Spark, and SQL to handle data from multiple source systems. Data consistency and quality were ensured across the platform.

Responsibilities:

Integrated multiple legacy systems into a unified data lake using Python and Spark.
Developed Spark-based ingestion and transformation services to process batch and streaming data.
Implemented Python and SQL code to handle complex data pipelines and ensure data quality.
Improved data consistency across source systems and provided scalable data processing.
Collaborated with cross-functional teams to deliver data to downstream analytics and reporting platforms.

Project Tech stack:

Python

Apache Spark

PySpark

Big Data

Algorithms and Data Structures

Data analysis

Requirement Analysis

Data Engineer

Jun 2018 - Jun 20191 year

Project Overview

An entity resolution system for a major bank that integrated customer data from over 20 disparate sources. Python and Apache Spark were leveraged to ingest, transform, and match records in a Palantir Foundry data lake, implementing fuzzy matching and rule‑based algorithms to resolve entities and improve data quality.

Responsibilities:

Integrated data from over 20 disparate sources into a unified data lake using Python, Spark, and SQL.
Designed and implemented fuzzy matching and rule-based algorithms to resolve duplicate entities across systems.
Built scalable data ingestion and transformation pipelines in Spark, ensuring data quality and consistency.
Collaborated with data analysts and compliance teams to define matching rules and improve customer data quality.
Optimized performance of Spark jobs and queries to handle large datasets efficiently.

Project Tech stack:

Python

Apache Spark

PySpark

Big Data

Algorithms and Data Structures

Data analysis

Requirement Analysis

Data Engineer

Jun 2018 - Jun 20191 year

Project Overview

A web application for monitoring industrial processes across a distributed chemical production network. The backend was created using Java and PostgreSQL to manage streaming sensor data and microservices, and the front-end for real-time dashboards was made using React and TypeScript. The project resulted in a scalable solution that provided plant operators and engineers with actionable insights.

Responsibilities:

Developed Java microservices and PostgreSQL data models to ingest and manage streaming data from industrial sensors.
Designed and built a React and TypeScript front-end with real-time dashboards and alerts for plant operators.
Implemented event-driven architecture and data pipelines to ensure reliable and scalable monitoring of production processes.
Collaborated with chemical engineers and operations teams to gather requirements and ensure usability of the solution.
Ensured system reliability, security, and performance across the distributed production network.

Project Tech stack:

Java

PostgreSQL

React

Typescript

Data Engineer

Oct 2016 - Feb 20174 months

Project Overview

Data pipelines for a B2B online advertising solution, combining lead generation and performance marketing. Scalable pipelines were developed in Scala, Java, and Spark to process campaign and user data. Infrastructure and deployments were managed on AWS using Ansible and Python for automation, ensuring high reliability and performance.

Responsibilities:

Developed and maintained data pipelines in Scala, Java, and Spark to process advertising data.
Automated infrastructure deployment and management using Ansible and Python on AWS.
Improved pipeline performance and reliability, enabling efficient lead generation and marketing analytics.
Collaborated with marketing teams to translate data requirements into scalable solutions.
Ensured data quality and compliance across the advertising platform.

Project Tech stack:

Scala

Java

Apache Spark

Python

Ansible

AWS

Keep in mind, the experience summary might exclude non-relevant projects

Education

2018

Software Engineering

Master's

Languages

German

Pre-intermediate

Polish

Advanced

English

Advanced

Hire Krzysztof or someone with similar qualifications in days

All developers are ready for interview and are are just waiting for your request