Krzysztof – SQL, Python, AWS
Krzysztof is a Senior Data Engineer with extensive experience designing and building production-grade data ecosystems and distributed data pipelines. He has deep expertise in Python, PySpark, SQL, Kubernetes, Terraform, and cloud platforms (AWS, OVH), complemented by strong DevOps skills that enable him to manage both infrastructure and deployment at scale. Candidate brings a rare specialization in HTAP databases (TiDB), understanding their architectural trade-offs across performance, cost, and maintainability. He combines solid engineering discipline with a practical product focus. Ideal for senior data engineering or data architecture roles, Krzysztof can operate independently while mentoring and guiding a small technical team.
8 years of commercial experience in
Main technologies
Additional skills
Direct hire
PossibleReady to get matched with vetted developers fast?
Let’s get started today!Experience Highlights
Senior Data Engineer
A distributed lakehouse platform for ad targeting, combining batch and streaming pipelines to manage audience data for ad targeting.
- Designed and implemented a scalable lakehouse architecture to handle audience data using batch and streaming pipelines (OVH, Kubernetes, PySpark, Terraform, Dagster, Python).
- Built and maintained Rust and Python microservices using TiDB for audience management and natural-language audience definitions.
- Developed an OpenAI-powered REST service for generating audience definitions from natural language.
- Established CI/CD pipelines, testing frameworks, and modular libraries to ensure high-quality software.
- Managed infrastructure with Kubernetes, Terraform, and Dagster, and used agentic workflows to accelerate feature delivery.
Senior Data Engineer
The project involved creating an architecture and implementing a data lake using Databricks on Azure, delivering both streaming and batch pipelines to process energy data.
- Designed and built streaming and batch pipelines using Databricks to process large-scale energy data.
- Managed infrastructure and deployments with Terraform, Kubernetes, and Azure DevOps.
- Developed backend and frontend applications using C#, .NET 6, TypeScript, React, and GraphQL.
- Created a C# microservice leveraging large language models to automate code reviews, using MLflow for experiment tracking.
- Collaborated with cross-functional teams to ensure data quality, reliability, and performance of the lakehouse platform.
Senior Data Engineer (Freelancer)
An IT data management system for a finance client. The project involved designing the backend with Python and Flask and integrating an Oracle database.
• Developed and scaled an IT data management system using Python, Flask, and Oracle. • Built REST and GraphQL APIs and data pipelines with PySpark and Apache Airflow. • Migrated legacy workloads to Kubernetes and automated deployments with Jenkins and Octopus. • Enabled self-service analytics via Tableau dashboards for finance stakeholders. • Ensured data quality, scalability, and reliability while coordinating with cross‑functional teams.
Data Engineer
A big-data monitoring platform for a major bank, integrating over 50 data sources into a unified solution. Splunk and Palantir Foundry, along with Python, were leveraged to collect, process, and analyze logs and metrics. Dashboards and alerts were introduced for monitoring system performance and ensuring compliance.
- Integrated over 50 data sources into a unified monitoring solution using Splunk and Palantir Foundry.
- Developed Python modules to ingest, process, and analyze log data.
- Designed dashboards and alerting systems to track critical business and system metrics.
- Collaborated with data engineering and compliance teams to ensure data quality and regulatory compliance.
Data Engineer
A pharmaceutical data analytics platform developed using Python, Pandas, and Scikit‑learn on AWS SageMaker to process clinical and manufacturing data. Implemented data pipelines orchestrated by Amazon ECS and Step Functions, storing processed data in Amazon S3 and Redshift, and delivering interactive dashboards through Microsoft Power BI. Managed infrastructure and workflows with AWS CloudFormation.
- Developed machine learning and analytics pipelines using Python, Pandas, and Scikit‑learn on AWS SageMaker to process clinical and manufacturing data.
- Implemented ETL workflows and orchestrated them with Amazon ECS and AWS Step Functions for reliability and scalability.
- Stored and served processed data using Amazon S3 and Redshift to support downstream analytics and reporting.
- Built interactive dashboards with Microsoft Power BI to visualize insights for stakeholders.
- Managed infrastructure as code with AWS CloudFormation and collaborated with cross‑functional teams to ensure data quality and compliance.
Data Engineer
Legacy systems integrated into a unified data lake and Spark-based services developed for an airline. Ingestion pipelines were built using Python, Spark, and SQL to handle data from multiple source systems. Data consistency and quality were ensured across the platform.
- Integrated multiple legacy systems into a unified data lake using Python and Spark.
- Developed Spark-based ingestion and transformation services to process batch and streaming data.
- Implemented Python and SQL code to handle complex data pipelines and ensure data quality.
- Improved data consistency across source systems and provided scalable data processing.
- Collaborated with cross-functional teams to deliver data to downstream analytics and reporting platforms.
Data Engineer
An entity resolution system for a major bank that integrated customer data from over 20 disparate sources. Python and Apache Spark were leveraged to ingest, transform, and match records in a Palantir Foundry data lake, implementing fuzzy matching and rule‑based algorithms to resolve entities and improve data quality.
- Integrated data from over 20 disparate sources into a unified data lake using Python, Spark, and SQL.
- Designed and implemented fuzzy matching and rule-based algorithms to resolve duplicate entities across systems.
- Built scalable data ingestion and transformation pipelines in Spark, ensuring data quality and consistency.
- Collaborated with data analysts and compliance teams to define matching rules and improve customer data quality.
- Optimized performance of Spark jobs and queries to handle large datasets efficiently.
Data Engineer
A web application for monitoring industrial processes across a distributed chemical production network. The backend was created using Java and PostgreSQL to manage streaming sensor data and microservices, and the front-end for real-time dashboards was made using React and TypeScript. The project resulted in a scalable solution that provided plant operators and engineers with actionable insights.
- Developed Java microservices and PostgreSQL data models to ingest and manage streaming data from industrial sensors.
- Designed and built a React and TypeScript front-end with real-time dashboards and alerts for plant operators.
- Implemented event-driven architecture and data pipelines to ensure reliable and scalable monitoring of production processes.
- Collaborated with chemical engineers and operations teams to gather requirements and ensure usability of the solution.
- Ensured system reliability, security, and performance across the distributed production network.
Data Engineer
Data pipelines for a B2B online advertising solution, combining lead generation and performance marketing. Scalable pipelines were developed in Scala, Java, and Spark to process campaign and user data. Infrastructure and deployments were managed on AWS using Ansible and Python for automation, ensuring high reliability and performance.
- Developed and maintained data pipelines in Scala, Java, and Spark to process advertising data.
- Automated infrastructure deployment and management using Ansible and Python on AWS.
- Improved pipeline performance and reliability, enabling efficient lead generation and marketing analytics.
- Collaborated with marketing teams to translate data requirements into scalable solutions.
- Ensured data quality and compliance across the advertising platform.