Nialish – Microsoft Azure, Python, SQL
Nialish is a Senior Data Engineer with over six years of experience in data engineering, big data analytics, and the development of scalable, distributed data pipelines. He possesses hands-on expertise in Python, Apache Airflow, Snowflake, SQL, and ETL/Data Warehouse architecture, among other related technologies. Throughout his career, Nialish has worked across various industries, delivering reliable, secure, and high-performance data solutions in fast-paced environments. He excels at designing and debugging complex data pipelines while effectively balancing technical best practices with business requirements to provide actionable insights. He demonstrates strong problem-solving, critical thinking, and business acumen, with the ability to communicate and collaborate across technical and non-technical teams. Adaptable, eager to learn, and experienced in leading teams, he thrives in agile environments and embraces new challenges.
9 years of commercial experience in
Main technologies
Additional skills
Direct hire
PossibleReady to get matched with vetted developers fast?
Let’s get started today!Experience Highlights
Team Lead & Principal Data Engineer
It is an intelligent customer support chatbot that delivers instant, accurate solutions by leveraging thousands of previously solved cases. The system consolidates knowledge scattered across diverse sources, including internal documentation, defect logs, help pages, data tables, and training materials, into a unified, intelligent platform. By doing so, it significantly reduces problem-resolution time, which previously could take weeks when handled manually by consultants.
- Served as the sole Principal Data Engineer in a 19-member team, mentoring colleagues and driving project development, delivery, and maintenance;
- Designed and built scalable Spark pipelines to process and integrate data from Azure Databricks Unity Catalog, transforming raw data into vector embeddings for semantic search and AI model relevance;
- Leveraged Azure AI Search as a vector store to persist embeddings, and engineered Spark workflows for efficient storage and retrieval, forming the foundation for LLM-powered applications;
- Developed end-to-end FastAPI services to enable real-time contextual data retrieval and dynamic response generation via LLMs;
- Migrated and implemented CI/CD pipelines with self-hosted Azure DevOps agents, ensuring compliance with internal security and regulatory requirements;
- Unified diverse knowledge sources (databases, logs, documentation, media) into vectorized representations, enabling instant retrieval of the most relevant context and integration with Azure OpenAI for precise solutions;
- Achieved dramatic improvements in time-to-resolution (from weeks to minutes), customer satisfaction, and scalability, with a solution designed to continuously learn from new cases.
Team Lead & Senior Data Engineer
It is a data-driven solution for a large care services marketplace (15M+ clients, 10K+ providers) to overcome disconnected data silos and improve efficiency in matching clients with care providers. The platform introduced predictive intelligence and self-service dashboards for real-time monitoring and KPIs, reducing manual data processing by 60%, improving scalability, and generating over €2M in additional revenue.
- Successfully built and deployed a Snowflake-based data warehouse from scratch, centralizing 10K+ partner company records and 15M+ client interactions through Azure Data Factory ELT pipelines;
- Implemented real-time data streaming with Snow Pipes and Apache Kafka, enabling live synchronization of transactional and customer data;
- Designed and developed a core ML engine on Databricks (Spark + PySpark) for data cleaning, feature engineering, predictive analytics, and automated appointment scheduling, generating over €2M in additional annual revenue;
- Created self-service BI dashboards with real-time KPIs, reducing dependency on IT/data teams, accelerating decision-making cycles, and improving business scalability;
- Established CI/CD pipelines for Databricks and Azure Data Factory, improving deployment reliability and shortening release cycles;
- Led and mentored a team of 3 data engineers within a larger group of 23, ensuring agile delivery, knowledge sharing, and alignment with project goals.
Senior Data Engineer
Developed a data-driven recruitment platform designed to optimize candidate-job matching at scale. The main challenge was that the existing matching process required up to 14 hours per day, causing delays and a poor user experience. By introducing advanced automation and optimized data pipelines, the solution significantly reduced processing time, improved accuracy in recommendations, and enhanced overall platform efficiency for both candidates and employers.
- Built a Big Data and Machine Learning platform from scratch using Apache Spark, enabling scalable candidate-job matching for 200K+ candidates and 1K+ daily logins;
- Designed and implemented ETL and ML pipelines for automated candidate ranking and matching, significantly reducing manual intervention and errors;
- Orchestrated workflows with Apache Airflow and integrated AWS SQS + SNS for reliable cross-server communication and recruiter notifications;
- Reduced candidate-job matching processing time from 14 hours per day to just 6 seconds, enabling near real-time recommendations and faster placements;
- Increased candidate shortlisting efficiency by 60%, improving recruiter productivity and decision-making speed;
- Contributed to 25% YoY growth in customer onboarding by optimizing recruitment workflows and improving overall user experience;
- Scaled the platform to support growing activity levels without delays, ensuring long-term performance and reliability.
Senior Data Engineer
An AI-powered B2B chatbot that delivers instant, accurate solutions to customers by leveraging multiple knowledge sources, including wiki articles, case stories, defect logs, help pages, training videos, and internal documentation. Reduced problem resolution time from weeks to minutes and improved customer satisfaction through real-time, data-backed responses.
- Achieved a 40% reduction in unexpected machine failures across 5K+ machines worldwide, saving an estimated $5M annually in maintenance and operational costs;
- Built a streaming data ingestion pipeline using Apache Kafka and Spark to process IoT sensor data (temperature, vibration, pressure) in real time;
- Stored and processed data on Azure Databricks, applying feature engineering for anomaly detection;
- Developed predictive ML models using PySpark and MLlib to forecast machine failures, enabling maintenance teams to act proactively.;
- Deployed live dashboards in Power BI to provide engineers with immediate insights into equipment health and enable automated alerts and proactive maintenance scheduling;
- Improved engineer productivity and reduced downtime by integrating predictive insights directly into operational workflows.