
Tarik
From Turkey (UTC+3)
11 years of commercial experience
Lemon.io stats
1
projects done0
hours workedTarik – AWS, Python, Docker
Tarik Altuncu specializes in data engineering, machine learning, and LLMs, with a PhD in Graph Theory and expertise in Python, Pandas, and Scikit-learn. He has delivered high-impact NLP and AI solutions, including retrieval-augmented generation (RAG) systems. Strong in ML concepts and problem-solving, he is a valuable asset for AI-driven and data-intensive projects.
Main technologies
Additional skills
Ready to start
To be verifiedDirect hire
Potentially possibleReady to get matched with vetted developers fast?
Let’s get started today!Experience Highlights
Freelance Consultant
An intelligent news retrieval system built for a global newsroom, leveraging Retrieval-Augmented Generation (RAG) to answer user queries with accurate, editorially compliant information from the newsroom's vast content repository. The system combines cutting-edge NLP techniques to classify intent, retrieve relevant documents from knowledge bases, summarize content, and apply editorial guardrails—all exposed through a flexible API that supports real-time streaming responses.
The solution addresses the challenge of information overload by providing precise, contextually relevant answers while maintaining the editorial standards. The modular architecture supports scalability, handling 100k+ documents while ensuring minimal latency.
- Designed and implemented the complete RAG pipeline using LangChain and LangGraph, creating a modular system with distinct components for intent classification, document retrieval, summarization, and editorial guardrailing;
- Developed a flexible configuration system that enables dynamic switching between cloud-based and local LLMs (via Ollama), supporting various deployment scenarios (production, degradation, A/B testing);
- Engineered a sophisticated guardrailing system that retrieves and applies the editorial guidelines based on topic context, ensuring all generated content maintains brand integrity;
- Integrated multiple knowledge bases including Elasticsearch and Typesense, implementing recency-based reranking to prioritize the latest news;
- Built a prompt management system using PromptLayer for version control and flexible prompt design, allowing for seamless adaptation to evolving business requirements;
- Created comprehensive evaluation frameworks for measuring intent classification accuracy, retrieval precision, summarization quality and following the editorial guidelines;
- Implemented Langsmith integration for debugging, monitoring, and continuous improvement of the RAG pipeline.
AI Engineer
The app is an innovative AI-powered storytelling application that creates personalized, immersive stories where children become the heroes of their own adventures. The platform combines advanced language models, image generation, and voice synthesis to craft unique tales based on a child's name, interests, and preferences.
The technical solution leverages a sophisticated AI workflow that processes user inputs to generate cohesive narratives with matching visuals and engaging voiceovers—all delivered through an intuitive mobile experience. The system's architecture ensures low-latency content generation while maintaining high-quality outputs across text, image, and audio modalities.
By addressing the challenge of creating meaningful, personalized content for children, the app transforms standard bedtime routines into interactive experiences that foster imagination and connection between parents and children.
- Architected and implemented the core AI generation pipeline that coordinates LLM-based story creation, image generation, and voice synthesis into a seamless workflow;
- Developed structured output schemas and constraints to ensure generated stories follow appropriate narrative arcs and maintain child-friendly content;
- Designed and implemented the Firebase Cloud Functions backend using Flask, creating scalable endpoints for story generation, user preferences, and content management;
- Built a sophisticated prompt management system using PromptLayer, enabling version control, A/B testing, and optimization of prompts for different story themes and age groups;
- Integrated multiple AI services including OpenRouter for accessing state-of-the-art language models and various diffusion-based image generation endpoints.
Senior Data Scientist
A hyper-personalized content tagging system. It utilizes small-sized Language Models (LLMs) like Microsoft's Phi model, fine-tuned with user profile descriptions and content metadata to identify relevant and irrelevant content for individual users. This project significantly enhanced the content recommendation engine's accuracy, leading to increased user retention.
- Implemented a fine-tuning pipeline for small LLMs to adapt them for personalized content relevance classification;
- Created a scalable system to process and tag large volumes of content in batch inference mode;
- Integrated the tagging system with the existing content recommendation engine, reducing false positives by 79%.
Senior Data Scientist
An AI-powered virtual assistant chatbot for the enterprise communication platform, enabling contextual searches and actions through natural language conversations. This assistant enhances user productivity by providing quick access to information and automating routine tasks within the organization's digital workspace.
- Implemented the initial POC to showcase the contextual conversational capabilities of LLMs;
- Presented to the CTO for the progress and the potential directions considering the current STOA in LLMs to determine the development strategy until the product management team took it over;
- Architected and implemented a conversational AI system using state-of-the-art language models and natural language understanding techniques;
- Integrated the assistant with various internal systems to enable actions like scheduling meetings, retrieving documents, and answering company-specific queries;
- Implemented context-aware conversation handling to maintain coherent multi-turn dialogues;
- Developed a robust intent classification and parameters extraction system to accurately route user requests to appropriate handlers.
Senior Data Scientist
A customer churn prediction system. The system analyzes customer behavior, product usage patterns, and engagement metrics to identify at-risk accounts and enable proactive retention strategies. Tarik developed a machine learning model to predict the churn risk of tenants for the SaaS platform, potentially saving $5M in Annual Recurring Revenue (ARR).
- Communicated with the customer success, sales, and product management teams to understand their needs, identify important features, and set the project goals;
- Developed the churn prediction model, end to end, from data preparation to model deployment;
- Engineered features from various data sources, including product usage logs, customer support tickets, and financial data;
- Implemented and compared multiple machine learning algorithms, including basic regression models, RNNs, and LSTMs, ultimately selecting LightGBM for its performance and interpretability;
- Developed an automated ML pipeline using MLFlow for model training, validation, and deployment;
- Created a dashboard for the customer success team to visualize churn risk and key factors contributing to potential churn;
- Achieved an 87% recall in predicting churn 60 days in advance, allowing for timely interventions.
Senior Data Scientist
A Retrieval-Augmented Generation (RAG) solution to generate accurate answers based on search results for user queries. This system combines the power of large language models with a company's specific knowledge base to provide contextually relevant and up-to-date responses.
- Designed and implemented a RAG pipeline that efficiently retrieves relevant documents and generates coherent answers;
- Optimized the document indexing and retrieval process using Milvus and ElasticSearch;
- Implemented a mechanism to qualify the generated answer to show or hide it on top of the search results page.
Senior Data Scientist
A scalable, multi-tenant content recommendation system for the modern enterprise intranet SAAS platform serving over 700 tenants with 700K+ users. The system provides personalized content suggestions and related content features, enhancing user engagement and information discovery within organization intranets.
- Architected and implemented a scalable recommendation engine using collaborative filtering techniques;
- Created an auto-modeling method that optimizes the training process for each tenant specific to their usage;
- Designed the model for multi-tenant scenarios, ensuring data isolation and personalized recommendations for each client;
- Optimized the training process to update each model with only the new data per tenant;
- Provided endpoints to retrieve real-time recommendations for the user using Redis indices for user and item embeddings, which reduces memory usage in the inference stage;
- Deployed the solution using Snowflake, Airflow, MLFlow, Redis and Kubernetes, enabling easy scaling to 100K+ recommendations per day.
Machine Learning Team Lead
An automated system to detect duplicate or near-duplicate questions in a large-scale trivia game database. This project aimed to maintain the quality and uniqueness of the question set as third-party providers continuously added new questions.
- Analyzed the existing question database to understand the scope and nature of duplication issues;
- Created a representative dataset of question-answer pairs that were considered near-duplicates based on predefined criteria;
- Fine-tuned the BERTurk model to detect semantic similarities between questions;
- Implemented a batch processing system to efficiently check new questions against the existing database;
- Developed a user interface for content managers to review and act on potential duplicates;
- Established a continuous monitoring process to ensure ongoing question set quality.
Machine Learning Team Lead
An in-depth market research project to analyze audience sentiment towards TRT's flagship TV drama on social media. This project combined advanced NLP techniques with traditional market research methods to inform strategic decisions for the upcoming season.
- Collected and processed large-scale Twitter data related to the TV drama;
- Fine-tuned BERTurk, a BERT-based model pre-trained on Turkish content, for multi-label emotion classification;
- Collaborated with a 3rd party company to create an annotated dataset for emotion tags such as anger, jealousy, love and hate;
- Developed a detailed emotional landscape analysis for audience reactions towards each actor and actress;
- Integrated findings with offline market research conducted by IPSOS to provide comprehensive insights;
- Presented results to decision-makers, directly influencing contract negotiations for the next season.