Tarik
From Turkey (GMT+3)
11 years of commercial experience
Lemon.io stats
Tarik – AWS, Python, Docker
A seasoned data scientist, Tarik possesses almost a decade of commercial experience and a PhD in mathematics (NLP and Graph ML). His strongest suits are Machine Learning, Data Engineering, LLMs, and Neural Networks. As a cherry on top, Tarik is capable of bringing value both as an individual contributor and manager, so do not hesitate you make him a part of your team.
Main technologies
Additional skills
Ready to start
September 9thDirect hire
Potentially possibleExperience Highlights
Senior Data Scientist
A hyper-personalized content tagging system. It utilizes small-sized Language Models (LLMs) like Microsoft's Phi model, fine-tuned with user profile descriptions and content metadata to identify relevant and irrelevant content for individual users. This project significantly enhanced the content recommendation engine's accuracy, leading to increased user retention.
- Implemented a fine-tuning pipeline for small LLMs to adapt them for personalized content relevance classification;
- Created a scalable system to process and tag large volumes of content in batch inference mode;
- Integrated the tagging system with the existing content recommendation engine, reducing false positives by 79%.
Senior Data Scientist
An AI-powered virtual assistant chatbot for the enterprise communication platform, enabling contextual searches and actions through natural language conversations. This assistant enhances user productivity by providing quick access to information and automating routine tasks within the organization's digital workspace.
- Implemented the initial POC to showcase the contextual conversational capabilities of LLMs;
- Presented to the CTO for the progress and the potential directions considering the current STOA in LLMs to determine the development strategy until the product management team took it over;
- Architected and implemented a conversational AI system using state-of-the-art language models and natural language understanding techniques;
- Integrated the assistant with various internal systems to enable actions like scheduling meetings, retrieving documents, and answering company-specific queries;
- Implemented context-aware conversation handling to maintain coherent multi-turn dialogues;
- Developed a robust intent classification and parameters extraction system to accurately route user requests to appropriate handlers.
Senior Data Scientist
A customer churn prediction system. The system analyzes customer behavior, product usage patterns, and engagement metrics to identify at-risk accounts and enable proactive retention strategies. Tarik developed a machine learning model to predict the churn risk of tenants for the SaaS platform, potentially saving $5M in Annual Recurring Revenue (ARR).
- Communicated with the customer success, sales, and product management teams to understand their needs, identify important features, and set the project goals;
- Developed the churn prediction model, end to end, from data preparation to model deployment;
- Engineered features from various data sources, including product usage logs, customer support tickets, and financial data;
- Implemented and compared multiple machine learning algorithms, including basic regression models, RNNs, and LSTMs, ultimately selecting LightGBM for its performance and interpretability;
- Developed an automated ML pipeline using MLFlow for model training, validation, and deployment;
- Created a dashboard for the customer success team to visualize churn risk and key factors contributing to potential churn;
- Achieved an 87% recall in predicting churn 60 days in advance, allowing for timely interventions.
Senior Data Scientist
A Retrieval-Augmented Generation (RAG) solution to generate accurate answers based on search results for user queries. This system combines the power of large language models with a company's specific knowledge base to provide contextually relevant and up-to-date responses.
- Designed and implemented a RAG pipeline that efficiently retrieves relevant documents and generates coherent answers;
- Optimized the document indexing and retrieval process using Milvus and ElasticSearch;
- Implemented a mechanism to qualify the generated answer to show or hide it on top of the search results page.
Senior Data Scientist
A scalable, multi-tenant content recommendation system for the modern enterprise intranet SAAS platform serving over 700 tenants with 700K+ users. The system provides personalized content suggestions and related content features, enhancing user engagement and information discovery within organization intranets.
- Architected and implemented a scalable recommendation engine using collaborative filtering techniques;
- Created an auto-modeling method that optimizes the training process for each tenant specific to their usage;
- Designed the model for multi-tenant scenarios, ensuring data isolation and personalized recommendations for each client;
- Optimized the training process to update each model with only the new data per tenant;
- Provided endpoints to retrieve real-time recommendations for the user using Redis indices for user and item embeddings, which reduces memory usage in the inference stage;
- Deployed the solution using Snowflake, Airflow, MLFlow, Redis and Kubernetes, enabling easy scaling to 100K+ recommendations per day.
Machine Learning Team Lead
An automated system to detect duplicate or near-duplicate questions in a large-scale trivia game database. This project aimed to maintain the quality and uniqueness of the question set as third-party providers continuously added new questions.
- Analyzed the existing question database to understand the scope and nature of duplication issues;
- Created a representative dataset of question-answer pairs that were considered near-duplicates based on predefined criteria;
- Fine-tuned the BERTurk model to detect semantic similarities between questions;
- Implemented a batch processing system to efficiently check new questions against the existing database;
- Developed a user interface for content managers to review and act on potential duplicates;
- Established a continuous monitoring process to ensure ongoing question set quality.
Machine Learning Team Lead
An in-depth market research project to analyze audience sentiment towards TRT's flagship TV drama on social media. This project combined advanced NLP techniques with traditional market research methods to inform strategic decisions for the upcoming season.
- Collected and processed large-scale Twitter data related to the TV drama;
- Fine-tuned BERTurk, a BERT-based model pre-trained on Turkish content, for multi-label emotion classification;
- Collaborated with a 3rd party company to create an annotated dataset for emotion tags such as anger, jealousy, love and hate;
- Developed a detailed emotional landscape analysis for audience reactions towards each actor and actress;
- Integrated findings with offline market research conducted by IPSOS to provide comprehensive insights;
- Presented results to decision-makers, directly influencing contract negotiations for the next season.