Hemed
From Turkey (GMT+3)
10 years of commercial experience
Lemon.io stats
3
projects done800
hours workedOpen
to new offersHemed – Python, Flask, Machine learning
Senior Data Scientist with solid theoretical foundations (Ph.D.), hands-on technical experience, and proven people skills. Hemed has achieved remarkable results working on a diverse range of machine learning models, including non-linear optimization, time series forecasting, recommendation and ranking systems, text-to-speech conversion, and machine translation. Undoubtedly, this knowledgeable engineer with a bright personality will be a brilliant addition to any team.
Main technologies
Additional skills
Ready to start
ASAPDirect hire
Potentially possibleExperience Highlights
AI scientist
A burgeoning platform specializing in text-to-speech conversion. In this endeavor, the team focused on enhancing the core product of the company: a sophisticated text-to-speech engine.
Hemed's work was centered on the analysis of various elements within a given movie or play script in PDF format. Particularly, he worked on:
- Extraction of Text: He employed OCR technologies to seamlessly extract textual content from the script PDF, ensuring accuracy and fidelity.
- Parsing and Segmentation: Leveraging cutting-edge parsing algorithms, he delineated distinct sections within the extracted text, including scenes, dialogues, and speaker attributions.
- Voice Allocation: Employing state-of-the-art techniques, his work discerned the unique voices of each character involved in the conversation and allocated distinct vocal characteristics accordingly, enhancing the immersive experience for the audience.
- Soundscape Integration: Recognizing the importance of ambiance and setting, Hemed curated background soundscapes and location effects tailored to each scene, seamlessly embedding them into the conversation to enrich the auditory experience.
- Release: Finally, he presented a fully realized spoken rendition of the original written play, adapted to preserve its essence while elevating it to an immersive audio format.
Head of AI
An up-and-coming healthcare startup that launched a new product that makes the life of medical coders easy by automatically assigning relevant medical codes to handwritten discharge summaries.
- Studied design, requirement elicitation, and data collection;
- Trained four code-prediction models (Information retrieval based, dictionary-based, sequence-to-sequence, and multilabel classification models);
- Evaluated the code prediction models on several datasets;
- Created the back-end code prediction API leveraging the best-performing model.
Tech lead
A lawsuit is a long (5-100 pages) document detailing a legal case. The goal of this project is to train a summarization model and expose an API endpoint in such a way that, when given a lawsuit, the document returns its short (less than 500 words) summary. The summary should contain all important case details and be written in such a language that a person not trained in the legal domain can understand.
- Created front-end and back-end architecture;
- Conducted data studies to find the best datasets for training and evaluating the summarization task;
- Evaluated different pre-trained summarization models to see which one best fits our needs;
- Debugged and resolved the application issues.
Data Scientist / Full-stack Developer
A tool for indexing Vimeo videos. The tool makes it possible to search Vimeo videos not only by the apparent text content (i.e., video title, tags, description, and create date) but also by the spoken words and phrases in the video itself. It was originally written for Viz media (the American manga publisher).
- Implemented the API integration for fetching and updating video metadata to and from Vimeo;
- Developed a text-to-speech model to generate subtitles for videos that did not contain subtitles automatically;
- Implemented the whoosh index to index over 5000 videos that were uploaded anywhere from 2012-2022. The videos were of 3 minutes to 1hr in length;
- Created a front-end web interface to interact with the indexer. The indexer allowed for automatic updating of the index with newly uploaded videos and searching for videos containing desired search phrases;
- Hosted the index on the AWS-EBS instance.
Tech Lead
A title block is an information box usually found in the bottom right-hand corner of an architectural drawing. The block indicates drawing details such as the title, author name, scale, version, and date of the drawing. In this project, Hemed developed a desktop application for automating the extraction of the title block information on PDFs/images of the drawings and automatically populating the fields in the cloud-based database. The main challenge was that different Architects or Engineering firms have different orientations or placements of the title block on the drawing sheets. The end result needed to be pretty smart to identify the location of the title block, its orientation, and the right attributes in it. The project was for a client from the Architectural Engineering and Construction (AEC) industry.
- Trained a YOLO model to identify the location of the title block in any given architectural drawing;
- Implemented a tesseract backend to re-orient the PDF and extract title block text rightly;
- Implemented the TKinterbased desktop application as an interface for the process.
Senior Data Scientist
An Australia-based travelling agency needed a way to scrape the web for all wonderful destinations. The goal was to build a mobile application that, when a user searches for places to visit in a particular location, they are presented with an exhaustive list of such destinations. The list would include the location name, address, images of the place, heading, and description.
- Developed a dynamic web scraper that fits all potential web pages as a flask API endpoint hosted as an AWS Lambda instance;
- Ran an Entity recognition model to identify location names in the scrapped text;
- Implemented the Flutter-based mobile application to interact with the API.
Lead Machine Learning Engineer
The tool extracts Frequently Asked Questions (FAQs) from a given email service. A client company receives hundreds of support emails every week. Each email thread contains back-and-forth conversations between the developers' team, the customer support team, and the customer. The tool swift through millions of email threads to identify the commonly asked questions. The questions are then well-contracted and presented on the FAQ page of the organization's website.
- Managed the API connections between Outlook and the local development environment;
- Trained a question identification/classification model by using transfer learning;
- Ran the clustering algorithm based on a sentence transformer to identify similar questions;
- Designed and implemented an ETL system to continuously fetch new emails and identify whether and where they fit in the FAQ database.