
Catherine
From United Kingdom (UTC+1)
9 years of commercial experience
Lemon.io stats
Catherine – Python, GCP, AWS
Catherine excels at simplifying complex concepts and effectively communicating her findings to both technical and non-technical audiences. With her ability to independently manage projects and her commitment to continuous learning, Catherine brings both expertise and adaptability to any team as a Senior Data Scientist with over 6 years of experience in the field.
Main technologies
Additional skills
Ready to start
To be verifiedDirect hire
Potentially possibleReady to get matched with vetted developers fast?
Let’s get started today!Experience Highlights
CTO
Developed a patent-pending content moderation algorithm capable of identifying various categories of abusive content in speech, as well as in images, videos, and audio, utilizing finely-tuned Transformers. Additionally, constructed the entire Cloud Infrastructure and Python REST APIs from the ground up.
- Created the patent-pending Emotion AI algorithm using Transformers for the NLP part and Python;
- Created various Computer vision models to detect nudity, weapons, and racist symbols using Resnet( for nudity detection) and Yolo via Roboflow for Object detection;
- Migrated legacy PoC from MS AZURE to GCP;
- Created a no-SQL Database using Mongo and connected it securely to the Cloud Infrastructure;
- Built a containerized FastAPI back-end to serve the models and deployed on Kubernetes using Docker;
- Hired Front-end and infrastructure Engineers during the scaling phase.
Senior Data Engineer
Catherine automated a previously labor-intensive process involving the weekly generation of device crash reports for the CTO of SKY Technologies in Google Data Studio. This project involved handling KPI data from AWS S3 buckets and BigQuery. She adeptly managed the data, ingested it into BigQuery, partitioned tables, and crafted views for diverse stakeholders. The final result was a fully automated end-to-end report in Data Studio.
- Was the first GCP-certified Data Engineer on this team and started the Data Pipeline on GCP from scratch, including GCP networking (PVC's Security groups, Database access, etc. );
- Built a comprehensive report in Data Studio that automatically rendered and visualized real-time KPI data weekly;
- Refrained from spending a lot of money on tools that could code herself and wrote an article about this project on Medium.
Cloud Engineer
Moved an on-premises system to RedHat OpenShift Kubernetes with Terraform, driven by a desire to deepen understanding of networking and security. This effort succeeded in broadening knowledge in these areas and revealed a true passion for constructing solutions. A Golang tool was developed to collaborate with Terratest for deployment on IBM Cloud. Penetration testing was conducted using Gobuster during the infrastructure testing phase.
- Migrated on-prem payments processing system of a large retail bank to IBM cloud / RedHat Openshift using Terraform;
- Built a Flask REST API service that detects severe weather warnings from RSS news feeds (data cleaning, wrangling, ML classification, Deployment of Flask Service using Docker and Kubernetes);
- Participated and succeeded in Capture the Flag events for Cyber Security Awareness month;
- Mentored Junior Engineers who were doing their Masters in AI-related fields with their assignments;
- Developed a Penetration testing tool in Golang that integrates with Terratest and supports IBM Cloud;
- Was part of a team that built corporate Chatbot applications using IBM Watson.
Lead Data Engineer/Data Scientist
Built various Webscrapers for Real Estate Websites and built AI models to predict prices based on the features of new properties that come the market.
- Built a bespoke web crawling tool that scraped real estate prices and features from various well-known websites;
- Cleaned and wrangled the data and ingested it into AWS Dynamo DB (document database);
- Performed exploratory data analysis;
- Trained SKlearn RandomForest to predict prices of new properties based on this data;
- Automated the data ingestion and period re-training of the model;
- Monitored model performance.
Senior Data Scientist
Design, Implementation, and Management of a business intelligence tool for data mining. Built a Framework for scraping used car prices that developed into a full-fledged Data pipeline, including training Scikitlearn Prediction models to serve the prices to Insurance customers on a Flask backend and a Plotly Dash dashboard for analytics.
- Wrote web scrapers and API consumers in Python that automatically scraped automotive sales websites and ingested the data into a MYSQL database;
- Designed schemas and set up the database on Digital Ocean;
- Did exploratory data analysis to find the best model to predict used car prices based on the data obtained from the Data Ingestion Pipeline;
- Wrote Python module that iteratively tried different ML models and generated charts for each model's accuracy and precision;
- Did Full stack Software /Data Engineering, including bespoke SKLEARN NLP classifier to identify vehicles from automotive adverts precisely geo-coding using Google Location API and interactive map generation of the location of vehicles visualizations using Plotly Dash, which was built on top of Flask with Auth and all using the free version from Plotly;
- Wrote reports and forecasts (Python code to automatically generate relevant charts generated in Python's matplolib as PowerPoint Presentation for Business Stakeholders).