Logo
Cindy – Python, Pandas, Data Science, experts in Lemon.io

Cindy

From United States (UTC-4)flag

Data Scientist|Senior

Cindy – Python, Pandas, Data Science

Cindy is a senior Data Scientist with strong applied experience, particularly within advertising and marketing domains. She has delivered automated ML pipelines, robust feature engineering, and business-aligned modeling solutions. Cindy mentors engineers, actively shares knowledge, and uses AI tools like Copilot for code reviews. Feedback highlights her initiative and measurable business impact, though her expertise is most pronounced in ad-tech contexts.

6 years of commercial experience in
Accounting
Advertising
Analytics
Banking
Consulting services
Customer support
Data analytics
Machine learning
Marketing
Product management
Project management
Marketplace
Main technologies
Python
8 years
Pandas
8 years
Data Science
4 years
NumPy
8 years
Additional skills
Looker
Scikit-learn
AWS
Machine learning
Docker
CI/CD
SQL
Redshift
XGBoost
Fine-tuning
MLOps
Amazon S3
CloudWatch
AWS Lambda
API
Airflow
Matplotlib
SciPy
NLP
RegExp
PostgreSQL
Direct hire
Possible
Ready to get matched with vetted developers fast?
Let’s get started today!

Experience Highlights

Data Scientist
Jul 2025 - Mar 20268 months
Project Overview

It's a rebuild of a degraded scoring model for a major social platform's ad inventory, which had collapsed into producing near-identical predictions across all input combinations.

Responsibilities:
  • Reframed the problem from classification to regression for meaningful score differentiation across inputs.
  • Designed split-first architecture to prevent training/inference data leakage.
  • Encoded domain constraints into preprocessing, blocking structurally impossible input combinations.
  • Validated non-linear feature relationships with mutual information, justifying tree-based models.
  • Delivered first differentiated scores across device/format combinations in 4 years.
Project Tech stack:
Python
SQL
MLOps
XGBoost
Redshift
Data Science
Scikit-learn
Fine-tuning
Data Scientist & ML Engineer
Jun 2025 - Mar 20269 months
Project Overview

It's a rebuild of a manual, analyst-dependent scoring process into a fully automated event-driven ML pipeline on AWS.

Responsibilities:
  • Designed S3 → Lambda → Airflow → Docker architecture eliminating all manual steps.
  • Built a multi-response ensemble scoring with row-level keys for safe prediction join-back.
  • Implemented global score distribution tracker with automated drift detection alerting.
  • Created a graceful market size fallback chain across three data sources for global coverage.
  • Enforced destination schema validation pre-write, replacing silent failures with clear errors.
  • Reduced turnaround time from 1.5 weeks to under 2 minutes, enabling 5 repeat global clients.
Project Tech stack:
Airflow
Python
AWS
AWS Lambda
CloudWatch
Redshift
Amazon S3
Scikit-learn
API
Data Scientist
Jul 2025 - Mar 20268 months
Project Overview

It’s a production ML issue I handled: one is designing an interpolation approach for a new, unseen product variant without retraining, and the other is ongoing model governance and drift monitoring tied to data sources.

Responsibilities:
  • Traced data creep to single client rollout and analyzed score distributions across variants.
  • Verified new variant blend formula maintained monotonicity between existing bounds.
  • Corroborated internal distributions against third-party benchmarks for relative impact.
  • Rejected unsupervised clustering due to insufficient data and traceability needs.
  • Secured senior data scientist, customer team, and VP alignment before shipping.
Project Tech stack:
Redshift
SQL
Python
NumPy
Pandas
Data Scientist
May 2025 - Jul 20252 months
Project Overview

It's a reusable statistical testing framework for measuring outcome lift across binary and ordinal survey responses in a controlled exposure study comparing two treatment conditions among thousands of respondents.

Responsibilities:
  • Implemented Chi-square, Fisher's Exact, and Z-tests for binary outcomes, validating convergence across methods.
  • Added Mann-Whitney U for ordinal Likert-scale data, selecting appropriate non-parametric tests.
  • Built null imputation sensitivity analysis as a conservative lower bound for robust significance.
  • Documented test selection rationale for non-technical stakeholders.
  • Designed a reusable statistical framework for future studies across response columns and grouping variables.
Project Tech stack:
Python
SciPy
Matplotlib
NumPy
Tech Lead, Research Analyst / Data Scientist
Apr 2022 - May 20231 year 1 month
Project Overview

It's a research project at a major university combining NLP-based signal extraction from SEC 8-K filings with event study methodology to identify language patterns associated with accounting fraud disclosures.

Responsibilities:
  • Built a keyword extraction pipeline on SEC 8-K filings to detect fraud signals like blame-shifting and regulatory mentions.
  • Combined SEC enforcement actions and restatements as additional fraud indicator categories.
  • Applied event study methodology with CRSP-Compustat returns to measure abnormal returns around flagged disclosures.
  • Identified market reaction patterns consistent with material adverse disclosure events.
  • Sustained year-long independent research from data acquisition through analysis and documentation.
Project Tech stack:
PostgreSQL
SQL
Python
NLP
RegExp

Education

2020
Operations Research
Master of Science

Languages

English
Advanced

Hire Cindy or someone with similar qualifications in days
All developers are ready for interview and are are just waiting for your requestdream dev illustration
Copyright © 2026 lemon.io. All rights reserved.