Cindy – Python, Pandas, Data Science
Cindy is a senior Data Scientist with strong applied experience, particularly within advertising and marketing domains. She has delivered automated ML pipelines, robust feature engineering, and business-aligned modeling solutions. Cindy mentors engineers, actively shares knowledge, and uses AI tools like Copilot for code reviews. Feedback highlights her initiative and measurable business impact, though her expertise is most pronounced in ad-tech contexts.
6 years of commercial experience in
Main technologies
Additional skills
Direct hire
PossibleReady to get matched with vetted developers fast?
Let’s get started today!Experience Highlights
Data Scientist
It's a rebuild of a degraded scoring model for a major social platform's ad inventory, which had collapsed into producing near-identical predictions across all input combinations.
- Reframed the problem from classification to regression for meaningful score differentiation across inputs.
- Designed split-first architecture to prevent training/inference data leakage.
- Encoded domain constraints into preprocessing, blocking structurally impossible input combinations.
- Validated non-linear feature relationships with mutual information, justifying tree-based models.
- Delivered first differentiated scores across device/format combinations in 4 years.
Data Scientist & ML Engineer
It's a rebuild of a manual, analyst-dependent scoring process into a fully automated event-driven ML pipeline on AWS.
- Designed S3 → Lambda → Airflow → Docker architecture eliminating all manual steps.
- Built a multi-response ensemble scoring with row-level keys for safe prediction join-back.
- Implemented global score distribution tracker with automated drift detection alerting.
- Created a graceful market size fallback chain across three data sources for global coverage.
- Enforced destination schema validation pre-write, replacing silent failures with clear errors.
- Reduced turnaround time from 1.5 weeks to under 2 minutes, enabling 5 repeat global clients.
Data Scientist
It’s a production ML issue I handled: one is designing an interpolation approach for a new, unseen product variant without retraining, and the other is ongoing model governance and drift monitoring tied to data sources.
- Traced data creep to single client rollout and analyzed score distributions across variants.
- Verified new variant blend formula maintained monotonicity between existing bounds.
- Corroborated internal distributions against third-party benchmarks for relative impact.
- Rejected unsupervised clustering due to insufficient data and traceability needs.
- Secured senior data scientist, customer team, and VP alignment before shipping.
Data Scientist
It's a reusable statistical testing framework for measuring outcome lift across binary and ordinal survey responses in a controlled exposure study comparing two treatment conditions among thousands of respondents.
- Implemented Chi-square, Fisher's Exact, and Z-tests for binary outcomes, validating convergence across methods.
- Added Mann-Whitney U for ordinal Likert-scale data, selecting appropriate non-parametric tests.
- Built null imputation sensitivity analysis as a conservative lower bound for robust significance.
- Documented test selection rationale for non-technical stakeholders.
- Designed a reusable statistical framework for future studies across response columns and grouping variables.
Tech Lead, Research Analyst / Data Scientist
It's a research project at a major university combining NLP-based signal extraction from SEC 8-K filings with event study methodology to identify language patterns associated with accounting fraud disclosures.
- Built a keyword extraction pipeline on SEC 8-K filings to detect fraud signals like blame-shifting and regulatory mentions.
- Combined SEC enforcement actions and restatements as additional fraud indicator categories.
- Applied event study methodology with CRSP-Compustat returns to measure abnormal returns around flagged disclosures.
- Identified market reaction patterns consistent with material adverse disclosure events.
- Sustained year-long independent research from data acquisition through analysis and documentation.