Crystal Zang

Ph.D. Candidate, University of Pittsburgh School of Public Health · Pittsburgh, PA 15232 · (413) 409-9940 · crystalzangzzw@gmail.com

As an aspiring Biostatistician, I am deeply passionate about leveraging statistical methodologies to address complex challenges in healthcare. My research focuses on integrating topic modeling techniques into estimation problems, aiming to extract valuable insights from large datasets. With a strong background in statistical modeling and Natural Language Processing (NLP), particularly within Electronic Health Records (EHR) and medical notes, I bring a blend of theoretical knowledge and practical experience to the field.


Education

University of Pittsburgh Logo
Ph.D. candidate in Biostatistics
August 2020 - Current
Smith College Logo
B.A. Mathematics, Statistical & Data Sciences

Phi Beta Kappa; Cum Laude with High Honors in Statistical and Data Sciences

August 2016 - May 2020

Skills

Programming Languages & Tools
  • R
  • Python
  • SQL
  • SAS
Version Control
Package Development
  • snmfITR: Python package for estimating individualized treatment rules by leveraging quantitative data and textual documents.

Current Research

  • Supervised Matrix Factorization for Estimating Individualized Treatment Rule: Incorporating Medical Notes in Decision Making

    Developed an integrative D-learner for estimating the individual treatment rule (ITR) by leveraging both quantitative data and unstructured textual documents. Incorporated Natural Language Processing (NLP) and identified underlying topics from clinical notes that are driven by the outcome using Supervised Nonnegative Matrix Factorization(SNMF).

    Penalized Decomposition for Feature Extraction in the Presence of Nuisance Variables using Principal Coordinate Analysis

    Designed penalized decomposition models for feature extraction by incorporating residuals and relaxed distance measures to address confounding variables. Analyzed microbiomes from the American Gut Project, identifying disease-associated taxa while accounting for patients’ lifestyle and physiological characteristics.

Work Experience

Clinical Biostatistician

University of Pittsburgh Medical Center, Pittsburgh, PA

Collaborated with physicians to design clinical trials, including sample size determination and statistical analysis, contributing to NIH-funded grant applications. Analyzed clinical data to evaluate the effectiveness of novel treatments for chronic pain. ClinicalTrials.gov ID NCT04747314

January 2024 - Current

Graduate Research Assistant

Department of Health Policy, University of Pittsburgh School of Public Health, Pittsburgh, PA

Partnered with the Allegheny County Department of Human Services and Health Policy Management to estimate opioid overdose prevalence using capture-recapture methods and Negative Binomial regression. Developed data visualizations and web scrapping tools to evaluate the performance of the Behavioral Health Network.

August 2021 - January 2024

Research Graduate Fellow

Biocomplexity Institute and Initiative, University of Virginia, Arlington, VA

Led two projects in support of the department of Social and Decision Analytics at the University of Virginia. Collaborated with research scientists and supervised and instructed undergraduate interns. Built R Shiny App and presented it to stakeholders.

In the R&D Text Corpora Filtering and Data Mining, performed natural language processing including sentence BERT embedding to retrieve articles about artificial intelligence (AI) from Federal RePORTER abstracts. Performed non-negative matrix factorization topic modeling on AI abstracts and identified emerging topics.

In the Defining and Measuring the Universe of Open Source Software Innovation, scraped GitHub repository information and classified repositories into various software types using term matching and sentence embeddings, which allowed National Center for Science and Engineering Statistics understand how different types of software are used within and across economic sectors

June 2021 - August 2021

Research Assistant

Department of Computational Oncology, Memorial Sloan Kettering Cancer Center, New York City

Applied Machine Learning to investigate metabolomics data from various cancers and predict unidentified metabolites. Implemented Lasso for its ability to reduce dimension, its high model performance, and its ease of interpretation. Studied the underlying structure of the metabolites via subpathways.Applied Bayesian Lasso and achieved more stable parameter estimates

June 2019 - August 2019

Teaching Experience

  • Teaching Fellow

    Course instructor for BIOST 2081: Mathematical Methods for Statistics.

    Teaching Assistant

    • CLRES 2020: Statistical Approaches in Clinical Research.
    • BIOST 2067: Applied Meta-Analysis.
    • BIOST 2041: Introduction to Statistical Methods.
    • BIOST 2037: Foundations of Statistical Theory.

Interests

Apart from my research, I enjoy dancing Argentine tango, salsa and bachata. I'm also a 3.5 NTRP level tennis player.