Crystal Zang

Ph.D. Candidate, University of Pittsburgh School of Public Health · Pittsburgh, PA 15232 · (413) 409-9940 ·

As an aspiring Biostatistician, I am deeply passionate about leveraging statistical methodologies to address complex challenges in healthcare. My research focuses on integrating topic modeling techniques into estimation problems, aiming to extract valuable insights from large datasets. With a strong background in statistical modeling and Natural Language Processing (NLP), particularly within Electronic Health Records (EHR) and medical notes, I bring a blend of theoretical knowledge and practical experience to the field.


Graduate Resrarch Assistant

,Department of Health Policy, University of Pittsburgh School of Public Health, Pittsburgh, PA

Provided statistical support for projects on the evaluation of the performance, access, capacity, and diversity of the Behavioral Health Network in Allegheny County for Medicaid Enrollees such as performing web scrapping , identifying trends, and data visualizations

Applied capture-recapture method to estimate the prevalence of opioid overdose in Allegheny County using Negative Binomial regression, and employed Bootstrap to estimate the confidence intervals for the combined estimates.

August 2021 - January 2024

Research Graduate Fellow

Biocomplexity Institute and Initiative, University of Virginia, Arlington, VA

Led two projects in support of the department of Social and Decision Analytics at the University of Virginia. Collaborated with research scientists and supervised and instructed undergraduate interns in data science framework. Built R Shiny Apps and presented to stakeholders.

In the R&D Text Corpora Filtering and Data Mining, performed natural language processing including sentence BERT embedding to retrieve articles about artificial intelligence (AI) from Federal RePORTER abstracts. Performed non-negative matrix factorization topic modeling on AI abstracts and identified emerging topics.

In the Defining and Measuring the Universe of Open Source Software Innovation, scraped GitHub repository information and classified repositories into various software types using term matching and sentence embeddings, which allowed National Center for Science and Engineering Statistics understand how different types of software are used within and across economic sectors

June 2021 - August 2021

Research Assistant

Department of Computational Oncology, Memorial Sloan Kettering Cancer Center, New York City

Applied Machine Learning to investigate metabolomics data from various cancers and predict unidentified metabolites. Implemented Lasso for its ability to reduce dimension, its high model performance, and its ease of interpretation. Studied the underlying structure of the metabolites via subpathways.Applied Bayesian Lasso and achieved more stable parameter estimates

June 2019 - August 2019


University of Pittsburgh

Ph.D. candidate in Biostatistics
August 2020 - Current

Smith College

B.A. Mathematics, Statistical & Data Sciences

Cum Laude with High Honors in Statistical and Data Sciences

August 2016 - May 2020


Programming Languages & Tools
  • R
  • Python
  • SQL
  • Version Control

    • Interests

      Apart from my research, I enjoy dancing Argentine tango, salsa and bachata. I'm also a 3.5 NTRP level tennis player.

      Current Research

      • Incorporating Medical Notes in Estimating Individualized Treatment Rules

        Developed an integrative D-learner for estimating the individual treatment rule (ITR) by leveraging both quantitative data and unstructured textual documents. Incorporated Natural Language Processing (NLP) and identified underlying topics from clinical notes that are driven by the outcome using Supervised Nonnegative Matrix Factorization(SNMF).