Personal Information


Education Background

Columbia University, Mailman School of Public Health New York, NY

  • Master of Science, Major in Biostatistics, Public Health Data Science Track, GPA 3.97 Aug 2020 – May 2022
  • Data Science Institute Scholar

Peking University, National School of Development (Ivy League in China) Beijing, China

  • Bachelor of Economics, Major in Economics Sep 2011 – Jul 2013
  • Excellent Interviewer Award of Institute of Social Science Survey of Peking University

Xi’an Jiaotong University, School of Life Science and Technology (Ivy League in China) Xi’an, China

  • Bachelor of Engineering, Major in Bioengineering Sep 2004 – Jul 2008
  • Gold Award of the Sixth Business Plan Competition of Xi’an Jiaotong University
  • Admitted with Exemption of National College Entrance Examination for 1st Prize Winner of Science Olympiad (~ 0.05% qualified)


Skills & Awards

Programming

  • Python (Pandas, NumPy, Scikit-Learn, TensorFlow) | SQL | R (Markdown) | SAS (Macros) | Git | Hadoop| Spark

Statistics

  • Machine Learning | Deep Learning | Spatial Analysis | Natural Language Processing | Experimental Design | A/B Testing

National Science Olympiad

  • 1st Prize of China High School Biology Olympiad, 3rd Prize of National Olympiad in Informatics in Province
  • 2nd Prize of China Middle School Physics Olympiad, 3rd Prize of China Middle School Mathematical Olympiad
  • 2nd Prize of China Elementary School Mathematical Olympiad

Languages

  • Chinese: Native Language;
  • English: Test of English as a Foreign Language (TOEFL) : iBT Test Score 105;
  • Spanish: Diploma of Spanish as a Foreign Language (Diploma de Español como Lengua Extranjera, DELE) : Level B1 of Common European Framework of Reference (CEFR);
  • Portuguese: Elementary Diploma of Portuguese as a Foreign Language (Diploma Elementar de Português Língua Estrangeira, DEPLE): Level B1 of Common European Framework of Reference (CEFR);
  • Italian: Certification of Italian as a Foreign Language (Certificazione di Italiano come Lingua Straniera, CILS): Level B1 of Common European Framework of Reference (CEFR).


Professional Experience

Columbia University New York, NY | Feb 2021 – Present

Data Science Institute Scholar (Biostatistician), Center for Precision Medicine and Genomics | Feb 2021 – Present
  • Conducted exploratory data analysis of ~6.7M gene data via Python & R and created training and test data for statistical modeling
  • Built GLM (Logistic Regression) and Machine Learning Models (Ridge, LASSO, Elastic Net, Random Forests) in Python & R
  • Performed Cross-Validation to select optimal model and identified key features for predicting true gene variants of kidney disease
Data Science Institute Scholar (Data Analyst), Department of Epidemiology | Nov 2021 – Present
  • Created NYC COVID-19 database after exploratory data analysis of ~23.5M Electronic Health Records Data in SAS, SQL & R
  • Plotted choropleth maps of socioeconomic and health factors using R & QGIS to evaluate geospatial distribution in NYC
  • Conducted Survival Analyses for effects of demographics and air pollution on COVID-19 hospitalization and mortality via R
Data Science Institute Scholar (Statistician), Center for International Earth Science Information Network | Mar 2022 – Present
  • Visualized geographic clustering of natural disasters and federal fundings in Python & R using hazards data and flood insurance data
  • Constructed Poisson Regression in Python & R to assess effects of socioeconomic factors on disaster risks and government fundings
  • Wrote project report and provided research findings for press publication: How we found communities in harm’s way
Teaching Assistant of Introduction to Biostatistical Methods, Department of Biostatistics | Aug 2021 – Dec 2021
  • Held R lab and weekly office hour sessions to advise ~60 graduate students on R programming and statistical analysis


Institute of Social Science Survey, Peking University Beijing, China | Jun 2013 – Aug 2020

Senior Data Analyst, China Family Panel Studies (CFPS) | Aug 2014 – Aug 2020
  • Trained 900+ interviewers, detected patterns of fieldwork issues, and recommended solutions via analysis of performance metrics
  • Wrote data wrangling scripts using SQL & SAS Macros to detect and correct errors in ~152K survey data for quality optimization
  • Harmonized 5 biennial surveys data via SQL & SAS, discovered survey deficiencies, and produced improvement recommendations
  • Developed and implemented data de-identification algorithms for data privacy, provided data analysis consultation for customers
  • Built client data pipeline, visualized customer growth and demographic statistics in SQL & SAS to communicate analytical results
Data Collection Intern, China Health and Retirement Longitudinal Study (CHARLS) | Jun 2013 – Aug 2014
  • Coordinated team leadership for 11 interviewers, identified sampled households, selected respondents with simple random sampling
  • Collected socioeconomic and vital signs data with computer-assisted personal interviews and recognized sources of data anomalies
  • Supervised fieldwork to optimize data quality, achieved ~93% survey response rate, and received Excellent Interviewer Award


Dihon Pharmaceutical Group Co., Ltd. Kunming, China | Sep 2008 – Nov 2010

Regional Manager | Sep 2008 – Jan 2009 & Aug 2010 – Nov 2010, Product Specialist | Jan 2009 – Jan 2010
  • Evaluated competitors by collecting market data from ~400 retail stores and designed incentive programs with new pricing strategy
  • Assessed geographic distribution of sales data and extended ~25 new wholesale and retail partners to increase distribution coverage
  • Analyzed sales, profit, and marketing cost data to identify key stores and revitalized brand image by building ~10 flagship stores
  • Cleared ¥500,000 stagnant inventory via diverse promotion activities and boosted ~430% growth in local monthly sales revenue