Education Background
Columbia University, Mailman School of Public Health New York, NY
- Master of Science, Major in Biostatistics, Public Health Data Science Track, GPA 3.97 Aug 2020 – May 2022
- Data Science Institute Scholar
Peking University, National School of Development (Ivy League in China) Beijing, China
- Bachelor of Economics, Major in Economics Sep 2011 – Jul 2013
- Excellent Interviewer Award of Institute of Social Science Survey of Peking University
Xi’an Jiaotong University, School of Life Science and Technology (Ivy League in China) Xi’an, China
- Bachelor of Engineering, Major in Bioengineering Sep 2004 – Jul 2008
- Gold Award of the Sixth Business Plan Competition of Xi’an Jiaotong University
- Admitted with Exemption of National College Entrance Examination for 1st Prize Winner of Science Olympiad (~ 0.05% qualified)
Skills & Awards
Programming
- Python (Pandas, NumPy, Scikit-Learn, TensorFlow) | SQL | R (Markdown) | SAS (Macros) | Git | Hadoop| Spark
Statistics
- Machine Learning | Deep Learning | Spatial Analysis | Natural Language Processing | Experimental Design | A/B Testing
National Science Olympiad
- 1st Prize of China High School Biology Olympiad, 3rd Prize of National Olympiad in Informatics in Province
- 2nd Prize of China Middle School Physics Olympiad, 3rd Prize of China Middle School Mathematical Olympiad
- 2nd Prize of China Elementary School Mathematical Olympiad
Languages
- Chinese: Native Language;
- English: Test of English as a Foreign Language (TOEFL) : iBT Test Score 105;
- Spanish: Diploma of Spanish as a Foreign Language (Diploma de Español como Lengua Extranjera, DELE) : Level B1 of Common European Framework of Reference (CEFR);
- Portuguese: Elementary Diploma of Portuguese as a Foreign Language (Diploma Elementar de Português Língua Estrangeira, DEPLE): Level B1 of Common European Framework of Reference (CEFR);
- Italian: Certification of Italian as a Foreign Language (Certificazione di Italiano come Lingua Straniera, CILS): Level B1 of Common European Framework of Reference (CEFR).
Professional Experience
Columbia University New York, NY | Feb 2021 – Present
Data Science Institute Scholar (Biostatistician), Center for Precision Medicine and Genomics | Feb 2021 – Present
- Conducted exploratory data analysis of ~6.7M gene data via Python & R and created training and test data for statistical modeling
- Built GLM (Logistic Regression) and Machine Learning Models (Ridge, LASSO, Elastic Net, Random Forests) in Python & R
- Performed Cross-Validation to select optimal model and identified key features for predicting true gene variants of kidney disease
Data Science Institute Scholar (Data Analyst), Department of Epidemiology | Nov 2021 – Present
- Created NYC COVID-19 database after exploratory data analysis of ~23.5M Electronic Health Records Data in SAS, SQL & R
- Plotted choropleth maps of socioeconomic and health factors using R & QGIS to evaluate geospatial distribution in NYC
- Conducted Survival Analyses for effects of demographics and air pollution on COVID-19 hospitalization and mortality via R
Teaching Assistant of Introduction to Biostatistical Methods, Department of Biostatistics | Aug 2021 – Dec 2021
- Held R lab and weekly office hour sessions to advise ~60 graduate students on R programming and statistical analysis
Institute of Social Science Survey, Peking University Beijing, China | Jun 2013 – Aug 2020
Senior Data Analyst, China Family Panel Studies (CFPS) | Aug 2014 – Aug 2020
- Trained 900+ interviewers, detected patterns of fieldwork issues, and recommended solutions via analysis of performance metrics
- Wrote data wrangling scripts using SQL & SAS Macros to detect and correct errors in ~152K survey data for quality optimization
- Harmonized 5 biennial surveys data via SQL & SAS, discovered survey deficiencies, and produced improvement recommendations
- Developed and implemented data de-identification algorithms for data privacy, provided data analysis consultation for customers
- Built client data pipeline, visualized customer growth and demographic statistics in SQL & SAS to communicate analytical results
Data Collection Intern, China Health and Retirement Longitudinal Study (CHARLS) | Jun 2013 – Aug 2014
- Coordinated team leadership for 11 interviewers, identified sampled households, selected respondents with simple random sampling
- Collected socioeconomic and vital signs data with computer-assisted personal interviews and recognized sources of data anomalies
- Supervised fieldwork to optimize data quality, achieved ~93% survey response rate, and received Excellent Interviewer Award
Dihon Pharmaceutical Group Co., Ltd. Kunming, China | Sep 2008 – Nov 2010
Regional Manager | Sep 2008 – Jan 2009 & Aug 2010 – Nov 2010, Product Specialist | Jan 2009 – Jan 2010
- Evaluated competitors by collecting market data from ~400 retail stores and designed incentive programs with new pricing strategy
- Assessed geographic distribution of sales data and extended ~25 new wholesale and retail partners to increase distribution coverage
- Analyzed sales, profit, and marketing cost data to identify key stores and revitalized brand image by building ~10 flagship stores
- Cleared ¥500,000 stagnant inventory via diverse promotion activities and boosted ~430% growth in local monthly sales revenue