CV
Education
University of California, San Diego, Ph.D. in Data Science, 2025 - Present (Expected June 2030)
- GPA: 4.0/4.0
- Research focus: Genomics and Machine Learning
University of California, San Diego, B.S. in Data Science, Mathematics and Economics, 2021 - 2025
- Provost Honors
Research Experience
Graduate Researcher, UCSD Halicioglu Data Science Institute, 2025 - Present
- Developing EvoLen: evolution-guided tokenization for DNA language models
- Building integrated genetic and transcriptomic models for neonatal asthma risk prediction
Senior Student Researcher, UC Institute on Global Conflict and Cooperation, 2023 - 2025
- Developed NLP models and SQL databases to analyze China's innovation ecosystem
Data Engineer, Power Transformation Lab / San Diego Supercomputer Center, 2023 - 2025
- Designed SQL databases and automation scripts for electricity data
Professional Experience
Data Management Intern, Office of the City Treasurer, San Diego, 2024
- Built Python-based automation pipelines and web scrapers, improving data workflow efficiency by 70%
Quantitative Analysis Intern, Soochow Securities, 2023
- Conducted asset allocation research using Mean-Variance and Black-Litterman optimization models
Data Assistant, Oliver Wyman, 2022
- Developed NLP models for keyword analysis and automated API data transfers
Selected Publications & Presentations
- EvoLen: Evolution-Guided Tokenization for DNA Language Models (Under review, COLM 2026)
- Simulating Organized Group Behavior: New Framework, Benchmark, and Analysis (Under review, COLM 2026)
- Integrated Genetic and Transcriptomic Risk Prediction for Neonatal Asthma (In Preparation)
- ASHG 2025 Poster: Enhancing Childhood Asthma Risk Prediction with Biologically Informed Polygenic Transcriptomic Models
- UCSD PhD Open House 2026 Poster: Identifying Causal Genomic Factors and Optimizing Structural Representations for Sequence Modeling
Skills
- Programming: Python (Expert), R (Expert), Java, SQL, MATLAB, JavaScript
- Data Tools: SAP, STATA, HTML, LaTeX
- Languages: Mandarin (Native), English (Advanced), Spanish (Fluent)