CV

Education

University of California, San Diego, Ph.D. in Data Science, 2025 - Present (Expected June 2030)

  • GPA: 4.0/4.0
  • Research focus: Genomics and Machine Learning

University of California, San Diego, B.S. in Data Science, Mathematics and Economics, 2021 - 2025

  • Provost Honors

Research Experience

Graduate Researcher, UCSD Halicioglu Data Science Institute, 2025 - Present

  • Developing EvoLen: evolution-guided tokenization for DNA language models
  • Building integrated genetic and transcriptomic models for neonatal asthma risk prediction

Senior Student Researcher, UC Institute on Global Conflict and Cooperation, 2023 - 2025

  • Developed NLP models and SQL databases to analyze China's innovation ecosystem

Data Engineer, Power Transformation Lab / San Diego Supercomputer Center, 2023 - 2025

  • Designed SQL databases and automation scripts for electricity data

Professional Experience

Data Management Intern, Office of the City Treasurer, San Diego, 2024

  • Built Python-based automation pipelines and web scrapers, improving data workflow efficiency by 70%

Quantitative Analysis Intern, Soochow Securities, 2023

  • Conducted asset allocation research using Mean-Variance and Black-Litterman optimization models

Data Assistant, Oliver Wyman, 2022

  • Developed NLP models for keyword analysis and automated API data transfers

Selected Publications & Presentations

  • EvoLen: Evolution-Guided Tokenization for DNA Language Models (Under review, COLM 2026)
  • Simulating Organized Group Behavior: New Framework, Benchmark, and Analysis (Under review, COLM 2026)
  • Integrated Genetic and Transcriptomic Risk Prediction for Neonatal Asthma (In Preparation)
  • ASHG 2025 Poster: Enhancing Childhood Asthma Risk Prediction with Biologically Informed Polygenic Transcriptomic Models
  • UCSD PhD Open House 2026 Poster: Identifying Causal Genomic Factors and Optimizing Structural Representations for Sequence Modeling

Skills

  • Programming: Python (Expert), R (Expert), Java, SQL, MATLAB, JavaScript
  • Data Tools: SAP, STATA, HTML, LaTeX
  • Languages: Mandarin (Native), English (Advanced), Spanish (Fluent)