Nan (Nancy) Huang

Nan (Nancy) Huang

PhD Student in Data Science

University of California, San Diego

Research Interests

Genomic Sequence Modeling
Precision Medicine & Health Equity

About

I am a Data Science PhD student at the Halıcıoğlu Data Science Institute (HDSI), University of California, San Diego. My research sits at the intersection of AI for genomics and machine learning, with a focus on building biologically informed models that bridge the gap between computational power and human genomics.

Research Interests:

  • Genomic Sequence Modeling: I develop DNA models that leverage biological priors such as evolutionary constraints to improve performance on genomic tasks, including cross-species analysis and structural representation.
  • Precision Medicine & Health Equity: My work includes building multi-ancestry genetic risk prediction frameworks, integrating SNP-based polygenic risk scores with transcriptomic data.

Prior to my PhD, I earned a B.S. in Data Science, Mathematics, and Economics from UC San Diego in March 2025.

Selected Publications

View All

EvoLen: Evolution-Guided Tokenization for DNA Language Models

Nan Huang, Xingyu Zhou, Jiaqi Cui, Michelle Tapia-Pacheco, Tiffany Amariuta, Yang E. Li, Jingbo Shang

Under review at COLM 2026

A novel evolution-guided tokenization approach for DNA language models that captures evolutionary constraints in genomic sequences.

Simulating Organized Group Behavior: New Framework, Benchmark, and Analysis

Xinhao Zou, Yifei Huang, Zijian Wu, Jiarui Sha, Nan Huang, Lingfeng Yun, Jingbo Shang, Liangcai Peng

Under review at COLM 2026

A new framework and benchmark for simulating and analyzing organized group behavior.

Integrated Genetic and Transcriptomic Risk Prediction for Neonatal Asthma

Nan Huang, Matthew F. Ragsac, Brian K. Pham, Kelan G. Tantisira, Tiffany Amariuta

In Preparation

Integrating polygenic risk scores and transcriptomic data for biologically informed neonatal asthma risk prediction.

News

2026-03

Presented poster *Identifying Causal Genomic Factors and Optimizing Structural Representations for Sequence Modeling* at HDSI PhD Open House, La Jolla, CA

2025-10

Presented poster *Enhancing Childhood Asthma Risk Prediction with Biologically Informed Polygenic Transcriptomic Models* at ASHG 2025, Boston, MA

2025-09

Started my PhD in Data Science at UC San Diego

2025-07

Oral presentation and poster *Childhood Asthma Risk Prediction: A Deeper Look at Causal Variants and Genes* at Human Genetics and Genomics Gordon Research Seminar and Conference (GRS and GRC), Portland, ME

2025-03

Graduated B.S. in Data Science, Mathematics and Economics from UCSD with Provost Honors

2025-03

Presented poster *Incorporate Deep Learning Model to Better Predict Individual Gene Expression* at HDSI Capstone Showcase, La Jolla, CA