About
I am a Data Science PhD student at the Halıcıoğlu Data Science Institute (HDSI), University of California, San Diego. My research sits at the intersection of AI for genomics and machine learning, with a focus on building biologically informed models that bridge the gap between computational power and human genomics.
Research Interests:
- Genomic Sequence Modeling: I develop DNA models that leverage biological priors such as evolutionary constraints to improve performance on genomic tasks, including cross-species analysis and structural representation.
- Precision Medicine & Health Equity: My work includes building multi-ancestry genetic risk prediction frameworks, integrating SNP-based polygenic risk scores with transcriptomic data.
Prior to my PhD, I earned a B.S. in Data Science, Mathematics, and Economics from UC San Diego in March 2025.
Selected Publications
View All →EvoLen: Evolution-Guided Tokenization for DNA Language Models
Nan Huang, Xingyu Zhou, Jiaqi Cui, Michelle Tapia-Pacheco, Tiffany Amariuta, Yang E. Li, Jingbo Shang
Under review at COLM 2026
A novel evolution-guided tokenization approach for DNA language models that captures evolutionary constraints in genomic sequences.
Simulating Organized Group Behavior: New Framework, Benchmark, and Analysis
Xinhao Zou, Yifei Huang, Zijian Wu, Jiarui Sha, Nan Huang, Lingfeng Yun, Jingbo Shang, Liangcai Peng
Under review at COLM 2026
A new framework and benchmark for simulating and analyzing organized group behavior.
Integrated Genetic and Transcriptomic Risk Prediction for Neonatal Asthma
Nan Huang, Matthew F. Ragsac, Brian K. Pham, Kelan G. Tantisira, Tiffany Amariuta
In Preparation
Integrating polygenic risk scores and transcriptomic data for biologically informed neonatal asthma risk prediction.
News
Presented poster *Identifying Causal Genomic Factors and Optimizing Structural Representations for Sequence Modeling* at HDSI PhD Open House, La Jolla, CA
Presented poster *Enhancing Childhood Asthma Risk Prediction with Biologically Informed Polygenic Transcriptomic Models* at ASHG 2025, Boston, MA
Started my PhD in Data Science at UC San Diego
Oral presentation and poster *Childhood Asthma Risk Prediction: A Deeper Look at Causal Variants and Genes* at Human Genetics and Genomics Gordon Research Seminar and Conference (GRS and GRC), Portland, ME
Graduated B.S. in Data Science, Mathematics and Economics from UCSD with Provost Honors
Presented poster *Incorporate Deep Learning Model to Better Predict Individual Gene Expression* at HDSI Capstone Showcase, La Jolla, CA
