It’s my third year as a PhD student at UW-Madison in the Department of Computer Science! I am fortunate to work with Fred Sala. I study data-driven methods for compute-efficient learning. My work been applied to several kinds of architectures, including multimodal models and diffusion LLMs.

In 2025 I spent a wonderful summer with the Deep Learning Group at Microsoft Research, mentored by Xiaodong Liu and Lucas Liu.

Before starting my PhD, I obtained a master’s degree at SEAS in Harvard University. And even before that, I was a full-stack software engineer at Academia.edu and at Abbvie Stemcentrx. I got my bachelor’s degree in computer science from Caltech.

Recent

[July 2025] Traveled to ICML in Vancouver to present our data mixing work at DIG-BUGS and at DataWorld.

[Jun 2025] Started my internship at MSR Redmond in the Deep Learning Group with Xiaodong Liu and Lucas Liu.

[May 2025] Released our new paper on efficient data mixing!

[Jan 2025] Our paper on novel in-context learning behaviors was accepted to ICML as a spotlight!

Research

R&B: Domain Regrouping and Data Mixture Balancing for Efficient Foundation Model Training. Albert Ge, Tzu-Heng Huang, John Cooper, Avi Trost, Ziyi Chu, Satya Sai Srinath Namburi, Ziyang Cai, Kendall Park, Nicholas Roberts, Frederic Sala. preprint.

Everything Everywhere All at Once: LLMs can In-Context Learn Multiple Tasks in Superposition. Zheyang Xiong, Ziyang Cai, John Cooper, Albert Ge, Vasilis Papageorgiou, Zack Sifakis, Angeliki Giannou, Ziqian Lin, Liu Yang, Saurabh Agarwal, Grigorios Chrysos, Samet Oymak, Kangwook Lee, Dimitris Papailiopoulos. ICML 2025 (spotlight).

Workshops and Talks

ICML 2025 DIG-BUGS, DataWorld - R&B: Breaking the Data Mixing Bottleneck with Just 0.01% Overhead

MMLS 2024 - Ingredients for Transformer Length Generalization