
Currently a research engineer at Extropic. Previously, I was a Math Master's student at University of Toronto, supervised by Vardan Papyan and Adrian Nachman.
I'm broadly interested in representation learning and mathematical methods for understanding empirical deep learning. I like to think about all things applied math, probability, and machine learning.
An optimizer update augmentation that applies window smoothing across depth.
We empirically find that the Jacobians of transformer blocks in pre-trained LLMs have highly similar singular vectors.
A JAX/Equinox implementation of Joint-Embedding Predictive Architecture (JEPA) models and related self-supervised learning methods. Features extensible code, data parallelization, and gradient checkpointing.