Alexander Detkov

01

About

I'm a PhD student in Computation & Neural Systems at Caltech, advised by Prof. Matt Thomson. My research is on how deep learning builds world models. I'm drawn to the topological, geometric, and algebraic structure a model takes on as it learns, and what it reveals about how networks generalize.

Before Caltech I studied Engineering Physics and Mathematics at the University of Alberta. Earlier work spanned neural architecture search at Huawei, graph neural networks, computational nanofluidics, and experimental particle physics with the IceCube Neutrino Observatory.

I believe reasoning and mathematical structure — not pure empiricism — are the key to generalization and interpretability, and I want to build a rigorous, interpretable theory of how deep learning works. Long term, I'm aiming for a career in research and teaching.

02

Research

I work where deep learning meets pure mathematics.

World models in deep learning

How a network turns raw experience into a usable internal model.

Structure in representations

The shape of what a model learns — and how it mirrors the world it learned from.

Generalization

When and why models generalize instead of memorizing — and what a real theory of it would take.

Interpretable, principled ML

Why structure, not scale alone, leads to models we can understand and trust.

03

Selected publications

arXiv2024

Prompt Baking

Alexander Detkov*, Aman Bhargava*, Cameron Witkowski*, Matt Thomson · equal contribution

Preprint, arXiv:2409.13697

Converts a natural-language prompt into a targeted weight update, training a model to match the prompted distribution — giving durable behavioral control, stronger zero-shot reasoning from baked chain-of-thought, and continual "knowledge baking."

Abstract PDF
ICLR2023Poster

Reparameterization through Spatial Gradient Scaling

Alexander Detkov, Mohammad Salameh, Muhammad Fetrat Qharabagh, Jialin Zhang, Wei Lui, Shangling Jui, Di Niu

International Conference on Learning Representations (ICLR), 2023

Shows that structural reparameterization is equivalent to a spatial scaling of gradients, explaining why it accelerates CNN training — and turns that into an analytical method for finding high-performing reparameterizations, with ~2× faster training.

Abstract PDF
CSME2022Oral

Imbibition of a Single Polymer into a Nanocapillary

Alexander Detkov, Wylie Stroberg

Canadian Society for Mechanical Engineering (CSME), 2022

A molecular- and Langevin-dynamics study of how a single polymer is drawn into a nanocapillary, identifying a maximum mean first-passage time with practical engineering implications.

PDF

Full list on Google Scholar.

04

Research experience

2024 — present

World Models in Transformers

California Institute of Technology · Prof. Matt Thomson

Studying world-model learning in transformers, and the structure that emerges in their representations.
2023 — 2024

Explainable Graph Neural Networks

University of Alberta · Prof. Di Niu

Developed methods for robust, global-level GNN explanations that stay faithful under distribution shift — including a closed loop between learned models and interpretable symbolic expressions.
2022

Neural Architecture Search

Huawei Research · Prof. Di Niu & Dr. Mohammad Salameh

Led a direction on the learning dynamics of CNNs under reparameterization (→ ICLR 2023) and worked on hardware-aware NAS and graph-level optimizers for NPUs.
2021

Computational Nanofluidics

University of Alberta · Prof. Wylie Stroberg

Molecular- and Langevin-dynamics simulations of polymer imbibition into nanotubes (LAMMPS, Julia), using stochastic-process theory to find a maximum mean first-passage time.
2021

Experimental Particle Physics

University of Alberta & IceCube Observatory · Prof. Juan Pablo Yáñez

Calibrated digital optical modules using Cherenkov radiation from atmospheric muons, and built a Python library for particle-track filtering.

05

News

Sep 2024Started my PhD in Computation & Neural Systems at Caltech, joining the Thomson Lab.
Sep 2024Prompt Baking posted to arXiv.
May 2023Joined Prof. Di Niu's lab to work on explainable graph neural networks.
Jan 2023Reparameterization through Spatial Gradient Scaling accepted to ICLR 2023.
Jun 2022Presented our molecular-dynamics work on polymer imbibition at CSME 2022.
Jan 2022Joined Huawei Research to work on Neural Architecture Search.

Alexander Detkov

About

Research

World models in deep learning

Structure in representations

Generalization

Interpretable, principled ML

Selected publications

Prompt Baking

Reparameterization through Spatial Gradient Scaling

Imbibition of a Single Polymer into a Nanocapillary

Research experience

World Models in Transformers

Explainable Graph Neural Networks

Neural Architecture Search

Computational Nanofluidics

Experimental Particle Physics

News