PhD Student · Computation & Neural Systems · Caltech

Alexander Detkov

I study how neural networks build world models — how local observations are glued into a coherent global picture of the world.

In the Thomson Lab.

Portrait of Alexander Detkov
01

About

I'm a PhD student in Computation & Neural Systems at Caltech, advised by Prof. Matt Thomson. My research is on how deep learning builds world models. I'm drawn to the topological, geometric, and algebraic structure a model takes on as it learns, and what it reveals about how networks generalize.

Before Caltech I studied Engineering Physics and Mathematics at the University of Alberta. Earlier work spanned neural architecture search at Huawei, graph neural networks, computational nanofluidics, and experimental particle physics with the IceCube Neutrino Observatory.

I believe reasoning and mathematical structure — not pure empiricism — are the key to generalization and interpretability, and I want to build a rigorous, interpretable theory of how deep learning works. Long term, I'm aiming for a career in research and teaching.

02

Research

I work where deep learning meets pure mathematics.

World models in deep learning

How a network turns raw experience into a usable internal model.

Structure in representations

The shape of what a model learns — and how it mirrors the world it learned from.

Generalization

When and why models generalize instead of memorizing — and what a real theory of it would take.

Interpretable, principled ML

Why structure, not scale alone, leads to models we can understand and trust.

03

Selected publications

  1. Prompt Baking — concept illustration
    arXiv2024

    Prompt Baking

    Alexander Detkov*, Aman Bhargava*, Cameron Witkowski*, Matt Thomson · equal contribution

    Preprint, arXiv:2409.13697

    Converts a natural-language prompt into a targeted weight update, training a model to match the prompted distribution — giving durable behavioral control, stronger zero-shot reasoning from baked chain-of-thought, and continual "knowledge baking."

  2. Reparameterization through spatial gradient scaling — figure
    ICLR2023Poster

    Reparameterization through Spatial Gradient Scaling

    Alexander Detkov, Mohammad Salameh, Muhammad Fetrat Qharabagh, Jialin Zhang, Wei Lui, Shangling Jui, Di Niu

    International Conference on Learning Representations (ICLR), 2023

    Shows that structural reparameterization is equivalent to a spatial scaling of gradients, explaining why it accelerates CNN training — and turns that into an analytical method for finding high-performing reparameterizations, with ~2× faster training.

  3. Imbibition of a single polymer into a nanocapillary — figure
    CSME2022Oral

    Imbibition of a Single Polymer into a Nanocapillary

    Alexander Detkov, Wylie Stroberg

    Canadian Society for Mechanical Engineering (CSME), 2022

    A molecular- and Langevin-dynamics study of how a single polymer is drawn into a nanocapillary, identifying a maximum mean first-passage time with practical engineering implications.

Full list on Google Scholar.

04

Research experience

  1. 2024 — present

    World Models in Transformers

    California Institute of Technology · Prof. Matt Thomson

    Studying world-model learning in transformers, and the structure that emerges in their representations.

  2. 2023 — 2024

    Explainable Graph Neural Networks

    University of Alberta · Prof. Di Niu

    Developed methods for robust, global-level GNN explanations that stay faithful under distribution shift — including a closed loop between learned models and interpretable symbolic expressions.

  3. 2022

    Neural Architecture Search

    Huawei Research · Prof. Di Niu & Dr. Mohammad Salameh

    Led a direction on the learning dynamics of CNNs under reparameterization (→ ICLR 2023) and worked on hardware-aware NAS and graph-level optimizers for NPUs.

  4. 2021

    Computational Nanofluidics

    University of Alberta · Prof. Wylie Stroberg

    Molecular- and Langevin-dynamics simulations of polymer imbibition into nanotubes (LAMMPS, Julia), using stochastic-process theory to find a maximum mean first-passage time.

  5. 2021

    Experimental Particle Physics

    University of Alberta & IceCube Observatory · Prof. Juan Pablo Yáñez

    Calibrated digital optical modules using Cherenkov radiation from atmospheric muons, and built a Python library for particle-track filtering.

05

News

  • Sep 2024Started my PhD in Computation & Neural Systems at Caltech, joining the Thomson Lab.
  • Sep 2024Prompt Baking posted to arXiv.
  • May 2023Joined Prof. Di Niu's lab to work on explainable graph neural networks.
  • Jan 2023Reparameterization through Spatial Gradient Scaling accepted to ICLR 2023.
  • Jun 2022Presented our molecular-dynamics work on polymer imbibition at CSME 2022.
  • Jan 2022Joined Huawei Research to work on Neural Architecture Search.