Pretraining Recurrent Networks without Recurrence

Akarsh Kumar, Phillip Isola

Massachusetts Institute of Technology. Preprint 2026.


teaser figure

TLDR

We propose Supervised Memory Training (SMT), a replacement for BPTT for training nonlinear RNNs. SMT trains a time-parallel encoder to produce 'optimal' memory states: compressed representations of the past that are predictive of the future. The RNN is trained with one-step supervised learning to mimic transitions between these optimal memory states.

Key Results

SMT achieves:

Applications

Citation

@article{kumar2026smt,
  title     = {Pretraining Recurrent Networks without Recurrence},
  author    = {Akarsh Kumar and Phillip Isola},
  year      = {2026},
  url       = {https://akarshkumar.com/smt},
}

Hit Counter by Digits