Imagine constructing a machine with a billion moving parts, only to discover that many of those components perform exactly the same function, just configured differently. In the age of deep learning, where neural networks grow ever larger and more intricate, a quietly radical idea is emerging: the sheer number of parameters in a model might not reflect its true complexity. This is the core insight behind the 2024 ICML paper “Exploring the Complexity of Deep Neural Networks through Functional Equivalence” by Guohao Shen of The Hong Kong Polytechnic University.
This work addresses one of the most counterintuitive phenomena in machine learning: despite their enormous size and potential for overfitting, overparameterized deep networks frequently perform better, not worse. They generalize to unseen data and are often easier to train than their smaller counterparts. The answer to this paradox, as Shen reveals, may lie not in what these networks can do, but in how many ways they can do the same thing.
A Fundamental Shift: From Parameters to Functions
At the heart of this research lies the notion of functional equivalence. While a network’s parameters — its weights and biases — define how it processes input data, it turns out that many…