Exact learning dynamics of deep linear networks with prior knowledge
Learning in deep neural networks is known to depend critically on the knowledge embedded in the initial network weights. However, few theoretical results have precisely linked prior knowledge to learning dynamics. Here we derive exact solutions to the dynamics of learning with rich prior knowledge in deep linear networks by generalising Fukumizu's matrix Riccati solution (Fukumizu 1998 1E-03). We obtain explicit expressions for the evolving network function, hidden representational similarity, and neural tangent kernel over training for a broad class of initialisations and tasks. The expressions reveal a class of task-independent initialisations that radically alter learning dynamics from slow non-linear dynamics to fast exponential trajectories while converging to a global optimum with identical representational similarity, dissociating learning trajectories from the structure of initial internal representations. We characterise how network weights dynamically align with task structure, rigorously justifying why previous solutions successfully described learning from small initial weights without incorporating their fine-scale structure. Finally, we discuss the implications of these findings for continual learning, reversal learning and learning of structured knowledge. Taken together, our results provide a mathematical toolkit for understanding the impact of prior knowledge on deep learning.
An analytical theory of curriculum learning in teacher-student networks
In animals and humans, curriculum learning-presenting data in a curated order-is critical to rapid learning and effective pedagogy. A long history of experiments has demonstrated the impact of curricula in a variety of animals but, despite its ubiquitous presence, a theoretical understanding of the phenomenon is still lacking. Surprisingly, in contrast to animal learning, curricula strategies are not widely used in machine learning and recent simulation studies reach the conclusion that curricula are moderately effective or even ineffective in most cases. This stark difference in the importance of curriculum raises a fundamental theoretical question: when and why does curriculum learning help? In this work, we analyse a prototypical neural network model of curriculum learning in the high-dimensional limit, employing statistical physics methods. We study a task in which a sparse set of informative features are embedded amidst a large set of noisy features. We analytically derive average learning trajectories for simple neural networks on this task, which establish a clear speed benefit for curriculum learning in the online setting. However, when training experiences can be stored and replayed (for instance, during sleep), the advantage of curriculum in standard neural networks disappears, in line with observations from the deep learning literature. Inspired by synaptic consolidation techniques developed to combat catastrophic forgetting, we propose curriculum-aware algorithms that consolidate synapses at curriculum change points and investigate whether this can boost the benefits of curricula. We derive generalisation performance as a function of consolidation strength (implemented as an regularisation/elastic coupling connecting learning phases), and show that curriculum-aware algorithms can yield a large improvement in test performance. Our reduced analytical descriptions help reconcile apparently conflicting empirical results, trace regimes where curriculum learning yields the largest gains, and provide experimentally-accessible predictions for the impact of task parameters on curriculum benefits. More broadly, our results suggest that fully exploiting a curriculum may require explicit adjustments in the loss.
Dynamics of stochastic gradient descent for two-layer neural networks in the teacher-student setup
Deep neural networks achieve stellar generalisation even when they have enough parameters to easily fit all their training data. We study this phenomenon by analysing the dynamics and the performance of over-parameterised two-layer neural networks in the teacher-student setup, where one network, the student, is trained on data generated by another network, called the teacher. We show how the dynamics of stochastic gradient descent (SGD) is captured by a set of differential equations and prove that this description is asymptotically exact in the limit of large inputs. Using this framework, we calculate the final generalisation error of student networks that have more parameters than their teachers. We find that the final generalisation error of the student increases with network size when training only the first layer, but stays constant or even decreases with size when training both layers. We show that these different behaviours have their root in the different solutions SGD finds for different activation functions. Our results indicate that achieving good generalisation in neural networks goes beyond the properties of SGD alone and depends on the interplay of at least the algorithm, the model architecture, and the data set.
Statistical physics of community ecology: a cavity solution to MacArthur's consumer resource model
A central question in ecology is to understand the ecological processes that shape community structure. Niche-based theories have emphasized the important role played by competition for maintaining species diversity. Many of these insights have been derived using MacArthur's consumer resource model (MCRM) or its generalizations. Most theoretical work on the MCRM has focused on small ecosystems with a few species and resources. However theoretical insights derived from small ecosystems many not scale up large ecosystems with many resources and species because large systems with many interacting components often display new emergent behaviors that cannot be understood or deduced from analyzing smaller systems. To address these shortcomings, we develop a statistical physics inspired cavity method to analyze MCRM when both the number of species and the number of resources is large. Unlike previous work in this limit, our theory addresses resource dynamics and resource depletion and demonstrates that species generically and consistently perturb their environments and significantly modify available ecological niches. We show how our cavity approach naturally generalizes niche theory to large ecosystems by accounting for the effect of collective phenomena on species invasion and ecological stability. Our theory suggests that such phenomena are a generic feature of large, natural ecosystems and must be taken into account when analyzing and interpreting community structure. It also highlights the important role that statistical-physics inspired approaches can play in furthering our understanding of ecology.
Coil-helix transition of polypeptide at water-lipid interface
We present the exact solution of a microscopic statistical mechanical model for the transformation of a long polypeptide between an unstructured coil conformation and an -helix conformation. The polypeptide is assumed to be adsorbed to the interface between a polar and a non-polar environment such as realized by water and the lipid bilayer of a membrane. The interfacial coil-helix transformation is the first stage in the folding process of helical membrane proteins. Depending on the values of model parameters, the conformation changes as a crossover, a discontinuous transition, or a continuous transition with helicity in the role of order parameter. Our model is constructed as a system of statistically interacting quasiparticles that are activated from the helix pseudo-vacuum. The particles represent links between adjacent residues in coil conformation that form a self-avoiding random walk in two dimensions. Explicit results are presented for helicity, entropy, heat capacity, and the average numbers and sizes of sboth coil and helix segments.
Model selection for degree-corrected block models
The proliferation of models for networks raises challenging problems of model selection: the data are sparse and globally dependent, and models are typically high-dimensional and have large numbers of latent variables. Together, these issues mean that the usual model-selection criteria do not work properly for networks. We illustrate these challenges, and show one way to resolve them, by considering the key network-analysis problem of dividing a graph into communities or blocks of nodes with homogeneous patterns of links to the rest of the network. The standard tool for undertaking this is the stochastic block model, under which the probability of a link between two nodes is a function solely of the blocks to which they belong. This imposes a homogeneous degree distribution within each block; this can be unrealistic, so degree-corrected block models add a parameter for each node, modulating its overall degree. The choice between ordinary and degree-corrected block models matters because they make very different inferences about communities. We present the first principled and tractable approach to model selection between standard and degree-corrected block models, based on new large-graph asymptotics for the distribution of log-likelihood ratios under the stochastic block model, finding substantial departures from classical results for sparse graphs. We also develop linear-time approximations for log-likelihoods under both the stochastic block model and the degree-corrected model, using belief propagation. Applications to simulated and real networks show excellent agreement with our approximations. Our results thus both solve the practical problem of deciding on degree correction and point to a general approach to model selection in network analysis.
Beyond mean field theory: statistical field theory for neural networks
Mean field theories have been a stalwart for studying the dynamics of networks of coupled neurons. They are convenient because they are relatively simple and possible to analyze. However, classical mean field theory neglects the effects of fluctuations and correlations due to single neuron effects. Here, we consider various possible approaches for going beyond mean field theory and incorporating correlation effects. Statistical field theory methods, in particular the Doi-Peliti-Janssen formalism, are particularly useful in this regard.
Mathematical modeling of escape of HIV from cytotoxic T lymphocyte responses
Human immunodeficiency virus (HIV-1 or simply HIV) induces a persistent infection, which in the absence of treatment leads to AIDS and death in almost all infected individuals. HIV infection elicits a vigorous immune response starting about 2-3 weeks post infection that can lower the amount of virus in the body, but which cannot eradicate the virus. How HIV establishes a chronic infection in the face of a strong immune response remains poorly understood. It has been shown that HIV is able to rapidly change its proteins via mutation to evade recognition by virus-specific cytotoxic T lymphocytes (CTLs). Typically, an HIV-infected patient will generate 4-12 CTL responses specific for parts of viral proteins called epitopes. Such CTL responses lead to strong selective pressure to change the viral sequences encoding these epitopes so as to avoid CTL recognition. Indeed, the viral population "escapes" from about half of the CTL responses by mutation in the first year. Here we review experimental data on HIV evolution in response to CTL pressure, mathematical models developed to explain this evolution, and highlight problems associated with the data and previous modeling efforts. We show that estimates of the strength of the epitope-specific CTL response depend on the method used to fit models to experimental data and on the assumptions made regarding how mutants are generated during infection. We illustrate that allowing CTL responses to decay over time may improve the fit to experimental data and provides higher estimates of the killing efficacy of HIV-specific CTLs. We also propose a novel method for simultaneously estimating the killing efficacy of multiple CTL populations specific for different epitopes of HIV using stochastic simulations. Lastly, we show that current estimates of the efficacy at which HIV-specific CTLs clear virus-infected cells can be improved by more frequent sampling of viral sequences and by combining data on sequence evolution with experimentally measured CTL dynamics.
Comparison of Pause Predictions of Two Sequence-Dependent Transcription Models
Two recent theoretical models, Bai et al. (2004, 2007) and Tadigotla et al. (2006), formulated thermodynamic explanations of sequence-dependent transcription pausing by RNA polymerase (RNAP). The two models differ in some basic assumptions and therefore make different yet overlapping predictions for pause locations, and different predictions on pause kinetics and mechanisms. Here we present a comprehensive comparison of the two models. We show that while they have comparable predictive power of pause locations at low NTP concentrations, the Bai et al. model is more accurate than Tadigotla et al. at higher NTP concentrations. Pausing kinetics predicted by Bai et al. is also consistent with time-course transcription reactions, while Tadigotla et al. is unsuited for this type of kinetic prediction. More importantly, the two models in general predict different pausing mechanisms even for the same pausing sites, and the Bai et al. model provides an explanation more consistent with recent single molecule observations.
Intrinsic dynamics of heart regulatory systems on short time-scales: from experiment to modelling
We discuss open problems related to the stochastic modeling of cardiac function. The work is based on an experimental investigation of the dynamics of heart rate variability (HRV) in the absence of respiratory perturbations. We consider first the cardiac control system on short time scales via an analysis of HRV within the framework of a random walk approach. Our experiments show that HRV on timescales of less than a minute takes the form of free diffusion, close to Brownian motion, which can be described as a non-stationary process with stationary increments. Secondly, we consider the inverse problem of modeling the state of the control system so as to reproduce the experimentally observed HRV statistics of. We discuss some simple toy models and identify open problems for the modelling of heart dynamics.
Molecular Spiders in One Dimension
Molecular spiders are synthetic bio-molecular systems which have "legs" made of short single-stranded segments of DNA. Spiders move on a surface covered with single-stranded DNA segments complementary to legs. Different mappings are established between various models of spiders and simple exclusion processes. For spiders with simple gait and varying number of legs we compute the diffusion coefficient; when the hopping is biased we also compute their velocity.
Dynamics of Microtubule Instabilities
We investigate an idealized model of microtubule dynamics that involves: (i) attachment of guanosine triphosphate (GTP) at rate λ, (ii) conversion of GTP to guanosine diphosphate (GDP) at rate 1, and (iii) detachment of GDP at rate μ. As a function of these rates, a microtubule can grow steadily or its length can fluctuate wildly. For μ = 0, we find the exact tubule and GTP cap length distributions, and power-law length distributions of GTP and GDP islands. For μ = ∞, we argue that the time between catastrophes, where the microtubule shrinks to zero length, scales as e(λ). We also discuss the nature of the phase boundary between a growing and shrinking microtubule.
Finding mesoscopic communities in sparse networks
We suggest a fast method for finding possibly overlapping network communities of a desired size and link density. Our method is a natural generalization of the finite-T superparamagnetic Potts clustering introduced by Blatt et al (1996 Phys. Rev. Lett.76 3251) and the annealing of the Potts model with a global antiferromagnetic term recently suggested by Reichard and Bornholdt (2004 Phys. Rev. Lett.93 21870). Like in both cited works, the proposed generalization is based on ordering of the ferromagnetic Potts model; the novelty of the proposed approach lies in the adjustable dependence of the antiferromagnetic term on the population of each Potts state, which interpolates between the two previously considered cases. This adjustability allows one to empirically tune the algorithm to detect the maximum number of communities of the given size and link density. We illustrate the method by detecting protein complexes in high-throughput protein binding networks.
Cartography of complex networks: modules and universal roles
Integrative approaches to the study of complex systems demand that one knows the manner in which the parts comprising the system are connected. The structure of the complex network defining the interactions provides insight into the function and evolution of the components of the system. Unfortunately, the large size and intricacy of these networks implies that such insight is usually difficult to extract. Here, we propose a method that allows one to systematically extract and display information contained in complex networks. Specifically, we demonstrate that one can (i) find modules in complex networks and (ii) classify nodes into universal roles according to their pattern of within- and between-module connections. The method thus yields a 'cartographic representation' of complex networks.
Viscous instabilities in flowing foams: a Cellular Potts Model approach
The Cellular Potts Model (CPM) successfully simulates drainage and shear in foams. Here we use the CPM to investigate instabilities due to the flow of a single large bubble in a dry, monodisperse two-dimensional flowing foam. As in experiments in a Hele-Shaw cell, above a threshold velocity the large bubble moves faster than the mean flow. Our simulations reproduce analytical and experimental predictions for the velocity threshold and the relative velocity of the large bubble, demonstrating the utility of the CPM in foam rheology studies.