Hierarchical clustering optimizes the tradeoff between compositionality and expressivity of task structures for flexible reinforcement learning
A hallmark of human intelligence, but challenging for reinforcement learning (RL) agents, is the ability to compositionally generalise, that is, to recompose familiar knowledge components in novel ways to solve new problems. For instance, when navigating in a city, one needs to know the location of the destination and how to operate a vehicle to get there, whether it be pedalling a bike or operating a car. In RL, these correspond to the reward function and transition function, respectively. To compositionally generalize, these two components need to be transferable independently of each other: multiple modes of transport can reach the same goal, and any given mode can be used to reach multiple destinations. Yet there are also instances where it can be helpful to learn and transfer entire structures, jointly representing goals and transitions, particularly whenever these recur in natural tasks (e.g., given a suggestion to get ice cream, one might prefer to bike, even in new towns). Prior theoretical work has explored how, in model-based RL, agents can learn and generalize task components (transition and reward functions). But a satisfactory account for how a single agent can simultaneously satisfy the two competing demands is still lacking. Here, we propose a hierarchical RL agent that learns and transfers individual task components as well as entire structures (particular compositions of components) by inferring both through a non-parametric Bayesian model of the task. It maintains a factorised representation of task components through a hierarchical Dirichlet process, but it also represents different possible covariances between these components through a standard Dirichlet process. We validate our approach on a variety of navigation tasks covering a wide range of statistical correlations between task components and show that it can also improve generalisation and transfer in more complex, hierarchical tasks with goal/subgoal structures. Finally, we end with a discussion of our work including how this clustering algorithm could conceivably be implemented by cortico-striatal gating circuits in the brain.
Rule-Enhanced Active Learning for Semi-Automated Weak Supervision
A major bottleneck preventing the extension of deep learning systems to new domains is the prohibitive cost of acquiring sufficient training labels. Alternatives such as weak supervision, active learning, and fine-tuning of pretrained models reduce this burden but require substantial human input to select a highly informative subset of instances or to curate labeling functions. REGAL (Rule-Enhanced Generative Active Learning) is an improved framework for weakly supervised text classification that performs active learning over labeling functions rather than individual instances. REGAL interactively creates high-quality labeling patterns from raw text, enabling a single annotator to accurately label an entire dataset after initialization with three keywords for each class. Experiments demonstrate that REGAL extracts up to 3 times as many high-accuracy labeling functions from text as current state-of-the-art methods for interactive weak supervision, enabling REGAL to dramatically reduce the annotation burden of writing labeling functions for weak supervision. Statistical analysis reveals REGAL performs equal or significantly better than interactive weak supervision for five of six commonly used natural language processing (NLP) baseline datasets.
Learning in the Machine: Random Backpropagation and the Deep Learning Channel
Random backpropagation (RBP) is a variant of the backpropagation algorithm for training neural networks, where the transpose of the forward matrices are replaced by fixed random matrices in the calculation of the weight updates. It is remarkable both because of its effectiveness, in spite of using random matrices to communicate error information, and because it completely removes the taxing requirement of maintaining symmetric weights in a physical neural system. To better understand random backpropagation, we first connect it to the notions of local learning and learning channels. Through this connection, we derive several alternatives to RBP, including skipped RBP (SRPB), adaptive RBP (ARBP), sparse RBP, and their combinations (e.g. ASRBP) and analyze their computational complexity. We then study their behavior through simulations using the MNIST and CIFAR-10 bechnmark datasets. These simulations show that most of these variants work robustly, almost as well as backpropagation, and that multiplication by the derivatives of the activation functions is important. As a follow-up, we study also the low-end of the number of bits required to communicate error information over the learning channel. We then provide partial intuitive explanations for some of the remarkable properties of RBP and its variations. Finally, we prove several mathematical results, including the convergence to fixed points of linear chains of arbitrary length, the convergence to fixed points of linear autoencoders with decorrelated data, the long-term existence of solutions for linear systems with a single hidden layer and convergence in special cases, and the convergence to fixed points of non-linear chains, when the derivative of the activation functions is included.
Methods for solving reasoning problems in abstract argumentation - A survey
Within the last decade, abstract argumentation has emerged as a central field in Artificial Intelligence. Besides providing a core formalism for many advanced argumentation systems, abstract argumentation has also served to capture several non-monotonic logics and other AI related principles. Although the idea of abstract argumentation is appealingly simple, several reasoning problems in this formalism exhibit high computational complexity. This calls for advanced techniques when it comes to implementation issues, a challenge which has been recently faced from different angles. In this survey, we give an overview on different methods for solving reasoning problems in abstract argumentation and compare their particular features. Moreover, we highlight available state-of-the-art systems for abstract argumentation, which put these methods to practice.
Modeling the Complex Dynamics and Changing Correlations of Epileptic Events
Patients with epilepsy can manifest short, sub-clinical epileptic "bursts" in addition to full-blown clinical seizures. We believe the relationship between these two classes of events-something not previously studied quantitatively-could yield important insights into the nature and intrinsic dynamics of seizures. A goal of our work is to parse these complex epileptic events into distinct dynamic regimes. A challenge posed by the intracranial EEG (iEEG) data we study is the fact that the number and placement of electrodes can vary between patients. We develop a Bayesian nonparametric Markov switching process that allows for (i) shared dynamic regimes between a variable number of channels, (ii) asynchronous regime-switching, and (iii) an unknown dictionary of dynamic regimes. We encode a sparse and changing set of dependencies between the channels using a Markov-switching Gaussian graphical model for the innovations process driving the channel dynamics and demonstrate the importance of this model in parsing and out-of-sample predictions of iEEG data. We show that our model produces intuitive state assignments that can help automate clinical analysis of seizures and enable the comparison of sub-clinical bursts and full clinical seizures.
The Dropout Learning Algorithm
Dropout is a recently introduced algorithm for training neural network by randomly dropping units during training to prevent their co-adaptation. A mathematical analysis of some of the static and dynamic properties of dropout is provided using Bernoulli gating variables, general enough to accommodate dropout on units or connections, and with variable rates. The framework allows a complete analysis of the ensemble averaging properties of dropout in linear networks, which is useful to understand the non-linear case. The ensemble averaging properties of dropout in non-linear logistic networks result from three fundamental equations: (1) the approximation of the expectations of logistic functions by normalized geometric means, for which bounds and estimates are derived; (2) the algebraic equality between normalized geometric means of logistic functions with the logistic of the means, which mathematically characterizes logistic functions; and (3) the linearity of the means with respect to sums, as well as products of independent variables. The results are also extended to other classes of transfer functions, including rectified linear functions. Approximation errors tend to cancel each other and do not accumulate. Dropout can also be connected to stochastic neurons and used to predict firing rates, and to backpropagation by viewing the backward propagation as ensemble averaging in a dropout linear network. Moreover, the convergence properties of dropout can be understood in terms of stochastic gradient descent. Finally, for the regularization properties of dropout, the expectation of the dropout gradient is the gradient of the corresponding approximation ensemble, regularized by an adaptive weight decay term with a propensity for self-consistent variance minimization and sparse representations.
Using Wikipedia to learn semantic feature representations of concrete concepts in neuroimaging experiments
In this paper we show that a corpus of a few thousand Wikipedia articles about concrete or visualizable concepts can be used to produce a low-dimensional semantic feature representation of those concepts. The purpose of such a representation is to serve as a model of the mental context of a subject during functional magnetic resonance imaging (fMRI) experiments. A recent study [19] showed that it was possible to predict fMRI data acquired while subjects thought about a concrete concept, given a representation of those concepts in terms of semantic features obtained with human supervision. We use topic models on our corpus to learn semantic features from text in an unsupervised manner, and show that those features can outperform those in [19] in demanding 12-way and 60-way classification tasks. We also show that these features can be used to uncover similarity relations in brain activation for different concepts which parallel those relations in behavioral data from human subjects.
The Local Geometry of Multiattribute Tradeoff Preferences
Existing representations for multiattribute ceteris paribus preference statements have provided useful treatments and clear semantics for qualitative comparisons, but have not provided similarly clear representations or semantics for comparisons involving quantitative tradeoffs. We use directional derivatives and other concepts from elementary differential geometry to interpret conditional multiattribute ceteris paribus preference comparisons that state bounds on quantitative tradeoff ratios. This semantics extends the familiar economic notion of marginal rate of substitution to multiple continuous or discrete attributes. The same geometric concepts also provide means for interpreting statements about the relative importance of different attributes.
Updating action domain descriptions
Incorporating new information into a knowledge base is an important problem which has been widely investigated. In this paper, we study this problem in a formal framework for reasoning about actions and change. In this framework, action domains are described in an action language whose semantics is based on the notion of causality. Unlike the formalisms considered in the related work, this language allows straightforward representation of non-deterministic effects and indirect effects of (possibly concurrent) actions, as well as state constraints; therefore, the updates can be more general than elementary statements. The expressivity of this formalism allows us to study the update of an action domain description with a more general approach compared to related work. First of all, we consider the update of an action description with respect to further criteria, for instance, by ensuring that the updated description entails some observations, assertions, or general domain properties that constitute further constraints that are not expressible in an action description in general. Moreover, our framework allows us to discriminate amongst alternative updates of action domain descriptions and to single out a most preferable one, based on a given preference relation possibly dependent on the specified criteria. We study semantic and computational aspects of the update problem, and establish basic properties of updates as well as a decomposition theorem that gives rise to a divide and conquer approach to updating action descriptions under certain conditions. Furthermore, we study the computational complexity of decision problems around computing solutions, both for the generic setting and for two particular preference relations, viz. set-inclusion and weight-based preference. While deciding the existence of solutions and recognizing solutions are PSPACE-complete problems in general, the problems fall back into the polynomial hierarchy under restrictions on the additional constraints. We finally discuss methods to compute solutions and approximate solutions (which disregard preference). Our results provide a semantic and computational basis for developing systems that incorporate new information into action domain descriptions in an action language, in the presence of additional constraints.
A comparative runtime analysis of heuristic algorithms for satisfiability problems
The satisfiability problem is a basic core NP-complete problem. In recent years, a lot of heuristic algorithms have been developed to solve this problem, and many experiments have evaluated and compared the performance of different heuristic algorithms. However, rigorous theoretical analysis and comparison are rare. This paper analyzes and compares the expected runtime of three basic heuristic algorithms: RandomWalk, (1+1) EA, and hybrid algorithm. The runtime analysis of these heuristic algorithms on two 2-SAT instances shows that the expected runtime of these heuristic algorithms can be exponential time or polynomial time. Furthermore, these heuristic algorithms have their own advantages and disadvantages in solving different SAT instances. It also demonstrates that the expected runtime upper bound of RandomWalk on arbitrary k-SAT(k >/= 3) is O((k - 1)(n)), and presents a k-SAT instance that has Theta((k - 1)(n)) expected runtime bound.