NEURAL NETWORKS

Recovering Permuted Sequential Features for effective Reinforcement Learning
Jiang Y, Feng M, Zhou W and Li H
When applying Reinforcement Learning (RL) to the real-world visual tasks, two major challenges necessitate consideration: sample inefficiency and limited generalization. To address the above two challenges, previous works focus primarily on learning semantic information from the visual state for improving sample efficiency, but they do not explicitly learn other valuable aspects, such as spatial information. Moreover, they improve generalization by learning representations that are invariant to alterations of task-irrelevant variables, without considering task-relevant variables. To enhance sample efficiency and generalization of the base RL algorithm in visual tasks, we propose an auxiliary task called Recovering Permuted Sequential Features (RPSF). Our method enhances generalization by learning the spatial structure information of the agent, which can mitigate the effects of changes in both task-relevant and task-irrelevant variables. Moreover, it explicitly learns both semantic and spatial information from the visual state by disordering and subsequently recovering a sequence of features to generate more holistic representations, thereby improving sample efficiency. Extensive experiments demonstrate that our method significantly improves the sample efficiency and generalization of the base RL algorithm and outperforms various state-of-the-art baselines across diverse tasks in unseen environments. Furthermore, our method exhibits compatibility with both CNN and Transformer architectures.
ALR-HT: A fast and efficient Lasso regression without hyperparameter tuning
Wang Y, Zou B, Xu J, Xu C and Tang YY
Lasso regression, known for its efficacy in high-dimensional data analysis and feature selection, stands as a cornerstone in the realm of supervised learning for regression estimation. However, hyperparameter tuning for Lasso regression is often time-consuming and susceptible to noisy data in big data scenarios. In this paper we introduce a new additive Lasso regression without Hyperparameter Tuning (ALR-HT) by integrating Markov resampling with additive models. We estimate the generalization bounds of the proposed ALR-HT and establish the fast learning rate. The experimental results for benchmark datasets confirm that the proposed ALR-HT algorithm has better performance in terms of sampling and training total time, mean squared error (MSE) compared to other algorithms. We present some discussions on the ALR-HT algorithm and apply it to Ridge regression, to show its versatility and effectiveness in regularized regression scenarios.
TSOM: Small object motion detection neural network inspired by avian visual circuit
Hu P, Zhang X, Li M, Zhu Y and Shi L
Detecting small moving objects in complex backgrounds from an overhead perspective is a highly challenging task for machine vision systems. As an inspiration from nature, the avian visual system is capable of processing motion information in various complex aerial scenes, and the Retina-OT-Rt visual circuit of birds is highly sensitive to capturing the motion information of small objects from high altitudes. However, more needs to be done on small object motion detection algorithms based on the avian visual system. In this paper, we conducted mathematical description based on extensive studies of the biological mechanisms of the Retina-OT-Rt visual circuit. Based on this, we proposed a novel tectum small object motion detection neural network (TSOM). The TSOM neural network includes the retina, SGC dendritic, SGC Soma, and Rt layers, each corresponding to neurons in the visual pathway for precise topographic projection, spatial-temporal encoding, motion feature selection, and multi-directional motion integration. Extensive experiments on pigeon neurophysiological experiments and image sequence data showed that the TSOM is biologically interpretable and effective in extracting reliable small object motion features from complex high-altitude backgrounds.
A protocol for trustworthy EEG decoding with neural networks
Borra D, Magosso E and Ravanelli M
Deep learning solutions have rapidly emerged for EEG decoding, achieving state-of-the-art performance on a variety of decoding tasks. Despite their high performance, existing solutions do not fully address the challenge posed by the introduction of many hyperparameters, defining data pre-processing, network architecture, network training, and data augmentation. Automatic hyperparameter search is rarely performed and limited to network-related hyperparameters. Moreover, pipelines are highly sensitive to performance fluctuations due to random initialization, hindering their reliability. Here, we design a comprehensive protocol for EEG decoding that explores the hyperparameters characterizing the entire pipeline and that includes multi-seed initialization for providing robust performance estimates. Our protocol is validated on 9 datasets about motor imagery, P300, SSVEP, including 204 participants and 26 recording sessions, and on different deep learning models. We accompany our protocol with extensive experiments on the main aspects influencing it, such as the number of participants used for hyperparameter search, the split into sequential simpler searches (multi-step search), the use of informed vs. non-informed search algorithms, and the number of random seeds for obtaining stable performance. The best protocol included 2-step hyperparameter search via an informed search algorithm, with the final training and evaluation performed using 10 random initializations. The optimal trade-off between performance and computational time was achieved by using a subset of 3-5 participants for hyperparameter search. Our protocol consistently outperformed baseline state-of-the-art pipelines, widely across datasets and models, and could represent a standard approach for neuroscientists for decoding EEG in a trustworthy and reliable way.
GradToken: Decoupling tokens with class-aware gradient for visual explanation of Transformer network
Cheng L, Liang Y, Lu Y and Cheung YM
Transformer networks have been widely used in the fields of computer vision, natural language processing, graph-structured data analysis, etc. Subsequently, explanations of Transformer play a key role in helping humans understand and analyze its decision-making and working mechanism, thereby improving the trustworthiness in its real-world applications. However, it is difficult to apply the existing explanation methods for convolutional neural networks to Transformer networks, due to the significant differences between their structures. How to design a specific and effective explanation method for Transformer poses a challenge in the explanation area. To address this challenge, we first analyze the semantic coupling problem of attention weight matrices in Transformer, which puts obstacles in providing distinctive explanations for different categories of targets. Then, we propose a gradient-decoupling-based token relevance method (i.e., GradToken) for the visual explanation of Transformer's predictions. GradToken exploits the class-aware gradient to decouple the tangled semantics in the class token to the semantics corresponding to each category. GradToken further leverages the relations between the class token and spatial tokens to generate relevance maps. As a result, the visual explanation results generated by GradToken can effectively focus on the regions of selected targets. Extensive quantitative and qualitative experiments are conducted to verify the validity and reliability of the proposed method.
Binary classification from N-Tuple Comparisons data
Li J, Huang S, Hua C and Yang Y
Pairwise comparison classification (Pcomp) is a recently thriving weakly-supervised method that generates a binary classifier based on feedback information from comparisons between unlabeled data pairs (one is more likely to be positive than the other). However, this approach turns out challenging in more complex scenarios involving comparisons among more than two instances. To overcome this problem, this paper starts with a comprehensive exploration of the triplet comparisons data (the first instance is more likely to be positive than the second instance, and the second instance is more likely to be positive than the third instance). Then the problem is extended to investigate N-Tuple comparisons learning (NT-Comp: the confidence of belonging to the positive class from the first instance to the last instance is in descending order, with the first instance being the biggest). This generalized model accommodates not only pairwise comparisons data but also more than two comparisons data. This paper derives an unbiased risk estimator for N-Tuple comparisons learning. The estimation error bound is also established theoretically. Finally, an experiment is conducted to validate the effectiveness of the proposed method.
Weakly supervised label learning flows
Lu Y, Song W, Arachie C and Huang B
Supervised learning usually requires a large amount of labeled data. However, attaining ground-truth labels is costly for many tasks. Alternatively, weakly supervised methods learn with cheap weak signals that only approximately label some data. Many existing weakly supervised learning methods learn a deterministic function that estimates labels given the input data and weak signals. In this paper, we develop label learning flows (LLF), a general framework for weakly supervised learning problems. Our method is a generative model based on normalizing flows. The main idea of LLF is to optimize the conditional likelihoods of all possible labelings of the data within a constrained space defined by weak signals. We develop a training method for LLF that trains the conditional flow inversely and avoids estimating the labels. Once a model is trained, we can make predictions with a sampling algorithm. We apply LLF to three weakly supervised learning problems. Experiment results show that our method outperforms many baselines we compare against.
Barrier-critic-disturbance approximate optimal control of nonzero-sum differential games for modular robot manipulators
Dong B, Zhu X, An T, Jiang H and Ma B
In this paper, for addressing the safe control problem of modular robot manipulators (MRMs) system with uncertain disturbances, an approximate optimal control scheme of nonzero-sum (NZS) differential games is proposed based on the control barrier function (CBF). The dynamic model of the manipulator system integrates joint subsystems through the utilization of joint torque feedback (JTF) technique, incorporating interconnected dynamic coupling (IDC) effects. By integrating the cost functions relevant to each player with the CBF, the evolution of system states is ensured to remain within the safe region. Subsequently, the optimal tracking control problem for the MRM system is reformulated as an NZS game involving multiple joint subsystems. Based on the adaptive dynamic programming (ADP) algorithm, a cost function approximator for solving Hamilton-Jacobi (HJ) equation using only critic neural networks (NN) is established, which promotes the feasible derivation of the approximate optimal control strategy. The Lyapunov theory is utilized to demonstrate that the tracking error is uniformly ultimately bounded (UUB). Utilizing the CBF's state constraint mechanism prevents the robot from deviating from the safe region, and the application of the NZS game approach ensures that the subsystems of the MRM reach a Nash equilibrium. The proposed control method effectively addresses the problem of safe and approximate optimal control of MRM system under uncertain disturbances. Finally, the effectiveness and superiority of the proposed method are verified through simulations and experiments.
A unified noise and watermark removal from information bottleneck-based modeling
Huang H and Pao HK
Both image denoising and watermark removal aim to restore a clean image from an observed noisy or watermarked one. The past research consists of the non-learning type with limited effectiveness or the learning types with limited interpretability. To address these issues simultaneously, we propose a method to deal with both the image-denoising and watermark removal tasks in a unified approach. The noises and watermarks are both considered to have different nuisance patterns from the original image content, therefore should be detected by robust image analysis. The unified detection method is based on the well-known information bottleneck (IB) theory and the proposed SIB-GAN where image content and nuisance patterns are well separated by a supervised approach. The IB theory guides us to keep the valuable content such as the original image by a controlled compression on the input (the noisy or watermark-included image) and then only the content without the nuisances can go through the network for effective noise or watermark removal. Additionally, we adjust the compression parameter in IB theory to learn a representation that approaches the minimal sufficient representation of the image content. In particular, to deal with the non-blind noises, an appropriate amount of compression can be estimated from the solid theory foundation. Working on the denoising task given the unseen data with blind noises also shows the model's generalization power. All of the above shows the interpretability of the proposed method. Overall, the proposed method has achieved promising results across three tasks: image denoising, watermark removal, and mixed noise and watermark removal, obtaining resultant images very close to the original image content and owning superior performance to almost all state-of-the-art approaches that deal with the same tasks.
A bio-inspired visual collision detection network integrated with dynamic temporal variance feedback regulated by scalable functional countering jitter streaming
Chang Z, Chen H, Hua M, Fu Q and Peng J
In pursuing artificial intelligence for efficient collision avoidance in robots, researchers draw inspiration from the locust's visual looming-sensitive neural circuit to establish an efficient neural network for collision detection. However, existing bio-inspired collision detection neural networks encounter challenges posed by jitter streaming, a phenomenon commonly experienced, for example, when a ground robot moves across uneven terrain. Visual inputs from jitter streaming induce significant fluctuations in grey values, distracting existing bio-inspired networks from extracting visually looming features. To overcome this limitation, we derive inspiration from the potential of feedback loops to enable the brain to generate a coherent visual perception. We introduce a novel dynamic temporal variance feedback loop regulated by scalable functional into the traditional bio-inspired collision detection neural network. This feedback mechanism extracts dynamic temporal variance information from the output of higher-order neurons in the conventional network to assess the fluctuation level of local neural responses and regulate it by a scalable functional to differentiate variance induced by incoherent visual input. Then the regulated signal is reintegrated into the input through negative feedback loop to reduce the incoherence of the signal within the network. Numerical experiments substantiate the effectiveness of the proposed feedback loop in promoting collision detection against jitter streaming. This study extends the capabilities of bio-inspired collision detection neural networks to address jitter streaming challenges, offering a novel insight into the potential of feedback mechanisms in enhancing visual neural abilities.
Dopamine-induced relaxation of spike synchrony diversifies burst patterns in cultured hippocampal networks
Hoang H, Matsumoto N, Miyano M, Ikegaya Y and Cortese A
The intricate interplay of neurotransmitters orchestrates a symphony of neural activity in the hippocampus, with dopamine emerging as a key conductor in this complex ensemble. Despite numerous studies uncovering the cellular mechanisms of dopamine, its influence on hippocampal neural networks remains elusive. Combining in vitro electrophysiological recordings of rat embryonic hippocampal neurons, pharmacological interventions, and computational analyses of spike trains, we found that dopamine induces a relaxation in network synchrony. This relaxation expands the repertoire of burst dynamics within these hippocampal networks, a phenomenon notably absent under the administration of dopamine antagonists. Our study provides a thorough understanding of how dopamine signaling influences the formation of functional networks among hippocampal neurons.
Corrigendum to "Hydra: Multi-head Low-rank Adaptation for Parameter Efficient Fine-tuning" [Neural Networks Volume 178, October (2024), 1-11/106414]]
Kim S, Yang H, Kim Y, Hong Y and Park E
Coordinating Multi-Agent Reinforcement Learning via Dual Collaborative Constraints
Li C, Dong S, Yang S, Hu Y, Li W and Gao Y
Many real-world multi-agent tasks exhibit a nearly decomposable structure, where interactions among agents within the same interaction set are strong while interactions between different sets are relatively weak. Efficiently modeling the nearly decomposable structure and leveraging it to coordinate agents can enhance the learning efficiency of multi-agent reinforcement learning algorithms for cooperative tasks, while existing works typically fail. To overcome this limitation, this paper proposes a novel algorithm named Dual Collaborative Constraints (DCC) that identifies the interaction sets as subtasks and achieves both intra-subtask and inter-subtask coordination. Specifically, DCC employs a bi-level structure to periodically distribute agents into multiple subtasks, and proposes both local and global collaborative constraints based on mutual information to facilitate both intra-subtask and inter-subtask coordination among agents. These two constraints ensure that agents within the same subtask reach a consensus on their local action selections and all of them select superior joint actions that maximize the overall task performance. Experimentally, we evaluate DCC on various cooperative multi-agent tasks, and its superior performance against multiple state-of-the-art baselines demonstrates its effectiveness.
UMS-ODNet: Unified-scale domain adaptation mechanism driven object detection network with multi-scale attention
Li Y, Zhang Y, Yang C and Chen Y
Unsupervised domain adaptation techniques improve the generalization capability and performance of detectors, especially when the source and target domains have different distributions. Compared with two-stage detectors, one-stage detectors (especially YOLO series) provide better real-time capabilities and become primary choices in industrial fields. In this paper, to improve cross-domain object detection performance, we propose a Unified-Scale Domain Adaptation Mechanism Driven Object Detection Network with Multi-Scale Attention (UMS-ODNet). UMS-ODNet chooses YOLOv6 as the basic framework in terms of its balance between efficiency and accuracy. UMS-ODNet considers the adaptation consistency across different scale feature maps, which tends to be ignored by existing methods. A unified-scale domain adaptation mechanism is designed to fully utilize and unify the discriminative information from different scales. A multi-scale attention module is constructed to further improve the multi-scale representation ability of features. A novel loss function is created to maintain the consistency of multi-scale information by considering the homology of the descriptions from the same latent feature. Multiply experiments are conducted on four widely used datasets. Our proposed method outperforms other state-of-the-art techniques, illustrating the feasibility and effectiveness of the proposed UMS-ODNet.
Partially multi-view clustering via re-alignment
Yan W, Zhu J, Chen J, Cheng H, Bai S, Duan L and Zheng Q
Multi-view clustering learns consistent information from multi-view data, aiming to achieve more significant clustering characteristics. However, data in real-world scenarios often exhibit temporal or spatial asynchrony, leading to views with unaligned instances. Existing methods primarily address this issue by learning transformation matrices to align unaligned instances, but this process of learning differentiable transformation matrices is cumbersome. To address the challenge of partially unaligned instances, we propose Partially Multi-view Clustering via Re-alignment (PMVCR). Our approach integrates representation learning and data alignment through a two-stage training and a re-alignment process. Specifically, our training process consists of three stages: (i) In the coarse-grained alignment stage, we construct negative instance pairs for unaligned instances and utilize contrastive learning to preliminarily learn the view representations of the instances. (ii) In the re-alignment stage, we match unaligned instances based on the similarity of their view representations, aligning them with the primary view. (iii) In the fine-grained alignment stage, we further enhance the discriminative power of the view representations and the model's ability to differentiate between clusters. Compared to existing models, our method effectively leverages information between unaligned samples and enhances model generalization by constructing negative instance pairs. Clustering experiments on several popular multi-view datasets demonstrate the effectiveness and superiority of our method. Our code is publicly available at https://github.com/WenB777/PMVCR.git.
Outer synchronization and outer H synchronization for coupled fractional-order reaction-diffusion neural networks with multiweights
Wang JL, Wang SY, Zhu YR and Huang T
This paper introduces multiple state or spatial-diffusion coupled fractional-order reaction-diffusion neural networks, and discusses the outer synchronization and outer H synchronization problems for these coupled fractional-order reaction-diffusion neural networks (CFRNNs). The Lyapunov functional method, Laplace transform and inequality techniques are utilized to obtain some outer synchronization conditions for CFRNNs. Moreover, some criteria are also provided to make sure the outer H synchronization of CFRNNs. Finally, the derived outer and outer H synchronization conditions are validated on the basis of two numerical examples.
Complexities of feature-based learning systems, with application to reservoir computing
Yasumoto H and Tanaka T
This paper studies complexity measures of reservoir systems. For this purpose, a more general model that we call a feature-based learning system, which is the composition of a feature map and of a final estimator, is studied. We study complexity measures such as growth function, VC-dimension, pseudo-dimension and Rademacher complexity. On the basis of the results, we discuss how the unadjustability of reservoirs and the linearity of readouts can affect complexity measures of the reservoir systems. Furthermore, some of the results generalize or improve the existing results.
Evolutionary architecture search for generative adversarial networks using an aging mechanism-based strategy
Man W, Xu L and He C
Generative Adversarial Networks (GANs) have emerged as a key technology in artificial intelligence, especially in image generation. However, traditionally hand-designed GAN architectures often face significant training stability challenges, which are effectively addressed by our Evolutionary Neural Architecture Search (ENAS) algorithm for GANs, named EAMGAN. This one-shot model automates the design of GAN architectures and employs an Operation Importance Metric (OIM) to enhance training stability. It also incorporates an aging mechanism to optimize the selection process during architecture search. Additionally, the use of a non-dominated sorting algorithm ensures the generation of Pareto-optimal solutions, promoting diversity and preventing premature convergence. We evaluated our method on benchmark datasets, and the results demonstrate that EAMGAN is highly competitive in terms of efficiency and performance. Our method identified an architecture achieving Inception Scores (IS) of 8.83±0.13 and Fréchet Inception Distance (FID) of 9.55 on CIFAR-10 with only 0.66 GPU days. Results on the STL-10, CIFAR-100, and ImageNet32 datasets further demonstrate the robust portability of our architecture.
MIU-Net: Advanced multi-scale feature extraction and imbalance mitigation for optic disc segmentation
Xiao Y, Shao Y, Chen Z, Zhang R, Ding X, Zhao J, Liu S, Fukuyama T, Zhao Y, Peng X, Tian G, Wen S and Zhou X
Pathological myopia is a severe eye condition that can cause serious complications like retinal detachment and macular degeneration, posing a threat to vision. Optic disc segmentation helps measure changes in the optic disc and observe the surrounding retina, aiding early detection of pathological myopia. However, these changes make segmentation difficult, resulting in accuracy levels that are not suitable for clinical use. To address this, we propose a new model called MIU-Net, which improves segmentation performance through several innovations. First, we introduce a multi-scale feature extraction (MFE) module to capture features at different scales, helping the model better identify optic disc boundaries in complex images. Second, we design a dual attention module that combines channel and spatial attention to focus on important features and improve feature use. To tackle the imbalance between optic disc and background pixels, we use focal loss to enhance the model's ability to detect minority optic disc pixels. We also apply data augmentation techniques to increase data diversity and address the lack of training data. Our model was tested on the iChallenge-PM and iChallenge-AMD datasets, showing clear improvements in accuracy and robustness compared to existing methods. The experimental results demonstrate the effectiveness and potential of our model in diagnosing pathological myopia and other medical image processing tasks.
FedART: A neural model integrating federated learning and adaptive resonance theory
Pateria S, Subagdja B and Tan AH
Federated Learning (FL) has emerged as a promising paradigm for collaborative model training across distributed clients while preserving data privacy. However, prevailing FL approaches aggregate the clients' local models into a global model through multi-round iterative parameter averaging. This leads to the undesirable bias of the aggregated model towards certain clients in the presence of heterogeneous data distributions among the clients. Moreover, such approaches are restricted to supervised classification tasks and do not support unsupervised clustering. To address these limitations, we propose a novel one-shot FL approach called Federated Adaptive Resonance Theory (FedART) which leverages self-organizing Adaptive Resonance Theory (ART) models to learn category codes, where each code represents a cluster of similar data samples. In FedART, the clients learn to associate their private data with various local category codes. Under heterogeneity, the local codes across different clients represent heterogeneous data. In turn, a global model takes these local codes as inputs and aggregates them into global category codes, wherein heterogeneous client data is indirectly represented by distinctly encoded global codes, in contrast to the averaging out of parameters in the existing approaches. This enables the learned global model to handle heterogeneous data. In addition, FedART employs a universal learning mechanism to support both federated classification and clustering tasks. Our experiments conducted on various federated classification and clustering tasks show that FedART consistently outperforms state-of-the-art FL methods on data with heterogeneous distribution across clients.
FairDRO: Group fairness regularization via classwise robust optimization
Park T, Jung S, Chun S and Moon T
Existing group fairness-aware training methods fall into two categories: re-weighting underrepresented groups according to certain rules, or using regularization terms such as smoothed approximations of fairness metrics or surrogate statistical quantities. While each category has its own strength in applicability or performance when compared to each other, their successful performances are typically limited to specific cases. To that end, we propose a new approach called FairDRO, which takes advantage of both categories through a classwise group distributionally robust optimization (DRO) framework. Our method unifies re-weighting and regularization by incorporating a well-justified group fairness metric into the objective as regularization, but solving it through a principled re-weighting strategy. To optimize our resulting objective efficiently, we adopt an iterative algorithm and consequently develop two variants of FairDRO algorithm depending on the choice of surrogate loss. For in-depth understanding, we derive three theoretical results: (i) a closed-form solution for the correct re-weights; (ii) justifications for using the surrogate losses; and (iii) a convergence analysis of our method. Experimental results show that our algorithms consistently achieve state-of-the-art performance in accuracy-fairness trade-offs across multiple benchmarks, demonstrating scalability and broad applicability compared to existing methods.