Protein structure prediction with energy minimization and deep learning approaches
In this paper we discuss the advantages and problems of two alternatives for ab initio protein structure prediction. On one hand, recent approaches based on deep learning, which have significantly improved prediction results for a wide variety of proteins, are discussed. On the other hand, methods based on protein conformational energy minimization and with different search strategies are analyzed. In this latter case, our methods based on a memetic combination between differential evolution and the fragment replacement technique are included, incorporating also the possibility of niching in the evolutionary search. Different proteins have been used to analyze the pros and cons in both approaches, proposing possibilities of integration of both alternatives.
Computational graph pangenomics: a tutorial on data structures and their applications
Computational pangenomics is an emerging research field that is changing the way computer scientists are facing challenges in biological sequence analysis. In past decades, contributions from combinatorics, stringology, graph theory and data structures were essential in the development of a plethora of software tools for the analysis of the human genome. These tools allowed computational biologists to approach ambitious projects at population scale, such as the 1000 Genomes Project. A major contribution of the 1000 Genomes Project is the characterization of a broad spectrum of genetic variations in the human genome, including the discovery of novel variations in the South Asian, African and European populations-thus enhancing the catalogue of variability within the reference genome. Currently, the need to take into account the high variability in population genomes as well as the specificity of an individual genome in a personalized approach to medicine is rapidly pushing the abandonment of the traditional paradigm of using a single reference genome. A graph-based representation of multiple genomes, or , is replacing the linear reference genome. This means completely rethinking well-established procedures to analyze, store, and access information from genome representations. Properly addressing these challenges is crucial to face the computational tasks of ambitious healthcare projects aiming to characterize human diversity by sequencing 1M individuals (Stark et al. 2019). This tutorial aims to introduce readers to the most recent advances in the theory of data structures for the representation of graph pangenomes. We discuss efficient representations of and the variability of in graph pangenomes, and highlight applications in solving computational problems in human and microbial (viral) pangenomes.
Counter machines and crystallographic structures
One way to depict a crystallographic structure is by a periodic (di)graph, i.e., a graph whose group of automorphisms has a translational subgroup of finite index acting freely on the structure. We establish a relationship between periodic graphs representing crystallographic structures and an infinite hierarchy of intersection languages 𝒟𝒞ℒ , = 0, 1, 2, …, within the intersection classes of deterministic context-free languages. We introduce a class of counter machines that accept these languages, where the machines with counters recognize the class 𝒟𝒞ℒ An intersection of languages in 𝒟𝒞ℒ defines 𝒟𝒞ℒ . We prove that there is a one-to-one correspondence between sets of walks starting and ending in the same unit of a -dimensional periodic (di)graph and the class of languages in 𝒟𝒞ℒ . The proof uses the following result: given a digraph Δ and a group , there is a unique digraph Γ such that ≤ Aut Γ, acts freely on the structure, and Γ/ ≅ Δ.
The importance of thermodynamics for molecular systems, and the importance of molecular systems for thermodynamics
Improved understanding of molecular systems has only emphasised the sophistication of networks within the cell. Simultaneously, the advance of nucleic acid nanotechnology, a platform within which reactions can be exquisitely controlled, has made the development of artificial architectures and devices possible. Vital to this progress has been a solid foundation in the thermodynamics of molecular systems. In this pedagogical review and perspective, we discuss how thermodynamics determines both the overall potential of molecular networks, and the minute details of design. We then argue that, in turn, the need to understand molecular systems is helping to drive the development of theories of thermodynamics at the microscopic scale.
Chemical reaction network designs for asynchronous logic circuits
Chemical reaction networks (CRNs) are a versatile language for describing the dynamical behaviour of chemical kinetics, capable of modelling a variety of digital and analogue processes. While CRN designs for synchronous sequential logic circuits have been proposed and their implementation in DNA demonstrated, a physical realisation of these devices is difficult because of their reliance on a clock. Asynchronous sequential logic, on the other hand, does not require a clock, and instead relies on handshaking protocols to ensure the temporal ordering of different phases of the computation. This paper provides novel CRN designs for the construction of asynchronous logic, arithmetic and control flow elements based on a bi-molecular reaction motif with catalytic reactions and uniform reaction rates. We model and validate the designs for the deterministic and stochastic semantics using Microsoft's GEC tool and the probabilistic model checker PRISM, demonstrating their ability to emulate the function of asynchronous components under low molecular count.
Programming discrete distributions with chemical reaction networks
We explore the range of probabilistic behaviours that can be engineered with Chemical Reaction Networks (CRNs). We give methods to "program" CRNs so that their steady state is chosen from some desired target distribution that has finite support in [Formula: see text], with [Formula: see text]. Moreover, any distribution with countable infinite support can be approximated with arbitrarily small error under the [Formula: see text] norm. We also give optimized schemes for special distributions, including the uniform distribution. Finally, we formulate a calculus to compute on distributions that is complete for finite support distributions, and can be compiled to a restricted class of CRNs that at steady state realize those distributions.
Petri-net-based 2D design of DNA walker circuits
We consider localised DNA computation, where a DNA strand walks along a binary decision graph to compute a binary function. One of the challenges for the design of reliable walker circuits consists in leakage transitions, which occur when a walker jumps into another branch of the decision graph. We automatically identify leakage transitions, which allows for a detailed qualitative and quantitative assessment of circuit designs, design comparison, and design optimisation. The ability to identify leakage transitions is an important step in the process of optimising DNA circuit layouts where the aim is to minimise the computational error inherent in a circuit while minimising the area of the circuit. Our 2D modelling approach of DNA walker circuits relies on coloured stochastic Petri nets which enable functionality, topology and dimensionality all to be integrated in one two-dimensional model. Our modelling and analysis approach can be easily extended to 3-dimensional walker systems.
Modeling biological gradient formation: combining partial differential equations and Petri nets
Both Petri nets and differential equations are important modeling tools for biological processes. In this paper we demonstrate how these two modeling techniques can be combined to describe biological gradient formation. Parameters derived from partial differential equation describing the process of gradient formation are incorporated in an abstract Petri net model. The quantitative aspects of the resulting model are validated through a case study of gradient formation in the fruit fly.
Computing with biological switches and clocks
The complex dynamics of biological systems is primarily driven by molecular interactions that underpin the regulatory networks of cells. These networks typically contain positive and negative feedback loops, which are responsible for switch-like and oscillatory dynamics, respectively. Many computing systems rely on switches and clocks as computational modules. While the combination of such modules in biological systems leads to a variety of dynamical behaviours, it is also driving development of new computing algorithms. Here we present a historical perspective on computation by biological systems, with a focus on switches and clocks, and discuss parallels between biology and computing. We also outline our vision for the future of biological computing.
Scaling up genetic circuit design for cellular computing: advances and prospects
Synthetic biology aims to engineer and redesign biological systems for useful real-world applications in biomanufacturing, biosensing and biotherapy following a typical design-build-test cycle. Inspired from computer science and electronics, synthetic gene circuits have been designed to exhibit control over the flow of information in biological systems. Two types are Boolean logic inspired TRUE or FALSE digital logic and graded analog computation. Key principles for gene circuit engineering include modularity, orthogonality, predictability and reliability. Initial circuits in the field were small and hampered by a lack of modular and orthogonal components, however in recent years the library of available parts has increased vastly. New tools for high throughput DNA assembly and characterization have been developed enabling rapid prototyping, systematic in situ characterization, as well as automated design and assembly of circuits. Recently implemented computing paradigms in circuit memory and distributed computing using cell consortia will also be discussed. Finally, we will examine existing challenges in building predictable large-scale circuits including modularity, context dependency and metabolic burden as well as tools and methods used to resolve them. These new trends and techniques have the potential to accelerate design of larger gene circuits and result in an increase in our basic understanding of circuit and host behaviour.
Rough sets: past, present, and future
Introduction of rough sets by Professor Zdzisław Pawlak has completed 35 years. The theory has already attracted the attention of many researchers and practitioners, who have contributed essentially to its development, from all over the world. The methods, developed based on rough set theory alone or in combination with other approaches, found applications in many areas. In this article, we outline some selected past and present research directions of rough sets. In particular, we emphasize the importance of searching strategies for relevant approximation spaces as the basic tools in achieving computational building blocks (granules or patterns) required for approximation of complex vague concepts. We also discuss new challenges related to problem solving by intelligent systems (IS) or complex adaptive systems (CAS). The concern is to control problems using interactive granular computing, an extension of the rough set approach, for effective realization of computations realized in IS or CAS. These challenges are important for the development of natural computing too.
A GIS-aided cellular automata system for monitoring and estimating graph-based spread of epidemics
In this study, we introduce an application of a Cellular Automata (CA)-based system for monitoring and estimating the spread of epidemics in real world, considering the example of a Greek city. The proposed system combines cellular structure and graph representation to approach the connections among the area's parts more realistically. The original design of the model is attributed to a classical SIR (Susceptible-Infected-Recovered) mathematical model. Aiming to upgrade the application's effectiveness, we have enriched the model with parameters that advances its functionality to become self-adjusting and more efficient of approaching real situations. Thus, disease-related parameters have been introduced, while human interventions such as vaccination have been represented in algorithmic manner. The model incorporates actual geographical data (GIS, geographic information system) to upgrade its response. A methodology that allows the representation of any area with given population distribution and geographical data in a graph associated with the corresponding CA model for epidemic simulation has been developed. To validate the efficient operation of the proposed model and methodology of data display, the city of Eleftheroupoli, in Eastern Macedonia-Thrace, Greece, was selected as a testing platform (Eleftheroupoli, Kavala). Tests have been performed at both macroscopic and microscopic levels, and the results confirmed the successful operation of the system and verified the correctness of the proposed methodology.
Estimates of the collective immunity to COVID-19 derived from a stochastic cellular automaton based framework
In the context of the propagation of infectious diseases, when a sufficient degree of immunisation is achieved within a population, the spread of the disease is ended or significantly decreased, leading to collective immunity, meaning the indirect protection given by immune individuals to susceptible individuals. Here we describe the estimates of the collective immunity to COVID-19 from a stochastic cellular automaton based model designed to emulate the spread of SARS-CoV-2 in a population of static individuals interacting only via a Moore neighbourhood of radius one, with a view to analyze the impact of initially immune individuals on the dynamics of COVID-19. This impact was measured by comparing a progression of initial immunity ratio-the percentage of immunised individuals before patient zero starts infecting its neighbourhood-from 0 to 95% of the initial population, with the number of susceptible individuals not contaminated, the peak value of active cases, the total number of deaths and the emulated pandemic duration in days. The influence of this range of immunities over the model was tested with different parameterisations regarding the uncertainties involved in the model such as the durations of the cellular automaton states, the contamination contributions of each state and the state transition probabilities. A collective immunity threshold of on average was obtained from this procedure, under four distinct parameterisations, which is in tune with the estimates of the currently available medical literature, even increasing the uncertainty of the input parameters.
Design of nucleic acid strands with long low-barrier folding pathways
A major goal of natural computing is to design biomolecules, such as nucleic acid sequences, that can be used to perform computations. We design sequences of nucleic acids that are "guaranteed" to have long folding pathways relative to their length. This particular sequences with high probability follow low-barrier folding pathways that visit a large number of distinct structures. Long folding pathways are interesting, because they demonstrate that natural computing can potentially support long and complex computations. Formally, we provide the first scalable designs of molecules whose low-barrier folding pathways, with respect to a simple, stacked pair energy model, grow superlinearly with the molecule length, but for which all significantly shorter alternative folding pathways have an energy barrier that is [Formula: see text] times that of the low-barrier pathway for any [Formula: see text] and a sufficiently long sequence.
A tutorial on multiobjective optimization: fundamentals and evolutionary methods
In almost no other field of computer science, the idea of using bio-inspired search paradigms has been so useful as in solving multiobjective optimization problems. The idea of using a population of search agents that collectively approximate the Pareto front resonates well with processes in natural evolution, immune systems, and swarm intelligence. Methods such as NSGA-II, SPEA2, SMS-EMOA, MOPSO, and MOEA/D became standard solvers when it comes to solving multiobjective optimization problems. This tutorial will review some of the most important fundamentals in multiobjective optimization and then introduce representative algorithms, illustrate their working principles, and discuss their application scope. In addition, the tutorial will discuss statistical performance assessment. Finally, it highlights recent important trends and closely related research fields. The tutorial is intended for readers, who want to acquire basic knowledge on the mathematical foundations of multiobjective optimization and state-of-the-art methods in evolutionary multiobjective optimization. The aim is to provide a starting point for researching in this active area, and it should also help the advanced reader to identify open research topics.
Dynamic vehicle routing with time windows in theory and practice
The vehicle routing problem is a classical combinatorial optimization problem. This work is about a variant of the vehicle routing problem with dynamically changing orders and time windows. In real-world applications often the demands change during operation time. New orders occur and others are canceled. In this case new schedules need to be generated on-the-fly. Online optimization algorithms for dynamical vehicle routing address this problem but so far they do not consider time windows. Moreover, to match the scenarios found in real-world problems adaptations of benchmarks are required. In this paper, a practical problem is modeled based on the procedure of daily routing of a delivery company. New orders by customers are introduced dynamically during the working day and need to be integrated into the schedule. A multiple ant colony algorithm combined with powerful local search procedures is proposed to solve the dynamic vehicle routing problem with time windows. The performance is tested on a new benchmark based on simulations of a working day. The problems are taken from Solomon's benchmarks but a certain percentage of the orders are only revealed to the algorithm during operation time. Different versions of the MACS algorithm are tested and a high performing variant is identified. Finally, the algorithm is tested in situ: In a field study, the algorithm schedules a fleet of cars for a surveillance company. We compare the performance of the algorithm to that of the procedure used by the company and we summarize insights gained from the implementation of the real-world study. The results show that the multiple ant colony algorithm can get a much better solution on the academic benchmark problem and also can be integrated in a real-world environment.
Extended spiking neural P systems with white hole rules and their red-green variants
We consider extended spiking neural P systems with the additional possibility of so-called "white hole rules", which send the complete contents of a neuron to other neurons, and we prove that this extension of the original model can easily simulate register machines. Based on this proof, we then define red-green variants of these extended spiking neural P systems with white hole rules and show how to go beyond Turing with these red-green systems. We also discuss the number of actor neurons needed, and the relation of this model to some special variants of Lindenmayer systems.
Interactive computations: toward risk management in interactive intelligent systems
Understanding the nature of interactions is regarded as one of the biggest challenges in projects related to complex adaptive systems. We discuss foundations for interactive computations in interactive intelligent systems (IIS), developed in the Wistech program and used for modeling complex systems. We emphasize the key role of risk management in problem solving by IIS. The considerations are based on experience gained in real-life projects concerning, e.g., medical diagnosis and therapy support, control of an unmanned helicopter, fraud detection algorithmic trading or fire commander decision support.
An all-optical soliton FFT computational arrangement in the 3NLSE-domain
In this paper an all-optical soliton method for calculating the Fast Fourier Transform (FFT) algorithm is presented. The method comes as an extension of the calculation methods (soliton gates) as they become possible in the cubic non-linear Schrödinger equation (3NLSE) domain, and provides a further proof of the computational abilities of the scheme. The method involves collisions entirely between first order solitons in optical fibers whose propagation evolution is described by the 3NLSE. The main building block of the arrangement is the half-adder processor. Expanding around the half-adder processor, the "butterfly" calculation process is demonstrated using first order solitons, leading eventually to the realisation of an equivalent to a full Radix-2 FFT calculation algorithm.
Knowledge representation of motor activity of patients with Parkinson's disease
An approach to the knowledge representation extraction from biomedical signals analysis concerning motor activity of Parkinson disease patients is proposed in this paper. This is done utilizing accelerometers attached to their body as well as exploiting video image of their hand movements. Experiments are carried out employing artificial neural networks and support vector machine to the recognition of characteristic motor activity disorders in patients. Obtained results indicate that it is possible to interpret some selected patient's body movements with a sufficiently high effectiveness.
Turning machines: a simple algorithmic model for molecular robotics
Molecular robotics is challenging, so it seems best to keep it simple. We consider an abstract molecular robotics model based on simple folding instructions that execute asynchronously. Turning Machines are a simple 1D to 2D folding model, also easily generalisable to 2D to 3D folding. A Turning Machine starts out as a line of connected monomers in the discrete plane, each with an associated turning number. A monomer turns relative to its neighbours, executing a unit-distance translation that drags other monomers along with it, and through collective motion the initial set of monomers eventually folds into a programmed shape. We provide a suite of tools for reasoning about Turning Machines by fully characterising their ability to execute line rotations: executing an almost-full line rotation of radians is possible, yet a full rotation is impossible. Furthermore, line rotations up to are executed efficiently, in expected time in our continuous time Markov chain time model. We then show that such line-rotations represent a fundamental primitive in the model, by using them to efficiently and asynchronously fold shapes. In particular, arbitrarily large zig-zag-rastered squares and zig-zag paths are foldable, as are -monotone shapes albeit with error (bounded by perimeter length). Finally, we give shapes that despite having paths that traverse all their points, are in fact impossible to fold, as well as techniques for folding certain classes of (scaled) shapes without error. Our approach relies on careful geometric-based analyses of the feats possible and impossible by a very simple robotic system, and pushes conceptional hardness towards mathematical analysis and away from molecular implementation.