We reconsider a recently published algorithm (Dalkilic et al.) for merging lists by way of the perfect shuffle. The original publication gave only experimental results which, although consistent with linear execution time on the samples tested, provided no analysis. Here we prove that the time complexity, in the average case, is indeed linear, although there is an Omega(n^2) worst case. This is then the first provably linear time merge algorithm based on the use of the perfect shuffle. We provide a proof of correctness, extend the algorithm to the general case where the lists are of unequal length and show how it can be made stable, all aspects not included in the original presentation and we give a much more concise definition of the algorithm.
Understanding the theoretical foundations of how memories are encoded and retrieved in neural populations is a central challenge in neuroscience. A popular theoretical scenario for modeling memory function is the attractor neural network scenario, whose prototype is the Hopfield model. The model has a poor storage capacity, compared with the capacity achieved with perceptron learning algorithms. Here, by transforming the perceptron learning rule, we present an online learning rule for a recurrent neural network that achieves near-maximal storage capacity without an explicit supervisory error signal, relying only upon locally accessible information. The fully-connected network consists of excitatory binary neurons with plastic recurrent connections and non-plastic inhibitory feedback stabilizing the network dynamics; the memory patterns are presented online as strong afferent currents, producing a bimodal distribution for the neuron synaptic inputs. Synapses corresponding to active inputs are modified as a function of the value of the local fields with respect to three thresholds. Above the highest threshold, and below the lowest threshold, no plasticity occurs. In between these two thresholds, potentiation/depression occurs when the local field is above/below an intermediate threshold. We simulated and analyzed a network of binary neurons implementing this rule and measured its storage capacity for different sizes of the basins of attraction. The storage capacity obtained through numerical simulations is shown to be close to the value predicted by analytical calculations. We also measured the dependence of capacity on the strength of external inputs. Finally, we quantified the statistics of the resulting synaptic connectivity matrix, and found that both the fraction of zero weight synapses and the degree of symmetry of the weight matrix increase with the number of stored patterns.
In this study, a spectral graph-theoretic grouping strategy for weakly supervised classification is introduced, where a limited number of labelled samples and a larger set of unlabelled samples are used to construct a larger annotated training set composed of strongly labelled and weakly labelled samples. The inherent relationship between the set of strongly labelled samples and the set of unlabelled samples is established via spectral grouping, with the unlabelled samples subsequently weakly annotated based on the strongly labelled samples within the associated spectral groups. A number of similarity graph models for spectral grouping, including two new similarity graph models introduced in this study, are explored to investigate their performance in the context of weakly supervised classification in handling different types of data. Experimental results using benchmark datasets as well as real EMG datasets demonstrate that the proposed approach to weakly supervised classification can provide noticeable improvements in classification performance, and that the proposed similarity graph models can lead to ultimate learning results that are either better than or on a par with existing similarity graph models in the context of spectral grouping for weakly supervised classification.
Distributed representations of words and paragraphs as semantic embeddings in high dimensional data are used across a number of Natural Language Understanding tasks such as retrieval, translation, and classification. In this work, we propose ‘Class Vectors’ – a framework for learning a vector per class in the same embedding space as the word and paragraph embeddings. Similarity between these class vectors and word vectors are used as features to classify a document to a class. In experiment on several sentiment analysis tasks such as Yelp reviews and Amazon electronic product reviews, class vectors have shown better or comparable results in classification while learning very meaningful class embeddings.
Since genetic algorithm was proposed by John Holland (Holland J. H., 1975) in the early 1970s, the study of evolutionary algorithm has emerged as a popular research field (Civicioglu & Besdok, 2013). Researchers from various scientific and engineering disciplines have been digging into this field, exploring the unique power of evolutionary algorithms (Hadka & Reed, 2013). Many applications have been successfully proposed in the past twenty years. For example, mechanical design (Lampinen & Zelinka, 1999), electromagnetic optimization (Rahmat-Samii & Michielssen, 1999), environmental protection (Bertini, Felice, Moretti, & Pizzuti, 2010), finance (Larkin & Ryan, 2010), musical orchestration (Esling, Carpentier, & Agon, 2010), pipe routing (Furuholmen, Glette, Hovin, & Torresen, 2010), and nuclear reactor core design (Sacco, Henderson, Rios-Coelho, Ali, & Pereira, 2009). In particular, its function optimization capability was highlighted (Goldberg & Richardson, 1987) because of its high adaptability to different function landscapes, to which we cannot apply traditional optimization techniques (Wong, Leung, & Wong, 2009). Here we review the applications of evolutionary algorithms in bioinformatics.
Real world problems always have different multiple solutions. For instance, optical engineers need to tune the recording parameters to get as many optimal solutions as possible for multiple trials in the varied-line-spacing holographic grating design problem. Unfortunately, most traditional optimization techniques focus on solving for a single optimal solution. They need to be applied several times; yet all solutions are not guaranteed to be found. Thus the multimodal optimization problem was proposed. In that problem, we are interested in not only a single optimal point, but also the others. With strong parallel search capability, evolutionary algorithms are shown to be particularly effective in solving this type of problem. In particular, the evolutionary algorithms for multimodal optimization usually not only locate multiple optima in a single run, but also preserve their population diversity throughout a run, resulting in their global optimization ability on multimodal functions. In addition, the techniques for multimodal optimization are borrowed as diversity maintenance techniques to other problems. In this chapter, we describe and review the state-of-the-arts evolutionary algorithms for multimodal optimization in terms of methodology, benchmarking, and application.
Logistic models for random graphs are commonly used to study binary networks when covariate information is available. After estimating the logistic parameters, one of the main questions which arises in practice is to assess the goodness of fit of the corresponding model. To address this problem, we add a general term, related to the graphon function of W-graph models, to the logistic function. Such an extra term aims at characterizing the residual structure of the network, that is not explained by the covariates. We approximate this new generic logistic model using a class of models with blockwise constant residual structure. This framework allows to derive a Bayesian procedure from a model based selection context using goodness-of-fit criteria. All these criteria depend on marginal likelihood terms for which we do provide estimates relying on two series of variational approximations. Experiments on toy data are carried out to assess the inference procedure. Finally, two real networks from social sciences and ecology are studied to illustrate the proposed methodology.
Tackling problems in areas such as computer vision, bioinformatics, speech or text recognition is often done best by taking into account task-specific statistical relations between output variables. In structured prediction, this internal structure is levered to predict multiple outputs simultaneously, leading to more accurate and coherent predictions. Structural support vector machines (SSVMs) are nonprobabilistic models that optimize a joint input-output function through margin-based learning. Because SSVMs generally disregard the interplay between unary and interaction factors during the training phase, final parameters are suboptimal. Moreover, its factors are often restricted to linear combinations of input features, limiting its generalization power. To improve prediction accuracy, this paper proposes: (i) Joint inference and learning by integration of back-propagation and loss-augmented inference in SSVM subgradient descent; (ii) Extending SSVM factors to neural networks that form highly nonlinear functions of input features. Image segmentation benchmark results demonstrate improvements over conventional SSVM training methods in terms of accuracy, highlighting the feasibility of end-to-end SSVM training with neural factors.
Intelligence can be understood as an agent’s ability to predict its environment’s dynamic by a level of precision which allows it to effectively foresee opportunities and threats. Under the assumption that such intelligence relies on a knowledge space any effective reasoning would benefit from a maximum portion of useful and a minimum portion of misleading knowledge fragments. It begs the question of how the quality of such knowledge space can be kept high as the amount of knowledge keeps growing. This article proposes a mathematical model to describe general principles of how quality of a growing knowledge space evolves depending on error rate, error propagation and countermeasures. There is also shown to which extend the quality of a knowledge space collapses as removal of low quality knowledge fragments occurs too slowly for a given knowledge space’s growth rate.
A multimodal system with Poisson, Gaussian, and multinomial observations is considered. A generative graphical model that combines multiple modalities through common factor loadings is proposed. In this model, latent factors are like summary objects that has latent factor scores in each modality, and the observed objects are represented in terms of such summary objects. This potentially brings about a significant dimensionality reduction. It also naturally enables a powerful means of clustering based on a diverse set of observations. An expectation-maximization (EM) algorithm to find the model parameters is provided. The algorithm is tested on a Twitter dataset which consists of the counts and geographical coordinates of hashtag occurrences, together with the bag of words for each hashtag. The resultant factors successfully localizes the hashtags in all dimensions: counts, coordinates, topics. The algorithm is also extended to accommodate von Mises-Fisher distribution, which is used to model the spherical coordinates.
Accurate estimation such as cost estimation, quality estimation and risk analysis is a major issue in management. We propose a patent pending soft computing framework to tackle this challenging problem. Our generic framework is independent of the nature and type of estimation. It consists of neural network, fuzzy logic, and an algorithmic estimation model. We made use of the Constructive Cost Model (COCOMO), Analysis of Variance (ANOVA), and Function Point Analysis as the algorithmic models and validated the accuracy of the Neuro-Fuzzy Algorithmic (NFA) Model in software cost estimation using industrial project data. Our model produces more accurate estimation than using an algorithmic model alone. We also discuss the prototypes of our tools that implement the NFA Model. We conclude with our roadmap and direction to enrich the model in tackling different estimation challenges.
We provide a complete description of possible correlation matrices consistent with a Gaussian latent tree model for any tree. This is expressed in the form of algebraic and semi-algebraic constraints imposed on the correlation matrix. We present techniques for utilising these constraints to assess whether observed data is compatible with a Gaussian latent tree model. We demonstrate the usefulness of the inverse-Wishart distribution to perform preliminary assessments of tree compatibility using the semi-algebraic constraints. Further, we note the correspondence of these algebraic constraints with vanishing tetrads. Using Drton et al. (2008) we provide the appropriate moments required for test statistics for assessing adherence to these equality constraints. We illustrate our exploratory tetrad analysis using a linguistic application and our confirmatory tetrad analysis using a biological application.
We present a new convolutional neural network-based time-series model. Typical convolutional neural network (CNN) architectures rely on the use of max-pooling operators in between layers, which leads to reduced resolution at the top layers. Instead, in this work we consider a fully convolutional network (FCN) architecture that uses causal filtering operations, and allows for the rate of the output signal to be the same as that of the input signal. We furthermore propose an undecimated version of the FCN, which we refer to as the undecimated fully convolutional neural network (UFCNN), and is motivated by the undecimated wavelet transform. Our experimental results verify that using the undecimated version of the FCN is necessary in order to allow for effective time-series modeling. The UFCNN has several advantages compared to other time-series models such as the recurrent neural network (RNN) and long short-term memory (LSTM), since it does not suffer from either the vanishing or exploding gradients problems, and is therefore easier to train. Convolution operations can also be implemented more efficiently compared to the recursion that is involved in RNN-based models. We evaluate the performance of our model in a synthetic target tracking task using bearing only measurements generated from a state-space model, a probabilistic modeling of polyphonic music sequences problem, and a high frequency trading task using a time-series of ask/bid quotes and their corresponding volumes. Our experimental results using synthetic and real datasets verify the significant advantages of the UFCNN compared to the RNN and LSTM baselines.
To evaluate the performance of diagnostic tests, it is important to control both TPR (True Positive Rate) and FPR (False Positive Rate). In literature, most researchers proposed to control the partial area under the ROC curve (pAUC) with restrictions on FPR, which is called FPR-pAUC. FPR-pAUC is designed to artificially measure the area controlled by TPR and FPR, but is often misleading conceptually and practically. In this paper, we provide a new and intuitive method, named two-way pAUC, which is the partial area under the ROC curve with both horizontal and vertical restrictions. We also propose an estimator of two-way pAUC and obtain its theoretical properties. Compared with other existing tools, the two-way pAUC approach is more efficient to distinguish diagnostic tests. We also demonstrate two-way pAUC as a new and efficient tool to analyze the Wisconsin Breast Cancer Data, where TPR need to be limited in high level due to diagnosing a lethal disease.
The quality of high-level AI of non-player characters (NPCs) in commercial open-world games (OWGs) has been increasing during the past years. However, due to constraints specific to the game industry, this increase has been slow and it has been driven by larger budgets rather than adoption of new complex AI techniques. Most of the contemporary AI is still expressed as hard-coded scripts. The complexity and manageability of the script codebase is one of the key limiting factors for further AI improvements. In this paper we address this issue. We present behavior objects – a general approach to development of NPC behaviors for large OWGs. Behavior objects are inspired by object-oriented programming and extend the concept of smart objects. Our approach promotes encapsulation of data and code for multiple related behaviors in one place, hiding internal details and embedding intelligence in the environment. Behavior objects are a natural abstraction of five different techniques that we have implemented to manage AI complexity in an upcoming AAA OWG. We report the details of the implementations in the context of behavior trees and the lessons learned during development. Our work should serve as inspiration for AI architecture designers from both the academia and the industry.
Dirichlet Process Mixture (DPM) models have been increasingly employed to specify random partition models that take into account possible patterns within the covariates. Furthermore, in response to large numbers of covariates, methods for selecting the most important covariates have been proposed. Commonly, the covariates are chosen either for their importance in determining the clustering of the observations or for their effect on the level of a response variable (in case a regression model is specified). Typically both strategies involve the specification of latent indicators that regulate the inclusion of the covariates in the model. Common examples involve the use of spike and slab prior distributions. In this work we review the most relevant DPM models that include the covariate information in the induced partition of the observations and we focus extensively on available variable selection techniques for these models. We highlight the main features of each model and demonstrate them in simulations.