When humans perform inductive learning, they often enhance the process with background knowledge. With the increasing availability of well-formed collaborative knowledge bases, the performance of learning algorithms could be significantly enhanced if a way were found to exploit these knowledge bases. In this work, we present a novel algorithm for injecting external knowledge into induction algorithms using feature generation. Given a feature, the algorithm defines a new learning task over its set of values, and uses the knowledge base to solve the constructed learning task. The resulting classifier is then used as a new feature for the original problem. We have applied our algorithm to the domain of text classification using large semantic knowledge bases. We have shown that the generated features significantly improve the performance of existing learning algorithms.
Clustering is essential to many tasks in pattern recognition and computer vision. With the advent of deep learning, there is an increasing interest in learning deep unsupervised representations for clustering analysis. Many works on this domain rely on variants of auto-encoders and use the encoder outputs as representations/features for clustering. In this paper, we show that an l2 normalization constraint on these representations during auto-encoder training, makes the representations more separable and compact in the Euclidean space after training. This greatly improves the clustering accuracy when k-means clustering is employed on the representations. We also propose a clustering based unsupervised anomaly detection method using l2 normalized deep auto-encoder representations. We show the effect of l2 normalization on anomaly detection accuracy. We further show that the proposed anomaly detection method greatly improves accuracy compared to previously proposed deep methods such as reconstruction error based anomaly detection.
Agent-based modeling has been criticized for its apparent lack of establishing causality of social phenomena. However, we demonstrate that when coupled with evolutionary computation techniques, agent-based models can be used to evolve plausible agent behaviors that are able to recreate patterns observed in real-world data, from which valuable insights into candidate explanations of the macro-phenomenon can be drawn. Existing methodologies have suggested the manual assembly and comparison or automated selection of pre-built models on their ability to fit patterns in data. We discuss the cons of existing manual approaches and how evolutionary model discovery, an evolutionary approach to explore the space of agent behaviors for plausible rule-sets, can overcome these issues. We couple evolutionary model discovery with concepts from the Agent\_Zero framework, ensuring social connectivity, emotional theory components and rational mechanisms. In this study, we revisit the farm-seeking strategy of the Artificial Anasazi model, originally designed to simply select the closest potential farm plot as their next farming location. We use evolutionary model discovery to explore plausible farm seeking strategies, extending our previous study by testing four social connectivity strategies, four emotional theory components and five rational mechanisms for a more complex human-like approach towards farm plot selection. Our results confirm that, plot quality, dryness and community presence were more important in the farm selection process of the Anasazi than distance, and discover farm selection strategies that generate simulations that produce a closer fit to the archaeological data.
We present Henge, a system to support intent-based multi-tenancy in modern stream processing applications. Henge supports multi-tenancy as a first-class citizen: everyone inside an organization can now submit their stream processing jobs to a single, shared, consolidated cluster. Additionally, Henge allows each tenant (job) to specify its own intents (i.e., requirements) as a Service Level Objective (SLO) that captures latency and/or throughput. In a multi-tenant cluster, the Henge scheduler adapts continually to meet jobs’ SLOs in spite of limited cluster resources, and under dynamic input workloads. SLOs are soft and are based on utility functions. Henge continually tracks SLO satisfaction, and when jobs miss their SLOs, it wisely navigates the state space to perform resource allocations in real time, maximizing total system utility achieved by all jobs in the system. Henge is integrated in Apache Storm and we present experimental results using both production topologies and real datasets.
This paper presents a method to learn a decision tree to quantitatively explain the logic of each prediction of a pre-trained convolutional neural networks (CNNs). Our method boosts the following two aspects of network interpretability. 1) In the CNN, each filter in a high conv-layer must represent a specific object part, instead of describing mixed patterns without clear meanings. 2) People can explain each specific prediction made by the CNN at the semantic level using a decision tree, i.e., which filters (or object parts) are used for prediction and how much they contribute in the prediction. To conduct such a quantitative explanation of a CNN, our method learns explicit representations of object parts in high conv-layers of the CNN and mines potential decision modes memorized in fully-connected layers. The decision tree organizes these potential decision modes in a coarse-to-fine manner. Experiments have demonstrated the effectiveness of the proposed method.
Deep learning involves a difficult non-convex optimization problem with a large number of weights between any two adjacent layers of a deep structure. To handle large data sets or complicated networks, distributed training is needed, but the calculation of function, gradient, and Hessian is expensive. In particular, the communication and the synchronization cost may become a bottleneck. In this paper, we focus on situations where the model is distributedly stored, and propose a novel distributed Newton method for training deep neural networks. By variable and feature-wise data partitions, and some careful designs, we are able to explicitly use the Jacobian matrix for matrix-vector products in the Newton method. Some techniques are incorporated to reduce the running time as well as the memory consumption. First, to reduce the communication cost, we propose a diagonalization method such that an approximate Newton direction can be obtained without communication between machines. Second, we consider subsampled Gauss-Newton matrices for reducing the running time as well as the communication cost. Third, to reduce the synchronization cost, we terminate the process of finding an approximate Newton direction even though some nodes have not finished their tasks. Details of some implementation issues in distributed environments are thoroughly investigated. Experiments demonstrate that the proposed method is effective for the distributed training of deep neural networks. In compared with stochastic gradient methods, it is more robust and may give better test accuracy.
In recent years, cloud database storage has become an inexpensive and convenient option for businesses and individuals to store information. While its positive aspects make the cloud extremely attractive for data storage, it is a relatively new area of service, making it vulnerable to cyber-attacks and security breaches. Storing data in a foreign location also requires the owner to relinquish control of their information to system administrators of these online database services. This opens the possibility for malicious, internal attacks on the data that may involve the manipulation, omission, or addition of data. The retention of the data as it was intended to be stored is referred to as the database’s integrity. Our research tests a potential solution for maintaining the integrity of these cloud-storage databases by converting the original databases to Integrity Coded Databases (ICDB). ICDBs utilize Integrity Codes: cryptographic codes created alongside the data by a private key that only the data owner has access to. When the database is queried, an integrity code is returned along with the queried information. The owner is then able to verify that the information is correct, complete, and fresh. Consequently, ICDBs also incur performance and memory penalties. In our research, we explore, test, and benchmark ICDBs to determine the costs and benefits of maintaining an ICDB versus a standard database.
The present work shows the application of transfer learning for a pre-trained deep neural network (DNN), using a small image dataset ($\approx$ 12,000) on a single workstation with enabled NVIDIA GPU card that takes up to 1 hour to complete the training task and archive an overall average accuracy of $94.7\%$. The DNN presents a $20\%$ score of misclassification for an external test dataset. The accuracy of the proposed methodology is equivalent to ones using HSI methodology $(81\%-91\%)$ used for the same task, but with the advantage of being independent on special equipment to classify wheat kernel for FHB symptoms.
With the rise of data driven deep neural networks as a realization of universal function approximators, most research on computer vision problems has moved away from hand crafted classical image processing algorithms. This paper shows that with a well designed algorithm, we are capable of outperforming neural network based methods on the task of depth completion. The proposed algorithm is simple and fast, runs on the CPU, and relies only on basic image processing operations to perform depth completion of sparse LIDAR depth data. We evaluate our algorithm on the challenging KITTI depth completion benchmark, and at the time of submission, our method ranks first on the KITTI test server among all published methods. Furthermore, our algorithm is data independent, requiring no training data to perform the task at hand. The code written in Python will be made publicly available at https://…/ip_basic.
Incremental versions of batch algorithms are often desired, for increased time efficiency in the streaming data setting, or increased memory efficiency in general. In this paper we present a novel algorithm for incremental kernel PCA, based on rank one updates to the eigendecomposition of the kernel matrix, which is more computationally efficient than comparable existing algorithms. We extend our algorithm to incremental calculation of the Nystr\’om approximation to the kernel matrix, the first such algorithm proposed. Incremental calculation of the Nystr\’om approximation leads to further gains in memory efficiency, and allows for empirical evaluation of when a subset of sufficient size has been obtained.
Gaussian process (GP) models provide a powerful tool for prediction but are computationally prohibitive using large data sets. In such scenarios, one has to resort to approximate methods. We derive an approximation based on a composite likelihood approach using a general belief updating framework, which leads to a recursive computation of the predictor as well as of learning the hyper-parameters. We then provide an analysis of the derived composite GP model in predictive and information-theoretic terms. Finally, we evaluate the approximation with both synthetic data and a real-world application.
We present a class of algorithms capable of directly training deep neural networks with respect to large families of task-specific performance measures such as the F-measure and the Kullback-Leibler divergence that are structured and non-decomposable. This presents a departure from standard deep learning techniques that typically use squared or cross-entropy loss functions (that are decomposable) to train neural networks. We demonstrate that directly training with task-specific loss functions yields much faster and more stable convergence across problems and datasets. Our proposed algorithms and implementations have several novel features including (i) convergence to first order stationary points despite optimizing complex objective functions; (ii) use of fewer training samples to achieve a desired level of convergence, (iii) a substantial reduction in training time, and (iv) a seamless integration of our implementation into existing symbolic gradient frameworks. We implement our techniques on a variety of deep architectures including multi-layer perceptrons and recurrent neural networks and show that on a variety of benchmark and real data sets, our algorithms outperform traditional approaches to training deep networks, as well as some recent approaches to task-specific training of neural networks.
In this paper, we introduce ‘Power Linear Unit’ (PoLU) which increases the nonlinearity capacity of a neural network and thus helps improving its performance. PoLU adopts several advantages of previously proposed activation functions. First, the output of PoLU for positive inputs is designed to be identity to avoid the gradient vanishing problem. Second, PoLU has a non-zero output for negative inputs such that the output mean of the units is close to zero, hence reducing the bias shift effect. Thirdly, there is a saturation on the negative part of PoLU, which makes it more noise-robust for negative inputs. Furthermore, we prove that PoLU is able to map more portions of every layer’s input to the same space by using the power function and thus increases the number of response regions of the neural network. We use image classification for comparing our proposed activation function with others. In the experiments, MNIST, CIFAR-10, CIFAR-100, Street View House Numbers (SVHN) and ImageNet are used as benchmark datasets. The neural networks we implemented include widely-used ELU-Network, ResNet-50, and VGG16, plus a couple of shallow networks. Experimental results show that our proposed activation function outperforms other state-of-the-art models with most networks.
We study a logistic model-based active learning procedure for binary classification problems, in which we adopt a batch subject selection strategy with a modified sequential experimental design method. Moreover, accompanying the proposed subject selection scheme, we simultaneously conduct a greedy variable selection procedure such that we can update the classification model with all labeled training subjects. The proposed algorithm repeatedly performs both subject and variable selection steps until a prefixed stopping criterion is reached. Our numerical results show that the proposed procedure has competitive performance, with smaller training size and a more compact model, comparing with that of the classifier trained with all variables and a full data set. We also apply the proposed procedure to a well-known wave data set (Breiman et al., 1984) to confirm the performance of our method.
Probit regression was first proposed by Bliss in 1934 to study mortality rates of insects. Since then, an extensive body of work has analyzed and used probit or related binary regression methods (such as logistic regression) in numerous applications and fields. This paper provides a fresh angle to such well-established binary regression methods. Concretely, we demonstrate that linearizing the probit model in combination with linear estimators performs on par with state-of-the-art nonlinear regression methods, such as posterior mean or maximum aposteriori estimation, for a broad range of real-world regression problems. We derive exact, closed-form, and nonasymptotic expressions for the mean-squared error of our linearized estimators, which clearly separates them from nonlinear regression methods that are typically difficult to analyze. We showcase the efficacy of our methods and results for a number of synthetic and real-world datasets, which demonstrates that linearized binary regression finds potential use in a variety of inference, estimation, signal processing, and machine learning applications that deal with binary-valued observations or measurements.
Phase retrieval deals with the recovery of complex- or real-valued signals from magnitude measurements. As shown recently, the method PhaseMax enables phase retrieval via convex optimization and without lifting the problem to a higher dimension. To succeed, PhaseMax requires an initial guess of the solution, which can be calculated via spectral initializers. In this paper, we show that with the availability of an initial guess, phase retrieval can be carried out with an ever simpler, linear procedure. Our algorithm, called PhaseLin, is the linear estimator that minimizes the mean squared error (MSE) when applied to the magnitude measurements. The linear nature of PhaseLin enables an exact and nonasymptotic MSE analysis for arbitrary measurement matrices. We furthermore demonstrate that by iteratively using PhaseLin, one arrives at an efficient phase retrieval algorithm that performs on par with existing convex and nonconvex methods on synthetic and real-world data.