Recursive Feature Generation for Knowledge-based Learning

When humans perform inductive learning, they often enhance the process with background knowledge. With the increasing availability of well-formed collaborative knowledge bases, the performance of learning algorithms could be significantly enhanced if a way were found to exploit these knowledge bases. In this work, we present a novel algorithm for injecting external knowledge into induction algorithms using feature generation. Given a feature, the algorithm defines a new learning task over its set of values, and uses the knowledge base to solve the constructed learning task. The resulting classifier is then used as a new feature for the original problem. We have applied our algorithm to the domain of text classification using large semantic knowledge bases. We have shown that the generated features significantly improve the performance of existing learning algorithms.

Clustering and Unsupervised Anomaly Detection with L2 Normalized Deep Auto-Encoder Representations

Clustering is essential to many tasks in pattern recognition and computer vision. With the advent of deep learning, there is an increasing interest in learning deep unsupervised representations for clustering analysis. Many works on this domain rely on variants of auto-encoders and use the encoder outputs as representations/features for clustering. In this paper, we show that an l2 normalization constraint on these representations during auto-encoder training, makes the representations more separable and compact in the Euclidean space after training. This greatly improves the clustering accuracy when k-means clustering is employed on the representations. We also propose a clustering based unsupervised anomaly detection method using l2 normalized deep auto-encoder representations. We show the effect of l2 normalization on anomaly detection accuracy. We further show that the proposed anomaly detection method greatly improves accuracy compared to previously proposed deep methods such as reconstruction error based anomaly detection.

Evolutionary Model Discovery of Factors for Farm Selection by the Artificial Anasazi

Agent-based modeling has been criticized for its apparent lack of establishing causality of social phenomena. However, we demonstrate that when coupled with evolutionary computation techniques, agent-based models can be used to evolve plausible agent behaviors that are able to recreate patterns observed in real-world data, from which valuable insights into candidate explanations of the macro-phenomenon can be drawn. Existing methodologies have suggested the manual assembly and comparison or automated selection of pre-built models on their ability to fit patterns in data. We discuss the cons of existing manual approaches and how evolutionary model discovery, an evolutionary approach to explore the space of agent behaviors for plausible rule-sets, can overcome these issues. We couple evolutionary model discovery with concepts from the Agent\_Zero framework, ensuring social connectivity, emotional theory components and rational mechanisms. In this study, we revisit the farm-seeking strategy of the Artificial Anasazi model, originally designed to simply select the closest potential farm plot as their next farming location. We use evolutionary model discovery to explore plausible farm seeking strategies, extending our previous study by testing four social connectivity strategies, four emotional theory components and five rational mechanisms for a more complex human-like approach towards farm plot selection. Our results confirm that, plot quality, dryness and community presence were more important in the farm selection process of the Anasazi than distance, and discover farm selection strategies that generate simulations that produce a closer fit to the archaeological data.

Henge: Intent-driven Multi-Tenant Stream Processing

We present Henge, a system to support intent-based multi-tenancy in modern stream processing applications. Henge supports multi-tenancy as a first-class citizen: everyone inside an organization can now submit their stream processing jobs to a single, shared, consolidated cluster. Additionally, Henge allows each tenant (job) to specify its own intents (i.e., requirements) as a Service Level Objective (SLO) that captures latency and/or throughput. In a multi-tenant cluster, the Henge scheduler adapts continually to meet jobs’ SLOs in spite of limited cluster resources, and under dynamic input workloads. SLOs are soft and are based on utility functions. Henge continually tracks SLO satisfaction, and when jobs miss their SLOs, it wisely navigates the state space to perform resource allocations in real time, maximizing total system utility achieved by all jobs in the system. Henge is integrated in Apache Storm and we present experimental results using both production topologies and real datasets.

Interpreting CNNs via Decision Trees

This paper presents a method to learn a decision tree to quantitatively explain the logic of each prediction of a pre-trained convolutional neural networks (CNNs). Our method boosts the following two aspects of network interpretability. 1) In the CNN, each filter in a high conv-layer must represent a specific object part, instead of describing mixed patterns without clear meanings. 2) People can explain each specific prediction made by the CNN at the semantic level using a decision tree, i.e., which filters (or object parts) are used for prediction and how much they contribute in the prediction. To conduct such a quantitative explanation of a CNN, our method learns explicit representations of object parts in high conv-layers of the CNN and mines potential decision modes memorized in fully-connected layers. The decision tree organizes these potential decision modes in a coarse-to-fine manner. Experiments have demonstrated the effectiveness of the proposed method.

Distributed Newton Methods for Deep Neural Networks

Deep learning involves a difficult non-convex optimization problem with a large number of weights between any two adjacent layers of a deep structure. To handle large data sets or complicated networks, distributed training is needed, but the calculation of function, gradient, and Hessian is expensive. In particular, the communication and the synchronization cost may become a bottleneck. In this paper, we focus on situations where the model is distributedly stored, and propose a novel distributed Newton method for training deep neural networks. By variable and feature-wise data partitions, and some careful designs, we are able to explicitly use the Jacobian matrix for matrix-vector products in the Newton method. Some techniques are incorporated to reduce the running time as well as the memory consumption. First, to reduce the communication cost, we propose a diagonalization method such that an approximate Newton direction can be obtained without communication between machines. Second, we consider subsampled Gauss-Newton matrices for reducing the running time as well as the communication cost. Third, to reduce the synchronization cost, we terminate the process of finding an approximate Newton direction even though some nodes have not finished their tasks. Details of some implementation issues in distributed environments are thoroughly investigated. Experiments demonstrate that the proposed method is effective for the distributed training of deep neural networks. In compared with stochastic gradient methods, it is more robust and may give better test accuracy.

Integrity Coded Databases: An Evaluation of Performance, Efficiency, and Practicality

In recent years, cloud database storage has become an inexpensive and convenient option for businesses and individuals to store information. While its positive aspects make the cloud extremely attractive for data storage, it is a relatively new area of service, making it vulnerable to cyber-attacks and security breaches. Storing data in a foreign location also requires the owner to relinquish control of their information to system administrators of these online database services. This opens the possibility for malicious, internal attacks on the data that may involve the manipulation, omission, or addition of data. The retention of the data as it was intended to be stored is referred to as the database’s integrity. Our research tests a potential solution for maintaining the integrity of these cloud-storage databases by converting the original databases to Integrity Coded Databases (ICDB). ICDBs utilize Integrity Codes: cryptographic codes created alongside the data by a private key that only the data owner has access to. When the database is queried, an integrity code is returned along with the queried information. The owner is then able to verify that the information is correct, complete, and fresh. Consequently, ICDBs also incur performance and memory penalties. In our research, we explore, test, and benchmark ICDBs to determine the costs and benefits of maintaining an ICDB versus a standard database.

Fusarium Damaged Kernels Detection Using Transfer Learning on Deep Neural Network Architecture

The present work shows the application of transfer learning for a pre-trained deep neural network (DNN), using a small image dataset (\approx 12,000) on a single workstation with enabled NVIDIA GPU card that takes up to 1 hour to complete the training task and archive an overall average accuracy of 94.7\%. The DNN presents a 20\% score of misclassification for an external test dataset. The accuracy of the proposed methodology is equivalent to ones using HSI methodology (81\%-91\%) used for the same task, but with the advantage of being independent on special equipment to classify wheat kernel for FHB symptoms.

In Defense of Classical Image Processing: Fast Depth Completion on the CPU

With the rise of data driven deep neural networks as a realization of universal function approximators, most research on computer vision problems has moved away from hand crafted classical image processing algorithms. This paper shows that with a well designed algorithm, we are capable of outperforming neural network based methods on the task of depth completion. The proposed algorithm is simple and fast, runs on the CPU, and relies only on basic image processing operations to perform depth completion of sparse LIDAR depth data. We evaluate our algorithm on the challenging KITTI depth completion benchmark, and at the time of submission, our method ranks first on the KITTI test server among all published methods. Furthermore, our algorithm is data independent, requiring no training data to perform the task at hand. The code written in Python will be made publicly available at https://…/ip_basic.

Incremental kernel PCA and the Nyström method

Incremental versions of batch algorithms are often desired, for increased time efficiency in the streaming data setting, or increased memory efficiency in general. In this paper we present a novel algorithm for incremental kernel PCA, based on rank one updates to the eigendecomposition of the kernel matrix, which is more computationally efficient than comparable existing algorithms. We extend our algorithm to incremental calculation of the Nystr\’om approximation to the kernel matrix, the first such algorithm proposed. Incremental calculation of the Nystr\’om approximation leads to further gains in memory efficiency, and allows for empirical evaluation of when a subset of sufficient size has been obtained.

Composite Gaussian Processes: Scalable Computation and Performance Analysis

Gaussian process (GP) models provide a powerful tool for prediction but are computationally prohibitive using large data sets. In such scenarios, one has to resort to approximate methods. We derive an approximation based on a composite likelihood approach using a general belief updating framework, which leads to a recursive computation of the predictor as well as of learning the hyper-parameters. We then provide an analysis of the derived composite GP model in predictive and information-theoretic terms. Finally, we evaluate the approximation with both synthetic data and a real-world application.

Optimizing Non-decomposable Measures with Deep Networks

We present a class of algorithms capable of directly training deep neural networks with respect to large families of task-specific performance measures such as the F-measure and the Kullback-Leibler divergence that are structured and non-decomposable. This presents a departure from standard deep learning techniques that typically use squared or cross-entropy loss functions (that are decomposable) to train neural networks. We demonstrate that directly training with task-specific loss functions yields much faster and more stable convergence across problems and datasets. Our proposed algorithms and implementations have several novel features including (i) convergence to first order stationary points despite optimizing complex objective functions; (ii) use of fewer training samples to achieve a desired level of convergence, (iii) a substantial reduction in training time, and (iv) a seamless integration of our implementation into existing symbolic gradient frameworks. We implement our techniques on a variety of deep architectures including multi-layer perceptrons and recurrent neural networks and show that on a variety of benchmark and real data sets, our algorithms outperform traditional approaches to training deep networks, as well as some recent approaches to task-specific training of neural networks.

Training Neural Networks by Using Power Linear Units (PoLUs)

In this paper, we introduce ‘Power Linear Unit’ (PoLU) which increases the nonlinearity capacity of a neural network and thus helps improving its performance. PoLU adopts several advantages of previously proposed activation functions. First, the output of PoLU for positive inputs is designed to be identity to avoid the gradient vanishing problem. Second, PoLU has a non-zero output for negative inputs such that the output mean of the units is close to zero, hence reducing the bias shift effect. Thirdly, there is a saturation on the negative part of PoLU, which makes it more noise-robust for negative inputs. Furthermore, we prove that PoLU is able to map more portions of every layer’s input to the same space by using the power function and thus increases the number of response regions of the neural network. We use image classification for comparing our proposed activation function with others. In the experiments, MNIST, CIFAR-10, CIFAR-100, Street View House Numbers (SVHN) and ImageNet are used as benchmark datasets. The neural networks we implemented include widely-used ELU-Network, ResNet-50, and VGG16, plus a couple of shallow networks. Experimental results show that our proposed activation function outperforms other state-of-the-art models with most networks.

Greedy Active Learning Algorithm for Logistic Regression Models

We study a logistic model-based active learning procedure for binary classification problems, in which we adopt a batch subject selection strategy with a modified sequential experimental design method. Moreover, accompanying the proposed subject selection scheme, we simultaneously conduct a greedy variable selection procedure such that we can update the classification model with all labeled training subjects. The proposed algorithm repeatedly performs both subject and variable selection steps until a prefixed stopping criterion is reached. Our numerical results show that the proposed procedure has competitive performance, with smaller training size and a more compact model, comparing with that of the classifier trained with all variables and a full data set. We also apply the proposed procedure to a well-known wave data set (Breiman et al., 1984) to confirm the performance of our method.

Linearized Binary Regression

Probit regression was first proposed by Bliss in 1934 to study mortality rates of insects. Since then, an extensive body of work has analyzed and used probit or related binary regression methods (such as logistic regression) in numerous applications and fields. This paper provides a fresh angle to such well-established binary regression methods. Concretely, we demonstrate that linearizing the probit model in combination with linear estimators performs on par with state-of-the-art nonlinear regression methods, such as posterior mean or maximum aposteriori estimation, for a broad range of real-world regression problems. We derive exact, closed-form, and nonasymptotic expressions for the mean-squared error of our linearized estimators, which clearly separates them from nonlinear regression methods that are typically difficult to analyze. We showcase the efficacy of our methods and results for a number of synthetic and real-world datasets, which demonstrates that linearized binary regression finds potential use in a variety of inference, estimation, signal processing, and machine learning applications that deal with binary-valued observations or measurements.

PhaseLin: Linear Phase Retrieval

Phase retrieval deals with the recovery of complex- or real-valued signals from magnitude measurements. As shown recently, the method PhaseMax enables phase retrieval via convex optimization and without lifting the problem to a higher dimension. To succeed, PhaseMax requires an initial guess of the solution, which can be calculated via spectral initializers. In this paper, we show that with the availability of an initial guess, phase retrieval can be carried out with an ever simpler, linear procedure. Our algorithm, called PhaseLin, is the linear estimator that minimizes the mean squared error (MSE) when applied to the magnitude measurements. The linear nature of PhaseLin enables an exact and nonasymptotic MSE analysis for arbitrary measurement matrices. We furthermore demonstrate that by iteratively using PhaseLin, one arrives at an efficient phase retrieval algorithm that performs on par with existing convex and nonconvex methods on synthetic and real-world data.

On the Topic of Jets
Optimal Calibration for Computer Model Prediction with Finite Samples
Cluster-based Approach to Improve Affect Recognition from Passively Sensed Data
Coupling geometry on binary bipartite networks: hypotheses testing on pattern geometry and nestedness
Technical Report: Adjudication of Coreference Annotations via Answer Set Optimization
Matrix completion with deterministic pattern – a geometric perspective
Deceptive Games
Minimally toughness in special graph classes
First order theory on $G(n, c n^{-1})$
Dynamics of Driver’s Gaze: Explorations in Behavior Modeling & Maneuver Prediction
On the separability of unitarily invariant random quantum states – the unbalanced regime
Stochastic Differential Equations with Critical Drifts
Graphon games
NC Algorithms for Perfect Matching and Maximum Flow in One-Crossing-Minor-Free Graphs
Improved Image Segmentation via Cost Minimization of Multiple Hypotheses
Cross-domain CNN for Hyperspectral Image Classification
Single Image Reflection Removal Using Deep Encoder-Decoder Network
On the Achievability Region of Regenerating Codes for Multiple Erasures
Predicting Wireless Channel Features using Neural Networks
Frequency Domain Properties and Fundamental Limits of Buffer-Feedback Regulation in Biochemical Systems
The time geography of segregation during working hours
A Modified Sigma-Pi-Sigma Neural Network with Adaptive Choice of Multinomials
Rethinking the Smaller-Norm-Less-Informative Assumption in Channel Pruning of Convolution Layers
Redundancy of Markov Family with Unbounded Memory
Algebraic formulas for the structure constants in symmetric functions
How many weights can a linear code have?
Alternating Multi-bit Quantization for Recurrent Neural Networks
Macros to Conduct Tests of Multimodality in SAS
Semantic White Balance: Semantic Color Constancy Using Convolutional Neural Network
Bootstrapping and Multiple Imputation Ensemble Approaches for Missing Data
Optimal LRC codes for all lenghts n <= q
Bipartite discrimination of independently prepared quantum states: revival of distinguishability with entanglement
Consensus-based Distributed Quickest Detection of Attacks with Unknown Parameters
Deep Learning with Data Dependent Implicit Activation Function
Augmented Space Linear Model
Perceptual Compressive Sensing
Full Image Recover for Block-Based Compressive Sensing
Machine learning and evolutionary techniques in interplanetary trajectory design
Spatio-temporal transfer function conditions of positive realness for translation invariant lattice networks of interacting linear systems
Tradeoff between Delay and Physical Layer Security in Wireless Networks
Dual Recurrent Attention Units for Visual Question Answering
Hoeffding’s lemma for Markov Chains and its applications to statistical learning
Inequivalent Berry phases for the bulk polarization
The Hackbusch conjecture on tensor formats – part two
Interference Mitigation Methods for Unmanned Aerial Vehicles Served by Cellular Networks
Strength of forensic evidence for composite hypotheses: An empirical Bayes view with a fixed prior quantile
Adapting predominant and novel sense discovery algorithms for identifying corpus-specific sense differences
On Polynomial time Constructions of Minimum Height Decision Tree
Face Aging with Contextual Generative Adversarial Nets
Towards Reliable (and Efficient) Job Executions in a Practical Geo-distributed Data Analytics System
Risk-sensitive performance criteria and robustness of quantum systems with a relative entropy description of state uncertainty
On indicated coloring of some classes of graphs
Energy Harvesting Fairness in AN-aided Secure MU-MIMO SWIPT Systems with Cooperative Jammer
Phonetic and Graphemic Systems for Multi-Genre Broadcast Transcription
A Nonparametric Delayed Feedback Model for Conversion Rate Prediction
Individual Resource Games and Resource Redistributions
Anomaly Detection in Log Data using Graph Databases and Machine Learning to Defend Advanced Persistent Threats
Robust Sequential Detection in Distributed Sensor Networks
VR Goggles for Robots: Real-to-sim Domain Adaptation for Visual Control
The condition of a function relative to a polytope
Emerging Language Spaces Learned From Massively Multilingual Corpora
HoloFace: Augmenting Human-to-Human Interactions on HoloLens
Online optimal exact identification of a quantum change point
Virtual-to-Real: Learning to Control in Visual Semantic Segmentation
$S$-Leaping: An adaptive, accelerated stochastic simulation algorithm, bridging $τ$-leaping and $R$-leaping
Complex Network Geometry and Frustrated Synchronization
Exploring the diluted ferromagnetic p-spin model with a Cavity Master Equation
Fluctuations of random semi-linear advection equations
Distributed Clustering Algorithm for Spatial Data Mining
State-Adaptive Coded Caching for Symmetric Broadcast Channels
Correlation and Prediction of Evaluation Metrics in Information Retrieval
Common factors in automatic and Sturmian sequences
Elements of Effective Deep Reinforcement Learning towards Tactical Driving Decision Making
Mobility-aware, adaptive algorithms for wireless power transfer in ad hoc networks
An $L^2$-identity and pinned distance problem
A Dynamic Game Approach for Demand-Side Management: Scheduling Energy Storage with Forecasting Errors
Dimensionless $L^p$ estimates for the Riesz vector on manifolds
Moment Analysis of Stochastic Hybrid Systems Using Semidefinite Programming
Signal-plus-noise matrix models: eigenvector deviations and fluctuations
Classifying medical notes into standard disease codes using Machine Learning
Annotation-Free and One-Shot Learning for Instance Segmentation of Homogeneous Object Clusters
A Unified Deep Learning Architecture for Abuse Detection
Crowd Flow Prediction by Deep Spatio-Temporal Transfer Learning
Limit theorems for symmetric $U$-statistics using contractions
Disunited Nations? A Multiplex Network Approach to Detecting Preference Affinity Blocs using Texts and Votes
A Comparison of Word Embeddings for the Biomedical Natural Language Processing
A Unifying Theory of Exactness of Linear Penalty Functions
3D Object Dense Reconstruction from a Single Depth View
Distributed Computing with Heterogeneous Communication Constraints: The Worst-Case Computation Load and Proof by Contradiction
Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples
A Fusion of Appearance based CNNs and Temporal evolution of Skeleton with LSTM for Daily Living Action Recognition
Coded Status Updates in an Energy Harvesting Erasure Channel
How many randomly colored edges make a randomly colored dense graph rainbow hamiltonian or rainbow connected?
DenseReg: Fully Convolutional Dense Shape Regression In-the-Wild