Balancing Interpretability and Predictive Accuracy for Unsupervised Tensor Mining

The PARAFAC tensor decomposition has enjoyed an increasing success in exploratory multi-aspect data mining scenarios. A major challenge remains the estimation of the number of latent factors (i.e., the rank) of the decomposition, which yields high-quality, interpretable results. Previously, we have proposed an automated tensor mining method which leverages a well-known quality heuristic from the field of Chemometrics, the Core Consistency Diagnostic (CORCONDIA), in order to automatically determine the rank for the PARAFAC decomposition. In this work we set out to explore the trade-off between 1) the interpretability/quality of the results (as expressed by CORCONDIA), and 2) the predictive accuracy of the results, in order to further improve the rank estimation quality. Our preliminary results indicate that striking a good balance in that trade-off benefits rank estimation.


Pairing heaps: the forward variant

The pairing heap is a classical heap data structure introduced in 1986 by Fredman, Sedgewick, Sleator, and Tarjan. It is remarkable both for its simplicity and for its excellent performance in practice. The ‘magic’ of pairing heaps lies in the restructuring that happens after the deletion of the smallest element. The resulting collection of trees is consolidated in two rounds: a left-to-right pairing round, followed by a right-to-left accumulation round. Fredman et al. showed, via an elegant correspondence to splay trees, that all operations take logarithmic amortized time. They also proposed an arguably more natural variant, where both pairing and accumulation are performed in a combined left-to-right round (we call this the forward variant). The analogy to splaying breaks down for this variant, and its analysis was left open. Since then, no progress was made on this question. In this paper we show that the amortized costs of inserting an element and of deleting the minimum in the forward variant of pairing heaps are both O(\log{n} \cdot 4^{\sqrt{\log{n}}} ). Our analysis relies on a new potential function that captures parent-child rank-differences in the heap, and may have further applications.


Continuous-Time Flows for Deep Generative Models

Normalizing flows have been developed recently as a method for drawing samples from an arbitrary distribution. This method is attractive due to its intrinsic ability to approximate a target distribution arbitrarily well. In practice, however, normalizing flows only consist of a finite number of deterministic transformations, and thus there is no guarantees on the approximation accuracy. In this paper we study the problem of learning deep generative models with {\em continuous-time} flows (CTFs), a family of diffusion-based methods that are able to asymptotically approach a target distribution. We discretize the CTF to make training feasible, and develop theory on the approximation error. A framework is then adopted to distill knowledge from a CTF to an efficient inference network. We apply the technique to deep generative models, including a CTF-based variational autoencoder and an adversarial-network-like density estimator. Experiments on various tasks demonstrate the superiority of the proposed CTF framework compared to existing techniques.


Parallel Statistical Computing with R: An Illustration on Two Architectures

To harness the full benefit of new computing platforms, it is necessary to develop software with parallel computing capabilities. This is no less true for statisticians than for astrophysicists. The R programming language, which is perhaps the most popular software environment for statisticians today, has many packages available for parallel computing. Their diversity in approach can be difficult to navigate. Some have attempted to alleviate this problem by designing common interfaces. However, these approaches offer limited flexibility to the user; additionally, they often serve as poor abstractions to the reality of modern hardware, leading to poor performance. We give a short introduction to two basic parallel computing approaches that closely align with hardware reality, allow the user to understand its performance, and provide sufficient capability to fully utilize multicore and multinode environments. We illustrate both approaches by working through a simple example fitting a random forest model. Beginning with a serial algorithm, we derive two parallel versions. Our objective is to illustrate the use of multiple cores on a single processor and the use of multiple processors in a cluster computer. We discuss the differences between the two versions and how the underlying hardware is used in each case.


Towards Understanding Adversarial Learning for Joint Distribution Matching

We investigate the non-identifiability issues associated with bidirectional adversarial training for joint distribution matching. Within a framework of conditional entropy, we propose both adversarial and non-adversarial approaches to learn desirable matched joint distributions for unsupervised and supervised tasks. We unify a broad family of adversarial models as joint distribution matching problems. Our approach stabilizes learning of unsupervised bidirectional adversarial learning methods. Further, we introduce an extension for semi-supervised learning tasks. Theoretical results are validated in synthetic data and real-world applications.


Discriminative Similarity for Clustering and Semi-Supervised Learning

Similarity-based clustering and semi-supervised learning methods separate the data into clusters or classes according to the pairwise similarity between the data, and the pairwise similarity is crucial for their performance. In this paper, we propose a novel discriminative similarity learning framework which learns discriminative similarity for either data clustering or semi-supervised learning. The proposed framework learns classifier from each hypothetical labeling, and searches for the optimal labeling by minimizing the generalization error of the learned classifiers associated with the hypothetical labeling. Kernel classifier is employed in our framework. By generalization analysis via Rademacher complexity, the generalization error bound for the kernel classifier learned from hypothetical labeling is expressed as the sum of pairwise similarity between the data from different classes, parameterized by the weights of the kernel classifier. Such pairwise similarity serves as the discriminative similarity for the purpose of clustering and semi-supervised learning, and discriminative similarity with similar form can also be induced by the integrated squared error bound for kernel density classification. Based on the discriminative similarity induced by the kernel classifier, we propose new clustering and semi-supervised learning methods.


Semantic Document Distance Measures and Unsupervised Document Revision Detection

In this paper, we model the document revision detection problem as a minimum cost branching problem that relies on computing document distances. Furthermore, we propose two new document distance measures, word vector-based Dynamic Time Warping (wDTW) and word vector-based Tree Edit Distance (wTED). Our revision detection system is designed for a large scale corpus and implemented in Apache Spark. We demonstrate that our system can more precisely detect revisions than state-of-the-art methods by utilizing the Wikipedia revision dumps https://…/wiki-meta.html and simulated data sets.


Hamiltonian Flow Simulation of Rare Events

Hamiltonian Flow Monte Carlo(HFMC) methods have been implemented in engineering, biology and chemistry. HFMC makes large gradient based steps to rapidly explore the state space. The application of the Hamiltonian dynamics allows to estimate rare events and sample from target distributions defined as the change of measures. The estimates demonstrated a variance reduction of the presented algorithm and its efficiency with respect to a standard Monte Carlo and interacting particle based system(IPS). We tested the algorithm on the case of the barrier option pricing.


Knowledge Sharing for Reinforcement Learning: Writing a BOOK

This paper proposes a novel deep reinforcement learning (RL) method integrating the neural-network-based RL and the classical RL based on dynamic programming. In comparison to the conventional deep RL methods, our method enhances the convergence speed and the performance by delving into the following two characteristic features in the training of conventional RL: (1) Having many credible experiences is important in training RL algorithms, (2) Input states can be semantically clustered into a relatively small number of core clusters, and the states belonging to the same cluster tend to share similar Q-values given an action. By following the two observations, we propose a dictionary-type memory that accumulates the Q-value for each cluster of states as well as the corresponding action, in terms of priority. Then, we iteratively update each Q-value in the memory from the Q-value acquired from the network trained by the experiences stored in the memory. We demonstrate the effectiveness of our method through training RL algorithms on widely used game environments from OpenAI.


Visualizing and Improving Scattering Networks

Scattering Transforms (or ScatterNets) introduced by Mallat are a promising start into creating a well-defined feature extractor to use for pattern recognition and image classification tasks. They are of particular interest due to their architectural similarity to Convolutional Neural Networks (CNNs), while requiring no parameter learning and still performing very well (particularly in constrained classification tasks). In this paper we visualize what the deeper layers of a ScatterNet are sensitive to using a ‘DeScatterNet’. We show that the higher orders of ScatterNets are sensitive to complex, edge-like patterns (checker-boards and rippled edges). These complex patterns may be useful for texture classification, but are quite dissimilar from the patterns visualized in second and third layers of Convolutional Neural Networks (CNNs) – the current state of the art Image Classifiers. We propose that this may be the source of the current gaps in performance between ScatterNets and CNNs (83% vs 93% on CIFAR-10 for ScatterNet+SVM vs ResNet). We then use these visualization tools to propose possible enhancements to the ScatterNet design, which show they have the power to extract features more closely resembling CNNs, while still being well-defined and having the invariance properties fundamental to ScatterNets.


Predicting Visual Features from Text for Image and Video Caption Retrieval

This paper strives to find amidst a set of sentences the one best describing the content of a given image or video. Different from existing works, which rely on a joint subspace for their image and video caption retrieval, we propose to do so in a visual space exclusively. Apart from this conceptual novelty, we contribute \emph{Word2VisualVec}, a deep neural network architecture that learns to predict a visual feature representation from textual input. Example captions are encoded into a textual embedding based on multi-scale sentence vectorization and further transferred into a deep visual feature of choice via a simple multi-layer perceptron. We further generalize Word2VisualVec for video caption retrieval, by predicting from text both 3-D convolutional neural network features as well as a visual-audio representation. Experiments on Flickr8k, Flickr30k, the Microsoft Video Description dataset and the very recent NIST TrecVid challenge for video caption retrieval detail Word2VisualVec’s properties, its benefit over textual embeddings, the potential for multimodal query composition and its state-of-the-art results.


Resource Elasticity for Distributed Data Stream Processing: A Survey and Future Directions

Under several emerging application scenarios, such as in smart cities, operational monitoring of large infrastructures, and Internet of Things, continuous data streams must be processed under very short delays. Several solutions, including multiple software engines, have been developed for processing unbounded data streams in a scalable and efficient manner. This paper surveys state of the art on stream processing engines and mechanisms for exploiting resource elasticity features of cloud computing in stream processing. Resource elasticity allows for an application or service to scale out/in according to fluctuating demands. Although such features have been extensively investigated for enterprise applications, stream processing poses challenges on achieving elastic systems that can make efficient resource management decisions based on current load. This work examines some of these challenges and discusses solutions proposed in the literature to address them.


Deep learning: Technical introduction

This note presents in a technical though hopefully pedagogical way the three most common forms of neural network architectures: Feedforward, Convolutional and Recurrent. For each network, their fundamental building blocks are detailed. The forward pass and the update rules for the backpropagation algorithm are then derived in full.


The Calculus of M-estimation in R with geex

M-estimation, or estimating equation, methods are widely applicable for point estimation and asymptotic inference. In this paper, we present an R package that can find roots and compute the empirical sandwich variance estimator for any set of user- specified, unbiased estimating equations. Examples from the M-estimation primer by Stefanski and Boos (2002) demonstrate use of the software. The package also includes a framework for finite sample variance corrections and a website with an extensive collection of tutorials.


A Statistical Approach to Increase Classification Accuracy in Supervised Learning Algorithms

Probabilistic mixture models have been widely used for different machine learning and pattern recognition tasks such as clustering, dimensionality reduction, and classification. In this paper, we focus on trying to solve the most common challenges related to supervised learning algorithms by using mixture probability distribution functions. With this modeling strategy, we identify sub-labels and generate synthetic data in order to reach better classification accuracy. It means we focus on increasing the training data synthetically to increase the classification accuracy.


Visualization in Bayesian workflow

Bayesian data analysis is not only about computing a posterior distribution, and Bayesian visualization is about more than trace plots of Markov chains. Rather, practical Bayesian data analysis is an iterative process of model building, inference, model checking and evaluation, and model expansion. Visualization is not only helpful in each of these stages of the Bayesian workflow, it is indispensable for making inferences from the intricate, high-dimensional models of interest to applied researchers, and essential for understanding and diagnosing the increasingly complex algorithms required to fit such models.


Linking Generative Adversarial Learning and Binary Classification

In this note, we point out a basic link between generative adversarial (GA) training and binary classification — any powerful discriminator essentially computes an (f-)divergence between real and generated samples. The result, repeatedly re-derived in decision theory, has implications for GA Networks (GANs), providing an alternative perspective on training f-GANs by designing the discriminator loss function.


Degree correlations in scale-free null models
Two-Dimensional Holstein Model: Critical Temperature, Ising Universality, and Bipolaron Liquid
Good Code Sets from Complementary Pairs via Discrete Frequency Chips
Optimal deep neural networks for sparse recovery via Laplace techniques
WESPE: Weakly Supervised Photo Enhancer for Digital Cameras
Learning to parse from a semantic objective: It works. Is it syntax?
Exact Inference for Relational Graphical Models with Interpreted Functions: Lifted Probabilistic Inference Modulo Theories
To Be Connected, or Not to Be Connected: That is the Minimum Inefficiency Subgraph Problem
MIP Formulations for the Steiner Forest Problem
Liu-type Shrinkage Estimations in Linear Models
WRPN: Wide Reduced-Precision Networks
Log-ratio Lasso: Scalable, Sparse Estimation for Log-ratio Models
A Multilayer-Based Framework for Online Background Subtraction with Freely Moving Cameras
Link the head to the ‘beak’: Zero Shot Learning from Noisy Text Description at Part Precision
Abstraction of Linear Consensus Networks with Guaranteed Systemic Performance Measures
Local dependencies in random fields via a Bonferroni-type inequality
Tight paths and matchings in convex geometric hypergraphs
Higher Cycle Operations and a Unipotent Torelli Theorem for Graphs
Random Subspace with Trees for Feature Selection Under Memory Constraints
A Convergence Analysis for A Class of Practical Variance-Reduction Stochastic Gradient MCMC
High-temperature scaling limit for directed polymers on a hierarchical lattice with bond disorder
Is human face processing a feature- or pattern-based task? Evidence using a unified computational method driven by eye movements
Learning Neural Word Salience Scores
Storytelling Agents with Personality and Adaptivity
Satirical News Detection and Analysis using Attention Mechanism and Linguistic Features
FLASH: Randomized Algorithms Accelerated over CPU-GPU for Ultra-High Dimensional Similarity Search
Compositional Approaches for Representing Relations Between Words: A Comparative Study
Using $k$-way Co-occurrences for Learning Word Embeddings
Enumeration of N-rooted maps using quantum field theory
Stochastic Nonlinear Model Predictive Control with State Estimation by Incorporation of the Unscented Kalman Filter
Truncated fractional moments of stable laws
Multi-View Spectral Clustering via Structured Low-Rank Matrix Factorization
Multi-Modal Multi-Scale Deep Learning for Large-Scale Image Annotation
Machine Learning Spatial Geometry from Entanglement Features
On the maximum value of conflict-free verex-connection number of graphs
Electrical networks and hyperplane arrangements
On the Suboptimality of Proximal Gradient Descent for $\ell^{0}$ Sparse Approximation
Linear Optimal Low Rank Projection Provably Outperforms Principal Components Analysis in High-Dimensional Multi-class Data
Order Preserving Maps of Posets
Newton-type Methods for Inference in Higher-Order Markov Random Fields
Push-Pull Block Puzzles are Hard
Simple algorithms to compute $k$-tuple total dominaing sets and $k$-tuple dominating sets
1324-avoiding permutations revisited
Inhomogoenous Hypergraph Clustering with Applications
Random walk on random walks: higher dimensions
Random walk on random walks: low densities
Tensor Representation in High-Frequency Financial Data for Price Change Prediction
Supervisory observer for parameter and state estimation of nonlinear systems using the DIRECT algorithm
Passivity based design of sliding modes for optimal Load Frequency Control
Location of hot spots in thin curved strips
Implicit Cooperative Positioning in Vehicular Networks
Polynomial Ensembles and Recurrence Coefficients
Higher order corrections to the effective potential close to the jamming transition in the perceptron model
SketchParse : Towards Rich Descriptions for Poorly Drawn Sketches using Multi-Task Hierarchical Deep Networks
Spectral Mixture Kernels for Multi-Output Gaussian Processes
Boosting the kernelized shapelets: Theory and algorithms for local features
Cross-Media Similarity Evaluation for Web Image Retrieval in the Wild
Distributed second order methods with variable number of working nodes
Hybrid simulation scheme for volatility modulated moving average fields
A Unification, Generalization, and Acceleration of Exact Distributed First Order Methods
Optimal Power Allocation and Secrecy Sum Rate in Two-Way Untrusted Relaying
Optimal Power Allocation by Imperfect Hardware Analysis in Untrusted Relaying Networks
Joint Relay Selection and Power Allocation in Large-Scale MIMO Systems with Untrusted Relays and Passive Eavesdroppers
Random Pilot and Data Access in Massive MIMO for Machine-type Communications
Learning Non-Metric Visual Similarity for Image Retrieval
A vertex and edge deletion game on graphs
Photometric stereo for strong specular highlights
On the local homology of Artin groups of finite and affine type
Speeding-up the decision making of a learning agent using an ion trap quantum processor
Second Order Optimality Conditions and Improved Convergence Results for a Scholtes-type Regularization for a Continuous Reformulation of Cardinality Constrained Optimization Problems
Local limits of lozenge tilings are stable under bounded boundary height perturbations
The magnetization ripple: a nonlocal stochastic PDE perspective
SLO-aware Colocation of Data Center Tasks Based on Instantaneous Processor Requirements
A Mathematical Framework for Resilience: Dynamics, Strategies, Shocks and Acceptable Paths
On the Lagrangian branched transport model and the equivalence with its Eulerian formulation
A fractal shape optimization problem in branched transport
Multi-label Class-imbalanced Action Recognition in Hockey Videos via 3D Convolutional Neural Networks
Towards social pattern characterization in egocentric photo-streams
Stochastic Gradient Descent: Going As Fast As Possible But Not Faster
Coalitional game with opinion exchange
An Exact Approach for the Balanced k-Way Partitioning Problem with Weight Constraints and its Application to Sports Team Realignment
A Generic Approach for Escaping Saddle points
A general class of mosaic random fields
Dense Face Alignment
Common factors, trends, and cycles in large datasets
Conditional independence testing based on a nearest-neighbor estimator of conditional mutual information
The Devil is in the Tails: Fine-grained Classification in the Wild
ML and Near-ML Decoding of LDPC Codes Over the BEC: Bounds and Decoding Algorithms
6D Object Pose Estimation with Depth Images: A Seamless Approach for Robotic Interaction and Augmented Reality
Disordered Quantum Spin Chains with Long-Range Antiferromagnetic Interactions
Parking cars after a trailer
Subspace Segmentation by Successive Approximations: A Method for Low-Rank and High-Rank Data with Missing Entries
Learning the PE Header, Malware Detection with Minimal Domain Knowledge
Leveraging multiple datasets for deep leaf counting
Sparsity-Aware Joint Frame Synchronization and Channel Estimation: Algorithm and USRP Implementation
Fine-tuning deep CNN models on specific MS COCO categories
Active Exploration for Learning Symbolic Representations
The traffic distribution of the squared unimodular random matrix and a formula for the moments of its ESD
SeDAR – Semantic Detection and Ranging: Humans can localise without LiDAR, can robots?
Squeeze-and-Excitation Networks