Balancing Interpretability and Predictive Accuracy for Unsupervised Tensor Mining
The PARAFAC tensor decomposition has enjoyed an increasing success in exploratory multi-aspect data mining scenarios. A major challenge remains the estimation of the number of latent factors (i.e., the rank) of the decomposition, which yields high-quality, interpretable results. Previously, we have proposed an automated tensor mining method which leverages a well-known quality heuristic from the field of Chemometrics, the Core Consistency Diagnostic (CORCONDIA), in order to automatically determine the rank for the PARAFAC decomposition. In this work we set out to explore the trade-off between 1) the interpretability/quality of the results (as expressed by CORCONDIA), and 2) the predictive accuracy of the results, in order to further improve the rank estimation quality. Our preliminary results indicate that striking a good balance in that trade-off benefits rank estimation.
Pairing heaps: the forward variant
The pairing heap is a classical heap data structure introduced in 1986 by Fredman, Sedgewick, Sleator, and Tarjan. It is remarkable both for its simplicity and for its excellent performance in practice. The ‘magic’ of pairing heaps lies in the restructuring that happens after the deletion of the smallest element. The resulting collection of trees is consolidated in two rounds: a left-to-right pairing round, followed by a right-to-left accumulation round. Fredman et al. showed, via an elegant correspondence to splay trees, that all operations take logarithmic amortized time. They also proposed an arguably more natural variant, where both pairing and accumulation are performed in a combined left-to-right round (we call this the forward variant). The analogy to splaying breaks down for this variant, and its analysis was left open. Since then, no progress was made on this question. In this paper we show that the amortized costs of inserting an element and of deleting the minimum in the forward variant of pairing heaps are both
. Our analysis relies on a new potential function that captures parent-child rank-differences in the heap, and may have further applications.
Continuous-Time Flows for Deep Generative Models
Normalizing flows have been developed recently as a method for drawing samples from an arbitrary distribution. This method is attractive due to its intrinsic ability to approximate a target distribution arbitrarily well. In practice, however, normalizing flows only consist of a finite number of deterministic transformations, and thus there is no guarantees on the approximation accuracy. In this paper we study the problem of learning deep generative models with {\em continuous-time} flows (CTFs), a family of diffusion-based methods that are able to asymptotically approach a target distribution. We discretize the CTF to make training feasible, and develop theory on the approximation error. A framework is then adopted to distill knowledge from a CTF to an efficient inference network. We apply the technique to deep generative models, including a CTF-based variational autoencoder and an adversarial-network-like density estimator. Experiments on various tasks demonstrate the superiority of the proposed CTF framework compared to existing techniques.
Parallel Statistical Computing with R: An Illustration on Two Architectures
To harness the full benefit of new computing platforms, it is necessary to develop software with parallel computing capabilities. This is no less true for statisticians than for astrophysicists. The R programming language, which is perhaps the most popular software environment for statisticians today, has many packages available for parallel computing. Their diversity in approach can be difficult to navigate. Some have attempted to alleviate this problem by designing common interfaces. However, these approaches offer limited flexibility to the user; additionally, they often serve as poor abstractions to the reality of modern hardware, leading to poor performance. We give a short introduction to two basic parallel computing approaches that closely align with hardware reality, allow the user to understand its performance, and provide sufficient capability to fully utilize multicore and multinode environments. We illustrate both approaches by working through a simple example fitting a random forest model. Beginning with a serial algorithm, we derive two parallel versions. Our objective is to illustrate the use of multiple cores on a single processor and the use of multiple processors in a cluster computer. We discuss the differences between the two versions and how the underlying hardware is used in each case.
Towards Understanding Adversarial Learning for Joint Distribution Matching
We investigate the non-identifiability issues associated with bidirectional adversarial training for joint distribution matching. Within a framework of conditional entropy, we propose both adversarial and non-adversarial approaches to learn desirable matched joint distributions for unsupervised and supervised tasks. We unify a broad family of adversarial models as joint distribution matching problems. Our approach stabilizes learning of unsupervised bidirectional adversarial learning methods. Further, we introduce an extension for semi-supervised learning tasks. Theoretical results are validated in synthetic data and real-world applications.
Discriminative Similarity for Clustering and Semi-Supervised Learning
Similarity-based clustering and semi-supervised learning methods separate the data into clusters or classes according to the pairwise similarity between the data, and the pairwise similarity is crucial for their performance. In this paper, we propose a novel discriminative similarity learning framework which learns discriminative similarity for either data clustering or semi-supervised learning. The proposed framework learns classifier from each hypothetical labeling, and searches for the optimal labeling by minimizing the generalization error of the learned classifiers associated with the hypothetical labeling. Kernel classifier is employed in our framework. By generalization analysis via Rademacher complexity, the generalization error bound for the kernel classifier learned from hypothetical labeling is expressed as the sum of pairwise similarity between the data from different classes, parameterized by the weights of the kernel classifier. Such pairwise similarity serves as the discriminative similarity for the purpose of clustering and semi-supervised learning, and discriminative similarity with similar form can also be induced by the integrated squared error bound for kernel density classification. Based on the discriminative similarity induced by the kernel classifier, we propose new clustering and semi-supervised learning methods.
Semantic Document Distance Measures and Unsupervised Document Revision Detection
In this paper, we model the document revision detection problem as a minimum cost branching problem that relies on computing document distances. Furthermore, we propose two new document distance measures, word vector-based Dynamic Time Warping (wDTW) and word vector-based Tree Edit Distance (wTED). Our revision detection system is designed for a large scale corpus and implemented in Apache Spark. We demonstrate that our system can more precisely detect revisions than state-of-the-art methods by utilizing the Wikipedia revision dumps
https://…/wiki-meta.html and simulated data sets.
Hamiltonian Flow Simulation of Rare Events
Hamiltonian Flow Monte Carlo(HFMC) methods have been implemented in engineering, biology and chemistry. HFMC makes large gradient based steps to rapidly explore the state space. The application of the Hamiltonian dynamics allows to estimate rare events and sample from target distributions defined as the change of measures. The estimates demonstrated a variance reduction of the presented algorithm and its efficiency with respect to a standard Monte Carlo and interacting particle based system(IPS). We tested the algorithm on the case of the barrier option pricing.
Knowledge Sharing for Reinforcement Learning: Writing a BOOK
This paper proposes a novel deep reinforcement learning (RL) method integrating the neural-network-based RL and the classical RL based on dynamic programming. In comparison to the conventional deep RL methods, our method enhances the convergence speed and the performance by delving into the following two characteristic features in the training of conventional RL: (1) Having many credible experiences is important in training RL algorithms, (2) Input states can be semantically clustered into a relatively small number of core clusters, and the states belonging to the same cluster tend to share similar Q-values given an action. By following the two observations, we propose a dictionary-type memory that accumulates the Q-value for each cluster of states as well as the corresponding action, in terms of priority. Then, we iteratively update each Q-value in the memory from the Q-value acquired from the network trained by the experiences stored in the memory. We demonstrate the effectiveness of our method through training RL algorithms on widely used game environments from OpenAI.
Visualizing and Improving Scattering Networks
Scattering Transforms (or ScatterNets) introduced by Mallat are a promising start into creating a well-defined feature extractor to use for pattern recognition and image classification tasks. They are of particular interest due to their architectural similarity to Convolutional Neural Networks (CNNs), while requiring no parameter learning and still performing very well (particularly in constrained classification tasks). In this paper we visualize what the deeper layers of a ScatterNet are sensitive to using a ‘DeScatterNet’. We show that the higher orders of ScatterNets are sensitive to complex, edge-like patterns (checker-boards and rippled edges). These complex patterns may be useful for texture classification, but are quite dissimilar from the patterns visualized in second and third layers of Convolutional Neural Networks (CNNs) – the current state of the art Image Classifiers. We propose that this may be the source of the current gaps in performance between ScatterNets and CNNs (83% vs 93% on CIFAR-10 for ScatterNet+SVM vs ResNet). We then use these visualization tools to propose possible enhancements to the ScatterNet design, which show they have the power to extract features more closely resembling CNNs, while still being well-defined and having the invariance properties fundamental to ScatterNets.
Predicting Visual Features from Text for Image and Video Caption Retrieval
This paper strives to find amidst a set of sentences the one best describing the content of a given image or video. Different from existing works, which rely on a joint subspace for their image and video caption retrieval, we propose to do so in a visual space exclusively. Apart from this conceptual novelty, we contribute \emph{Word2VisualVec}, a deep neural network architecture that learns to predict a visual feature representation from textual input. Example captions are encoded into a textual embedding based on multi-scale sentence vectorization and further transferred into a deep visual feature of choice via a simple multi-layer perceptron. We further generalize Word2VisualVec for video caption retrieval, by predicting from text both 3-D convolutional neural network features as well as a visual-audio representation. Experiments on Flickr8k, Flickr30k, the Microsoft Video Description dataset and the very recent NIST TrecVid challenge for video caption retrieval detail Word2VisualVec’s properties, its benefit over textual embeddings, the potential for multimodal query composition and its state-of-the-art results.
Resource Elasticity for Distributed Data Stream Processing: A Survey and Future Directions
Under several emerging application scenarios, such as in smart cities, operational monitoring of large infrastructures, and Internet of Things, continuous data streams must be processed under very short delays. Several solutions, including multiple software engines, have been developed for processing unbounded data streams in a scalable and efficient manner. This paper surveys state of the art on stream processing engines and mechanisms for exploiting resource elasticity features of cloud computing in stream processing. Resource elasticity allows for an application or service to scale out/in according to fluctuating demands. Although such features have been extensively investigated for enterprise applications, stream processing poses challenges on achieving elastic systems that can make efficient resource management decisions based on current load. This work examines some of these challenges and discusses solutions proposed in the literature to address them.
Deep learning: Technical introduction
This note presents in a technical though hopefully pedagogical way the three most common forms of neural network architectures: Feedforward, Convolutional and Recurrent. For each network, their fundamental building blocks are detailed. The forward pass and the update rules for the backpropagation algorithm are then derived in full.
The Calculus of M-estimation in R with geex
M-estimation, or estimating equation, methods are widely applicable for point estimation and asymptotic inference. In this paper, we present an R package that can find roots and compute the empirical sandwich variance estimator for any set of user- specified, unbiased estimating equations. Examples from the M-estimation primer by Stefanski and Boos (2002) demonstrate use of the software. The package also includes a framework for finite sample variance corrections and a website with an extensive collection of tutorials.
A Statistical Approach to Increase Classification Accuracy in Supervised Learning Algorithms
Probabilistic mixture models have been widely used for different machine learning and pattern recognition tasks such as clustering, dimensionality reduction, and classification. In this paper, we focus on trying to solve the most common challenges related to supervised learning algorithms by using mixture probability distribution functions. With this modeling strategy, we identify sub-labels and generate synthetic data in order to reach better classification accuracy. It means we focus on increasing the training data synthetically to increase the classification accuracy.
Visualization in Bayesian workflow
Bayesian data analysis is not only about computing a posterior distribution, and Bayesian visualization is about more than trace plots of Markov chains. Rather, practical Bayesian data analysis is an iterative process of model building, inference, model checking and evaluation, and model expansion. Visualization is not only helpful in each of these stages of the Bayesian workflow, it is indispensable for making inferences from the intricate, high-dimensional models of interest to applied researchers, and essential for understanding and diagnosing the increasingly complex algorithms required to fit such models.
Linking Generative Adversarial Learning and Binary Classification
In this note, we point out a basic link between generative adversarial (GA) training and binary classification — any powerful discriminator essentially computes an (f-)divergence between real and generated samples. The result, repeatedly re-derived in decision theory, has implications for GA Networks (GANs), providing an alternative perspective on training f-GANs by designing the discriminator loss function.
• Degree correlations in scale-free null models
• Two-Dimensional Holstein Model: Critical Temperature, Ising Universality, and Bipolaron Liquid
• Good Code Sets from Complementary Pairs via Discrete Frequency Chips
• Optimal deep neural networks for sparse recovery via Laplace techniques
• WESPE: Weakly Supervised Photo Enhancer for Digital Cameras
• Learning to parse from a semantic objective: It works. Is it syntax?
• Exact Inference for Relational Graphical Models with Interpreted Functions: Lifted Probabilistic Inference Modulo Theories
• To Be Connected, or Not to Be Connected: That is the Minimum Inefficiency Subgraph Problem
• MIP Formulations for the Steiner Forest Problem
• Liu-type Shrinkage Estimations in Linear Models
• WRPN: Wide Reduced-Precision Networks
• Log-ratio Lasso: Scalable, Sparse Estimation for Log-ratio Models
• A Multilayer-Based Framework for Online Background Subtraction with Freely Moving Cameras
• Link the head to the ‘beak’: Zero Shot Learning from Noisy Text Description at Part Precision
• Abstraction of Linear Consensus Networks with Guaranteed Systemic Performance Measures
• Local dependencies in random fields via a Bonferroni-type inequality
• Tight paths and matchings in convex geometric hypergraphs
• Higher Cycle Operations and a Unipotent Torelli Theorem for Graphs
• Random Subspace with Trees for Feature Selection Under Memory Constraints
• A Convergence Analysis for A Class of Practical Variance-Reduction Stochastic Gradient MCMC
• High-temperature scaling limit for directed polymers on a hierarchical lattice with bond disorder
• Is human face processing a feature- or pattern-based task? Evidence using a unified computational method driven by eye movements
• Learning Neural Word Salience Scores
• Storytelling Agents with Personality and Adaptivity
• Satirical News Detection and Analysis using Attention Mechanism and Linguistic Features
• FLASH: Randomized Algorithms Accelerated over CPU-GPU for Ultra-High Dimensional Similarity Search
• Compositional Approaches for Representing Relations Between Words: A Comparative Study
• Using $k$-way Co-occurrences for Learning Word Embeddings
• Enumeration of N-rooted maps using quantum field theory
• Stochastic Nonlinear Model Predictive Control with State Estimation by Incorporation of the Unscented Kalman Filter
• Truncated fractional moments of stable laws
• Multi-View Spectral Clustering via Structured Low-Rank Matrix Factorization
• Multi-Modal Multi-Scale Deep Learning for Large-Scale Image Annotation
• Machine Learning Spatial Geometry from Entanglement Features
• On the maximum value of conflict-free verex-connection number of graphs
• Electrical networks and hyperplane arrangements
• On the Suboptimality of Proximal Gradient Descent for $\ell^{0}$ Sparse Approximation
• Linear Optimal Low Rank Projection Provably Outperforms Principal Components Analysis in High-Dimensional Multi-class Data
• Order Preserving Maps of Posets
• Newton-type Methods for Inference in Higher-Order Markov Random Fields
• Push-Pull Block Puzzles are Hard
• Simple algorithms to compute $k$-tuple total dominaing sets and $k$-tuple dominating sets
• 1324-avoiding permutations revisited
• Inhomogoenous Hypergraph Clustering with Applications
• Random walk on random walks: higher dimensions
• Random walk on random walks: low densities
• Tensor Representation in High-Frequency Financial Data for Price Change Prediction
• Supervisory observer for parameter and state estimation of nonlinear systems using the DIRECT algorithm
• Passivity based design of sliding modes for optimal Load Frequency Control
• Location of hot spots in thin curved strips
• Implicit Cooperative Positioning in Vehicular Networks
• Polynomial Ensembles and Recurrence Coefficients
• Higher order corrections to the effective potential close to the jamming transition in the perceptron model
• SketchParse : Towards Rich Descriptions for Poorly Drawn Sketches using Multi-Task Hierarchical Deep Networks
• Spectral Mixture Kernels for Multi-Output Gaussian Processes
• Boosting the kernelized shapelets: Theory and algorithms for local features
• Cross-Media Similarity Evaluation for Web Image Retrieval in the Wild
• Distributed second order methods with variable number of working nodes
• Hybrid simulation scheme for volatility modulated moving average fields
• A Unification, Generalization, and Acceleration of Exact Distributed First Order Methods
• Optimal Power Allocation and Secrecy Sum Rate in Two-Way Untrusted Relaying
• Optimal Power Allocation by Imperfect Hardware Analysis in Untrusted Relaying Networks
• Joint Relay Selection and Power Allocation in Large-Scale MIMO Systems with Untrusted Relays and Passive Eavesdroppers
• Random Pilot and Data Access in Massive MIMO for Machine-type Communications
• Learning Non-Metric Visual Similarity for Image Retrieval
• A vertex and edge deletion game on graphs
• Photometric stereo for strong specular highlights
• On the local homology of Artin groups of finite and affine type
• Speeding-up the decision making of a learning agent using an ion trap quantum processor
• Second Order Optimality Conditions and Improved Convergence Results for a Scholtes-type Regularization for a Continuous Reformulation of Cardinality Constrained Optimization Problems
• Local limits of lozenge tilings are stable under bounded boundary height perturbations
• The magnetization ripple: a nonlocal stochastic PDE perspective
• SLO-aware Colocation of Data Center Tasks Based on Instantaneous Processor Requirements
• A Mathematical Framework for Resilience: Dynamics, Strategies, Shocks and Acceptable Paths
• On the Lagrangian branched transport model and the equivalence with its Eulerian formulation
• A fractal shape optimization problem in branched transport
• Multi-label Class-imbalanced Action Recognition in Hockey Videos via 3D Convolutional Neural Networks
• Towards social pattern characterization in egocentric photo-streams
• Stochastic Gradient Descent: Going As Fast As Possible But Not Faster
• Coalitional game with opinion exchange
• An Exact Approach for the Balanced k-Way Partitioning Problem with Weight Constraints and its Application to Sports Team Realignment
• A Generic Approach for Escaping Saddle points
• A general class of mosaic random fields
• Dense Face Alignment
• Common factors, trends, and cycles in large datasets
• Conditional independence testing based on a nearest-neighbor estimator of conditional mutual information
• The Devil is in the Tails: Fine-grained Classification in the Wild
• ML and Near-ML Decoding of LDPC Codes Over the BEC: Bounds and Decoding Algorithms
• 6D Object Pose Estimation with Depth Images: A Seamless Approach for Robotic Interaction and Augmented Reality
• Disordered Quantum Spin Chains with Long-Range Antiferromagnetic Interactions
• Parking cars after a trailer
• Subspace Segmentation by Successive Approximations: A Method for Low-Rank and High-Rank Data with Missing Entries
• Learning the PE Header, Malware Detection with Minimal Domain Knowledge
• Leveraging multiple datasets for deep leaf counting
• Sparsity-Aware Joint Frame Synchronization and Channel Estimation: Algorithm and USRP Implementation
• Fine-tuning deep CNN models on specific MS COCO categories
• Active Exploration for Learning Symbolic Representations
• The traffic distribution of the squared unimodular random matrix and a formula for the moments of its ESD
• SeDAR – Semantic Detection and Ranging: Humans can localise without LiDAR, can robots?
• Squeeze-and-Excitation Networks
Like this:
Like Loading...