Partial Membership Latent Dirichlet Allocation

For many years, topic models (e.g., pLSA, LDA, SLDA) have been widely used for segmenting and recognizing objects in imagery simultaneously. However, these models are confined to the analysis of categorical data, forcing a visual word to belong to one and only one topic. There are many images in which some regions cannot be assigned a crisp categorical label (e.g., transition regions between a foggy sky and the ground or between sand and water at a beach). In these cases, a visual word is best represented with partial memberships across multiple topics. To address this, we present a partial membership latent Dirichlet allocation (PM-LDA) model and associated parameter estimation algorithms. PM-LDA defines a novel partial membership model for word and document generation. We employ Gibbs sampling for parameter estimation. Experimental results on two natural image datasets and one SONAR image dataset show that PM-LDA can produce both crisp and soft semantic image segmentations; a capability existing methods do not have.


ReOpt: an Algorithm with a Quality Guaranty for Solving the Static Relocation Problem

In a carsharing system, a fleet of cars is distributed at stations in an urban area, customers can take and return cars at any time and station. For operating such a system in a satisfactory way, the stations have to keep a good ratio between the numbers of free places and cars in each station. This leads to the problem of relocating cars between stations, which can be modeled within the framework of a metric task system. In this paper, we focus on the Static Relocation Problem, where the system has to be set into a certain state, outgoing from the current state. We present a combinatorial approach and provide approximation factors for several different situations.


Generalized Spatial Regression with Differential Regularization

We propose a method for the analysis of data scattered over a spatial irregularly shaped domain and having a distribution within the exponential family. This is a generalized additive model for spatially distributed data. The model is fitted by maximizing a penalized log-likelihood function with a roughness penalty term that involves a differential operator of the spatial field over the domain of interest. Efficient spatial field estimation is achieved resorting to the finite element method, which provides a basis for piecewise polynomial surfaces. The method is illustrated by an application to the study of criminality in the city of Portland, Oregon, USA.


Change-Point Detection and Bootstrap for Hilbert Space Valued Random Fields

The problem of testing for the presence of epidemic changes in random fields is investigated. In order to be able to deal with general changes in the marginal distribution, a Cram\’er-von-Mises-type test is introduced which is based on Hilbert space theory. A functional central limit theorem for \rho-mixing Hilbert space valued random fields is proven. In order to avoid the estimation of the long-run variance and obtain critical values, Shao’s dependent wild bootstrap method is adapted to this context. For this, a joint functional central limit theorem for the original and the bootstrap sample is shown. Finally, the theoretic results are supplemented by a short simulation study.


Speed learning on the fly

The practical performance of online stochastic gradient descent algorithms is highly dependent on the chosen step size, which must be tediously hand-tuned in many applications. The same is true for more advanced variants of stochastic gradients, such as SAGA, SVRG, or AdaGrad. Here we propose to adapt the step size by performing a gradient descent on the step size itself, viewing the whole performance of the learning trajectory as a function of step size. Importantly, this adaptation can be computed online at little cost, without having to iterate backward passes over the full data.


Towards Structured Deep Neural Network for Automatic Speech Recognition

In this paper we propose the Structured Deep Neural Network (structured DNN) as a structured and deep learning framework. This approach can learn to find the best structured object (such as a label sequence) given a structured input (such as a vector sequence) by globally considering the mapping relationships between the structures rather than item by item. When automatic speech recognition is viewed as a special case of such a structured learning problem, where we have the acoustic vector sequence as the input and the phoneme label sequence as the output, it becomes possible to comprehensively learn utterance by utterance as a whole, rather than frame by frame. Structured Support Vector Machine (structured SVM) was proposed to perform ASR with structured learning previously, but limited by the linear nature of SVM. Here we propose structured DNN to use nonlinear transformations in multi-layers as a structured and deep learning approach. This approach was shown to beat structured SVM in preliminary experiments on TIMIT.


Hierarchical Variational Models

Black box inference allows researchers to easily prototype and evaluate an array of models. Recent advances in variational inference allow such algorithms to scale to high dimensions. However, a central question remains: How to specify an expressive variational distribution which maintains efficient computation? To address this, we develop hierarchical variational models. In a hierarchical variational model, the variational approximation is augmented with a prior on its parameters, such that the latent variables are conditionally independent given this shared structure. This preserves the computational efficiency of the original approximation, while admitting hierarchically complex distributions for both discrete and continuous latent variables. We study hierarchical variational models on a variety of deep discrete latent variable models. Hierarchical variational models generalize other expressive variational distributions and maintains higher fidelity to the posterior.


The Value Functions of Markov Decision Processes

We provide a full characterization of the set of value functions of Markov decision processes.


Combining Privileged Information to Improve Context-Aware Recommender Systems

A recommender system is an information filtering technology which can be used to predict preference ratings of items (products, services, movies, etc) and/or to output a ranking of items that are likely to be of interest to the user. Context-aware recommender systems (CARS) learn and predict the tastes and preferences of users by incorporating available contextual information in the recommendation process. One of the major challenges in context-aware recommender systems research is the lack of automatic methods to obtain contextual information for these systems. Considering this scenario, in this paper, we propose to use contextual information from topic hierarchies of the items (web pages) to improve the performance of context-aware recommender systems. The topic hierarchies are constructed by an extension of the LUPI-based Incremental Hierarchical Clustering method that considers three types of information: traditional bag-of-words (technical information), and the combination of named entities (privileged information I) with domain terms (privileged information II). We evaluated the contextual information in four context-aware recommender systems. Different weights were assigned to each type of information. The empirical results demonstrated that topic hierarchies with the combination of the two kinds of privileged information can provide better recommendations.


Efficient Multiscale Gaussian Process Regression using Hierarchical Clustering

Standard Gaussian Process (GP) regression, a powerful machine learning tool, is computationally expensive when it is applied to large datasets, and potentially inaccurate when data points are sparsely distributed in a high-dimensional feature space. To address these challenges, a new multiscale, sparsified GP algorithm is formulated, with the goal of application to large scientific computing datasets. In this approach, the data is partitioned into clusters and the cluster centers are used to define a reduced training set, resulting in an improvement over standard GPs in terms of training and evaluation costs. Further, a hierarchical technique is used to adaptively map the local covariance representation to the underlying sparsity of the feature space, leading to improved prediction accuracy when the data distribution is highly non-uniform. A theoretical investigation of the computational complexity of the algorithm is presented. The efficacy of this method is then demonstrated on simple analytical functions and on data from a direct numerical simulation of turbulent combustion.


A Note on Bounded Biclique Coverings of Complete Graphs

Symmetries and control in generative neural nets

The structure of $\{U_{2,5}, U_{3,5}\}$-fragile matroids

New hook-content formulas for strict partitions

Multiple Instance Dictionary Learning using Functions of Multiple Instances

Strong ratio limit theorems associated with random walks

Exponentials and $R$-recurrent random walks on groups

Difference operators for partitions under the Littlewood decomposition

Parameterized complexity of length-bounded cuts and multi-cuts

Deep Compositional Question Answering with Neural Module Networks

Bayesian Inference in Cumulative Distribution Fields

Generating Images from Captions with Attention

Finite-order correlation length for 4-dimensional weakly self-avoiding walk and $|\varphi|^4$ spins

Approximation Algorithms for Finding Maximum Induced Expanders

On the importance of being structured: instantaneous coalescence rates and a re-evaluation of human evolution

On Posterior Consistency of Tail Index for Bayesian Kernel Mixture Models

Hypercontractivity of heat semigroups on free quantum groups

A New Class of Nonsymmetric Multivariate Dependence Measures

A short proof that every finite graph has a tree-decomposition displaying its tangles

PAC-Bayesian High Dimensional Bipartite Ranking

Variance inequalities for quadratic forms with applications

Learning Instrumental Variables with Non-Gaussianity Assumptions: Theoretical Limitations and Practical Algorithms

The continuous Anderson hamiltonian in dimension two

Strong existence and higher order Fréchet differentiability of stochastic flows of fractional Brownian motion driven SDE’s with singular drift

Nash equilibria for non zero-sum ergodic stochastic differential games

The existence of Riemann-Stieltjes integrals with applications to fractional Brownian motion

The necessary and sufficient conditions in the Marchenko-Pastur theorem

Biologically Inspired Dynamic Textures for Probing Motion Perception

Rare event computation in deterministic chaotic systems using genealogical particle analysis

Cluster Algebras of Type $D_4$, Tropical Planes, and the Positive Tropical Grassmannian

Cracks in random brittle solids: From fiber bundles to continuum mechanics

Genomics and Biological Big Data: Facing Current and Future Challenges around Data and Software Sharing and Reproducibility

Computing Seshadri constants on smooth toric surfaces

Bayesian SegNet: Model Uncertainty in Deep Convolutional Encoder-Decoder Architectures for Scene Understanding

On the existence of SLE trace: finite energy drivers and non-constant $κ$

Enacting textual entailment and ontologies for automated essay grading in chemical domain

An Efficient Multilinear Optimization Framework for Hypergraph Matching

Two Flow-Based Approaches for the Static Relocation Problem in Carsharing Systems

Approximate methods for dynamic ecological models

Exponential Decay of Matrix $Φ$-Entropies on Markov Semigroups with Applications to Dynamical Evolutions of Quantum Ensembles

Toward Biochemical Probabilistic Computation

Decomposition Bounds for Marginal MAP

Optimal Dynamic Strings

Waste Makes Haste: Bounded Time Protocols for Envy-Free Cake Cutting with Free Disposal

A New Relaxation Approach to Normalized Hypergraph Cut

Faster Randomized Branching Algorithms for $r$-SAT

S-PowerGraph: Streaming Graph Partitioning for Natural Graphs by Vertex-Cut

Batch-normalized Maxout Network in Network

How far can we go without convolution: Improving fully-connected networks

On the Cauchy problem for Stochastic parabolic equations in Hölder spaces

Explicit Knowledge-based Reasoning for Visual Question Answering

The Log-Behavior of $\sqrt[n]{p(n)}$ and $\sqrt[n]{p(n)/n}$

Sentiment Expression via Emoticons on Social Media

Deep Recurrent Neural Networks for Sequential Phenotype Prediction in Genomics

Estimation for bivariate quantile varying coefficient model

Backward Iteration Algorithms for Julia sets of Möbius Semigroups

Distributed Security Constrained Economic Dispatch

On unavoidable induced subgraphs in large prime graphs

Sandwiching the marginal likelihood using bidirectional Monte Carlo

Energy and discrepancy of rotationally invariant determinantal point processes in high dimensional spheres

Order Determination of Large Dimensional Dynamic Factor Model

Asymptotics of lattice walks via analytic combinatorics in several variables

Improved Approximation Algorithms for Relay Placement

Inertia Sets For Families of Graphs

Estimating a smooth function on a large graph by Bayesian Laplacian regularisation

Algorithmic Stability for Adaptive Data Analysis

Autotuning OpenCL Workgroup Size for Stencil Patterns

Hardness and Approximation for Network Flow Interdiction

Interdicting Structured Combinatorial Optimization Problems with {0,1}-Objectives

Statistical physics of inference: Thresholds and algorithms

On Sylvester Colorings of Cubic Graphs

Graph Isomorphism for Bounded Genus Graphs In Linear Time

(Yet) Another Theoretical Model of Thinking

Learning Linguistic Biomarkers for Predicting Mild Cognitive Impairment using Compound Skip-grams

A Chinese POS Decision Method Using Korean Translation Information

Accelerating Recommender Systems using GPUs

A Study of an Modeling Method of T-S fuzzy System Based on Moving Fuzzy Reasoning and Its Application

On the chromatic number of structured Cayley graphs

A Winner-Take-All Approach to Emotional Neural Networks with Universal Approximation Property

Design of an Alarm System for Isfahan Ozone Level based on Artificial Intelligence Predictor Models

Sufficient Conditions for Graphicality of Bidegree Sequences

Max-Sum Diversification, Monotone Submodular Functions and Semi-metric Spaces

On Stabbing Queries for Generalized Longest Repeat

Spectra and Laplacian spectra of arbitrary powers of lexicographic products of graphs

The Saxl Conjecture for Fourth Powers via the Semigroup Property

Review-Level Sentiment Classification with Sentence-Level Polarity Correction

Information Extraction Under Privacy Constraints

On Schur p-groups of odd order

Frequentist tests for Bayesian models

Network-Based Analysis of a Small Ebola Outbreak

Opposites Attract: Virtual Cluster Embedding for Profit

Performance Analysis of Multiclass Support Vector Machine Classification for Diagnosis of Coronary Heart Diseases

Construction of SDE-based wind speed models with exponential autocorrelation

Markov chain order estimation with parametric significance tests of conditional mutual information

On reducing the Erdös-Szekeres problem into a constraint unsatisfiability problem regarding certain multisets

Almost sure convergence of vertex degree densities in the vertex-splitting model

The Goldilocks Principle: Reading Children’s Books with Explicit Memory Representations

The Sample Complexity of Auctions with Side Information

Estimating large deviation rate functions

A Derivative-Free Trust-Region Method for Reliability-Based Optimization

Generation and Comprehension of Unambiguous Object Descriptions

Robotics Technology in Mental Health Care

Stacked Attention Networks for Image Question Answering

Spontaneous Quantum Teleportation in a Quenched Spin Lattice

Signed Support Recovery for Single Index Models in High-Dimensions

Active Perceptual Similarity Modeling with Auxiliary Information

Self-replication with magnetic dipolar colloids

Climbing Mont Blanc – A Training Site for Energy Efficient Programming on Heterogeneous Multicore Processors

NAND-Trees, Average Choice Complexity, and Effective Resistance