A Semantic Loss Function for Deep Learning with Symbolic Knowledge

This paper develops a novel methodology for using symbolic knowledge in deep learning. From first principles, we derive a semantic loss function that bridges between neural output vectors and logical constraints. This loss function captures how close the neural network is to satisfying the constraints on its output. An experimental evaluation shows that our semantic loss function effectively guides the learner to achieve (near-)state-of-the-art results on semi-supervised multi-class classification. Moreover, it significantly increases the ability of the neural network to predict structured objects, such as rankings and paths. These discrete concepts are tremendously difficult to learn, and benefit from a tight integration of deep learning and symbolic reasoning methods.

Collaborative Filtering with Social Exposure: A Modular Approach to Social Recommendation

This paper is concerned with how to make efficient use of social information to improve recommendations. Most existing social recommender systems assume people share similar preferences with their social friends. Which, however, may not hold true due to various motivations of making online friends and dynamics of online social networks. Inspired by recent causal process based recommendations that first model user exposures towards items and then use these exposures to guide rating prediction, we utilize social information to capture user exposures rather than user preferences. We assume that people get information of products from their online friends and they do not have to share similar preferences, which is less restrictive and seems closer to reality. Under this new assumption, in this paper, we present a novel recommendation approach (named SERec) to integrate social exposure into collaborative filtering. We propose two methods to implement SERec, namely social regularization and social boosting, each with different ways to construct social exposures. Experiments on four real-world datasets demonstrate that our methods outperform the state-of-the-art methods on top-N recommendations. Further study compares the robustness and scalability of the two proposed methods.

HSC: A Novel Method for Clustering Hierarchies of Networked Data

Hierarchical clustering is one of the most powerful solutions to the problem of clustering, on the grounds that it performs a multi scale organization of the data. In recent years, research on hierarchical clustering methods has attracted considerable interest due to the demanding modern application domains. We present a novel divisive hierarchical clustering framework called Hierarchical Stochastic Clustering (HSC), that acts in two stages. In the first stage, it finds a primary hierarchy of clustering partitions in a dataset. In the second stage, feeds a clustering algorithm with each one of the clusters of the very detailed partition, in order to settle the final result. The output is a hierarchy of clusters. Our method is based on the previous research of Meyer and Weissel Stochastic Data Clustering and the theory of Simon and Ando on Variable Aggregation. Our experiments show that our framework builds a meaningful hierarchy of clusters and benefits consistently the clustering algorithm that acts in the second stage, not only computationally but also in terms of cluster quality. This result suggest that HSC framework is ideal for obtaining hierarchical solutions of large volumes of data.

Multimodal Attribute Extraction

The broad goal of information extraction is to derive structured information from unstructured data. However, most existing methods focus solely on text, ignoring other types of unstructured data such as images, video and audio which comprise an increasing portion of the information on the web. To address this shortcoming, we propose the task of multimodal attribute extraction. Given a collection of unstructured and semi-structured contextual information about an entity (such as a textual description, or visual depictions) the task is to extract the entity’s underlying attributes. In this paper, we provide a dataset containing mixed-media data for over 2 million product items along with 7 million attribute-value pairs describing the items which can be used to train attribute extractors in a weakly supervised manner. We provide a variety of baselines which demonstrate the relative effectiveness of the individual modes of information towards solving the task, as well as study human performance.

Variational Deep Q Network

We propose a framework that directly tackles the probability distribution of the value function parameters in Deep Q Network (DQN), with powerful variational inference subroutines to approximate the posterior of the parameters. We will establish the equivalence between our proposed surrogate objective and variational inference loss. Our new algorithm achieves efficient exploration and performs well on large scale chain Markov Decision Process (MDP).

Knowledge Graph Embedding with Iterative Guidance from Soft Rules

Embedding knowledge graphs (KGs) into continuous vector spaces is a focus of current research. Combining such an embedding model with logic rules has recently attracted increasing attention. Most previous attempts made a one-time injection of logic rules, ignoring the interactive nature between embedding learning and logical inference. And they focused only on hard rules, which always hold with no exception and usually require extensive manual effort to create or validate. In this paper, we propose Rule-Guided Embedding (RUGE), a novel paradigm of KG embedding with iterative guidance from soft rules. RUGE enables an embedding model to learn simultaneously from 1) labeled triples that have been directly observed in a given KG, 2) unlabeled triples whose labels are going to be predicted iteratively, and 3) soft rules with various confidence levels extracted automatically from the KG. In the learning process, RUGE iteratively queries rules to obtain soft labels for unlabeled triples, and integrates such newly labeled triples to update the embedding model. Through this iterative procedure, knowledge embodied in logic rules may be better transferred into the learned embeddings. We evaluate RUGE in link prediction on Freebase and YAGO. Experimental results show that: 1) with rule knowledge injected iteratively, RUGE achieves significant and consistent improvements over state-of-the-art baselines; and 2) despite their uncertainties, automatically extracted soft rules are highly beneficial to KG embedding, even those with moderate confidence levels. The code and data used for this paper can be obtained from https://…/RUGE.

Improving Latent User Models in Online Social Media

Modern social platforms are characterized by the presence of rich user-behavior data associated with the publication, sharing and consumption of textual content. Users interact with content and with each other in a complex and dynamic social environment while simultaneously evolving over time. In order to effectively characterize users and predict their future behavior in such a setting, it is necessary to overcome several challenges. Content heterogeneity and temporal inconsistency of behavior data result in severe sparsity at the user level. In this paper, we propose a novel mutual-enhancement framework to simultaneously partition and learn latent activity profiles of users. We propose a flexible user partitioning approach to effectively discover rare behaviors and tackle user-level sparsity. We extensively evaluate the proposed framework on massive datasets from real-world platforms including Q&A networks and interactive online courses (MOOCs). Our results indicate significant gains over state-of-the-art behavior models ( 15% avg ) in a varied range of tasks and our gains are further magnified for users with limited interaction data. The proposed algorithms are amenable to parallelization, scale linearly in the size of datasets, and provide flexibility to model diverse facets of user behavior.

How Deep Are Deep Gaussian Processes?

Recent research has shown the potential utility of probability distributions designed through hierarchical constructions which are conditionally Gaussian. This body of work is placed in a common framework and, through recursion, several classes of deep Gaussian processes are defined. The resulting samples have a Markovian structure with respect to the depth parameter and the effective depth of the process is interpreted in terms of the ergodicity, or non-ergodicity, of the resulting Markov chain.

Towards Accurate Binary Convolutional Neural Network

We introduce a novel scheme to train binary convolutional neural networks (CNNs) — CNNs with weights and activations constrained to {-1,+1} at run-time. It has been known that using binary weights and activations drastically reduce memory size and accesses, and can replace arithmetic operations with more efficient bitwise operations, leading to much faster test-time inference and lower power consumption. However, previous works on binarizing CNNs usually result in severe prediction accuracy degradation. In this paper, we address this issue with two major innovations: (1) approximating full-precision weights with the linear combination of multiple binary weight bases; (2) employing multiple binary activations to alleviate information loss. The implementation of the resulting binary CNN, denoted as ABC-Net, is shown to achieve much closer performance to its full-precision counterpart, and even reach the comparable prediction accuracy on ImageNet and forest trail datasets, given adequate binary weight bases and activations.

Multivariate Time Series Classification with WEASEL+MUSE

Multivariate time series (MTS) arise when multiple interconnected sensors record data over time. Dealing with this high-dimensional data is challenging for every classifier for at least two aspects: First, a MTS is not only characterized by individual feature values, but also by the co-occurrence of features in different dimensions. Second, this typically adds large amounts of irrelevant data and noise. We present our novel MTS classifier WEASEL+MUSE (Word ExtrAction for time SEries cLassification + MUltivariate Symbols and dErivatives) which addresses both challenges. WEASEL+MUSE builds a multivariate feature vector, first using a sliding-window approach applied to each dimension of the MTS, then extracts discrete features per window and dimension. The feature vector is subsequently fed through feature selection, removing non-discriminative features, and analysed by a machine learning classifier. The novelty of WEASEL+MUSE lies in its specific way of extracting and filtering multivariate features from MTS by encoding context information into each feature, resulting in a small, yet very discriminative feature set useful for MTS classification. Based on a popular benchmark of 20 MTS datasets, we found that WEASEL+MUSE is the most accurate domain agnostic classifier, when compared to the state of the art. The outstanding robustness of WEASEL+MUSE is further confirmed based on motion gesture recognition data, where it out-of-the-box achieved similar accuracies as domain-specific methods.

A Multi-Horizon Quantile Recurrent Forecaster
On the use of bootstrap with variational inference: Theory, interpretation, and a two-sample test example
Gaussian Process Neurons Learn Stochastic Activation Functions
A robust version of Freiman’s $3k-4$ Theorem and applications
Paradoxes in Fair Computer-Aided Decision Making
Happiness Pursuit: Personality Learning in a Society of Agents
Detection-aided liver lesion segmentation using deep learning
A fast nonconvex Compressed Sensing algorithm for highly low-sampled MR images reconstruction
Compounding Doubly Affine Matrices
Refined existence and regularity results for a class of semilinear dissipative SPDEs
Partitioned Successive-Cancellation Flip Decoding of Polar Codes
Sample-based Population Observers
Improved Successive Cancellation Flip Decoding of Polar Codes Based on Error Distribution
Deep Learning for identifying radiogenomic associations in breast cancer
A generative graph model for electrical infrastructure networks
Mixture Models in Astronomy
Task Replication for Deadline-Constrained Vehicular Cloud Computing: Optimal Policy, Performance Analysis and Implications on Road Traffic
Optimal $(t,r)$ Broadcasts On the Infinite Grid
Towards Alzheimer’s Disease Classification through Transfer Learning
Heat kernel estimates and intrinsic metric for random walks with general speed measure under degenerate conductances
Predicting and Explaining Human Semantic Search in a Cognitive Model
Two-level value function approach to nonsmooth optimistic and pessimistic bilevel programs
Demand Side Management in the Smart Grid: an Efficiency and Fairness Tradeoff
A Product Formula for the Normalized Volume of Free Sums of Lattice Polytopes
Extending the Accuracy of the SNAP Interatomic Potential Form
Video Captioning via Hierarchical Reinforcement Learning
GANs for LIFE: Generative Adversarial Networks for Likelihood Free Inference
A Cooperative Proof of Work Scheme for Distributed Consensus Protocols
Structured learning and detailed interpretation of minimal object images
Optical Flow Guided Feature: A Fast and Robust Motion Representation for Video Action Recognition
Predicting Depression Severity by Multi-Modal Feature Engineering and Fusion
A note on time hierarchies for reasonable semantic classes without advice
Can Complex Collective Behaviour Be Generated Through Randomness, Memory and a Pinch of Luck?
Safe Exploration for Identifying Linear Systems via Robust Optimization
Towards Data Quality Assessment in Online Advertising
Combinatorial applications of the Hodge-Riemann relations
State Space LSTM Models with Particle MCMC Inference
Improved Learning in Evolution Strategies via Sparser Inter-Agent Network Topologies
Strategic Topology Switching for Security-Part II: Detection & Switching Topologies
Strategic Topology Switching for Security-Part I: Consensus & Switching Times
Phase Transitions in Approximate Ranking
A Multivariate Poisson-Log Normal Mixture Model for Clustering Transcriptome Sequencing Data
Neural Response Generation with Dynamic Vocabularies
Design of Non-orthogonal Multiple Access Enhanced Backscatter Communication
Distributed Optimization on Riemannian Manifolds for multi-agent networks
Embedded Real-Time Fall Detection Using Deep Learning For Elderly Care
On deep-holes of Gabidulin codes
The Dispersion of Universal Joint Source-Channel Coding for Arbitrary Sources and Additive Channels
Keep it Fair: Equivalences
Symbol Error Rate Performance of Box-relaxation Decoders in Massive MIMO
Riemannian Stein Variational Gradient Descent for Bayesian Inference
Future Person Localization in First-Person Videos
Monte Carlo Estimation of the Density of the Sum of Dependent Random Variables
RANSAC Algorithms for Subspace Recovery and Subspace Clustering
Cache-based Document-level Neural Machine Translation
Properties on n-dimensional convolution for image deconvolution
Bayesian variable selection for multi-dimensional semiparametric regression models
Quantum Neuron: an elementary building block for machine learning on quantum computers
Asymptotic for a second order evolution equation with vanishing damping term and Tikhonov regularization
Provably noise-robust, regularised $k$-means clustering
A Closer Look at Spatiotemporal Convolutions for Action Recognition
ArbiText: Arbitrary-Oriented Text Detection in Unconstrained Scene
q-Binomials and related symmetric unimodal polynomials
Signless Laplacian spectral conditions for Hamilton-connected graphs with large minimum degree
A novel graph structure for salient object detection based on divergence background and compact foreground
TCAV: Relative concept importance testing with Linear Concept Activation Vectors
Exact formulas for two interacting particles and applications in particle systems with duality
Sensitivity analysis for inverse probability weighting estimators via the percentile bootstrap
Learning to Compose Skills
Parallel-Data-Free Voice Conversion Using Cycle-Consistent Adversarial Networks
Demand Response in the Smart Grid: the Impact of Consumers Temporal Preferences
The Martin boundary of relatively hyperbolic groups with virtually abelian parabolic subgroups
Multi-Domain Adversarial Learning for Slot Filling in Spoken Language Understanding
Variational formulation of American option prices in the Heston Model
Submodular Maximization through the Lens of Linear Programming
Unsupervised Learning for Cell-level Visual Representation in Histopathology Images with Generative Adversarial Networks
High Dynamic Range Imaging Technology
Unidirectional Random Growth with Resetting
Element Distinctness Revisited
Radially-Distorted Conjugate Translations
Generalizing Gale’s theorem on backward induction and domination of strategies
A limit field for orthogonal range searches in two-dimensional random point search trees
Descent Representations of Generalized Coinvariant Algebras
Why So Many Published Sensitivity Analyses Are False. A Systematic Review of Sensitivity Analysis Practices
3DContextNet: K-d Tree Guided Hierarchical Learning of Point Clouds Using Local Contextual Cues
Learning to Learn from Weak Supervision by Full Supervision
MR image reconstruction using the learned data distribution as prior
Two-sided Facility Location
Who wins the Miss Contest for Imputation Methods? Our Vote for Miss BooPF
A note on power generalized extreme value distribution and its properties
Flat bands in fractal-like geometry
Low-complexity and Statistically Robust Beamformer Design for Massive MIMO Systems
Approximating Connected Safe Sets in Weighted Trees
Scalable synthesis of safety certificates from data with application to learning-based control
Leader-Based Optimal Coordination Control for the Consensus Problem of Multiagent Differential Games via Fuzzy Adaptive Dynamic Programming
Distributed Optimal Consensus Control for Nonlinear Multi-agent System with Unknown Dynamic
On reducing the communication cost of the diffusion LMS algorithm
A simple and efficient profile likelihood for semiparametric exponential family
ConvNets and ImageNet Beyond Accuracy: Explanations, Bias Detection, Adversarial Examples and Model Criticism
Towards an Understanding of Our World by GANing Videos in the Wild
A method to define the energy threshold depending on noise level for rare events searches
LATTE: Application Oriented Network Embedding
Sum of squares lower bounds from symmetry and a good story
Spatially-Adaptive Filter Units for Deep Neural Networks
Auxiliary Guided Autoregressive Variational Autoencoders
Uncertainty Estimates for Efficient Neural Network-based Dialogue Policy Optimisation
Summary of effect aliasing structure (SEAS): new descriptive statistics for factorial and supersaturated designs
Tverberg partitions as epsilon-nets
Exponential lower bounds on spectrahedral representations of hyperbolicity cones
Nonseparable Gaussian Stochastic Process: A Unified View and Computational Strategy
Convolutional Networks with Adaptive Computation Graphs
A Monte Carlo method for estimating sensitivities of reflected diffusions in convex polyhedral domains
Calculating Semantic Similarity between Academic Articles using Topic Event and Ontology
The Channel Multivariate Entropy Triangle and Balance Equation
Thermostat-assisted Continuous-tempered Hamiltonian Monte Carlo for Multimodal Posterior Sampling on Large Datasets
Coupled regularization with multiple data discrepancies
Lexical and Derivational Meaning in Vector-Based Models of Relativisation
A note on the restricted arc connectivity of oriented graphs of girth four
Single-epoch supernova classification with deep convolutional neural networks
Improved Linear Embeddings via Lagrange Duality
Bayesian inference for spectral projectors of covariance matrix
A short-ranged memory model with preferential growth
Predicting Severe Sepsis Using Text from the Electronic Health Record
Quasiparticle Density of States, Localization, and Distributed Disorder in the Cuprate Superconductors
Learning to Adapt by Minimizing Discrepancy
Embodied Question Answering
Spin glass transition in a thin-film NiO/Permalloy bilayer
Conservative model reduction for finite-volume models
Iteration complexity of an inexact Douglas-Rachford method and of a Douglas-Rachford-Tseng’s F-B four-operator splitting method for solving monotone inclusions
Pre-freezing transition in Boltzmann-Gibbs measures associated with log-correlated fields
ROAD: Reality Oriented Adaptation for Semantic Segmentation of Urban Scenes
Testing Conditional Independence of Discrete Distributions
Measuring the tendency of CNNs to Learn Surface Statistical Regularities
Deep Neural Networks for Multiple Speaker Detection and Localization
Hybrid VAE: Improving Deep Generative Models using Partial Observations
Relation Networks for Object Detection
Towards High Performance Video Object Detection
Multi-agent decision-making dynamics inspired by honeybees
Outlier-robust moment-estimation via sum-of-squares
High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs
Toward Multimodal Image-to-Image Translation