Trust from the past: Bayesian Personalized Ranking based Link Prediction in Knowledge Graphs

Estimating the confidence for a link is a critical task for Knowledge Graph construction. Link prediction, or predicting the likelihood of a link in a knowledge graph based on prior state is a key research direction within this area. We propose a Latent Feature Embedding based link recommendation model for prediction task and utilize Bayesian Personalized Ranking based optimization technique for learning models for each predicate. Experimental results on large-scale knowledge bases such as YAGO2 show that our approach achieves substantially higher performance than several state-of-art approaches. Furthermore, we also study the performance of the link prediction algorithm in terms of topological properties of the Knowledge Graph and present a linear regression model to reason about its expected level of accuracy.


ActiveClean: Interactive Data Cleaning While Learning Convex Loss Models

Data cleaning is often an important step to ensure that predictive models, such as regression and classification, are not affected by systematic errors such as inconsistent, out-of-date, or outlier data. Identifying dirty data is often a manual and iterative process, and can be challenging on large datasets. However, many data cleaning workflows can introduce subtle biases into the training processes due to violation of independence assumptions. We propose ActiveClean, a progressive cleaning approach where the model is updated incrementally instead of re-training and can guarantee accuracy on partially cleaned data. ActiveClean supports a popular class of models called convex loss models (e.g., linear regression and SVMs). ActiveClean also leverages the structure of a user’s model to prioritize cleaning those records likely to affect the results. We evaluate ActiveClean on five real-world datasets UCI Adult, UCI EEG, MNIST, Dollars For Docs, and WorldBank with both real and synthetic errors. Our results suggest that our proposed optimizations can improve model accuracy by up-to 2.5x for the same amount of data cleaned. Furthermore for a fixed cleaning budget and on all real dirty datasets, ActiveClean returns more accurate models than uniform sampling and Active Learning.


Rank-width: Algorithmic and structural results

Rank-width is a width parameter of graphs describing whether it is possible to decompose a graph into a tree-like structure by `simple’ cuts. This survey aims to summarize known algorithmic and structural results on rank-width of graphs.


Matrix Neural Networks

Traditional neural networks assume vectorial inputs as the network is arranged as layers of single line of computing units called neurons. This special structure requires the non-vectorial inputs such as matrices to be converted into vectors. This process can be problematic. Firstly, the spatial information among elements of the data may be lost during vectorisation. Secondly, the solution space becomes very large which demands very special treatments to the network parameters and high computational cost. To address these issues, we propose matrix neural networks (MatNet), which takes matrices directly as inputs. Each neuron senses summarised information through bilinear mapping from lower layer units in exactly the same way as the classic feed forward neural networks. Under this structure, back prorogation and gradient descent combination can be utilised to obtain network parameters efficiently. Furthermore, it can be conveniently extended for multimodal inputs. We apply MatNet to MNIST handwritten digits classification and image super resolution tasks to show its effectiveness. Without too much tweaking MatNet achieves comparable performance as the state-of-the-art methods in both tasks with considerably reduced complexity.


Automatic Description Generation from Images: A Survey of Models, Datasets, and Evaluation Measures

Automatic description generation from natural images is a challenging problem that has recently received a large amount of interest from the computer vision and natural language processing communities. In this survey, we classify the existing approaches based on how they conceptualize this problem, viz., models that cast description as either generation problem or as a retrieval problem over a visual or multimodal representational space. We provide a detailed review of existing models, highlighting their advantages and disadvantages. Moreover, we give an overview of the benchmark image datasets and the evaluation measures that have been developed to assess the quality of machine-generated image descriptions. Finally we extrapolate future directions in the area of automatic image description generation.


Real-Time Association Mining in Large Social Networks

There is a growing realisation that to combat the waning effectiveness of traditional marketing, social media platform owners need to find new ways to monetise their data. Social media data contains rich information describing how real world entities relate to each other. Understanding the allegiances, communities and structure of key entities is of vital importance for decision support in a swathe of industries that have hitherto relied on expensive, small scale survey data. In this paper, we present a real-time method to query and visualise regions of networks that are closely related to a set of input vertices. The input vertices can define an industry, political party, sport etc. The key idea is that in large digital social networks measuring similarity via direct connections between nodes is not robust, but that robust similarities between nodes can be attained through the similarity of their neighbourhood graphs. We are able to achieve real-time performance by compressing the neighbourhood graphs using minhash signatures and facilitate rapid queries through Locality Sensitive Hashing. These techniques reduce query times from hours using industrial desktop machines to milliseconds on standard laptops. Our method allows analysts to interactively explore strongly associated regions of large networks in real time. Our work has been deployed in software that is actively used by analysts to understand social network structure.


Detecting and Extracting Events from Text Documents

Events of various kinds are mentioned and discussed in text documents, whether they are books, news articles, blogs or microblog feeds. The paper starts by giving an overview of how events are treated in linguistics and philosophy. We follow this discussion by surveying how events and associated information are handled in computationally. In particular, we look at how textual documents can be mined to extract events and ancillary information. These days, it is mostly through the application of various machine learning techniques. We also discuss applications of event detection and extraction systems, particularly in summarization, in the medical domain and in the context of Twitter posts. We end the paper with a discussion of challenges and future directions.


Dual-tree $k$-means with bounded iteration runtime

Markov processes with spatial delay: path space characterization, occupation time and properties

Linear Algebraic Structure of Word Senses, with Applications to Polysemy

Characterization of the Nonequilibrium Steady State of a Heterogeneous Nonlinear $q$-Voter Model with Zealotry

Formally Proving and Enhancing a Self-Stabilising Distributed Algorithm

Generation of a Supervised Classification Algorithm for Time-Series Variable Stars with an Application to the LINEAR Dataset

On the Long-Repetition-Free 2-Colorability of Trees

Towards Turkish ASR: Anatomy of a rule-based Turkish g2p

Localization of weakly disordered flat band states

A Method for Image Reduction Based on a Generalization of Ordered Weighted Averaging Functions

On the Corrádi-Hajnal Theorem and a question of Dirac

Approximating welfare in large efficient markets

Entity-oriented spatial coding and discrete topological spatial relations

Phase coherence induced by correlated disorder

On the consistency of inversion-free parameter estimation for Gaussian random fields

Virtual Machine Migration Enabled Cloud Resource Management: A Challenging Task

A Relative Exponential Weighing Algorithm for Adversarial Utility-based Dueling Bandits

Moment explosion and tail asymptotics for the CKLS process

An Analytic Expression for the Distribution of the Generalized Shiryaev-Roberts Diffusion

Container-Based Cloud Virtual Machine Benchmarking

Inversion-symmetry breaking controls the boson peak anomaly in glasses and crystals

Ground-State Candidate for the Dipolar Kagome Ising Antiferromagnet

The hidden geometry of weighted complex networks

Mining frequent items in the time fading model

Ridge regression and asymptotic minimax estimation over spheres of growing dimension

Polynomial Pickands functions

Asymptotic of spectral covariance for linear random fields with infinite variance

Multimodal Pivots for Image Caption Translation

A combinatorial approach to small ball inequalities for sums and differences

On an application of multidimensional arrays

Stability and instability of a random multiple access model with adaptive energy harvesting

Improved graph-based SFA: Information preservation complements the slowness principle

Boundary of the range II: lower tails

The fractional non-homogeneous Poisson process

A stochastic Stefan-type problem under first order boundary conditions

An Elastic Middleware Platform for Concurrent and Distributed Cloud and MapReduce Simulations

A point-line incidence identity in finite fields, and applications

A new cyclic sieving phenomenon for Catalan objects

Tightening the Sample Complexity of Empirical Risk Minimization via Preconditioned Stability

Concurrent Hash Tables: Fast and General?(!)

Faster Asynchronous SGD

Powers of cycles in random graphs and hypergraphs

Funnel Libraries for Real-Time Robust Feedback Motion Planning

It’s about time: Online Macrotask Sequencing in Expert Crowdsourcing

Phase transition in the KMP model with Slow/Fast boundaries

A note on the minimum reduced reciprocal Randić index of n-vertex unicyclic graphs

Lectures on the local semicircle law for Wigner matrices

Long tail distributions near the many body localization transition

Parallel and Distributed Methods for Nonconvex Optimization–Part II: Applications