FedMark: A Marketplace for Federated Data on the Web

The Web of Data (WoD) has experienced a phenomenal growth in the past. This growth is mainly fueled by tireless volunteers, government subsidies, and open data legislations. The majority of commercial data has not made the transition to the WoD, yet. The problem is that it is not clear how publishers of commercial data can monetize their data in this new setting. Advertisement, which is one of the main financial engines of the World Wide Web, cannot be applied to the Web of Data as such unwanted data can easily be filtered out, automatically. This raises the question how the WoD can (i) maintain its grow when subsidies disappear and (ii) give commercial data providers financial incentives to share their wealth of data. In this paper, we propose a marketplace for the WoD as a solution for this data monetization problem. Our approach allows a customer to transparently buy data from a combination of different providers. To that end, we introduce two different approaches for deciding which data elements to buy and compare their performance. We also introduce FedMark, a prototypical implementation of our marketplace that represents a first step towards an economically viable WoD beyond subsidies.

Discovering Context Specific Causal Relationships

With the increasing need of personalised decision making, such as personalised medicine and online recommendations, a growing attention has been paid to the discovery of the context and heterogeneity of causal relationships. Most existing methods, however, assume a known cause (e.g. a new drug) and focus on identifying from data the contexts of heterogeneous effects of the cause (e.g. patient groups with different responses to the new drug). There is no approach to efficiently detecting directly from observational data context specific causal relationships, i.e. discovering the causes and their contexts simultaneously. In this paper, by taking the advantages of highly efficient decision tree induction and the well established causal inference framework, we propose the Tree based Context Causal rule discovery (TCC) method, for efficient exploration of context specific causal relationships from data. Experiments with both synthetic and real world data sets show that TCC can effectively discover context specific causal rules from the data.

The Mismatch Principle: Statistical Learning Under Large Model Uncertainties

We study the learning capacity of empirical risk minimization with regard to the squared loss and a convex hypothesis class consisting of linear functions. While these types of estimators were originally designed for noisy linear regression problems, it recently turned out that they are in fact capable of handling considerably more complicated situations, involving highly non-linear distortions. This work intends to provide a comprehensive explanation of this somewhat astonishing phenomenon. At the heart of our analysis stands the mismatch principle, which is a simple, yet generic recipe to establish theoretical error bounds for empirical risk minimization. The scope of our results is fairly general, permitting arbitrary sub-Gaussian input-output pairs, possibly with strongly correlated feature variables. Noteworthy, the mismatch principle also generalizes to a certain extent the classical orthogonality principle for ordinary least squares. This adaption allows us to investigate problem setups of recent interest, most importantly, high-dimensional parameter regimes and non-linear observation processes. In particular, our theoretical framework is applied to various scenarios of practical relevance, such as single-index models, variable selection, and strongly correlated designs. We thereby demonstrate the key purpose of the mismatch principle, that is, learning (semi-)parametric output rules under large model uncertainties and misspecifications.

Deep learning, deep change? Mapping the development of the Artificial Intelligence General Purpose Technology

General Purpose Technologies (GPTs) that can be applied in many industries are an important driver of economic growth and national and regional competitiveness. In spite of this, the geography of their development and diffusion has not received significant attention in the literature. We address this with an analysis of Deep Learning (DL), a core technique in Artificial Intelligence (AI) increasingly being recognized as the latest GPT. We identify DL papers in a novel dataset from ArXiv, a popular preprints website, and use CrunchBase, a technology business directory to measure industrial capabilities related to it. After showing that DL conforms with the definition of a GPT, having experienced rapid growth and diffusion into new fields where it has generated an impact, we describe changes in its geography. Our analysis shows China’s rise in AI rankings and relative decline in several European countries. We also find that initial volatility in the geography of DL has been followed by consolidation, suggesting that the window of opportunity for new entrants might be closing down as new DL research hubs become dominant. Finally, we study the regional drivers of DL clustering. We find that competitive DL clusters tend to be based in regions combining research and industrial activities related to it. This could be because GPT developers and adopters located close to each other can collaborate and share knowledge more easily, thus overcoming coordination failures in GPT deployment. Our analysis also reveals a Chinese comparative advantage in DL after we control for other explanatory factors, perhaps underscoring the importance of access to data and supportive policies for the successful development of this complex, `omni-use’ technology.

Causal Discovery by Telling Apart Parents and Children

We consider the problem of inferring the directed, causal graph from observational data, assuming no hidden confounders. We take an information theoretic approach, and make three main contributions. First, we show how through algorithmic information theory we can obtain SCI, a highly robust, effective and computationally efficient test for conditional independence—and show it outperforms the state of the art when applied in constraint-based inference methods such as stable PC. Second, building upon on SCI, we show how to tell apart the parents and children of a given node based on the algorithmic Markov condition. We give the Climb algorithm to efficiently discover the directed, causal Markov blanket—and show it is at least as accurate as inferring the global network, while being much more efficient. Last, but not least, we detail how we can use the Climb score to direct those edges that state of the art causal discovery algorithms based on PC or GES leave undirected—and show this improves their precision, recall and F1 scores by up to 20%.

Learning to Learn from Web Data through Deep Semantic Embeddings

In this paper we propose to learn a multimodal image and text embedding from Web and Social Media data, aiming to leverage the semantic knowledge learnt in the text domain and transfer it to a visual model for semantic image retrieval. We demonstrate that the pipeline can learn from images with associated text without supervision and perform a thourough analysis of five different text embeddings in three different benchmarks. We show that the embeddings learnt with Web and Social Media data have competitive performances over supervised methods in the text based image retrieval task, and we clearly outperform state of the art in the MIRFlickr dataset when training in the target data. Further we demonstrate how semantic multimodal image retrieval can be performed using the learnt embeddings, going beyond classical instance-level retrieval problems. Finally, we present a new dataset, InstaCities1M, composed by Instagram images and their associated texts that can be used for fair comparison of image-text embeddings.

Faster Support Vector Machines

The time complexity of support vector machines (SVMs) prohibits training on huge data sets with millions of samples. Recently, multilevel approaches to train SVMs have been developed to allow for time efficient training on huge data sets. While regular SVMs perform the entire training in one – time consuming – optimization step, multilevel SVMs first build a hierarchy of problems decreasing in size that resemble the original problem and then train an SVM model for each hierarchy level benefiting from the solved models of previous levels. We present a faster multilevel support vector machine that uses a label propagation algorithm to construct the problem hierarchy. Extensive experiments show that our new algorithm achieves speed-ups up to two orders of magnitude while having similar or better classification quality over state-of-the-art algorithms.

DeeSIL: Deep-Shallow Incremental Learning

Incremental Learning (IL) is an interesting AI problem when the algorithm is assumed to work on a budget. This is especially true when IL is modeled using a deep learning approach, where two complex challenges arise due to limited memory, which induces catastrophic forgetting and delays related to the retraining needed in order to incorporate new classes. Here we introduce DeeSIL, an adaptation of a known transfer learning scheme that combines a fixed deep representation used as feature extractor and learning independent shallow classifiers to increase recognition capacity. This scheme tackles the two aforementioned challenges since it works well with a limited memory budget and each new concept can be added within a minute. Moreover, since no deep retraining is needed when the model is incremented, DeeSIL can integrate larger amounts of initial data that provide more transferable features. Performance is evaluated on ImageNet LSVRC 2012 against three state of the art algorithms. Results show that, at scale, DeeSIL performance is 23 and 33 points higher than the best baseline when using the same and more initial data respectively.

Towards Fine Grained Network Flow Prediction

One main challenge for the design of networks is that traffic load is not generally known in advance. This makes it hard to adequately devote resources such as to best prevent or mitigate bottlenecks. While several authors have shown how to predict traffic in a coarse grained manner by aggregating flows, fine grained prediction of traffic at the level of individual flows, including bursty traffic, is widely considered to be impossible. This paper shows, to the best of our knowledge, the first approach to fine grained per flow traffic prediction. In short, we introduce the Frequency-based Kernel Kalman Filter (FKKF), which predicts individual flows’ behavior based on measurements. Our FKKF relies on the well known Kalman Filter in combination with a kernel to support the prediction of non linear functions. Furthermore we change the operating space from time to frequency space. In this space, into which we transform the input data via a Short-Time Fourier Transform (STFT), the peak structures of flows can be predicted after gleaning their key characteristics, with a Principal Component Analysis (PCA), from past and ongoing flows that stem from the same socket-to-socket connection. We demonstrate the effectiveness of our approach on popular benchmark traces from a university data center. Our approach predicts traffic on average across 17 out of 20 groups of flows with an average prediction error of 6.43% around 0.49 (average) seconds in advance, whilst existing coarse grained approaches exhibit prediction errors of 77% at best.

A Structural-Factor Approach to Modeling High-Dimensional Time Series

This paper considers a structural-factor approach to modeling high-dimensional time series where individual series are decomposed into trend, seasonal, and irregular components. For ease in analyzing many time series, we employ a time polynomial for the trend, a linear combination of trigonometric series for the seasonal component, and a new factor model for the irregular components. The new factor model can simplify the modeling process and achieve parsimony in parameterization. We propose a Bayesian Information Criterion (BIC) to consistently determine the order of the polynomial trend and the number of trigonometric functions. A test statistic is used to determine the number of common factors. The convergence rates for the estimators of the trend and seasonal components and the limiting distribution of the test statistic are established under the setting that the number of time series tends to infinity with the sample size, but at a slower rate. We use simulation to study the performance of the proposed analysis in finite samples and apply the proposed approach to two real examples. The first example considers modeling weekly PM_{2.5} data of 15 monitoring stations in the southern region of Taiwan and the second example consists of monthly value-weighted returns of 12 industrial portfolios.

Adaptive Document Retrieval for Deep Question Answering

State-of-the-art systems in deep question answering proceed as follows: (1) an initial document retrieval selects relevant documents, which (2) are then processed by a neural network in order to extract the final answer. Yet the exact interplay between both components is poorly understood, especially concerning the number of candidate documents that should be retrieved. We show that choosing a static number of documents — as used in prior research — suffers from a noise-information trade-off and yields suboptimal results. As a remedy, we propose an adaptive document retrieval model. This learns the optimal candidate number for document retrieval, conditional on the size of the corpus and the query. We report extensive experimental results showing that our adaptive approach outperforms state-of-the-art methods on multiple benchmark datasets, as well as in the context of corpora with variable sizes.

Triangle Lasso for Simultaneous Clustering and Optimization in Graph Datasets

Recently, network lasso has drawn many attentions due to its remarkable performance on simultaneous clustering and optimization. However, it usually suffers from the imperfect data (noise, missing values etc), and yields sub-optimal solutions. The reason is that it finds the similar instances according to their features directly, which is usually impacted by the imperfect data, and thus returns sub-optimal results. In this paper, we propose triangle lasso to avoid its disadvantage. Triangle lasso finds the similar instances according to their neighbours. If two instances have many common neighbours, they tend to become similar. Although some instances are profiled by the imperfect data, it is still able to find the similar counterparts. Furthermore, we develop an efficient algorithm based on Alternating Direction Method of Multipliers (ADMM) to obtain a moderately accurate solution. In addition, we present a dual method to obtain the accurate solution with the low additional time consumption. We demonstrate through extensive numerical experiments that triangle lasso is robust to the imperfect data. It usually yields a better performance than the state-of-the-art method when performing data analysis tasks in practical scenarios.

The Deconfounded Recommender: A Causal Inference Approach to Recommendation

The goal of a recommender system is to show its users items that they will like. In forming its prediction, the recommender system tries to answer: ‘what would the rating be if we ‘forced’ the user to watch the movie?’ This is a question about an intervention in the world, a causal question, and so traditional recommender systems are doing causal inference from observational data. This paper develops a causal inference approach to recommendation. Traditional recommenders are likely biased by unobserved confounders, variables that affect both the ‘treatment assignments’ (which movies the users watch) and the ‘outcomes’ (how they rate them). We develop the deconfounded recommender, a strategy to leverage classical recommendation models for causal predictions. The deconfounded recommender uses Poisson factorization on which movies users watched to infer latent confounders in the data; it then augments common recommendation models to correct for potential confounding bias. The deconfounded recommender improves recommendation and it enjoys stable performance against interventions on test sets.

The empirical likelihood prior applied to bias reduction of general estimating equations
Indoor Coverage Enhancement for mmWave Systems with Passive Reflectors: Measurements and Ray Tracing Simulations
SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing
Optimal Control for Discrete-time Markov Jump Linear System with Control Input Delay
Improved Decision Rule Approximations for Multi-Stage Robust Optimization via Copositive Programming
Lexicosyntactic Inference in Neural Models
Theoretical study of an adaptive cubic regularization method with dynamic inexact Hessian information
Spatio-temproal prediction of crimes using network analytic approach
XL-NBT: A Cross-lingual Neural Belief Tracking Framework
$Z_2\times Z_2$-cordial cycle-free hypergraphs
Dynamic Temporal Alignment of Speech to Lips
An incremental local-first community detection method for dynamic graphs
Eliminating the Blind Spot: Adapting 3D Object Detection and Monocular Depth Estimation to 360° Panoramic Imagery
Counting Connected Graphs without Overlapping Cycles
Pseudorandom Generators for Read-Once Branching Programs, in any Order
Neural Machine Translation of Text from Non-Native Speakers
Iteration-Complexity of the Subgradient Method on Riemannian Manifolds with Lower Bounded Curvature
Applying Machine Learning To Maize Traits Prediction
Person Re-Identification by Semantic Region Representation and Topology Constraint
Incremental Learning in Person Re-Identification
Multimodal speech synthesis architecture for unsupervised speaker adaptation
Multi-Perspective Context Aggregation for Semi-supervised Cloze-style Reading Comprehension
Seymour’s Second Neighborhood Conjecture for Subsets
Universal Stagewise Learning for Non-Convex Problems with Convergence on Averaged Solutions
Refined Asymptotics in the Online Selection of an Increasing Subsequence
Question Generation from SQL Queries Improves Neural Semantic Parsing
Post-Processing of Word Representations via Variance Normalization and Dynamic Embedding
Reed-Solomon codes over small fields with constrained generator matrices
Analysis of ‘Learn-As-You-Go’ (LAGO) Studies
Binomial coefficients and multifactorial numbers through generative grammars
A General Framework of Multi-Armed Bandit Processes by Switching Restrictions
Stability condition of a two-dimensional QBD process and its application to estimation of efficiency for two-queue models
Group-Strategyproof mechanisms for facility location with Euclidean distance
Universal Image Manipulation Detection using Deep Siamese Convolutional Neural Network
PAC-learning is Undecidable
GPU PaaS Computation Model in Aneka Cloud Computing Environment
Wrangling Rogues: Managing Experimental Post-Moore Architectures
Optimal asset allocation for a DC plan with partial information under inflation and mortality risks
On cyclic codes of length $2^e$ over finite fields
On the error in Laplace approximations of high-dimensional integrals
A Distribution Similarity Based Regularizer for Learning Bayesian Networks
Optimal gradient estimates of heat kernels of stable-like operators
Navigating the Landscape for Real-time Localisation and Mapping for Robotics and Virtual and Augmented Reality
Signed Graph Convolutional Network
Towards Anticipation of Architectural Smells using Link Prediction Techniques
Alzheimer’s Disease Modelling and Staging through Independent Gaussian Process Analysis of Spatio-Temporal Brain Changes
Learning from #Barcelona Instagram data what Locals and Tourists post about its Neighbourhoods
Wandering chimeras in adaptive network of pulse-coupled oscillators
Spectrum of free-form Sudoku graphs
Progressive Operational Perceptron with Memory
FAMU: study of the energy dependent transfer rate $Λ_{μp \rightarrow μO}$
FusionNet and AugmentedFlowNet: Selective Proxy Ground Truth for Training on Unlabeled Images
Configurable Distributed Physical Downlink Control Channel for 5G New Radio: ResourceBundling and Diversity Trade-off
Bayesian Regression for a Dirichlet Distributed Response using Stan
Amplitude Quantization for Type-2 Codebook Based CSI Feedback in New Radio System
PPP-Completeness with Connections to Cryptography
Semiparametric estimation of structural failure time model in continuous-time processes
Evolutionary, Mean-Field and Pressure-Resistance Game Modelling of Networks Security
Scalable Edge Partitioning
Dynamic Intention-Aware Recommendation with Self-Attention
Neighborhood Troubles: On the Value of User Pre-Filtering To Speed Up and Enhance Recommendations
Spillover Effects in Cluster Randomized Trials with Noncompliance
What Stands-in for a Missing Tool? A Prototypical Grounded Knowledge-based Approach to Tool Substitution
CapsDeMM: Capsule network for Detection of Munro\textquoteright s Microabscess in skin biopsy images
A unified Framework for Robust Modelling of Financial Markets in discrete time
On the almost decrease of a subexponential density
An Assessment of Covariates of Nonstationary Storm Surge Statistical Behavior by Bayesian Model Averaging
Synthetic Patient Generation: A Deep Learning Approach Using Variational Autoencoders
On the compression of messages in the multi-party setting
A Class of Non-Parametric Statistical Manifolds modelled on Sobolev Space
Reproducible evaluation of classification methods in Alzheimer’s disease: framework and application to MRI and PET data
Translational Motion Compensation for Soft Tissue Velocity Images
Learning to Dialogue via Complex Hindsight Experience Replay
Life-Long Disentangled Representation Learning with Cross-Domain Latent Homologies
Optimized Rate-Adaptive Protograph-Based LDPC Codes for Source Coding with Side Information
State-of-the-art Chinese Word Segmentation with Bi-LSTMs
Single-View Place Recognition under Seasonal Changes
Simultaneous synthesis of FLAIR and segmentation of white matter hypointensities from T1 MRIs
CU-Net: Coupled U-Nets
The asymmetric traveling salesman path LP has constant integrality ratio
Detecting Core-Periphery Structure in Spatial Networks
On generalized Erdös-Ginzburg-Ziv constants for $\mathbb{Z}_2^d$
Multi-View Graph Embedding Using Randomized Shortest Paths
Class-Aware Fully-Convolutional Gaussian and Poisson Denoising
Dynamic-sensitive cooperation in the presence of multiple strategy updating rules
Splitter Theorems for Graph Immersions
Detecting cognitive impairments by agreeing on interpretations of linguistic features
A Semi-Supervised and Inductive Embedding Model for Churn Prediction of Large-Scale Mobile Games
Peptide-Spectra Matching from Weak Supervision
Contract-based Incentive Mechanism for LTE over Unlicensed Channels
Improved Latency-Communication Trade-Off for Map-Shuffle-Reduce Systems with Stragglers
Learning Monocular Depth by Distilling Cross-domain Stereo Networks
Video-to-Video Synthesis