Combining multiple observational data sources to estimate causal effects

The era of big data has witnessed an increasing availability of multiple data sources for statistical analyses. As an important example in causal inference, we consider estimation of causal effects combining big main data with unmeasured confounders and smaller validation data with supplementary information on these confounders. Under the unconfoundedness assumption with completely observed confounders, the smaller validation data allow for constructing consistent estimators for causal effects, but the big main data can only give error-prone estimators in general. However, by leveraging the information in the big main data in a principled way, we can improve the estimation efficiencies yet preserve the consistencies of the initial estimators based solely on the validation data. The proposed framework applies to asymptotically normal estimators, including the commonly-used regression imputation, weighting, and matching estimators, and does not require a correct specification of the model relating the unmeasured confounders to the observed variables. Coupled with appropriate bootstrap procedures, our method is straightforward to implement using software routines for existing estimators.

Relaxed covariate overlap and margin-based causal effect estimation

In most nonrandomized observational studies, differences between treatment groups may arise not only due to the treatment but also because of the effect of confounders. Therefore, causal inference regarding the treatment effect is not as straightforward as in a randomized trial. To adjust for confounding due to measured covariates, a variety of methods based on the potential outcomes framework are used to estimate average treatment effects. One of the key assumptions is treatment positivity, and methods for performing causal inference when this assumption is violated are relatively limited. In this article, we explore the issue of covariate overlap and discuss a new condition involving overlap in the convex hulls of treatment groups, which we term relaxed covariate balance. An advantage of this concept is that it can be linked to a concept from machine learning, termed the margin. Introduction of relaxed covariate overlap leads to an approach in which we can perform causal inference in a three-step manner. The methodology is illustrated with two examples.

ruptures: change point detection in Python

ruptures is a Python library for offline change point detection. This package provides methods for the analysis and segmentation of non-stationary signals. Implemented algorithms include exact and approximate detection for various parametric and non-parametric models. ruptures focuses on ease of use by providing a well-documented and consistent interface. In addition, thanks to its modular structure, different algorithms and models can be connected and extended within this package.

Multiple changepoint detection for periodic autoregressive models with an application to river flow analysis

In river flow analysis and forecasting there are some key elements to consider in order to obtain reliable results. For example, seasonality is often accounted for in statistical models because climatic oscillations occurring every year have an obvious impact on river flow. Further sources of alteration could be caused by changes in reservoir management, instrumentation or even unexpected shifts in climatic conditions. When these changes are ignored the statistical results can be strongly misleading. This paper develops an automatic procedure to estimate number and locations of changepoints in Periodic AutoRegressive models. These latter have been extensively used for modelling seasonality in hydrology, climatology, economics and electrical engineering, but there are very few papers devoted also to changepoints detection, moreover being limited to changes in mean or variance. In our proposal we allow the model structure as a whole to change, and estimation is performed by optimizing an objective function derived from the Information Criterion using Genetic Algorithms. The proposed methodology is brought out through the example of three river flows, for which we built models with possible changepoints and evaluated their forecasting accuracy by means of Root Mean Square Error, Mean Absolute Error and Mean Absolute Percentage Error. The last years of data sets have been omitted from the selection and estimation procedure and were then used to forecast. Comparisons with literature on river flow forecasting confirms the efficiency of our proposal.

MVG Mechanism: Differential Privacy under Matrix-Valued Query

Differential privacy mechanism design has traditionally been tailored for a scalar-valued query function. Although many mechanisms such as the Laplace and Gaussian mechanisms can be extended to a matrix-valued query function by adding i.i.d. noise to each element of the matrix, this method is often suboptimal as it forfeits an opportunity to exploit the structural characteristics typically associated with matrix analysis. To address this challenge, we propose a novel differential privacy mechanism called the Matrix-Variate Gaussian (MVG) mechanism, which adds a matrix-valued noise drawn from a matrix-variate Gaussian distribution, and we rigorously prove that the MVG mechanism preserves (\epsilon,\delta)-differential privacy. Furthermore, we introduce the concept of directional noise made possible by the design of the MVG mechanism. Directional noise allows the impact of the noise on the utility of the matrix-valued query function to be moderated. Finally, we experimentally demonstrate the performance of our mechanism using three matrix-valued queries on three privacy-sensitive datasets. We find that the MVG mechanism notably outperforms four previous state-of-the-art approaches, and provides comparable utility to the non-private baseline. Our work thus presents a promising prospect for both future research and implementation of differential privacy for matrix-valued query functions.

Online Job Scheduling in Distributed Machine Learning Clusters

Nowadays large-scale distributed machine learning systems have been deployed to support various analytics and intelligence services in IT firms. To train a large dataset and derive the prediction/inference model, e.g., a deep neural network, multiple workers are run in parallel to train partitions of the input dataset, and update shared model parameters. In a shared cluster handling multiple training jobs, a fundamental issue is how to efficiently schedule jobs and set the number of concurrent workers to run for each job, such that server resources are maximally utilized and model training can be completed in time. Targeting a distributed machine learning system using the parameter server framework, we design an online algorithm for scheduling the arriving jobs and deciding the adjusted numbers of concurrent workers and parameter servers for each job over its course, to maximize overall utility of all jobs, contingent on their completion times. Our online algorithm design utilizes a primal-dual framework coupled with efficient dual subroutines, achieving good long-term performance guarantees with polynomial time complexity. Practical effectiveness of the online algorithm is evaluated using trace-driven simulation and testbed experiments, which demonstrate its outperformance as compared to commonly adopted scheduling algorithms in today’s cloud systems.

Advice from the Oracle: Really Intelligent Information Retrieval

What is ‘intelligent’ information retrieval? Essentially this is asking what is intelligence, in this article I will attempt to show some of the aspects of human intelligence, as related to information retrieval. I will do this by the device of a semi-imaginary Oracle. Every Observatory has an oracle, someone who is a distinguished scientist, has great administrative responsibilities, acts as mentor to a number of less senior people, and as trusted advisor to even the most accomplished scientists, and knows essentially everyone in the field. In an appendix I will present a brief summary of the Statistical Factor Space method for text indexing and retrieval, and indicate how it will be used in the Astrophysics Data System Abstract Service. 2018 Keywords: Personal Digital Assistant; Supervised Topic Models

Optimal Bayesian Transfer Learning

Transfer learning has recently attracted significant research attention, as it simultaneously learns from different source domains, which have plenty of labeled data, and transfers the relevant knowledge to the target domain with limited labeled data to improve the prediction performance. We propose a Bayesian transfer learning framework where the source and target domains are related through the joint prior density of the model parameters. The modeling of joint prior densities enables better understanding of the ‘transferability’ between domains. We define a joint Wishart density for the precision matrices of the Gaussian feature-label distributions in the source and target domains to act like a bridge that transfers the useful information of the source domain to help classification in the target domain by improving the target posteriors. Using several theorems in multivariate statistics, the posteriors and posterior predictive densities are derived in closed forms with hypergeometric functions of matrix argument, leading to our novel closed-form and fast Optimal Bayesian Transfer Learning (OBTL) classifier. Experimental results on both synthetic and real-world benchmark data confirm the superb performance of the OBTL compared to the other state-of-the-art transfer learning and domain adaptation methods.

ScreenerNet: Learning Curriculum for Neural Networks

We propose to learn a curriculum or a syllabus for supervised learning with deep neural networks. Specifically, we learn weights for each sample in training by an attached neural network, called ScreenerNet, to the original network and jointly train them in an end-to-end fashion. We show the networks augmented with our ScreenerNet achieve early convergence with better accuracy than the state-of-the-art rule-based curricular learning methods in extensive experiments using three popular vision datasets including MNIST, CIFAR10 and Pascal VOC2012, and a Cartpole task using Deep Q-learning.

Modeling Interaction Effects in Logistic Regression: Information Analysis

The Akaike information criterion (AIC) is commonly used to select a logistic regression model for predicting a discrete response variable using available regressors. In practice, finding models with near-minimum AIC estimates is not presented with a well-defined procedure. As an alternative approach to model selection, we propose to formulate a two-step selection scheme of identifying the indispensable regressors as main-effect predictors, followed by inspecting the significant interaction effects between the selected predictors so as to construct the desired logistic model. In this study, the two-step selection scheme is developed based on the analysis of mutual information between the regressors and the response variable. It is proved that the scheme yields the most parsimonious logistic model using the indispensable predictors and the least interaction effects. As a byproduct, it also conveniently locates the minimum AIC model in a neighborhood of the selected model. The scheme is employed to modeling the regression for predicting the acquisition of professional licenses in a survey of employed youth workers.

Graph Pattern Matching for Dynamic Team Formation

Finding a list of k teams of experts, referred to as top-k team formation, with the required skills and high collaboration compatibility has been extensively studied. However, existing methods have not considered the specific collaboration relationships among different team members, i.e., structural constraints, which are typically needed in practice. In this study, we first propose a novel graph pattern matching approach for top-k team formation, which incorporates both structural constraints and capacity bounds. Second, we formulate and study the dynamic top-k team formation problem due to the growing need of a dynamic environment. Third, we develop an unified incremental approach, together with an optimization technique, to handle continuous pattern and data updates, separately and simultaneously, which has not been explored before. Finally, using real-life and synthetic data, we conduct an extensive experimental study to show the effectiveness and efficiency of our graph pattern matching approach for (dynamic) top-k team formation.

A Reliability Theory of Truth

Our approach is basically a coherence approach, but we avoid the well-known pitfalls of coherence theories of truth. Consistency is replaced by reliability, which expresses support and attack, and, in principle, every theory (or agent, message) counts. At the same time, we do not require a priviledged access to ‘reality’. A centerpiece of our approach is that we attribute reliability also to agents, messages, etc., so an unreliable source of information will be less important in future. Our ideas can also be extended to value systems, and even actions, e.g., of animals.

Spot the Difference by Object Detection

In this paper, we propose a simple yet effective solution to a change detection task that detects the difference between two images, which we call ‘spot the difference’. Our approach uses CNN-based object detection by stacking two aligned images as input and considering the differences between the two images as objects to detect. An early-merging architecture is used as the backbone network. Our method is accurate, fast and robust while using very cheap annotation. We verify the proposed method on the task of change detection between the digital design and its photographic image of a book. Compared to verification based methods, our object detection based method outperforms other methods by a large margin and gives extra information of location. We compress the network and achieve 24 times acceleration while keeping the accuracy. Besides, as we synthesize the training data for detection using weakly labeled images, our method does not need expensive bounding box annotation.

Clustering of Data with Missing Entries

The analysis of large datasets is often complicated by the presence of missing entries, mainly because most of the current machine learning algorithms are designed to work with full data. The main focus of this work is to introduce a clustering algorithm, that will provide good clustering even in the presence of missing data. The proposed technique solves an \ell_0 fusion penalty based optimization problem to recover the clusters. We theoretically analyze the conditions needed for the successful recovery of the clusters. We also propose an algorithm to solve a relaxation of this problem using saturating non-convex fusion penalties. The method is demonstrated on simulated and real datasets, and is observed to perform well in the presence of large fractions of missing entries.

Social Media Analysis based on Semanticity of Streaming and Batch Data

Languages shared by people differ in different regions based on their accents, pronunciation and word usages. In this era sharing of language takes place mainly through social media and blogs. Every second swing of such a micro posts exist which induces the need of processing those micro posts, in-order to extract knowledge out of it. Knowledge extraction differs with respect to the application in which the research on cognitive science fed the necessities for the same. This work further moves forward such a research by extracting semantic information of streaming and batch data in applications like Named Entity Recognition and Author Profiling. In the case of Named Entity Recognition context of a single micro post has been utilized and context that lies in the pool of micro posts were utilized to identify the sociolect aspects of the author of those micro posts. In this work Conditional Random Field has been utilized to do the entity recognition and a novel approach has been proposed to find the sociolect aspects of the author (Gender, Age group).

Modeling Log-linear and Logit Models in Categorical Data Analysis

The association between categorical variables is analyzed using the mutual information approach complied with the multivariate multinomial distributions. Schematic decompositions of mutual information are employed for characterizing log-linear and logit models. A geometric analysis of the conditional mutual information is proposed for selecting indispensable predictors and their interaction effects for constructing log-linear and logit models. The new approach to selecting the most concise logit model also facilitates search for the minimum AIC model with a finite set of predictors. The proposed constructive schemes are illustrated in analyzing a contingency table of data collected in a study on the risk factors of ischemic cerebral stroke.

Learning automata based SVM for intrusion detection

As an indispensable defensive measure of network security, the intrusion detection is a process of monitoring the events occurring in a computer system or network and analyzing them for signs of possible incidents. It is a classifier to judge the event is normal or malicious. The information used for intrusion detection contains some redundant features which would increase the difficulty of training the classifier for intrusion detection and increase the time of making predictions. To simplify the training process and improve the efficiency of the classifier, it is necessary to remove these dispensable features. in this paper, we propose a novel LA-SVM scheme to automatically remove redundant features focusing on intrusion detection. This is the first application of learning automata for solving dimension reduction problems. The simulation results indicate that the LA-SVM scheme achieves a higher accuracy and is more efficient in making predictions compared with traditional SVM.

Generalized Network Dismantling

Finding the set of nodes, which removed or (de)activated can stop the spread of (dis)information, contain an epidemic or disrupt the functioning of a corrupt/criminal organization is still one of the key challenges in network science. In this paper, we introduce the generalized network dismantling problem, which aims to find the set of nodes that, when removed from a network, results in a network fragmentation into subcritical network components at minimum cost. For unit costs, our formulation becomes equivalent to the standard network dismantling problem. Our non-unit cost generalization allows for the inclusion of topological cost functions related to node centrality and non-topological features such as the price, protection level or even social value of a node. In order to solve this optimization problem, we propose a method, which is based on the spectral properties of a novel node-weighted Laplacian operator. The proposed method is applicable to large-scale networks with millions of nodes. It outperforms current state-of-the-art methods and opens new directions in understanding the vulnerability and robustness of complex systems.

What have we learned from deep representations for action recognition?

As the success of deep models has led to their deployment in all areas of computer vision, it is increasingly important to understand how these representations work and what they are capturing. In this paper, we shed light on deep spatiotemporal representations by visualizing what two-stream models have learned in order to recognize actions in video. We show that local detectors for appearance and motion objects arise to form distributed representations for recognizing human actions. Key observations include the following. First, cross-stream fusion enables the learning of true spatiotemporal features rather than simply separate appearance and motion features. Second, the networks can learn local representations that are highly class specific, but also generic representations that can serve a range of classes. Third, throughout the hierarchy of the network, features become more abstract and show increasing invariance to aspects of the data that are unimportant to desired distinctions (e.g. motion patterns across various speeds). Fourth, visualizations can be used not only to shed light on learned representations, but also to reveal idiosyncracies of training data and to explain failure cases of the system.

Jointly Learning to Construct and Control Agents using Deep Reinforcement Learning

The physical design of a robot and the policy that controls its motion are inherently coupled. However, existing approaches largely ignore this coupling, instead choosing to alternate between separate design and control phases, which requires expert intuition throughout and risks convergence to suboptimal designs. In this work, we propose a method that jointly optimizes over the physical design of a robot and the corresponding control policy in a model-free fashion, without any need for expert supervision. Given an arbitrary robot morphology, our method maintains a distribution over the design parameters and uses reinforcement learning to train a neural network controller. Throughout training, we refine the robot distribution to maximize the expected reward. This results in an assignment to the robot parameters and neural network policy that are jointly optimal. We evaluate our approach in the context of legged locomotion, and demonstrate that it discovers novel robot designs and walking gaits for several different morphologies, achieving performance comparable to or better than that of hand-crafted designs.

Cluster-weighted latent class modeling

Usually in Latent Class Analysis (LCA), external predictors are taken to be cluster conditional probability predictors (LC models with covariates), and/or score conditional probability predictors (LC regression models). In such cases, their distribution is not of interest. Class specific distribution is of interest in the distal outcome model, when the distribution of the external variable(s) is assumed to dependent on LC membership. In this paper, we consider a more general formulation, typical in cluster-weighted models, which embeds both the latent class regression and the distal outcome models. This allows us to test simultaneously both whether the distribution of the covariate(s) differs across classes, and whether there are significant direct effects of the covariate(s) on the indicators, by including most of the information about the covariate(s) – latent variable relationship. We show the advantages of the proposed modeling approach through a set of population studies and an empirical application on assets ownership of Italian households.

PHOENICS: A universal deep Bayesian optimizer

In this work we introduce PHOENICS, a probabilistic global optimization algorithm combining ideas from Bayesian optimization with concepts from Bayesian kernel density estimation. We propose an inexpensive acquisition function balancing the explorative and exploitative behavior of the algorithm. This acquisition function enables intuitive sampling strategies for an efficient parallel search of global minima. The performance of PHOENICS is assessed via an exhaustive benchmark study on a set of 15 discrete, quasi-discrete and continuous multidimensional functions. Unlike optimization methods based on Gaussian processes (GP) and random forests (RF), we show that PHOENICS is less sensitive to the nature of the co-domain, and outperforms GP and RF optimizations. We illustrate the performance of PHOENICS on the Oregonator, a difficult case-study describing a complex chemical reaction network. We demonstrate that only PHOENICS was able to reproduce qualitatively and quantitatively the target dynamic behavior of this nonlinear reaction dynamics. We recommend PHOENICS for rapid optimization of scalar, possibly non-convex, black-box unknown objective functions.

A Greedy Search Tree Heuristic for Symbolic Regression

Symbolic Regression tries to find a mathematical expression that describes the relationship of a set of explanatory variables to a measured variable. The main objective is to find a model that minimizes the error and, optionally, that also minimizes the expression size. A smaller expression can be seen as an interpretable model considered a reliable decision model. This is often performed with Genetic Programming which represents their solution as expression trees. The shortcoming of this algorithm lies on this representation that defines a rugged search space and contains expressions of any size and difficulty. These pose as a challenge to find the optimal solution under computational constraints. This paper introduces a new data structure, called Interaction-Transformation (IT), that constrains the search space in order to exclude a region of larger and more complicated expressions. In order to test this data structure, it was also introduced an heuristic called SymTree. The obtained results show evidence that SymTree are capable of obtaining the optimal solution whenever the target function is within the search space of the IT data structure and competitive results when it is not. Overall, the algorithm found a good compromise between accuracy and simplicity for all the generated models.

DENSER: Deep Evolutionary Network Structured Representation

Deep Evolutionary Network Structured Representation (DENSER) is a novel approach to automatically design Artificial Neural Networks (ANNs) using Evolutionary Computation (EC). The algorithm not only searches for the best network topology (e.g., number of layers, type of layers), but also tunes hyper-parameters, such as, learning parameters or data augmentation parameters. The automatic design is achieved using a representation with two distinct levels, where the outer level encodes the general structure of the network, i.e., the sequence of layers, and the inner level encodes the parameters associated with each layer. The allowed layers and hyper-parameter value ranges are defined by means of a human-readable Context-Free Grammar. DENSER was used to evolve ANNs for two widely used image classification benchmarks obtaining an average accuracy result of up to 94.27% on the CIFAR-10 dataset, and of 78.75% on the CIFAR-100. To the best of our knowledge, our CIFAR-100 results are the highest performing models generated by methods that aim at the automatic design of Convolutional Neural Networks (CNNs), and is amongst the best for manually designed and fine-tuned CNNs .

Testing Optimality of Sequential Decision-Making

This paper provides a statistical method to test whether a system that performs a binary sequential hypothesis test is optimal in the sense of minimizing the average decision times while taking decisions with given reliabilities. The proposed method requires samples of the decision times, the decision outcomes, and the true hypotheses, but does not require knowledge on the statistics of the observations or the properties of the decision-making system. The method is based on fluctuation relations for decision time distributions which are proved for sequential probability ratio tests. These relations follow from the martingale property of probability ratios and hold under fairly general conditions. We illustrate these tests with numerical experiments and discuss potential applications.

A practical tutorial on autoencoders for nonlinear feature fusion: Taxonomy, models, software and guidelines

Many of the existing machine learning algorithms, both supervised and unsupervised, depend on the quality of the input characteristics to generate a good model. The amount of these variables is also important, since performance tends to decline as the input dimensionality increases, hence the interest in using feature fusion techniques, able to produce feature sets that are more compact and higher level. A plethora of procedures to fuse original variables for producing new ones has been developed in the past decades. The most basic ones use linear combinations of the original variables, such as PCA (Principal Component Analysis) and LDA (Linear Discriminant Analysis), while others find manifold embeddings of lower dimensionality based on non-linear combinations, such as Isomap or LLE (Linear Locally Embedding) techniques. More recently, autoencoders (AEs) have emerged as an alternative to manifold learning for conducting nonlinear feature fusion. Dozens of AE models have been proposed lately, each with its own specific traits. Although many of them can be used to generate reduced feature sets through the fusion of the original ones, there also AEs designed with other applications in mind. The goal of this paper is to provide the reader with a broad view of what an AE is, how they are used for feature fusion, a taxonomy gathering a broad range of models, and how they relate to other classical techniques. In addition, a set of didactic guidelines on how to choose the proper AE for a given task is supplied, together with a discussion of the software tools available. Finally, two case studies illustrate the usage of AEs with datasets of handwritten digits and breast cancer.

SpectralNet: Spectral Clustering using Deep Neural Networks

Spectral clustering is a leading and popular technique in unsupervised data analysis. Two of its major limitations are scalability and generalization of the spectral embedding (i.e., out-of-sample-extension). In this paper we introduce a deep learning approach to spectral clustering that overcomes the above shortcomings. Our network, which we call SpectralNet, learns a map that embeds input data points into the eigenspace of their associated graph Laplacian matrix and subsequently clusters them. We train SpectralNet using a procedure that involves constrained stochastic optimization. Stochastic optimization allows it to scale to large datasets, while the constraints, which are implemented using a special-purpose output layer, allow us to keep the network output orthogonal. Moreover, the map learned by SpectralNet naturally generalizes the spectral embedding to unseen data points. To further improve the quality of the clustering, we replace the standard pairwise Gaussian affinities with affinities leaned from unlabeled data using a Siamese network. Additional improvement can be achieved by applying the network to code representations produced, e.g., by standard autoencoders. Our end-to-end learning procedure is fully unsupervised. In addition, we apply VC dimension theory to derive a lower bound on the size of SpectralNet. State-of-the-art clustering results are reported on the Reuters dataset. Our implementation is publicly available at https://…/SpectralNet .

Principal component analysis for big data

Big data is transforming our world, revolutionizing operations and analytics everywhere, from financial engineering to biomedical sciences. The complexity of big data often makes dimension reduction techniques necessary before conducting statistical inference. Principal component analysis, commonly referred to as PCA, has become an essential tool for multivariate data analysis and unsupervised dimension reduction, the goal of which is to find a lower dimensional subspace that captures most of the variation in the dataset. This article provides an overview of methodological and theoretical developments of PCA over the last decade, with focus on its applications to big data analytics. We first review the mathematical formulation of PCA and its theoretical development from the view point of perturbation analysis. We then briefly discuss the relationship between PCA and factor analysis as well as its applications to large covariance estimation and multiple testing. PCA also finds important applications in many modern machine learning problems, and we focus on community detection, ranking, mixture model and manifold learning in this paper.

Intelligence Graph

In fact, there exist three genres of intelligence architectures: logics (e.g. \textit{Random Forest, A^* Searching}), neurons (e.g. \textit{CNN, LSTM}) and probabilities (e.g. \textit{Naive Bayes, HMM}), all of which are incompatible to each other. However, to construct powerful intelligence systems with various methods, we propose the intelligence graph (short as \textbf{\textit{iGraph}}), which is composed by both of neural and probabilistic graph, under the framework of forward-backward propagation. By the paradigm of iGraph, we design a recommendation model with semantic principle. First, the probabilistic distributions of categories are generated from the embedding representations of users/items, in the manner of neurons. Second, the probabilistic graph infers the distributions of features, in the manner of probabilities. Last, for the recommendation diversity, we perform an expectation computation then conduct a logic judgment, in the manner of logics. Experimentally, we beat the state-of-the-art baselines and verify our conclusions.

Ontology-based Approach for Semantic Data Extraction from Social Big Data: State-of-the-art and Research Directions

A challenge of managing and extracting useful knowledge from social media data sources has attracted much attention from academic and industry. To address this challenge, semantic analysis of textual data is focused in this paper. We propose an ontology-based approach to extract semantics of textual data and define the domain of data. In other words, we semantically analyse the social data at two levels i.e. the entity level and the domain level. We have chosen Twitter as a social channel challenge for a purpose of concept proof. Domain knowledge is captured in ontologies which are then used to enrich the semantics of tweets provided with specific semantic conceptual representation of entities that appear in the tweets. Case studies are used to demonstrate this approach. We experiment and evaluate our proposed approach with a public dataset collected from Twitter and from the politics domain. The ontology-based approach leverages entity extraction and concept mappings in terms of quantity and accuracy of concept identification.

Reasons and Means to Model Preferences as Incomplete

Literature involving preferences of artificial agents or human beings often assume their preferences can be represented using a complete transitive binary relation. Much has been written however on different models of preferences. We review some of the reasons that have been put forward to justify more complex modeling, and review some of the techniques that have been proposed to obtain models of such preferences.

FOTS: Fast Oriented Text Spotting with a Unified Network

Incidental scene text spotting is considered one of the most difficult and valuable challenges in the document analysis community. Most existing methods treat text detection and recognition as separate tasks. In this work, we propose a unified end-to-end trainable Fast Oriented Text Spotting (FOTS) network for simultaneous detection and recognition, sharing computation and visual information among the two complementary tasks. Specially, RoIRotate is introduced to share convolutional features between detection and recognition. Benefiting from convolution sharing strategy, our FOTS has little computation overhead compared to baseline text detection network, and the joint training method learns more generic features to make our method perform better than these two-stage methods. Experiments on ICDAR 2015, ICDAR 2017 MLT, and ICDAR 2013 datasets demonstrate that the proposed method outperforms state-of-the-art methods significantly, which further allows us to develop the first real-time oriented text spotting system which surpasses all previous state-of-the-art results by more than 5% on ICDAR 2015 text spotting task while keeping 22.6 fps.

Accelerated Training for Massive Classification via Dynamic Class Selection

Massive classification, a classification task defined over a vast number of classes (hundreds of thousands or even millions), has become an essential part of many real-world systems, such as face recognition. Existing methods, including the deep networks that achieved remarkable success in recent years, were mostly devised for problems with a moderate number of classes. They would meet with substantial difficulties, e.g. excessive memory demand and computational cost, when applied to massive problems. We present a new method to tackle this problem. This method can efficiently and accurately identify a small number of ‘active classes’ for each mini-batch, based on a set of dynamic class hierarchies constructed on the fly. We also develop an adaptive allocation scheme thereon, which leads to a better tradeoff between performance and cost. On several large-scale benchmarks, our method significantly reduces the training cost and memory demand, while maintaining competitive performance.

Artificial Intelligence (AI) Methods in Optical Networks: A Comprehensive Survey

Artificial intelligence (AI) is an extensive scientific discipline which enables computer systems to solve problems by emulating complex biological processes such as learning, reasoning and self-correction. This paper presents a comprehensive review of the application of AI techniques for improving performance of optical communication systems and networks. The use of AI-based techniques is first studied in applications related to optical transmission, ranging from the characterization and operation of network components to performance monitoring, mitigation of nonlinearities, and quality of transmission estimation. Then, applications related to optical network control and management are also reviewed, including topics like optical network planning and operation in both transport and access networks. Finally, the paper also presents a summary of opportunities and challenges in optical networking where AI is expected to play a key role in the near future.

Negative Binomial Matrix Factorization for Recommender Systems

We introduce negative binomial matrix factorization (NBMF), a matrix factorization technique specially designed for analyzing over-dispersed count data. It can be viewed as an extension of Poisson matrix factorization (PF) perturbed by a multiplicative term which models exposure. This term brings a degree of freedom for controlling the dispersion, making NBMF more robust to outliers. We show that NBMF allows to skip traditional pre-processing stages, such as binarization, which lead to loss of information. Two estimation approaches are presented: maximum likelihood and variational Bayes inference. We test our model with a recommendation task and show its ability to predict user tastes with better precision than PF.

RobustGaSP: Robust Gaussian Stochastic Process Emulation in R

Gaussian stochastic process (GaSP) emulation is a powerful tool for approximating computationally intensive computer models. However, estimation of parameters in the GaSP emulator is a challenging task. No closed-form estimator is available and many numerical problems arise with standard estimates, e.g., the maximum likelihood estimator (MLE). In this package, we implement a marginal posterior mode estimator, for special priors and parameterizations, an estimation method that meets the robust parameter estimation criteria discussed in \cite{Gu2016thesis,Gu2016robustness}; mathematical reasons are provided therein to explain why robust parameter estimation can greatly improve predictive performance of the emulator. The package also allows inert inputs (inputs that almost have no effect on the variability of a function) to be identified from the marginal posterior mode estimation, at no extra computational cost. The package can be operated in a default mode, but also allows numerous user specifications, such as the capability of specifying trend functions and noise terms. Examples are studied herein to highlight the performance of the package in terms of out-of-sample prediction.}

Identifying emergency stages in Facebook posts of police departments with convolutional and recurrent neural networks and support vector machines
Secretary problem: graphs, matroids and greedoids
Algorithms, Bounds, and Strategies for Entangled XOR Games
A Novel Approach to Skew-Detection and Correction of English Alphabets for OCR
On Optimizing Operator Fusion Plans for Large-Scale Machine Learning in SystemML
QuickCast: Fast and Efficient Inter-Datacenter Transfers using Forwarding Tree Cohorts
Exploring Architectures, Data and Units For Streaming End-to-End Speech Recognition with RNN-Transducer
The geometry of rank decompositions of matrix multiplication II: $3\times 3$ matrices
A Concentration Result of Estimating Phi-Divergence using Data Dependent Partition
Phylogenetic trees and homomorphisms
Utilizing Semantic Visual Landmarks for Precise Vehicle Navigation
Accounting for unobserved covariates with varying degrees of estimability in high dimensional experimental data
Panoptic Segmentation
Attack Analysis and Resilient Control Design for Discrete-time Distributed Multi-agent Systems
Eigenvalues of random lifts and polynomial of random permutations matrices
Moment bounds of a class of stochastic heat equations driven by space-time colored noise in bounded domains
A Novel Feature Descriptor for Image Retrieval by Combining Modified Color Histogram and Diagonally Symmetric Co-occurrence Texture Pattern
Deep convolutional neural networks for segmenting 3D in vivo multiphoton images of vasculature in Alzheimer disease mouse models
Deep Spatial Feature Reconstruction for Partial Person Re-identification: Alignment-Free Approach
Gradient-based Optimization for Regression in the Functional Tensor-Train Format
Recovery of Point Clouds on Surfaces: Application to Image Reconstruction
Recovery of Noisy Points on Band-limited Surfaces: Kernel Methods Re-explained
Limits of memory coefficient in measuring correlated bursts
Reliable OFDM Transmission with Ultra-Low Resolution ADC
Differential Geometry for Model Independent Analysis of Images and Other Non-Euclidean Data: Recent Developments
Learning from Mutants: Using Code Mutation to Learn and Monitor Invariants of a Cyber-Physical System
Neural Networks in Adversarial Setting and Ill-Conditioned Weight Space
Circuit Complexity of Bounded Planar Cutwidth Graph Matching
Instance Embedding Transfer to Unsupervised Video Object Segmentation
Rapid Information Transfer in Networks with Delayed Self Reinforcement
Joint Content Delivery and Caching Placement via Dynamic Programming
Joint Optic Disc and Cup Segmentation Based on Multi-label Deep Network and Polar Transformation
A discrete event traffic model explaining the traffic phases of the train dynamics in a metro line system with a junction
Topological Tracking of Connected Components in Image Sequences
Secure communication over fully quantum Gel’fand-Pinsker wiretap channel
Modulus Zero-Forcing Detection for MIMO Channels
Prediction of corrosions in Gas and Oil pipelines based on the theory of records
On maximum distance separable group codes with complementary dual
A New Design Paradigm for Secure Full-Duplex Multiuser Systems
Phase Transition of Convex Programs for Linear Inverse Problems with Multiple Prior Constraints
Joint convolutional neural pyramid for depth map super-resolution
A New Wald Test for Hypothesis Testing Based on MCMC outputs
Optimal Learning from the Doob-Dynkin lemma
Integrated quantile functions: properties and applications
A Model-Free Selection Criterion For The Mixing Coefficient Of Spatial Max-Mixture Models
Sentence Object Notation: Multilingual sentence notation based on Wordnet
Joint Uplink and Downlink Resource Configuration for Ultra-reliable and Low-latency Communications
The Capacity on Degraded Relay Broadcast Channel
ViZDoom: DRQN with Prioritized Experience Replay, Double-Q Learning, & Snapshot Ensembling
The risk model with stochastic premiums, dependence and a threshold dividend strategy
A Comprehensive Bayesian Treatment of the Universal Kriging parameters with Matérn correlation kernels
Connections between a system of Forward-Backward SDEs and Backward Stochastic PDEs related to the utility maximization problem
Double barrier reflected BSDEs with stochastic Lipschitz coefficient
Local probabilities of randomly stopped sums of power law lattice random variables
On the exact maximum induced density of almost all graphs and their inducibility
Implementation of Deep Convolutional Neural Network in Multi-class Categorical Image Classification
Information Bottleneck on General Alphabets
Computational complexity lower bounds of certain discrete Radon transform approximations
Gaps between prime numbers and tensor rank of multiplication in finite fields
Polynomial-based rotation invariant features
Slowing Down Top Trees for Better Worst-Case Bounds
Intrinsic Gaussian processes on complex constrained domains
Theoretical links between universal and Bayesian compressed sensing algorithms
Universality and Thouless energy in the supersymmetric Sachdev-Ye-Kitaev Model
Randomized Linear Algebra Approaches to Estimate the Von Neumann Entropy of Density Matrices
Forcing large tight components in 3-graphs
Live Intrinsic Material Estimation
Exact Calculation of Mean-Square Error of Approximation of Multiple Ito Stochastic integrals for the Method, Based on the Multiple Fourier Series
Adaptive Energy-aware Scheduling of Dynamic Event Analytics across Edge and Cloud Resources
Convergence rates of Forward–Douglas–Rachford splitting method
Independence number of graphs with a prescribed number of cliques
On Periodicity Lemma for Partial Words
Generalizing the Kawaguchi-Kyan bound to stochastic parallel machine scheduling
On certain edge-transitive bicirculants
Self-learning Monte Carlo with Deep Neural Networks
Computing the Sparse Matrix Vector Product using Block-Based Kernels Without Zero Padding on Processors with AVX-512 Instructions
Stability and pre-thermalization in chains of classical kicked rotors
Coins and Logic
Graph switching, 2-ranks, and graphical Hadamard matrices
Automorphism groups and Ramsey properties of sparse graphs
Optimization-based AMP for Phase Retrieval: The Impact of Initialization and $\ell_2$-regularization
Rapid, concurrent and adaptive extreme scale binding free energy calculation
A note on the problem of prisoners and hats
Fingerprint Distortion Rectification using Deep Convolutional Neural Networks
Predicting Chronic Disease Hospitalizations from Electronic Health Records: An Interpretable Classification Approach
Expansion formulas for European quanto options in a local volatility FX-LIBOR model
Radial basis function collocation method for decoupled fractional Laplacian wave equations
Deep Learning for Forecasting Stock Returns in the Cross-Section
Binary Extended Formulations
The distribution of overlaps between eigenvectors of Ginibre matrices
Generalized Similarity U: A Non-parametric Test of Association Based on Similarity
Probabilistic max-plus schemes for solving Hamilton-Jacobi-Bellman equations
Bounded-Velocity Stochastic Control for Dynamic Resource Allocation
A Decision-theoretic Approach to Detection-based Target Search with a UAV
A short note on doubly substochastic analog of Birkhoff’s theorem
Depth Not Needed – An Evaluation of RGB-D Feature Encodings for Off-Road Scene Understanding by Convolutional Neural Network
Multistep Neural Networks for Data-driven Discovery of Nonlinear Dynamical Systems
Crossing Generative Adversarial Networks for Cross-View Person Re-identification
Sparse Bayesian ARX models with flexible noise distributions
Constructing Metropolis-Hastings proposals using damped BFGS updates
Cooperative Ambient Backscatter Communications for Green Internet-of-Things
Approximate Ranking from Pairwise Comparisons
Deep Learning Reconstruction for 9-View Dual Energy CT Baggage Scanner
Cross-domain Human Parsing via Adversarial Feature and Label Adaptation
ICFVR 2017: 3rd International Competition on Finger Vein Recognition
Improved Bounds on Lossless Source Coding and Guessing Moments via Rényi Measures
How to estimate the sample mean and standard deviation from the five number summary?
Ultra-Reliable and Low-Latency Wireless Communication: Tail, Risk and Scale
DeepTriage: Exploring the Effectiveness of Deep Learning for Bug Triaging
Object segmentation in depth maps with one user click and a synthetically trained fully convolutional network
Minimal convex majorants of functions and Demyanov–Rubinov super(sub)differentials
Ergodic BSDE with an unbounded and multiplicative underlying diffusion and application to large time behavior of viscosity solution of HJB equation
Inequality Constrained Multilevel Models
Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor
Approximate solutions to large nonsymmetric differential Riccati problems
Lazy Restless Bandits for Decision Making with Limited Observation Capability: Applications in Wireless Networks
Microscopic Travel Time Analysis of Bottleneck Experiments
The mathematics of asymptotic stability in the Kuramoto model
Validity of Borodin and Kostochka Conjecture for {4 Times K1}-free Graphs
PixelLink: Detecting Scene Text via Instance Segmentation
Text Extraction and Retrieval from Smartphone Screenshots: Building a Repository for Life in Media
Semantic Segmentation via Highly Fused Convolutional Network with Multiple Soft Cost Functions
Understanding the connections between species distribution models
Schurity and separability of quasiregular coherent configurations
VnCoreNLP: A Vietnamese Natural Language Processing Toolkit
Mobility and Congestion in Dynamical Multilayer Networks with Finite Storage Capacity
Covariant Schrödinger semigroups on noncompact Riemannian manifolds
A family of multigraphs with large palette index
A simple model for low variability in neural spike trains
The cost of controlling strongly degenerate parabolic equations
Codegree Turán density of complete $r$-uniform hypergraphs
Prediction Error Bounds for Linear Regression With the TREX
Demystifying MMD GANs
A fully automated framework for lung tumour detection, segmentation and analysis
String Periods in the Order-Preserving Model
Towards Application Portability on Blockchains
Practical Challenges in Explicit Ethical Machine Reasoning
Overcoming catastrophic forgetting with hard attention to the task
SmartTennisTV: Automatic indexing of tennis videos
Fit to speak – Physical fitness is associated with reduced language decline in healthy ageing
Design and Implementation of a Polar Codes Blind Detection Scheme
A Large Dataset for Improving Patch Matching
Deep Reinforcement Learning based Optimal Control of Hot Water Systems
Finite semilattices with many congruences
Variational analysis and variational rationality in behavioral sciences: stationary traps
Deep Cross Polarimetric Thermal-to-visible Face Recognition
Model Class Reliance: Variable Importance Measures for any Machine Learning Model Class, from the ‘Rashomon’ Perspective
The Action of Young Subgroups on the Partition Complex
Hats: all or nothing
Correlation between clustering and degree in affiliation networks
Bayesian Constraint Relaxation
A Quantitative Analysis of Multi-Winner Rules
A novel calibration framework for survival analysis when a binary covariate is measured at sparse time points
The Mutating Contact Process: Model Introduction and Qualitative Analysis of Phase Transitions in its Survival
Slugbot: An Application of a Novel and Scalable Open Domain Socialbot Framework
Improvement to the Prediction of Fuel Cost Distributions Using ARIMA Model
Bayesian uncertainty analysis establishes the link between the parameter space of a complex model of hormonal crosstalk in Arabidopsis root development and experimental measurements
Asymptotic bounds for spherical codes
An Implementation of Back-Propagation Learning on GF11, a Large SIMD Parallel Computer
A bound on the inducibility of cycles
Plan in 2D, execute in 3D: An augmented reality solution for cup placement in total hip arthroplasty
On-the-fly Augmented Reality for Orthopaedic Surgery Using a Multi-Modal Fiducial
Expansion of Triple Stratonovich Stochastic Integrals, Based on Generalized Multiple Fourier Series, Converging in the Mean: General Case of Series Summation
How Many Rounds Should You Expect in Urn Solitaire?
LoopSmart: Smart Visual SLAM Through Surface Loop Closure
Functional control of network dynamics using designed Laplacian spectra
Object Referring in Videos with Language and Human Gaze
Hitting probabilities and expected hitting times under a weak drift: on the 1/3-rule and beyond
On a class of polynomials connected to Bell polynomials
Estimation in the Spiked Wigner Model: A Short Proof of the Replica Formula
Differentially Private Releasing via Deep Generative Model
Combination of Hyperband and Bayesian Optimization for Hyperparameter Optimization in Deep Learning
Learning $3$D-FilterMap for Deep Convolutional Neural Networks
On Convergence to Essential Singularities
Learning from Pseudo-Randomness With an Artificial Neural Network – Does God Play Pseudo-Dice?
High Throughput Low Delay Wireless Multicast via Multi-Channel Moving Window Codes
Total Capture: A 3D Deformation Model for Tracking Faces, Hands, and Bodies
Enabling Strong Database Integrity using Trusted Execution Environments
Dynamic Island Model based on Spectral Clustering in Genetic Algorithm
Optimal Pilot Symbols Ratio in terms of Spectrum and Energy Efficiency in Uplink CoMP Networks
Deep learning for word-level handwritten Indic script identification
VSE-ens: Visual-Semantic Embeddings with Efficient Negative Sampling
aNMM: Ranking Short Answer Texts with Attention-Based Neural Matching Model
Parity Considerations in Rogers-Ramanujan-Gordon Type Overpartitions
Energy-Efficient User Access Control and Resource Allocation in HCNs with Non-Ideal Circuitry
Gauged Mini-Bucket Elimination for Approximate Inference
Energy Efficiency Maximization for CoMP Joint Transmission with Non-Ideal Power Amplifiers
Hydrodynamics in a condensation regime: the disordered asymmetric zero-range process
Control charts in a multi product environment: An application study
Energy Efficiency Analysis of Heterogeneous Cellular Networks With Extra Cell Range Expansion
Achieving strongly negative scattering asymmetry factor in random media composed of dual-dipolar particles
Incipient Fault Detection and Location in Distribution Networks: A Data-Driven Approach
VulDeePecker: A Deep Learning-Based System for Vulnerability Detection
Coordinated Motion Planning: Reconfiguring a Swarm of Labeled Robots with Bounded Stretch
Efficient Image Evidence Analysis of CNN Classification Results
Cross-Sensor Iris Recognition: LG4000-to-LG2200 Comparison
Empirical Bayes analysis of spike and slab posterior distributions
Moving Vehicle Detection Using AdaBoost and Haar-Like Feature in Surveillance Videos
Variable-Length Intrinsic Randomness Allowing Positive Value of the Average Variational Distance
Gatekeeping Algorithms with Human Ethical Bias: The ethics of algorithms in archives, libraries and society
Energy Efficiency Maximization of Full-Duplex Two-Way Relay With Non-Ideal Power Amplifiers and Non-Negligible Circuit Power
Tree based classification of tabla strokes
A Multi-task Learning Approach for Improving Product Title Compression with User Search Log Data
Semantic-aware Grad-GAN for Virtual-to-Real Urban Scene Adaption
Entropy production rate as a criterion for inconsistency in decision theory
Energy-efficient resource allocation for hybrid bursty services in multi-relay OFDM networks
A relativistic extension of Hopfield neural networks via the mechanical analogy
Transformation of arbitrary distributions to the normal distribution with application to EEG test-retest reliability
Nonparametric Stochastic Contextual Bandits
Improving the Secrecy of Distributed Storage Systems using Interference Alignment
Learning Feature Representations for Keyphrase Extraction
3D-DETNet: a Single Stage Video-Based Vehicle Detector
Inverse Uncertainty Quantification using the Modular Bayesian Approach based on Gaussian Process, Part 1: Theory
Multiple addition, deletion and restriction theorems for hyperplane arrangements
Dynamic and granular loss reserving with copulae
Selective Fair Scheduling over Fading Channels
Sparse highly connected spanning subgraphs in dense directed graphs
Spatially Coupled Sparse Regression Codes: Design and State Evolution Analysis
Monte Carlo integration with a growing number of control variates
Closed-form marginal likelihood in Gamma-Poisson factorization
Neighborhood-Prime Labelings of Trees and Other Classes of Graphs
Scheduling Policies for Minimizing Age of Information in Broadcast Wireless Networks
Bayesian calibration of a numerical code for prediction
Finding the seed of uniform attachment trees
Scaling beyond the Imry-Ma length: Evidence of a glassy first-order phase transition in the $3D$ Random Field $XY$ Model
Population model learned on different stimulus ensembles predicts network responses in the retina
Towards Understanding and Answering Multi-Sentence Recommendation Questions on Tourism
Shielding Google’s language toxicity model against adversarial attacks
On the Logic (plus some history and philosophy) of Statistical Tests and Scientific Investigation
Seymour’s conjecture on 2-connected graphs of large pathwidth
Design and Performance Characterization of RADICAL-Pilot on Titan
Learning Implicit Brain MRI Manifolds with Deep Learning
Hi-Fi: Hierarchical Feature Integration for Skeleton Detection
Maximum entropy models reveal the correlation structure in cortical neural activity during wakefulness and sleep
The Arbitrarily Varying Relay Channel
Controllability of the Navier-Stokes equation in a rectangle with a little help of an interior phantom force
Complex Reaction Kinetics in Chemistry: A unified picture suggested by Mechanics in Physics
The dynamical structure of political corruption networks
Near Optimal Coded Data Shuffling for Distributed Learning