Making AI meaningful again

Artificial intelligence (AI) research enjoyed an initial period of enthusiasm in the 1970s and 80s. But this enthusiasm was tempered by a long interlude of frustration when genuinely useful AI applications failed to be forthcoming. Today, we are experiencing once again a period of enthusiasm, fired above all by the successes of the technology of deep neural networks or deep machine learning. In this paper we draw attention to what we take to be serious problems underlying current views of artificial intelligence encouraged by these successes, especially in the domain of language processing. We then show an alternative approach to language-centric AI, in which we identify a role for philosophy.

Trace-back Along Capsules and Its Application on Semantic Segmentation

In this paper, we propose a capsule-based neural network model to solve the semantic segmentation problem. By taking advantage of the extractable part-whole dependencies available in capsule layers, we derive the probabilities of the class labels for individual capsules through a recursive, layer-by-layer procedure. We model this procedure as a traceback pipeline and take it as a central piece to build an end-to-end segmentation network. Under the proposed framework, image-level class labels and object boundaries are jointly sought in an explicit manner, which poses a significant advantage over the state-of-the-art fully convolutional network (FCN) solutions. Experiments conducted on modified MNIST and neuroimages demonstrate that our model considerably enhance the segmentation performance compared to the leading FCN variant.

Algorithmic Bayesian Group Gibbs Selection

Bayesian model selection, with precedents in George and McCulloch (1993) and Abramovich et al. (1998), support credibility measures that relate model uncertainty, but computation can be costly when sparse priors are approximate. We design an exact selection engine suitable for Gauss noise, t-distributed noise, and logistic learning, benefiting from data-structures derived from coordinate descent lasso. Gibbs sampler chains are stored in a compressed binary format compatible with Equi-Energy (Kou et al., 2006) tempering. We achieve a grouped-effects selection model, similar to the setting for group lasso, to determine co-entry of coefficients into the model. We derive a functional integrand for group inclusion, and introduce a new MCMC switching step to avoid numerical integration. Theorems show this step has exponential convergence to target distribution. We demonstrate a role for group selection to inform on genetic decomposition in a diallel experiment, and identify potential quantitative trait loci in p > 40K Heterogenous Stock haplotype/phenotype studies.

An interval-valued GARCH model for range-measured return processes

Range-measured return contains more information than the traditional scalar-valued return. In this paper, we propose to model the [low, high] price range as a random interval and suggest an interval-valued GARCH (Int-GARCH) model for the corresponding range-measured return process. Under the general framework of random sets, the model properties are investigated. Parameters are estimated by the maximum likelihood method, and the asymptotic properties are established. Empirical application to stocks and financial indices data sets suggests that our Int-GARCH model overall outperforms the traditional GARCH for both in-sample estimation and out-of-sample prediction of volatility.

A Truthful FPTAS Mechanism for Emergency Demand Response in Colocation Data Centers

Demand response (DR) is not only a crucial solution to the demand side management but also a vital means of electricity market in maintaining power grid reliability, sustainability and stability. DR can enable consumers (e.g. data centers) to reduce their electricity consumption when the supply of electricity is a shortage. The consumers will be rewarded in the case of DR if they reduce or shift some of their energy usage during peak hours. Aiming at solving the efficiency of DR, in this paper, we present MEDR, a mechanism on emergency DR in colocation data center. First, we formalize the MEDR problem and propose a dynamic programming to solve the optimization version of the problem. We then design a deterministic mechanism as a solution to solve the MEDR problem. We show that our proposed mechanism is truthful. Next, we prove that our mechanism is an FPTAS, i.e., it can be approximated within 1 + \epsilon for any given \epsilon > 0, while the running time of our mechanism is polynomial in n and 1/\epsilon, where n is the number of tenants in the datacenter. Furthermore, we also give an auction system covering the efficient FPTAS algorithm as bidding decision program for DR in colocation datacenter. Finally, we choose a practical smart grid dataset to build a large number of datasets for simulation in performance evaluation. By evaluating metrics of the approximation ratio of our mechanism, the non-negative utility of tenants and social cost of colocation datacenter, the results demonstrate the effectiveness of our work.

Auto-DeepLab: Hierarchical Neural Architecture Search for Semantic Image Segmentation

Recently, Neural Architecture Search (NAS) has successfully identified neural network architectures that exceed human designed ones on large-scale image classification problems. In this paper, we study NAS for semantic image segmentation, an important computer vision task that assigns a semantic label to every pixel in an image. Existing works often focus on searching the repeatable cell structure, while hand-designing the outer network structure that controls the spatial resolution changes. This choice simplifies the search space, but becomes increasingly problematic for dense image prediction which exhibits a lot more network level architectural variations. Therefore, we propose to search the network level structure in addition to the cell level structure, which forms a hierarchical architecture search space. We present a network level search space that includes many popular designs, and develop a formulation that allows efficient gradient-based architecture search (3 P100 GPU days on Cityscapes images). We demonstrate the effectiveness of the proposed method on the challenging Cityscapes, PASCAL VOC 2012, and ADE20K datasets. Without any ImageNet pretraining, our architecture searched specifically for semantic image segmentation attains state-of-the-art performance.

Nonparametric Multiple Change Point Detection for Non-Stationary Times Series

This article considers a nonparametric method for detecting change points in non-stationary time series. The proposed method will divide the time series into several segments so that between two adjacent segments, the normalized spectral density functions are different. The theory is based on the assumption that within each segment, time series is a linear process, which means that our method works not only for classic time series models, e.g., causal and invertible ARMA process, but also preserves good performance for non-invertible moving average process. We show that our estimations for change points are consistent. Also, a Bayesian information criterion is applied to estimate the member of change points consistently. Simulation results as well as empirical results will be presented.

Towards Optimised Data Transport and Analytics for Edge Computing

Industrial organisations, particularly Small and Medium-sized Enterprises (SME), face a number of challenges with regard to the adoption of Industrial Internet of Things (IIoT) technologies and methods. The scope of analytics processing that can be performed on data from IIoT-enabled industrial processes is typically limited by the compute and storage resources that are available, and any investment in additional hardware that is sufficiently flexible and scalable is difficult to justify in terms of Return On Investment (ROI). We describe a distributed model of data transport and processing that eases the take-up of IIoT, whilst also enabling a capability to securely deliver more complex analysis and future insight discovery, than would be possible with traditional network architectures.

GM-PLL: Graph Matching based Partial Label Learning

Partial Label Learning (PLL) aims to learn from the data where each training example is associated with a set of candidate labels, among which only one is correct. The key to deal with such problem is to disambiguate the candidate label sets and obtain the correct assignments between instances and their candidate labels. In this paper, we interpret such assignments as instance-to-label matchings, and reformulate the task of PLL as a matching selection problem. To model such problem, we propose a novel Graph Matching based Partial Label Learning (GM-PLL) framework, where Graph Matching (GM) scheme is incorporated owing to its excellent capability of exploiting the instance and label relationship. Meanwhile, since conventional one-to-one GM algorithm does not satisfy the constraint of PLL problem that multiple instances may correspond to the same label, we extend a traditional one-to-one probabilistic matching algorithm to the many-to-one constraint, and make the proposed framework accommodate to the PLL problem. Moreover, we also propose a relaxed matching prediction model, which can improve the prediction accuracy via GM strategy. Extensive experiments on both artificial and real-world data sets demonstrate that the proposed method can achieve superior or comparable performance against the state-of-the-art methods.

Active Learning for One-Class Classification Using Two One-Class Classifiers

This paper introduces a novel, generic active learning method for one-class classification. Active learning methods play an important role to reduce the efforts of manual labeling in the field of machine learning. Although many active learning approaches have been proposed during the last years, most of them are restricted on binary or multi-class problems. One-class classifiers use samples from only one class, the so-called target class, during training and hence require special active learning strategies. The few strategies proposed for one-class classification either suffer from their limitation on specific one-class classifiers or their performance depends on particular assumptions about datasets like imbalance. Our proposed method bases on using two one-class classifiers, one for the desired target class and one for the so-called outlier class. It allows to invent new query strategies, to use binary query strategies and to define simple stopping criteria. Based on the new method, two query strategies are proposed. The provided experiments compare the proposed approach with known strategies on various datasets and show improved results in almost all situations.

Automating the search for a patent’s prior art with a full text similarity search

More than ever, technical inventions are the symbol of our society’s advance. Patents guarantee their creators protection against infringement. For an invention being patentable, its novelty and inventiveness have to be assessed. Therefore, a search for published work that describes similar inventions to a given patent application needs to be performed. Currently, this so-called search for prior art is executed with semi-automatically composed keyword queries, which is not only time consuming, but also prone to errors. In particular, errors may systematically arise by the fact that different keywords for the same technical concepts may exist across disciplines. In this paper, a novel approach is proposed, where the full text of a given patent application is compared to existing patents using machine learning and natural language processing techniques to automatically detect inventions that are similar to the one described in the submitted document. Various state-of-the-art approaches for feature extraction and document comparison are evaluated. In addition to that, the quality of the current search process is assessed based on ratings of a domain expert. The evaluation results show that our automated approach, besides accelerating the search process, also improves the search results for prior art with respect to their quality.

A Flexible Parametric Modelling Framework for Survival Analysis

We introduce a general, flexible, parametric survival modelling framework which encompasses key shapes of hazard function (constant, increasing, decreasing, up-then-down, down-then-up), various common survival distributions (log-logistic, Burr type XII, Weibull, Gompertz), and includes defective distributions (i.e., cure models). This generality is achieved using four basic distributional parameters: two scale-type parameters and two shape parameters. Generalising to covariate dependence, the scale-type regression components correspond to accelerated failure time (AFT) and proportional hazards (PH) models. Therefore, this general formulation unifies the most popular survival models which allows us to consider the practical value of possible modelling choices for survival data. Furthermore, in line with our proposed flexible baseline distribution, we advocate the use of multi-parameter regression in which more than one distributional parameter depends on covariates – rather than the usual convention of having a single covariate-dependent (scale) parameter. While many choices are available, we suggest introducing covariates through just one or other of the two scale parameters, which covers AFT and PH models, in combination with a `power’ shape parameter, which allows for more complex non-AFT/non-PH effects, while the other shape parameter remains covariate-independent, and handles automatic selection of the baseline distribution. We explore inferential issues in simulations, both with and without a covariate, with particular focus on evidence concerning the need, or otherwise, to include both AFT and PH parameters. We illustrate the efficacy of our modelling framework by investigating differences between treatment groups using data from a lung cancer study and a melanoma study. Censoring is accommodated throughout.

Efficient Bayesian Decision Tree Algorithm

Bayesian Decision Trees are known for their probabilistic interpretability. However, their construction can sometimes be costly. In this article we present a general Bayesian Decision Tree algorithm applicable to both regression and classification problems. The algorithm does not apply Markov Chain Monte Carlo and does not require a pruning step. While it is possible to construct a weighted probability tree space we find that one particular tree, the greedy-modal tree (GMT), explains most of the information contained in the numerical examples. This approach seems to perform similarly to Random Forests.

Smoothing Spline Semiparametric Density Models

Density estimation plays a fundamental role in many areas of statistics and machine learning. Parametric, nonparametric and semiparametric density estimation methods have been proposed in the literature. Semiparametric density models are flexible in incorporating domain knowledge and uncertainty regarding the shape of the density function. Existing literature on semiparametric density models is scattered and lacks a systematic framework. In this paper, we consider a unified framework based on the reproducing kernel Hilbert space for modeling, estimation, computation and theory. We propose general semiparametric density models for both a single sample and multiple samples which include many existing semiparametric density models as special cases. We develop penalized likelihood based estimation methods and computational methods under different situations. We establish joint consistency and derive convergence rates of the proposed estimators for both the finite dimensional Euclidean parameters and an infinite-dimensional functional parameter. We validate our estimation methods empirically through simulations and an application.

Penalized estimation of flexible hidden Markov models for time series of counts

Hidden Markov models are versatile tools for modeling sequential observations, where it is assumed that a hidden state process selects which of finitely many distributions generates any given observation. Specifically for time series of counts, the Poisson family often provides a natural choice for the state-dependent distributions, though more flexible distributions such as the negative binomial or distributions with a bounded range can also be used. However, in practice, choosing an adequate class of (parametric) distributions is often anything but straightforward, and an inadequate choice can have severe negative consequences on the model’s predictive performance, on state classification, and generally on inference related to the system considered. To address this issue, we propose an effectively nonparametric approach to fitting hidden Markov models to time series of counts, where the state-dependent distributions are estimated in a completely data-driven way without the need to select a distributional family. To avoid overfitting, we add a roughness penalty based on higher-order differences between adjacent count probabilities to the likelihood, which is demonstrated to produce smooth probability mass functions of the state-dependent distributions. The feasibility of the suggested approach is assessed in a simulation experiment, and illustrated in two real-data applications, where we model the distribution of i) major earthquake counts and ii) acceleration counts of an oceanic whitetip shark (Carcharhinus longimanus) over time.

Accelerated Flow for Probability distributions

This paper presents a methodology and numerical algorithms for constructing accelerated gradient flows on the space of probability distributions. In particular, we extend the recent variational formulation of accelerated gradient methods in (wibisono, et. al. 2016) from vector valued variables to probability distributions. The variational problem is modeled as a mean-field optimal control problem. The maximum principle of optimal control theory is used to derive Hamilton’s equations for the optimal gradient flow. The Hamilton’s equation are shown to achieve the accelerated form of density transport from any initial probability distribution to a target probability distribution. A quantitative estimate on the asymptotic convergence rate is provided based on a Lyapunov function construction, when the objective functional is displacement convex. Two numerical approximations are presented to implement the Hamilton’s equations as a system of N interacting particles. The continuous limit of the Nesterov’s algorithm is shown to be a special case with N=1. The algorithm is illustrated with numerical examples.