A Hierarchical Spectral Method for Extreme Classification

Extreme classification problems are multiclass and multilabel classification problems where the number of outputs is so large that straightforward strategies are neither statistically nor computationally viable. One strategy for dealing with the computational burden is via a tree decomposition of the output space. While this typically leads to training and inference that scales sublinearly with the number of outputs, it also results in reduced statistical performance. In this work, we identify two shortcomings of tree decomposition methods, and describe two heuristic mitigations. We compose these with a novel eigenvalue technique for constructing the tree which is essentially hierarchical orthonormal partial least squares. The end result is a computationally efficient algorithm that provides good statistical performance on several extreme data sets.


Conditional Poisson process approximation

Point processes are an essential tool when we are interested in where in time or space events occur. The basic starting point for point processes is usually the Poisson process. Over the years, Stein’s method has been developed with a great deal of success for Poisson point process approximation. When studying rare events though, typically one only begins modelling after the occurrence of such an event. As a result, a point process that is conditional upon at least one atom, is arguably more appropriate in certain applications. In this paper, we develop Stein’s method for conditional Poisson point process approximation, and closely examine what sort of difficulties that this conditioning entails. By utilising a characterising immigration-death process, we calculate bounds for the Stein factors.


Stochastic Expectation Propagation for Large Scale Gaussian Process Classification

A method for large scale Gaussian process classification has been recently proposed based on expectation propagation (EP). Such a method allows Gaussian process classifiers to be trained on very large datasets that were out of the reach of previous deployments of EP and has been shown to be competitive with related techniques based on stochastic variational inference. Nevertheless, the memory resources required scale linearly with the dataset size, unlike in variational methods. This is a severe limitation when the number of instances is very large. Here we show that this problem is avoided when stochastic EP is used to train the model.


Taxonomy of Pathways to Dangerous AI

In order to properly handle a dangerous Artificially Intelligent (AI) system it is important to understand how the system came to be in such a state. In popular culture (science fiction movies/books) AIs/Robots became self-aware and as a result rebel against humanity and decide to destroy it. While it is one possible scenario, it is probably the least likely path to appearance of dangerous AI. In this work, we survey, classify and analyze a number of circumstances, which might lead to arrival of malicious AI. To the best of our knowledge, this is the first attempt to systematically classify types of pathways leading to malevolent AI. Previous relevant work either surveyed specific goals/meta-rules which might lead to malevolent behavior in AIs (\’Ozkural, 2014) or reviewed specific undesirable behaviors AGIs can exhibit at different stages of its development (Alexey Turchin, July 10 2015, July 10, 2015).


Generalized Multiple Importance Sampling

Importance Sampling methods are broadly used to approximate posterior distributions or some of their moments. In its standard approach, samples are drawn from a single proposal distribution and weighted properly. However, since the performance depends on the mismatch between the targeted and the proposal distributions, several proposal densities are often employed for the generation of samples. Under this Multiple Importance Sampling (MIS) scenario, many works have addressed the selection or adaptation of the proposal distributions, interpreting the sampling and the weighting steps in different ways. In this paper, we establish a general framework for sampling and weighing procedures when more than one proposal are available. The most relevant MIS schemes in the literature are encompassed within the new framework, and, moreover novel valid schemes appear naturally. All the MIS schemes are compared and ranked in terms of the variance of the associated estimators. Finally, we provide illustrative examples which reveal that, even with a good choice of the proposal densities, a careful interpretation of the sampling and weighting procedures can make a significant difference in the performance of the method.


Reducing the Training Time of Neural Networks by Partitioning

This paper presents a new method for pre-training neural networks that can decrease the total training time for a neural network while maintaining the final performance, which motivates its use on deep neural networks. By partitioning the training task in multiple training subtasks with sub-models, which can be performed independently and in parallel, it is shown that the size of the sub-models reduces almost quadratically with the number of subtasks created, quickly scaling down the sub-models used for the pre-training. The sub-models are then merged to provide a pre-trained initial set of weights for the original model. The proposed method is independent of the other aspects of the training, such as architecture of the neural network, training method, and objective, making it compatible with a wide range of existing approaches. The speedup without loss of performance is validated experimentally on MNIST and on CIFAR10 data sets, also showing that even performing the subtasks sequentially can decrease the training time. Moreover, we show that larger models may present higher speedups and conjecture about the benefits of the method in distributed learning systems.


Black-box $α$-divergence Minimization

A service system with packing constraints: Greedy randomized algorithm achieving sublinear in scale optimality gap

Cooperative epidemics on multiplex networks

Learning Communities in the Presence of Errors

On the geometry of output-code multi-class learning

Checkpointing with Minimal Recover in Adhocnet based TMR

Sliced Wasserstein Kernels for Probability Distributions

Accelerated Newton Iteration: Roots of Black Box Polynomials and Matrix Eigenvalues

Between primitive and $2$-transitive: Synchronization and its friends

Towards three-dimensional conformal probability

Semi-supervised Tuning from Temporal Coherence

Ultrasensitivity and sharp threshold theorems for multisite systems

A Study on Splay Trees

Jeffreys priors for mixture estimation

Asynchronous Decentralized 20 Questions for Adaptive Search

Sequence-structure relations of biopolymers

The CLLC conjecture holds for cyclic outer permutations

k-way Hypergraph Partitioning via n-Level Recursive Bisection

Out of the Cage of Shadows

A tight relation between series-parallel graphs and Bipartite Distance Hereditary graphs

USFD: Twitter NER with Drift Compensation and Linked Data

The CTU Prague Relational Learning Repository

Cacti with maximum Kirchhoff index

Scenario generation for stochastic programs with tail risk measures

Solutions of Reeder’s Puzzle

Tiny Descriptors for Image Retrieval with Unsupervised Triplet Hashing

Investigating the stylistic relevance of adjective and verb simile markers

Improvement of code behaviour in a design of experiments by metamodeling

Tiling with Small Tiles

Semantic processing of EHR data for clinical research

Learning With Adversary

Information retrieval in folktales using natural language processing

Factor Copula Models for Spatial Data

Identification by Edge Contraction in Linear Structural Equation Models

Power law in random multiplicative processes with spatio-temporal correlated multipliers

Optimality of Training/Test Size and Resampling Effectiveness of Cross-Validation Estimators of the Generalization Error

On s-extremal singly even self-dual [24k+8,12k+4,4k+2] codes

Uniform Integrability of the OLS Estimators, and the Convergence of their Moments

PCS: Predictive Component-level Scheduling for Reducing Tail Latency in Cloud Online Services

Asymptotic expansion of the invariant measure for ballistic random walk in the low disorder regime

ExtraPush for Convex Smooth Decentralized Optimization over Directed Networks

Sharing Social Network Data: Differentially Private Estimation of Exponential-Family Random Graph Models

Diagonal Form of the Varchenko Matrices of Oriented Matroids

Detecting events and key actors in multi-person videos

Spectral-Spatial Classification of Hyperspectral Image Using Autoencoders

A New Framework for Strong Connectivity and 2-Connectivity in Directed Graphs

Efficient Construction of Local Parametric Reduced Order Models Using Machine Learning Techniques

A disembodied developmental robotic agent called Samu Bátfai

Hodge Theory for Combinatorial Geometries

Visual Language Modeling on CNN Image Representations