• A Hierarchical Spectral Method for Extreme Classification
Extreme classification problems are multiclass and multilabel classification problems where the number of outputs is so large that straightforward strategies are neither statistically nor computationally viable. One strategy for dealing with the computational burden is via a tree decomposition of the output space. While this typically leads to training and inference that scales sublinearly with the number of outputs, it also results in reduced statistical performance. In this work, we identify two shortcomings of tree decomposition methods, and describe two heuristic mitigations. We compose these with a novel eigenvalue technique for constructing the tree which is essentially hierarchical orthonormal partial least squares. The end result is a computationally efficient algorithm that provides good statistical performance on several extreme data sets.
• Conditional Poisson process approximation
Point processes are an essential tool when we are interested in where in time or space events occur. The basic starting point for point processes is usually the Poisson process. Over the years, Stein’s method has been developed with a great deal of success for Poisson point process approximation. When studying rare events though, typically one only begins modelling after the occurrence of such an event. As a result, a point process that is conditional upon at least one atom, is arguably more appropriate in certain applications. In this paper, we develop Stein’s method for conditional Poisson point process approximation, and closely examine what sort of difficulties that this conditioning entails. By utilising a characterising immigration-death process, we calculate bounds for the Stein factors.
• Stochastic Expectation Propagation for Large Scale Gaussian Process Classification
A method for large scale Gaussian process classification has been recently proposed based on expectation propagation (EP). Such a method allows Gaussian process classifiers to be trained on very large datasets that were out of the reach of previous deployments of EP and has been shown to be competitive with related techniques based on stochastic variational inference. Nevertheless, the memory resources required scale linearly with the dataset size, unlike in variational methods. This is a severe limitation when the number of instances is very large. Here we show that this problem is avoided when stochastic EP is used to train the model.
• Taxonomy of Pathways to Dangerous AI
In order to properly handle a dangerous Artificially Intelligent (AI) system it is important to understand how the system came to be in such a state. In popular culture (science fiction movies/books) AIs/Robots became self-aware and as a result rebel against humanity and decide to destroy it. While it is one possible scenario, it is probably the least likely path to appearance of dangerous AI. In this work, we survey, classify and analyze a number of circumstances, which might lead to arrival of malicious AI. To the best of our knowledge, this is the first attempt to systematically classify types of pathways leading to malevolent AI. Previous relevant work either surveyed specific goals/meta-rules which might lead to malevolent behavior in AIs (\’Ozkural, 2014) or reviewed specific undesirable behaviors AGIs can exhibit at different stages of its development (Alexey Turchin, July 10 2015, July 10, 2015).
• Generalized Multiple Importance Sampling
Importance Sampling methods are broadly used to approximate posterior distributions or some of their moments. In its standard approach, samples are drawn from a single proposal distribution and weighted properly. However, since the performance depends on the mismatch between the targeted and the proposal distributions, several proposal densities are often employed for the generation of samples. Under this Multiple Importance Sampling (MIS) scenario, many works have addressed the selection or adaptation of the proposal distributions, interpreting the sampling and the weighting steps in different ways. In this paper, we establish a general framework for sampling and weighing procedures when more than one proposal are available. The most relevant MIS schemes in the literature are encompassed within the new framework, and, moreover novel valid schemes appear naturally. All the MIS schemes are compared and ranked in terms of the variance of the associated estimators. Finally, we provide illustrative examples which reveal that, even with a good choice of the proposal densities, a careful interpretation of the sampling and weighting procedures can make a significant difference in the performance of the method.
• Reducing the Training Time of Neural Networks by Partitioning
This paper presents a new method for pre-training neural networks that can decrease the total training time for a neural network while maintaining the final performance, which motivates its use on deep neural networks. By partitioning the training task in multiple training subtasks with sub-models, which can be performed independently and in parallel, it is shown that the size of the sub-models reduces almost quadratically with the number of subtasks created, quickly scaling down the sub-models used for the pre-training. The sub-models are then merged to provide a pre-trained initial set of weights for the original model. The proposed method is independent of the other aspects of the training, such as architecture of the neural network, training method, and objective, making it compatible with a wide range of existing approaches. The speedup without loss of performance is validated experimentally on MNIST and on CIFAR10 data sets, also showing that even performing the subtasks sequentially can decrease the training time. Moreover, we show that larger models may present higher speedups and conjecture about the benefits of the method in distributed learning systems.
• Black-box $α$-divergence Minimization
• A service system with packing constraints: Greedy randomized algorithm achieving sublinear in scale optimality gap
• Cooperative epidemics on multiplex networks
• Learning Communities in the Presence of Errors
• On the geometry of output-code multi-class learning
• Checkpointing with Minimal Recover in Adhocnet based TMR
• Sliced Wasserstein Kernels for Probability Distributions
• Accelerated Newton Iteration: Roots of Black Box Polynomials and Matrix Eigenvalues
• Between primitive and $2$-transitive: Synchronization and its friends
• Towards three-dimensional conformal probability
• Semi-supervised Tuning from Temporal Coherence
• Ultrasensitivity and sharp threshold theorems for multisite systems
• A Study on Splay Trees
• Jeffreys priors for mixture estimation
• Asynchronous Decentralized 20 Questions for Adaptive Search
• Sequence-structure relations of biopolymers
• The CLLC conjecture holds for cyclic outer permutations
• k-way Hypergraph Partitioning via n-Level Recursive Bisection
• Out of the Cage of Shadows
• A tight relation between series-parallel graphs and Bipartite Distance Hereditary graphs
• USFD: Twitter NER with Drift Compensation and Linked Data
• The CTU Prague Relational Learning Repository
• Cacti with maximum Kirchhoff index
• Scenario generation for stochastic programs with tail risk measures
• Solutions of Reeder’s Puzzle
• Tiny Descriptors for Image Retrieval with Unsupervised Triplet Hashing
• Investigating the stylistic relevance of adjective and verb simile markers
• Improvement of code behaviour in a design of experiments by metamodeling
• Tiling with Small Tiles
• Semantic processing of EHR data for clinical research
• Learning With Adversary
• Information retrieval in folktales using natural language processing
• Factor Copula Models for Spatial Data
• Identification by Edge Contraction in Linear Structural Equation Models
• Power law in random multiplicative processes with spatio-temporal correlated multipliers
• Optimality of Training/Test Size and Resampling Effectiveness of Cross-Validation Estimators of the Generalization Error
• On s-extremal singly even self-dual [24k+8,12k+4,4k+2] codes
• Uniform Integrability of the OLS Estimators, and the Convergence of their Moments
• PCS: Predictive Component-level Scheduling for Reducing Tail Latency in Cloud Online Services
• Asymptotic expansion of the invariant measure for ballistic random walk in the low disorder regime
• ExtraPush for Convex Smooth Decentralized Optimization over Directed Networks
• Sharing Social Network Data: Differentially Private Estimation of Exponential-Family Random Graph Models
• Diagonal Form of the Varchenko Matrices of Oriented Matroids
• Detecting events and key actors in multi-person videos
• Spectral-Spatial Classification of Hyperspectral Image Using Autoencoders
• A New Framework for Strong Connectivity and 2-Connectivity in Directed Graphs
• Efficient Construction of Local Parametric Reduced Order Models Using Machine Learning Techniques
• A disembodied developmental robotic agent called Samu Bátfai
• Hodge Theory for Combinatorial Geometries
• Visual Language Modeling on CNN Image Representations
Like this:
Like Loading...