Daleel: Simplifying Cloud Instance Selection Using Machine Learning

Decision making in cloud environments is quite challenging due to the diversity in service offerings and pricing models, especially considering that the cloud market is an incredibly fast moving one. In addition, there are no hard and fast rules, each customer has a specific set of constraints (e.g. budget) and application requirements (e.g. minimum computational resources). Machine learning can help address some of the complicated decisions by carrying out customer-specific analytics to determine the most suitable instance type(s) and the most opportune time for starting or migrating instances. We employ machine learning techniques to develop an adaptive deployment policy, providing an optimal match between the customer demands and the available cloud service offerings. We provide an experimental study based on extensive set of job executions over a major public cloud infrastructure.


Convex Relaxation Regression: Efficient Optimization of Smooth Functions by Learning Their Convex Envelopes

Finding efficient and provable methods to solve non-convex optimization problems is an outstanding challenge in machine learning. A popular approach used to tackle non-convex problems is to use convex relaxation techniques to find a convex surrogate for the problem. Unfortunately, convex relaxations typically must be found on a problem-by-problem basis. Thus, providing a general-purpose strategy to estimate a convex relaxation would have a wide reaching impact. Here, we introduce Convex Relaxation Regression (CoRR), an approach for learning the convex relaxation of a wide class of smooth functions. The main idea behind our approach is to estimate the convex envelope of a function f by evaluating f at random points and then fitting a convex function to these function evaluations. We prove that, as the number T of function evaluations grows, the solution of our algorithm converges to the global minimum of f with a polynomial rate in T. Also our result scales polynomially with the dimension. Our approach enables the use of convex optimization tools to solve a broad class of non-convex optimization problems.


BISTRO: An Efficient Relaxation-Based Method for Contextual Bandits

We present efficient algorithms for the problem of contextual bandits with i.i.d. covariates, an arbitrary sequence of rewards, and an arbitrary class of policies. Our algorithm BISTRO requires d calls to the empirical risk minimization (ERM) oracle per round, where d is the number of actions. The method uses unlabeled data to make the problem computationally simple. When the ERM problem itself is computationally hard, we extend the approach by employing multiplicative approximation algorithms for the ERM. The integrality gap of the relaxation only enters in the regret bound rather than the benchmark. Finally, we show that the adversarial version of the contextual bandit problem is learnable (and efficient) whenever the full-information supervised online learning problem has a non-trivial regret guarantee (and efficient).


WebNav: A New Large-Scale Task for Natural Language based Sequential Decision Making

We propose a goal-driven web navigation as a benchmark task for evaluating an agent with abilities to understand natural language and plan on partially observed environments. In this challenging task, an agent navigates through a web site, which is represented as a graph consisting of web pages as nodes and hyperlinks as directed edges, to find a web page in which a query appears. The agent is required to have sophisticated high-level reasoning based on natural languages and efficient sequential decision making capability to succeed. We release a software tool, called WebNav, that automatically transforms a website into this goal-driven web navigation task, and as an example, we make WikiNav, a dataset constructed from the English Wikipedia containing approximately 5 million articles and more than 12 million queries for training. We evaluate two different agents based on neural networks on the WikiNav and provide the human performance. Our results show the difficulty of the task for both humans and machines. With this benchmark, we expect faster progress in developing artificial agents with natural language understanding and planning skills.


A Deep Learning Approach to Unsupervised Ensemble Learning

We show how deep learning methods can be applied in the context of crowdsourcing and unsupervised ensemble learning. First, we prove that the popular model of Dawid and Skene, which assumes that all classifiers are conditionally independent, is {\em equivalent} to a Restricted Boltzmann Machine (RBM) with a single hidden node. Hence, under this model, the posterior probabilities of the true labels can be instead estimated via a trained RBM. Next, to address the more general case, where classifiers may strongly violate the conditional independence assumption, we propose to apply RBM-based Deep Neural Net (DNN). Experimental results on various simulated and real-world datasets demonstrate that our proposed DNN approach outperforms other state-of-the-art methods, in particular when the data violates the conditional independence assumption.


ERBlox: Combining Matching Dependencies with Machine Learning for Entity Resolution

Entity resolution (ER), an important and common data cleaning problem, is about detecting data duplicate representations for the same external entities, and merging them into single representations. Relatively recently, declarative rules called ‘matching dependencies’ (MDs) have been proposed for specifying similarity conditions under which attribute values in database records are merged. In this work we show the process and the benefits of integrating four components of ER: (a) Building a classifier for duplicate/non-duplicate record pairs built using machine learning (ML) techniques; (b) Use of MDs for supporting the blocking phase of ML; (c) Record merging on the basis of the classifier results; and (d) The use of the declarative language ‘LogiQL’ -an extended form of Datalog supported by the ‘LogicBlox’ platform- for all activities related to data processing, and the specification and enforcement of MDs.


Exploring the Limits of Language Modeling

In this work we explore recent advances in Recurrent Neural Networks for large scale Language Modeling, a task central to language understanding. We extend current models to deal with two key challenges present in this task: corpora and vocabulary sizes, and complex, long term structure of language. We perform an exhaustive study on techniques such as character Convolutional Neural Networks or Long-Short Term Memory, on the One Billion Word Benchmark. Our best single model significantly improves state-of-the-art perplexity from 51.3 down to 30.0 (whilst reducing the number of parameters by a factor of 20), while an ensemble of models sets a new record by improving perplexity from 41.0 down to 24.2. We also release these models for the NLP and ML community to study and improve upon.


Efficient Algorithms for Adversarial Contextual Learning

We provide the first oracle efficient sublinear regret algorithms for adversarial versions of the contextual bandit problem. In this problem, the learner repeatedly makes an action on the basis of a context and receives reward for the chosen action, with the goal of achieving reward competitive with a large class of policies. We analyze two settings: i) in the transductive setting the learner knows the set of contexts a priori, ii) in the small separator setting, there exists a small set of contexts such that any two policies behave differently in one of the contexts in the set. Our algorithms fall into the follow the perturbed leader family \cite{Kalai2005} and achieve regret O(T^{3/4}\sqrt{K\log(N)}) in the transductive setting and O(T^{2/3} d^{3/4} K\sqrt{\log(N)}) in the separator setting, where K is the number of actions, N is the number of baseline policies, and d is the size of the separator. We actually solve the more general adversarial contextual semi-bandit linear optimization problem, whilst in the full information setting we address the even more general contextual combinatorial optimization. We provide several extensions and implications of our algorithms, such as switching regret and efficient learning with predictable sequences.


Binarized Neural Networks

In this work we introduce a binarized deep neural network (BDNN) model. BDNNs are trained using a novel binarized back propagation algorithm (BBP), which uses binary weights and binary neurons during the forward and backward propagation, while retaining precision of the stored weights in which gradients are accumulated. At test phase, BDNNs are fully binarized and can be implemented in hardware with low circuit complexity. The proposed binarized networks can be implemented using binary convolutions and proxy matrix multiplications with only standard binary XNOR and population count (popcount) operations. BBP is expected to reduce energy consumption by at least two orders of magnitude when compared to the hardware implementation of existing training algorithms. We obtained near state-of-the-art results with BDNNs on the permutation-invariant MNIST, CIFAR-10 and SVHN datasets.


Guarantees in Wasserstein Distance for the Langevin Monte Carlo Algorithm

We study the problem of sampling from a distribution \target using the Langevin Monte Carlo algorithm and provide rate of convergences for this algorithm in terms of Wasserstein distance of order 2. Our result holds as long as the continuous diffusion process associated with the algorithm converges exponentially fast to the target distribution along with some technical assumptions. While such an exponential convergence holds for example in the log-concave measure case, it also holds for the more general case of asymptoticaly log-concave measures. Our results thus extends the known rates of convergence in total variation and Wasserstein distances which have only been obtained in the log-concave case. Moreover, using a sharper approximation bound of the continuous process, we obtain better asymptotic rates than traditional results. We also look into variations of the Langevin Monte Carlo algorithm using other discretization schemes. In a first time, we look into the use of the Ozaki’s discretization but are unable to obtain any significative improvement in terms of convergence rates compared to the Euler’s scheme. We then provide a (sub-optimal) way to study more general schemes, however our approach only holds for the log-concave case.


Phase Structure of 1d Interacting Floquet Systems I: Abelian SPTs

A Note on Alternating Minimization Algorithm for the Matrix Completion Problem

Dimensions of the irreducible representations of the symmetric and alternating group

Probabilistic Extension to the Concurrent Constraint Factor Oracle Model for Music Improvisation

On Column Selection in Approximate Kernel Canonical Correlation Analysis

A Decomposition of Parking Functions by Undesired Spaces

Bayesian Regularized Regression for Treatment Effect Estimation from Observational Data

Active Information Acquisition

Maximum Likelihood Estimation for Semiparametric Transformation Models with Interval-Censored Data

Sparse Kalman Filtering Approaches to Covariance Estimation from High Frequency Data in the Presence of Jumps

Endomorphisms of The Hamming Graph and Related Graphs

Using particle swarm optimization to search for locally $D$-optimal designs for mixed factor experiments with binary response

On minimising a portfolio’s shortfall probability

Robustness Metric for Quantifying Causal Model Confidence and Parameter Uncertainty

Rebuttal of the ‘Letter to the Editor’ of Annals of Applied Statistics on Lambert W x F Distributions and the IGMM Algorithm

Efficient Second Order Online Learning via Sketching

Non-universality for longest increasing subsequence of a random walk

Classification Accuracy as a Proxy for Two Sample Testing

Fuzzy Maximum Satisfiability

Swivel: Improving Embeddings by Noticing What’s Missing

Strongly-Typed Recurrent Neural Networks

Variational Hamiltonian Monte Carlo via Score Matching

Improved Dropout for Shallow and Deep Learning

Sample path large deviations for Laplacian models in $(1+1)$-dimensions

Reducing training requirements through evolutionary based dimension reduction and subject transfer

Fast Multipole Method as a Matrix-Free Hierarchical Low-Rank Approximation

Deep Cross-Modal Hashing

A Tractable Fully Bayesian Method for the Stochastic Block Model

Simplicial orders and chordality

Recovery guarantee of weighted low-rank approximation via alternating minimization

Disjoint non-monochromatic triangles in the plane

Secure and Dependable Virtual Network Embedding

An algorithm for approximating the second moment of the normalizing constant estimate from a particle filter

How to Train Deep Variational Autoencoders and Probabilistic Ladder Networks

Importance Sampling for Minibatches

Embedding tetrahedra into quasirandom hypergraphs

On a Turán problem in weakly quasirandom 3-uniform hypergraphs

Discrepancy and Eigenvalues of Cayley Graphs

On Efficient Distributed Construction of Near Optimal Routing Schemes

The Cayley isomorphism property for Cayley maps

Some remarks on the extremal function for uniformly two-path dense hypergraphs

Line arrangements and configurations of points with an unusual geometric property

On the structure of dense graphs with fixed clique number

Variational Inference with Rényi Divergence

Universality for a class of random band matrices

A class of random Cantor sets

Endomorphism algebras for a class of negative Calabi-Yau categories

The Diffusion Geometry of Fibre Bundles

Scalable Text Mining with Sparse Generative Models

Stratified Bayesian Optimization

Dynamic Selection of Virtual Machines for Application Servers in Cloud Environments

Control of Directional Errors in Fixed Sequence Multiple Testing

Inverting the Rational Sweep Map

Linear recurrence relations in $Q$-systems via lattice points in polyhedra

Solving Ridge Regression using Sketched Preconditioned SVRG

Hyperparameter optimization with approximate gradient

NED: An Inter-Graph Node Metric on Edit Distance

Difference sets are not multiplicatively closed

On the circuit complexity of the standard and the Karatsuba methods of multiplying integers

Supervised and Semi-Supervised Text Categorization using One-Hot LSTM for Region Embeddings

Monodromy and K-theory of Schubert curves via generalized jeu de taquin

Find an Optimal Path in Static System and Dynamical System within Polynomial Runtime

Disentangled Representations in Neural Models

Network Inference by Learned Node-Specific Degree Prior

Ensemble Robustness of Deep Learning Algorithms

An Information-Theoretic Foundation for the Weighted Updating Model

Non-Stationary Dynamic Factor Models for Large Datasets

Configurations of conjugate permutations

A decomposition theorem for {ISK4,wheel}-free trigraphs

Lasso Estimation of an Interval-Valued Multiple Regression Model

A mathematical formalization of data parallel operations

The Hairer–Quastel universality result in equilibrium

On uniformly bounded orthonormal Sidon systems

A Scalable Multi-Resolution Spatio-Temporal Model for Brain Activation and Connectivity in fMRI Data

A Simple Practical Accelerated Method for Finite Sums

Loss factorization, weakly supervised learning and label noise robustness

Strengthening theorems of Dirac and Erdős on disjoint cycles

Overfitting hidden Markov models with an unknown number of states

Particle Swarm Optimized Power Consumption of Trilateration

Quasiperiodic driving of Anderson localized waves in one dimension

Convergence of random sums and statistics constructed from samples with random sizes to the Linnik and Mittag-Leffler distributions and their generalizations

Simultaneous Safe Screening of Features and Samples in Doubly Sparse Modeling

Two-sample tests for high-dimension, strongly spiked eigenvalue models

How fast can Maker win in fair biased games?

The ‘Sprekend Nederland’ project and its application to accent location

Sharp thresholds for Ramsey properties of strictly balanced nearly bipartite graphs

Wikipedia Tools for Google Spreadsheets

Fast k-means with accurate bounds

Multi-view Kernel Completion

Data-Efficient Reinforcement Learning in Continuous-State POMDPs

Semidefinite bounds for nonbinary codes based on quadruples

The isomorphic version of Brualdies nestedness is in P

Just Another Gibbs Additive Modeller: Interfacing JAGS and mgcv

Dynamic Spatial Autoregressive Models with Autoregressive and Heteroskedastic Disturbances

Homogeneity of Cluster Ensembles

Generalized chord diagram expansions of Dyson-Schwinger equations

Invariant Gaussian Fields on Homogeneous Spaces : Explicit Constructions and Geometric Measure of the Zero-set

A Random Growth Model for Power Grids and Other Spatially Embedded Infrastructure Networks

DECOrrelated feature space partitioning for distributed sparse regression

Edge Lower Bounds for List Critical Graphs, via Discharging

Corrected Discrete Approximations for the Conditional and Unconditional Distributions of the Continuous Scan Statistic

Statistical estimate of the proportional hazard premium of loss under random censoring

Hidden Gibbs random fields model selection using Block Likelihood Information Criterion

Metric Dimension of Bounded Tree-length Graphs

Adaptive imputation of missing values for incomplete pattern classification

Scalability and Total Recall with Fast CoveringLSH

On p-adic approximation of sums of binomial coefficients

Around the Complete Intersection Theorem

Generating Images with Perceptual Similarity Metrics based on Deep Networks

LSTM Deep Neural Networks Postfiltering for Improving the Quality of Synthetic Voices

Graying the black box: Understanding DQNs

Exploiting Cyclic Symmetry in Convolutional Neural Networks

A convolution formula for Tutte polynomials of arithmetic matroids and other combinatorial structures

The happiness paradox: your friends are happier than you

A Variational Analysis of Stochastic Gradient Algorithms

Model and Objective Separation with Conditional Lower Bounds: Disjunction is Harder than Conjunction

Learning to Communicate to Solve Riddles with Deep Distributed Recurrent Q-Networks

Macdonald’s solid-angle sum for real dilations of rational polygons

Predicting Clinical Events by Combining Static and Dynamic Information Using Recurrent Neural Networks

Two-Bit Messages are Sufficient to Implement Atomic Read/Write Registers in Crash-prone Systems

On the Integrity of Deep Learning Oracles

Defining Cross-Cloud Systems

Compressed Online Dictionary Learning for Fast fMRI Decomposition

Indistinguishable Bandits Dueling with Decoys on a Poset

Markov loops, coverings and fields

Is the corporate elite disintegrating? Interlock boards and the Mizruchi hypothesis

Generalization of the Kimeldorf-Wahba correspondence for constrained interpolation

Monotone Subsequences in High-Dimensional Permutations

Contextual-MDPs for PAC-Reinforcement Learning with Rich Observations

Local and Global Convergence of a General Inertial Proximal Splitting Scheme

Probabilistic modeling and global sensitivity analysis for $CO_2$ storage in geological formations: a spectral approach

On Determining if Tree-based Networks Contain Fixed Trees

Rare region induced avoided quantum criticality in disordered three-dimensional Dirac and Weyl semimetals