Rich Component Analysis

In many settings, we have multiple data sets (also called views) that capture different and overlapping aspects of the same phenomenon. We are often interested in finding patterns that are unique to one or to a subset of the views. For example, we might have one set of molecular observations and one set of physiological observations on the same group of individuals, and we want to quantify molecular patterns that are uncorrelated with physiology. Despite being a common problem, this is highly challenging when the correlations come from complex distributions. In this paper, we develop the general framework of Rich Component Analysis (RCA) to model settings where the observations from different views are driven by different sets of latent components, and each component can be a complex, high-dimensional distribution. We introduce algorithms based on cumulant extraction that provably learn each of the components without having to model the other components. We show how to integrate RCA with stochastic gradient descent into a meta-algorithm for learning general models, and demonstrate substantial improvement in accuracy on several synthetic and real datasets in both supervised and unsupervised tasks. Our method makes it possible to learn latent variable models when we don’t have samples from the true model but only samples after complex perturbations.

Optimal self-assembly of finite shapes at temperature 1 in 3D

Working in a three-dimensional variant of Winfree’s abstract Tile Assembly Model, we show that, for an arbitrary finite, connected shape X \subset \mathbb{Z}^2, there is a tile set that uniquely self-assembles into a 3D representation of a scaled-up version of $X$ at temperature 1 in 3D with optimal program-size complexity (the ‘program-size complexity’, also known as ’tile complexity’, of a shape is the minimum number of tile types required to uniquely self-assemble it). Moreover, our construction is ‘just barely’ 3D in the sense that it only places tiles in the $z = 0$ and $z = 1$ planes. Our result is essentially a just-barely 3D temperature 1 simulation of a similar 2D temperature 2 result by Soloveichik and Winfree (SICOMP 2007).

Sum-of-Squares Lower Bounds for Sparse PCA

This paper establishes a statistical versus computational trade-off for solving a basic high-dimensional machine learning problem via a basic convex relaxation method. Specifically, we consider the {\em Sparse Principal Component Analysis} (Sparse PCA) problem, and the family of {\em Sum-of-Squares} (SoS, aka Lasserre/Parillo) convex relaxations. It was well known that in large dimension $p$, a planted $k$-sparse unit vector can be {\em in principle} detected using only $n \approx k\log p$ (Gaussian or Bernoulli) samples, but all {\em efficient} (polynomial time) algorithms known require $n \approx k^2 \log p$ samples. It was also known that this quadratic gap cannot be improved by the the most basic {\em semi-definite} (SDP, aka spectral) relaxation, equivalent to a degree-2 SoS algorithms. Here we prove that also degree-4 SoS algorithms cannot improve this quadratic gap. This average-case lower bound adds to the small collection of hardness results in machine learning for this powerful family of convex relaxation algorithms. Moreover, our design of moments (or ‘pseudo-expectations’) for this lower bound is quite different than previous lower bounds. Establishing lower bounds for higher degree SoS algorithms for remains a challenging problem.

Adaptive Elastic Net Method for Cox Model

In this paper, we study the Adaptive Elastic Net method for the Cox model. We prove the grouping effect and oracle property of its estimators. Finally, we show these two properties by an empirical analysis and a numerical simulation, respectively.

A formula for the number of the spanning trees of line graphs

Let $G=(V,E)$ be a loopless graph and $\mathcal{T}(G)$ be the set of all spanning trees of $G$. Let $L(G)$ be the line graph of the graph $G$ and $t(L(G))$ be the number of spanning trees of $L(G)$. Then, by using techniques from electrical networks, we obtain the following formula: $$ t(L(G)) = \frac{1}{\prod_{v\in V}d^2(v)}\sum_{T\subseteq \mathcal{T}(G)}\big[\prod_{e = xy\in T}d(x)d(y)\big]\big[\prod_{e = uv\in E\backslash T}[d(u)+d(v)]\big]. $$ As a result, we provide a very simple and different proof of the formula on the number of spanning trees of some irregular line graphs, and give a positive answer to a conjecture proposed by Yan [J. Combin. Theory Ser. A 120 (2013) no. 7, 1642-1648]. By applying our formula we also derive the number of spanning trees of circulant line graphs.

The Odd Generalized Exponential Gompertz

In this paper we propose a new lifetime model, called the odd generalized exponential gompertz distribution, We obtain some of its mathematical properties. Some structural properties of the new distribution are studied. The method of maximum likelihood method is used for estimating the model parameters and the observed Fisher’s information matrix is derived. We illustrate the usefulness of the proposed model by applications to real data.

Convergence rate in precise asymptotics for Davis law of large numbers

Let $\{X_n,n\geq 1\}$ be a sequence of i.i.d. random variables with partial sums $\{S_n,n\geq 1\}$. Based on the classical Baum-Katz theorem, a paper by Heyde in 1975 initiated the precise asymptotics for the sum $\sum_{n\geq 1}\mbox{P}(|S_n|\geq\epsilon n)$ as $\epsilon$ goes to zero. Later, Klesov determined the convergence rate in Heyde’s theorem. The aim of this paper is to extend Klesov’s result to the precise asymptotics for Davis law of large numbers, a theorem in Gut and Sp\u{a}taru [2000a].

Arbitrariness of peer review: A Bayesian analysis of the NIPS experiment

The principle of peer review is central to the evaluation of research, by ensuring that only high-quality items are funded or published. But peer review has also received criticism, as the selection of reviewers may introduce biases in the system. In 2014, the organizers of the “Neural Information Processing Systems\rq\rq{} conference conducted an experiment in which $10\%$ of submitted manuscripts (166 items) went through the review process twice. Arbitrariness was measured as the conditional probability for an accepted submission to get rejected if examined by the second committee. This number was equal to $60\%$, for a total acceptance rate equal to $22.5\%$. Here we present a Bayesian analysis of those two numbers, by introducing a hidden parameter which measures the probability that a submission meets basic quality criteria. The standard quality criteria usually include novelty, clarity, reproducibility, correctness and no form of misconduct, and are met by a large proportions of submitted items. The Bayesian estimate for the hidden parameter was equal to $56\%$ ($95\%$CI: $ I = (0.34, 0.83)$), and had a clear interpretation. The result suggested the total acceptance rate should be increased in order to decrease arbitrariness estimates in future review processes.

A unified approach to a priori estimates for supersolutions of BSDEs in general filtrations

We provide a unified approach to a priori estimates for supersolutions of BSDEs in general filtrations, which may not be quasi left-continuous. As an example of application, we prove that reflected BSDEs are well-posed in a general framework.

An Erdős-Ko-Rado theorem for finite 2-transitive groups

We prove an analogue of the classical Erd\H{o}s-Ko-Rado theorem for intersecting sets of permutations in finite 2-transitive groups. Given a finite group G acting faithfully and 2-transitively on the set X, we show that an intersecting set of maximal size in G has cardinality |G|/|X|. This generalises and gives a unifying proof of some similar recent results in the literature.

Dynamic Matrix Factorization with Priors on Unknown Values

Advanced and effective collaborative filtering methods based on explicit feedback assume that unknown ratings do not follow the same model as the observed ones (\emph{not missing at random}). In this work, we build on this assumption, and introduce a novel dynamic matrix factorization framework that allows to set an explicit prior on unknown values. When new ratings, users, or items enter the system, we can update the factorization in time independent of the size of data (number of users, items and ratings). Hence, we can quickly recommend items even to very recent users. We test our methods on three large datasets, including two very sparse ones, in static and dynamic conditions. In each case, we outrank state-of-the-art matrix factorization methods that do not use a prior on unknown ratings.

Infrared-Induced Sluggish Dynamics in the GeSbTe Electron Glass

The electron-glass dynamics of Anderson-localized GeSbTe films is dramatically slowed-down following a brief infrared illumination that increases the system carrier-concentration (and thus its conductance). These results demonstrate that the dynamics exhibited by electron-glasses is more sensitive to carrier-concentration than to disorder. In turn, this seems to imply that many-body effects such as the Orthogonality Catastrophe must play a role in the sluggish dynamics observed in the intrinsic electron-glasses.

Diseases transmission in a z-ary tree

We extend some results of Itai Benjamini and Yuri Lima (see \href{http://…/1305.2610.pdf}{\cite{Benjamini}}). In this paper they consider a binary tree $\mathbb T_n$ of height $n$, each leaf is either infected by one of $k$ diseases or not infected at all. In other words, $x$ at generation $n$ is infected by the $i$-th infection with probability $p_i$ and sane with $p_{k+1}$. Moreover the infections are independently distributed for each leaf. Infections spread along the tree based on specific rules. In their paper they study the limit distribution of the root of $\mathbb T_n$ as $n$ goes to infinity. Here we want to study the more general case of a Galton-Watson tree and a $z$-ary tree.

The gradient flow approach to hydrodynamic limits for the simple exclusion process

We give a new approach to the well-known convergence to the hydrodynamic limit for the symmetric simple exclusion process (SSEP). More precisely, we characterize any possible limit of its empirical density measures as solutions to the heat equation by passing to the limit in the gradient flow structure of the particle system.

On the Economic Efficiency of the Combinatorial Clock Auction

Since the 1990s spectrum auctions have been implemented world-wide. This has provided for a practical examination of an assortment of auction mechanisms and, amongst these, two simultaneous ascending price auctions have proved to be extremely successful. These are the simultaneous multiround ascending auction (SMRA) and the combinatorial clock auction (CCA). It has long been known that, for certain classes of valuation functions, the SMRA provides good theoretical guarantees on social welfare. However, no such guarantees were known for the CCA. In this paper, we show that CCA does provide strong guarantees on social welfare provided the price increment and stopping rule are well-chosen. This is very surprising in that the choice of price increment has been used primarily to adjust auction duration and the stopping rule has attracted little attention. The main result is a polylogarithmic approximation guarantee for social welfare when the maximum number of items demanded $\mathcal{C}$ by a bidder is fixed. Specifically, we show that either the revenue of the CCA is at least an $\Omega\Big(\frac{1}{\mathcal{C}^{2}\log n\log^2m}\Big)$-fraction of the optimal welfare or the welfare of the CCA is at least an $\Omega\Big(\frac{1}{\log n}\Big)$-fraction of the optimal welfare, where $n$ is the number of bidders and $m$ is the number of items. As a corollary, the welfare ratio — the worst case ratio between the social welfare of the optimum allocation and the social welfare of the CCA allocation — is at most $O(\mathcal{C}^2 \cdot \log n \cdot \log^2 m)$. We emphasize that this latter result requires no assumption on bidders valuation functions. Finally, we prove that such a dependence on $\mathcal{C}$ is necessary. In particular, we show that the welfare ratio of the CCA is at least $\Omega \Big(\mathcal{C} \cdot \frac{\log m}{\log \log m}\Big)$.

Optimization Theory in Statistics

This paper addresses the issues of optimization theory and related numerical issues within the context of Statistics. Focusing on the problem of concave regression, several estimation techniques for nonparametric shape-constrained regression are classified, analyzed and compared qualitatively and quantitatively through numerical simulations. In particular, their main features, strengths and limitations for solving large instances of the problem are examined through this paper. Several improvements to enhance numerical stability and bound the computational cost are proposed. For each analyzed algorithm, the pseudo-code and its corresponding code in Scilab are provided. The results from this study demonstrate that the choice of the optimization approach strongly impact algorithmic performances. Interestingly, it is also shown that, currently, are not available methods able to solve efficiently large instance of the concave regression problems (more than many thousands of points). We suggest that further research to fill this gap in the literature should focus on finding a way to exploit and adapt classical multi-scale strategy to compute an approximate solution.

Brillinger mixing of determinantal point processes and statistical applications

Stationary determinantal point processes are proved to be Brillinger mixing. This property is an important step towards asymptotic statistics for these processes. As an important example, a central limit theorem for a wide class of functionals of determinantal point processes is established. This result yields in particular the asymptotic normality of the estimator of the intensity of a stationary determinantal point process and of the kernel estimator of its pair correlation.

Consistent estimation of the filtering and marginal smoothing distributions in nonparametric hidden Markov models

In this paper, we consider the filtering and smoothing recursions in nonparametric finite state space hidden Markov models (HMMs) when the parameters of the model are unknown and replaced by estimators. We provide an explicit and time uniform control of the filtering and smoothing errors in total variation norm as a function of the parameter estimation errors. We prove that the risk for the filtering and smoothing errors may be uniformly upper bounded by the risk of the estimators. It has been proved very recently that statistical inference for finite state space nonparametric HMMs is possible. We study how the recent spectral methods developed in the parametric setting may be extended to the nonparametric framework and we give explicit upper bounds for the L2-risk of the nonparametric spectral estimators. When the observation space is compact, this provides explicit rates for the filtering and smoothing errors in total variation norm. The performance of the spectral method is assessed with simulated data for both the estimation of the (nonparametric) conditional distribution of the observations and the estimation of the marginal smoothing distributions.

Online Paintability: The Slow-Coloring Game

The slow-coloring game is played by Lister and Painter on a graph $G$. On each round, Lister marks a nonempty subset $M$ of the remaining vertices, scoring $|M|$ points. Painter then deletes a subset of $M$ that is independent in $G$. The game ends when all vertices are deleted. Painter’s goal is to minimize the total score; Lister seeks to maximize it. The score that each player can guarantee doing no worse than is the cost of $G$, written $\mathring{s}(G)$. The game is a variant of online list coloring. We prove several results. Always $\frac{|V(G)|}{2\alpha(G)}+\frac{1}{2}\leq \frac{\mathring{s}(G)}{|V(G)|}\leq \max\left\{\frac{|V(H)|}{\alpha(H)}\colon\,\!H \subset G\right\}$, where $\alpha(G)$ is the independence number. Among $n$-vertex trees, $\mathring{s}$ is minimized by the star and maximized by the path. Trivially $\mathring{s}(G)$ is at most the sum-paintability of $G$, with equality only when all components of $G$ are complete. Also $\mathring{s}(G)$ is at least the chromatic sum (the minimum sum of vertex colors in a proper coloring by positive integers), with equality if $\alpha(G)\leq 2$. Finally, we give good bounds on $\mathring{s}(K_{r,s})$.

How to burn a graph

We introduce a new graph parameter called the burning number, inspired by contact processes on graphs such as graph bootstrap percolation, and graph searching paradigms such as Firefighter. The burning number measures the speed of the spread of contagion in a graph; the lower the burning number, the faster the contagion spreads. We provide a number of properties of the burning number, including characterizations and bounds. The burning number is computed for several graph classes, and is derived for the graphs generated by the Iterated Local Transitivity model for social networks.

Deep Recurrent Q-Learning for Partially Observable MDPs

Deep Reinforcement Learning has yielded proficient controllers for complex tasks. However, these controllers have limited memory and rely on being able to perceive the complete game screen at each decision point. To address these shortcomings, this article investigates the effects of adding recurrency to a Deep Q-Network (DQN) by replacing the first post-convolutional fully-connected layer with a recurrent LSTM. The resulting Deep Recurrent Q-Network (DRQN) exhibits similar performance on standard Atari 2600 MDPs but better performance on equivalent partially observed domains featuring flickering game screens. Results indicate that given the same length of history, recurrency allows partial information to be integrated through time and is superior to alternatives such as stacking a history of frames in the network’s input layer. We additionally show that when trained with partial observations, DRQN’s performance at evaluation time scales as a function of observability. Similarly, when trained with full observations and evaluated with partial observations, DRQN’s performance degrades more gracefully than that of DQN. We therefore conclude that when dealing with partially observed domains, the use of recurrency confers tangible benefits.

Manitest: Are classifiers really invariant?

Invariance to geometric transformations is a highly desirable property of automatic classifiers in many image recognition tasks. Nevertheless, it is unclear to which extent state-of-the-art classifiers are invariant to basic transformations such as rotations and translations. This is mainly due to the lack of general methods that properly measure such an invariance. In this paper, we propose a rigorous and systematic approach for quantifying the invariance to geometric transformations of any classifier. Our key idea is to cast the problem of assessing a classifier’s invariance as the computation of geodesics along the manifold of transformed images. We propose the Manitest method, built on the efficient Fast Marching algorithm to compute the invariance of classifiers. Our new method quantifies in particular the importance of data augmentation for learning invariance from data, and the increased invariance of convolutional neural networks with depth. We foresee that the proposed generic tool for measuring invariance to a large class of geometric transformations and arbitrary classifiers will have many applications for evaluating and comparing classifiers based on their invariance, and help improving the invariance of existing classifiers.

Human Pose Estimation with Iterative Error Feedback

Hierarchical feature extractors such as Convolutional Networks (ConvNets) have achieved strong performance on a variety of classification tasks using purely feedforward processing. Feedforward architectures can learn rich representations of the input space but do not explicitly model dependencies in the output spaces, that are quite structured for tasks such as articulated human pose estimation or object segmentation. Here we propose a framework that expands the expressive power of hierarchical feature extractors to encompass both input and output spaces, by introducing top-down feedback. Instead of directly predicting the target outputs in one go, we use a self-correcting model that progressively changes an initial solution by feeding back error predictions, in a process we call Iterative Error Feedback (IEF). We show that IEF improves over the state-of-the-art on the task of articulated human pose estimation on the challenging MPII dataset.

Learning Weak Constraints in Answer Set Programming

This paper contributes to the area of inductive logic programming by presenting a new learning framework that allows the learning of weak constraints in Answer Set Programming (ASP). The framework, called Learning from Ordered Answer Sets, generalises our previous work on learning ASP programs without weak constraints, by considering a new notion of examples as ordered pairs of partial answer sets that exemplify which answer sets of a learned hypothesis (together with a given background knowledge) are preferred to others. In this new learning task inductive solutions are searched within a hypothesis space of normal rules, choice rules, and hard and weak constraints. We propose a new algorithm, ILASP2, which is sound and complete with respect to our new learning framework. We investigate its applicability to learning preferences in an interview scheduling problem and also demonstrate that when restricted to the task of learning ASP programs without weak constraints, ILASP2 can be much more efficient than our previously proposed system.

The asymptotic distribution of the pathwise mean-square displacement in single-particle tracking experiments

Recent advances in light microscopy have spawned new research frontiers in microbiology by working around the diffraction barrier and allowing for the observation of nanometric biological structures. Microrheology is the study of the properties of complex fluids, such as those found in biology, through the dynamics of small embedded particles, typically latex beads. Statistics based on the recorded sample paths are then used by biophysicists to infer rheological properties of the fluid. In the biophysical literature, the main statistic for characterizing diffusivity is the so-named mean square displacement (MSD) of the tracer particles. Notwithstanding the central role played by the MSD, its asymptotic distribution in different cases has not yet been established. In this paper, we tackle this problem. We take a pathwise approach and assume that the particle movement undergoes a Gaussian, stationary-increment stochastic process. We show that as the sample and the increment lag sizes go to infinity, the MSD displays Gaussian or non-Gaussian limiting distributions, as well as distinct convergence rates, depending on the diffusivity parameter. We illustrate our results analytically and computationally based on fractional Brownian motion and the (integrated) fractional Ornstein-Uhlenbeck process.

Two Murnaghan-Nakayama rules in Schubert calculus

The Murnaghan-Nakayama rule expresses the product of a Schur function with a Newton power sum in the basis of Schur functions. We establish a version of the Murnaghan-Nakayama rule for Schubert polynomials and a version for the quantum cohomology ring of the Grassmannian. These rules compute all intersections of Schubert cycles with tautological classes coming from the Chern character.

Multi-scale exploration of convex functions and bandit convex optimization

We construct a new map from a convex function to a distribution on its domain, with the property that this distribution is a multi-scale exploration of the function. We use this map to solve a decade-old open problem in adversarial bandit convex optimization by showing that the minimax regret for this problem is $\tilde{O}(\mathrm{poly}(n) \sqrt{T})$, where $n$ is the dimension and $T$ the number of rounds. This bound is obtained by studying the dual Bayesian maximin regret via the information ratio analysis of Russo and Van Roy, and then using the multi-scale exploration to solve the Bayesian problem.

The Smith and Critical Groups of the Square Rook’s Graph and its Complement

Let $R_{n}$ denote the graph with vertex set consisting of the squares of an $n \times n$ grid, with two squares of the grid adjacent when they lie in the same row or column. This is the square rook’s graph, and can also be thought of as the Cartesian product of two complete graphs of order $n$, or the line graph of the complete bipartite graph $K_{n,n}$. In this paper we compute the Smith group and critical group of the graph $R_{n}$ and its complement. This is equivalent to determining the Smith normal form of both the adjacency and Laplacian matrix of each of these graphs. In doing so we verify a 1986 conjecture of Rushanan.

Chromatic functors of graphs

Does there exist a natural bijection between regular $n$-colorings of two graphs which have the same chromatic polynomial? We address this problem by using a functorial formulation. Fix a simple graph $G$. Then for each set $X$ we can associate the set of $X$-colorings. This defines a functor (‘chromatic functor’) from the category of sets with injections to itself. The first main result verifies that two finite graphs determine isomorphic chromatic functors if and only if they have the same chromatic polynomial. Chromatic functors can be defined for arbitrary (possibly infinite) graphs. This fact enables us to investigate functorial chromatic theory for infinite graphs. We prove that chromatic functors satisfy Cantor-Bernstein-Schr\’oder property. We also prove that countable connected trees determine isomorphic chromatic functors. Finally we present a pair of infinite graphs which determine non-isomorphic chromatic functors.

LDAExplore: Visualizing Topic Models Generated Using Latent Dirichlet Allocation

We present LDAExplore, a tool to visualize topic distributions in a given document corpus that are generated using Topic Modeling methods. Latent Dirichlet Allocation (LDA) is one of the basic methods that is predominantly used to generate topics. One of the problems with methods like LDA is that users who apply them may not understand the topics that are generated. Also, users may find it difficult to search correlated topics and correlated documents. LDAExplore, tries to alleviate these problems by visualizing topic and word distributions generated from the document corpus and allowing the user to interact with them. The system is designed for users, who have minimal knowledge of LDA or Topic Modelling methods. To evaluate our design, we run a pilot study which uses the abstracts of 322 Information Visualization papers, where every abstract is considered a document. The topics generated are then explored by users. The results show that users are able to find correlated documents and group them based on topics that are similar.

Neural NILM: Deep Neural Networks Applied to Energy Disaggregation

Energy disaggregation estimates appliance-by-appliance electricity consumption from a single meter that measures the whole home’s electricity demand. Recently, deep neural networks have driven remarkable improvements in classification performance in neighbouring machine learning fields such as image classification and automatic speech recognition. In this paper, we adapt three deep neural network architectures to energy disaggregation: 1) a form of recurrent neural network called `long short-term memory’ (LSTM); 2) denoising autoencoders; and 3) a network which regresses the start time, end time and average power demand of each appliance activation. We test the performance of these algorithms on real aggregate power data from five appliances against seven metrics. Tests are performed against houses seen during training and houses not seen during training. We find that all three neural nets achieve better F1 scores (averaged over all five appliances) than either combinatorial optimisation or factorial hidden Markov models.

Rigorizing and Extending the Cox-Jaynes Derivation of Probability: Implications for Statistical Practice

There have been three attempts to date to establish foundations for the discipline of probability, namely the efforts of Kolmogorov (frequentist), de Finetti (Bayesian: belief and betting odds) and RT Cox (Bayesian: reasonable expectation)/ET Jaynes (Bayesian: optimal information processing). The original ‘proof’ of the validity of the Cox-Jaynes approach has been shown to be incomplete, and attempts to date to remedy this situation are themselves not entirely satisfactory. Here we offer a new axiomatization that both rigorizes the Cox-Jaynes derivation of probability and extends it – from apparent dependence on finite additivity to (1) countable additivity and (2) the ability to simultaneously make uncountably many probability assertions in a logically-internally-consistent manner – and we discuss the implications of this work for statistical methodology and applications. This topic is sharply relevant for statistical practice, because continuous expression of uncertainty – for example, taking the set $\Theta$ of possible values of an unknown $\theta$ to be $(0,1)$, or $\mathbb{R}$, or the space of all cumulative distribution functions on $\mathbb{R}$ – is ubiquitous, but has not previously been rigorously supported under at least one popular Bayesian axiomatization of probability. The most important area of statistical methodology that our work has now justified from a Cox-Jaynes perspective is Bayesian non-parametric inference, a topic of fundamental importance in applied statistics. We present two interesting foundational findings, one of which is that Kolmogorov’s probability function $\mathbb{P}_K (A)$ of the single argument A is isomorphic to a version of the Cox-Jaynes two-argument probability map $\mathbb{P}_{CJ} (A \mid B)$ in which Kolmogorov’s B has been ‘hard-wired at the factory’ to coincide with his sample space $\Omega$, thus Kolmogorov is a special case of Cox-Jaynes.

Optimal Learning Rates for Localized SVMs

One of the limiting factors of using support vector machines (SVMs) in large scale applications are their super-linear computational requirements in terms of the number of training samples. To address this issue, several approaches that train SVMs on many small chunks of large data sets separately have been proposed in the literature. So far, however, almost all these approaches have only been empirically investigated. In addition, their motivation was always based on computational requirements. In this work, we consider a localized SVM approach based upon a partition of the input space. For this local SVM, we derive a general oracle inequality. Then we apply this oracle inequality to least squares regression using Gaussian kernels and deduce local learning rates that are essentially minimax optimal under some standard smoothness assumptions on the regression function. This gives the first motivation for using local SVMs that is not based on computational requirements but on theoretical predictions on the generalization performance. We further introduce a data-dependent parameter selection method for our local SVM approach and show that this method achieves the same learning rates as before. Finally, we present some larger scale experiments for our localized SVM showing that it achieves essentially the same test performance as a global SVM for a fraction of the computational requirements. In addition, it turns out that the computational requirements for the local SVMs are similar to those of a vanilla random chunk approach, while the achieved test errors are significantly better.

Robust Monotone Submodular Function Maximization

Instances of monotone submodular function maximization with cardinality constraint occur often in practical applications. One example is feature selection in machine learning, where in many models, adding a new feature to an existing set of features always improves the modeling power (monotonicity) and the marginal benefit of adding a new feature decreases as we consider larger sets (submodularity). However, we would like to choose a robust set of features such that, there is not too much dependence on a small subset of chosen features %Another application is sensor placement over large networks. However, some of the sensors placed may fail and thus, we would like our chosen set to be robust to removal of a small number of elements [Krause et al.\ (08), Globerson \& Roweis (06)]. We consider a formulation of this problem that was formally introduced by Krause et al.\ (08). It is not difficult to show that even if we look at robustness to removing a single element, the problem is at least as hard as the ordinary maximization problem, which is approximable upto a factor of $1-1/e$. For this case of single element removal, we give: (i) an algorithm with parameter $m$, that has asymptotic guarantee $(1-1/e)-\Omega(1/m)$ using $O(n^{m+1})$ queries and (ii) a fast asymptotically 0.5547 approximate algorithm. For the general case of subset removal, we give a fast algorithm with asymptotic guarantee $0.285$. These are the first constant factor results for this problem.

Alexandria: Extensible Framework for Rapid Exploration of Social Media

The Alexandria system under development at IBM Research provides an extensible framework and platform for supporting a variety of big-data analytics and visualizations. The system is currently focused on enabling rapid exploration of text-based social media data. The system provides tools to help with constructing ‘domain models’ (i.e., families of keywords and extractors to enable focus on tweets and other social media documents relevant to a project), to rapidly extract and segment the relevant social media and its authors, to apply further analytics (such as finding trends and anomalous terms), and visualizing the results. The system architecture is centered around a variety of REST-based service APIs to enable flexible orchestration of the system capabilities; these are especially useful to support knowledge-worker driven iterative exploration of social phenomena. The architecture also enables rapid integration of Alexandria capabilities with other social media analytics system, as has been demonstrated through an integration with IBM Research’s SystemG. This paper describes a prototypical usage scenario for Alexandria, along with the architecture and key underlying analytics.

Supervised Collective Classification for Crowdsourcing

Crowdsourcing utilizes the wisdom of crowds for collective classification via information (e.g., labels of an item) provided by labelers. Current crowdsourcing algorithms are mainly unsupervised methods that are unaware of the quality of crowdsourced data. In this paper, we propose a supervised collective classification algorithm that aims to identify reliable labelers from the training data (e.g., items with known labels). The reliability (i.e., weighting factor) of each labeler is determined via a saddle point algorithm. The results on several crowdsourced data show that supervised methods can achieve better classification accuracy than unsupervised methods, and our proposed method outperforms other algorithms.

Clustering of Modal Valued Symbolic Data

Symbolic Data Analysis is based on special descriptions of data – symbolic objects (SO). Such descriptions preserve more detailed information about units and their clusters than the usual representations with mean values. A special kind of symbolic object is a representation with frequency or probability distributions (modal values). This representation enables us to consider in the clustering process the variables of all measurement types at the same time. In the paper a clustering criterion function for SOs is proposed such that the representative of each cluster is again composed of distributions of variables’ values over the cluster. The corresponding leaders clustering method is based on this result. It is also shown that for the corresponding agglomerative hierarchical method a generalized Ward’s formula holds. Both methods are compatible – they are solving the same clustering optimization problem. The leaders method efficiently solves clustering problems with large number of units; while the agglomerative method can be applied alone on the smaller data set, or it could be applied on leaders, obtained with compatible nonhierarchical clustering method. Such a combination of two compatible methods enables us to decide upon the right number of clusters on the basis of the corresponding dendrogram. The proposed methods were applied on different data sets. In the paper, some results of clustering of ESS data are presented.

Thresholds for half-random games

We study biased Maker-Breaker positional games between two players, one of whom is playing randomly against an opponent with an optimal strategy. In both such scenarios, that is when Maker plays randomly and when Breaker plays randomly, we determine the sharp threshold bias of classical graph games, such as connectivity, Hamiltonicity, and minimum degree-k. The traditional, deterministic version of these games, with two optimal players playing, are known to obey the so-called probabilistic intuition. That is, the threshold bias of these games is asymptotically equal to the threshold bias of their random counterpart, where players just take edges uniformly at random. We find, that despite this remarkably precise agreement of the results of the deterministic and the random games, playing randomly against an optimal opponent is not a good idea: the threshold bias becomes significantly higher in favor of the ‘clever’ player. An important qualitative aspect of the probabilistic intuition carries through nevertheless: the bottleneck for Maker to occupy a connected graph is still the ability to avoid isolated vertices in her graph.

Improved Answer-Set Programming Encodings for Abstract Argumentation

The design of efficient solutions for abstract argumentation problems is a crucial step towards advanced argumentation systems. One of the most prominent approaches in the literature is to use Answer-Set Programming (ASP) for this endeavor. In this paper, we present new encodings for three prominent argumentation semantics using the concept of conditional literals in disjunctions as provided by the ASP-system clingo. Our new encodings are not only more succinct than previous versions, but also outperform them on standard benchmarks.

The Anatomy of Large-Scale Distributed Graph Algorithms

The increasing complexity of the software/hardware stack of modern supercomputers results in explosion of parameters. The performance analysis becomes a truly experimental science, even more challenging in the presence of massive irregularity and data dependency. We analyze how the existing body of research handles the experimental aspect in the context of distributed graph algorithms (DGAs). We distinguish algorithm-level contributions, often prioritized by authors, from runtime-level concerns that are harder to place. We show that the runtime is such an integral part of DGAs that experimental results are difficult to interpret and extrapolate without understanding the properties of the runtime used. We argue that in order to gain understanding about the impact of runtimes, more information needs to be gathered. To begin this process, we provide an initial set of recommendations for describing DGA results based on our analysis of the current state of the field.

Combinatorial Micro-Macro Dynamical Systems

The second law of thermodynamics states that the entropy of an isolated system almost always increases. We explore the possibilities for the second law within a purely combinatorial context. We focus on the extreme cases: full applicability and maximal failure of the law.