An $(n, k)$-Poisson Multinomial Distribution (PMD) is a random variable of the form $X = \sum_{i=1}^n X_i$, where the $X_i$‘s are independent random vectors supported on the set of standard basis vectors in $\mathbb{R}^k.$ In this paper, we obtain a refined structural understanding of PMDs by analyzing their Fourier transform. As our core structural result, we prove that the Fourier transform of PMDs is {\em approximately sparse}, i.e., roughly speaking, its $L_1$-norm is small outside a small set. By building on this result, we obtain the following applications: {\bf Learning Theory.} We design the first computationally efficient learning algorithm for PMDs with respect to the total variation distance. Our algorithm learns an arbitrary $(n, k)$-PMD within variation distance $\epsilon$ using a near-optimal sample size of $\widetilde{O}_k(1/\epsilon^2),$ and runs in time $\widetilde{O}_k(1/\epsilon^2) \cdot \log n.$ Previously, no algorithm with a $\mathrm{poly}(1/\epsilon)$ runtime was known, even for $k=3.$ {\bf Game Theory.} We give the first efficient polynomial-time approximation scheme (EPTAS) for computing Nash equilibria in anonymous games. For normalized anonymous games with $n$ players and $k$ strategies, our algorithm computes a well-supported $\epsilon$-Nash equilibrium in time $n^{O(k^3)} \cdot (k/\epsilon)^{O(k^3\log(k/\epsilon)/\log\log(k/\epsilon))^{k-1}}.$ The best previous algorithm for this problem had running time $n^{(f(k)/\epsilon)^k},$ where $f(k) = \Omega(k^{k^2})$, for any $k>2.$ {\bf Statistics.} We prove a multivariate central limit theorem (CLT) that relates an arbitrary PMD to a discretized multivariate Gaussian with the same mean and covariance, in total variation distance. Our new CLT strengthens the CLT of Valiant and Valiant by completely removing the dependence on $n$ in the error bound.
It has been a long time, since data mining technologies have made their ways to the field of data management. Classification is one of the most important data mining tasks for label prediction, categorization of objects into groups, advertisement and data management. In this paper, we focus on the standard classification problem which is predicting unknown labels in Euclidean space. Most efforts in Machine Learning communities are devoted to methods that use probabilistic algorithms which are heavy on Calculus and Linear Algebra. Most of these techniques have scalability issues for big data, and are hardly parallelizable if they are to maintain their high accuracies in their standard form. Sampling is a new direction for improving scalability, using many small parallel classifiers. In this paper, rather than conventional sampling methods, we focus on a discrete classification algorithm with O(n) expected running time. Our approach performs a similar task as sampling methods. However, we use column-wise sampling of data, rather than the row-wise sampling used in the literature. In either case, our algorithm is completely deterministic. Our algorithm, proposes a way of combining 2D convex hulls in order to achieve high classification accuracy as well as scalability in the same time. First, we thoroughly describe and prove our O(n) algorithm for finding the convex hull of a point set in 2D. Then, we show with experiments our classifier model built based on this idea is very competitive compared with existing sophisticated classification algorithms included in commercial statistical applications such as MATLAB.
We introduce a new and increasingly relevant setting for distributed optimization in machine learning, where the data defining the optimization are distributed (unevenly) over an extremely large number of \nodes, but the goal remains to train a high-quality centralized model. We refer to this setting as Federated Optimization. In this setting, communication efficiency is of utmost importance. A motivating example for federated optimization arises when we keep the training data locally on users’ mobile devices rather than logging it to a data center for training. Instead, the mobile devices are used as nodes performing computation on their local data in order to update a global model. We suppose that we have an extremely large number of devices in our network, each of which has only a tiny fraction of data available totally; in particular, we expect the number of data points available locally to be much smaller than the number of devices. Additionally, since different users generate data with different patterns, we assume that no device has a representative sample of the overall distribution. We show that existing algorithms are not suitable for this setting, and propose a new algorithm which shows encouraging experimental results. This work also sets a path for future research needed in the context of federated optimization.
Much of information sits in an unprecedented amount of text data. Managing allocation of these large scale text data is an important problem for many areas. Topic modeling performs well in this problem. The traditional generative models (PLSA,LDA) are the state-of-the-art approaches in topic modeling and most recent research on topic generation has been focusing on improving or extending these models. However, results of traditional generative models are sensitive to the number of topics K, which must be specified manually. The problem of generating topics from corpus resembles community detection in networks. Many effective algorithms can automatically detect communities from networks without a manually specified number of the communities. Inspired by these algorithms, in this paper, we propose a novel method named Hierarchical Latent Semantic Mapping (HLSM), which automatically generates topics from corpus. HLSM calculates the association between each pair of words in the latent topic space, then constructs a unipartite network of words with this association and hierarchically generates topics from this network. We apply HLSM to several document collections and the experimental comparisons against several state-of-the-art approaches demonstrate the promising performance.
In last decades, diversity and accuracy have been regarded as two essential measures in evaluating recommender systems. However, a clear concern is that a model focusing excessively on one measure will put the other one at risk, which means that it is not easy to greatly improve diversity and accuracy simultaneously. In this paper, we propose a method which can be used in many being diffuse-like algorithms, like ProbS, BHC and HHP, to get more relevant and personalized recommendation results on user-object bipartite networks. The main idea is to enhance the RA similarity of objects in the transfer equations, by giving a tuneable exponent on the shoulder of RA similarity, to intentionally regulate the resource allocation on objects of different degrees. Experiments on three benchmark data sets, MovieLens, Netflix, and RYM show that the modified algorithms can yield remarkable performance improvement compared with the original models.
In our study we implemented and compared seven sequential and parallel sorting algorithms: bitonic sort, multistep bitonic sort, adaptive bitonic sort, merge sort, quicksort, radix sort and sample sort. Sequential algorithms were implemented on a central processing unit using C++, whereas parallel algorithms were implemented on a graphics processing unit using CUDA platform. We chose these algorithms because to the best of our knowledge their sequential and parallel implementations were not yet compared all together in the same execution environment. We improved the above mentioned implementations and adopted them to be able to sort input sequences of arbitrary length. We compared algorithms on six different input distributions, which consisted of 32-bit numbers, 32-bit key-value pairs, 64-bit numbers and 64-bit key-value pairs. In this report we give a short description of seven sorting algorithms and all the results obtained by our tests.
Many existing dynamical systems including all living objects exhibit signatures of what can be called a spontaneous dynamical long-range order (DLRO). Its omnipresence has long been recognized by the scientific community as is evidenced by a whole myriad of the related concepts, theoretical and phenomenological frameworks, and experimental phenomena such as turbulence, 1/f noise, dynamical complexity, chaos and the butterfly effect, Richter scale for earthquakes and the scale-free statistics of other sudden processes, self organization and pattern formation, self-organized criticality etc. Even though several successful approaches to some of the realizations of the DLRO have been established, the universal theoretical understanding of this phenomenon remained elusive. The possibility to construct a unified theory of the DLRO has emerged recently within the approximation-free supersymmetric theory of stochastics (STS). There, the DLRO is the spontaneous breakdown of the topological or De Rahm supersymmetry that all stochastic differential equations (SDEs) possess. This theory may be interesting to researchers with very different backgrounds because the ubiquitous DLRO is a truly interdisciplinary entity. The STS is too an interdisciplinary construction. It is based on the dynamical systems theory, the cohomological field theories, the theory of pseudo-Hermitian operators, and the conventional theory of SDEs. Reviewing the literature on all of these mathematical disciplines can be time consuming. As such, a concise and self-contained introduction to the STS, the goal of this review, may be useful.
We present a semi-supervised learning algorithm for learning discrete factor analysis models with arbitrary structure on the latent variables. Our algorithm assumes that every latent variable has an ‘anchor’, an observed variable with only that latent variable as its parent. Given such anchors, we show that it is possible to consistently recover moments of the latent variables and use these moments to learn complete models. We also introduce a new technique for improving the robustness of method-of-moment algorithms by optimizing over the marginal polytope or its relaxations. We evaluate our algorithm using two real-world tasks, tag prediction on questions from the Stack Overflow website and medical diagnosis in an emergency department.
We celebrate the 50th anniversary of one the most classical models in probability theory. In this survey, we describe the main results of first passage percolation, paying special attention to the recent burst of advances of the past 5 years. The purpose of these notes is twofold. In the first chapters, we give self-contained proofs of seminal results obtained in the ’80s and ’90s on limit shapes and geodesics, while covering the state of the art of these questions. Second, aside from these classical results, we discuss recent perspectives and directions including (1) the connection between Busemann functions and geodesics, (2) the proof of sublinear variance under 2+log moments of passage times and (3) the role of growth and competition models. We also provide a collection of (old and new) open questions, hoping to solve them before the 100th birthday.