A Primer on Neural Network Models for Natural Language Processing

Over the past few years, neural networks have re-emerged as powerful machine-learning models, yielding state-of-the-art results in fields such as image recognition and speech processing. More recently, neural network models started to be applied also to textual natural language signals, again with very promising results. This tutorial surveys neural network models from the perspective of natural language processing research, in an attempt to bring natural-language researchers up to speed with the neural techniques. The tutorial covers input encoding for natural language tasks, feed-forward networks, convolutional networks, recurrent networks and recursive networks, as well as the computation graph abstraction for automatic gradient computation.

Analytic Posteriors for Pearson’s Correlation Coefficient

For bivariate normal data, all (marginal) posterior moments of Pearson’s correlation coefficient are given in analytic form.

Balancing the Communication Load of Asynchronously Parallelized Machine Learning Algorithms

Stochastic Gradient Descent (SGD) is the standard numerical method used to solve the core optimization problem for the vast majority of machine learning (ML) algorithms. In the context of large scale learning, as utilized by many Big Data applications, efficient parallelization of SGD is in the focus of active research. Recently, we were able to show that the asynchronous communication paradigm can be applied to achieve a fast and scalable parallelization of SGD. Asynchronous Stochastic Gradient Descent (ASGD) outperforms other, mostly MapReduce based, parallel algorithms solving large scale machine learning problems. In this paper, we investigate the impact of asynchronous communication frequency and message size on the performance of ASGD applied to large scale ML on HTC cluster and cloud environments. We introduce a novel algorithm for the automatic balancing of the asynchronous communication load, which allows to adapt ASGD to changing network bandwidths and latencies.

Boosting in the presence of outliers: adaptive classification with non-convex loss functions

This paper examines the role and efficiency of the non-convex loss functions for binary classification problems. In particular, we investigate how to design a simple and effective boosting algorithm that is robust to the outliers in the data. The analysis of the role of a particular non-convex loss for prediction accuracy varies depending on the diminishing tail properties of the gradient of the loss — the ability of the loss to efficiently adapt to the outlying data, the local convex properties of the loss and the proportion of the contaminated data. In order to use these properties efficiently, we propose a new family of non-convex losses named \gamma-robust losses. Moreover, we present a new boosting framework, {\it Arch Boost}, designed for augmenting the existing work such that its corresponding classification algorithm is significantly more adaptable to the unknown data contamination. Along with the Arch Boosting framework, the non-convex losses lead to the new class of boosting algorithms, named adaptive, robust, boosting (ARB). Furthermore, we present theoretical examples that demonstrate the robustness properties of the proposed algorithms. In particular, we develop a new breakdown point analysis and a new influence function analysis that demonstrate gains in robustness. Moreover, we present new theoretical results, based only on local curvatures, which may be used to establish statistical and optimization properties of the proposed Arch boosting algorithms with highly non-convex loss functions. Extensive numerical calculations are used to illustrate these theoretical properties and reveal advantages over the existing boosting methods when data exhibits a number of outliers.

Clustering over Logistic Random Dot Product Graphs

Inference of low-dimensional structures, such as clusters, on large networks is a central problem in network science. An important class of models describing such structures is the Random Dot Product Graph (RDPG), which assigns low dimensional latent position vectors to nodes and computes edge probabilities using dot products between these vectors. The RDPG provides a more flexible network model compared with the standard Stochastic Block Model (SBM). In this paper, we introduce the Logistic RDPG, which uses a logistic link function mapping from latent positions to edge probabilities. The logistic RDPG includes most SBMs as well as other low-dimensional structures, such as degree-corrected models, that are not described by SBMs. Over this model, we derive a method for efficient, asymptotically exact maximum-likelihood inference of latent position vectors. Our method involves computing the top eigenvectors of the mean-centered adjacency matrix and performing a logistic regression step to recover the appropriate eigenvalue scaling. Applied to the network clustering problem on diverse synthetic network models, we illustrate that our method is more accurate and more robust than existing spectral and semidefinite network clustering methods.

Convergence Analysis of a Stochastic Projection-free Algorithm

This paper presents and analyzes a stochastic version of the Frank-Wolfe algorithm (a.k.a. conditional gradient method or projection-free algorithm) for constrained convex optimization. We first prove that when the quality of gradient estimate improves as {\cal O}( \sqrt{ \eta_t^{\Delta} / t } ), where t is the iteration index and \eta_t^{\Delta} is an increasing sequence, then the objective value of the stochastic Frank-Wolfe algorithm converges in at least the same order. When the optimal solution lies in the interior of the constraint set, the convergence rate is accelerated to {\cal O}(\eta_t^{\Delta} /t). Secondly, we study how the stochastic Frank-Wolfe algorithm can be applied to a few practical machine learning problems. Tight bounds on the gradient estimate errors for these examples are established. Numerical simulations support our findings.

Design and Management of Vehicle Sharing Systems: A Survey of Algorithmic Approaches

Vehicle (bike or car) sharing represents an emerging transportation scheme which may comprise an important link in the green mobility chain of smart city environments. This chapter offers a comprehensive review of algorithmic approaches for the design and management of vehicle sharing systems. Our focus is on one-way vehicle sharing systems (wherein customers are allowed to pick-up a vehicle at any location and return it to any other station) which best suits typical urban journey requirements. Along this line, we present methods dealing with the so-called asymmetric demand-offer problem (i.e. the unbalanced offer and demand of vehicles) typically experienced in one-way sharing systems which severely affects their economic viability as it implies that considerable human (and financial) resources should be engaged in relocating vehicles to satisfy customer demand. The chapter covers all planning aspects that affect the effectiveness and viability of vehicle sharing systems: the actual system design (e.g. number and location of vehicle station facilities, vehicle fleet size, vehicles distribution among stations); customer incentivisation schemes to motivate customer-based distribution of bicycles/cars (such schemes offer meaningful incentives to users so as to leave their vehicle to a station different to that originally intended and satisfy future user demand); cost-effective solutions to schedule operator-based repositioning of bicycles/cars (by employees explicitly enrolled in vehicle relocation) based on the current and future (predicted) demand patterns (operator-based and customer-based relocation may be thought as complementary methods to achieve the intended distribution of vehicles among stations).

Distributed Parameter Map-Reduce

This paper describes how to convert a machine learning problem into a series of map-reduce tasks. We study logistic regression algorithm. In logistic regression algorithm, it is assumed that samples are independent and each sample is assigned a probability. Parameters are obtained by maxmizing the product of all sample probabilities. Rapid expansion of training samples brings challenges to machine learning method. Training samples are so many that they can be only stored in distributed file system and driven by map-reduce style programs. The main step of logistic regression is inference. According to map-reduce spirit, each sample makes inference through a separate map procedure. But the premise of inference is that the map procedure holds parameters for all features in the sample. In this paper, we propose Distributed Parameter Map-Reduce, in which not only samples, but also parameters are distributed in nodes of distributed filesystem. Through a series of map-reduce tasks, we assign each sample parameters for its features, make inference for the sample and update paramters of the model. The above processes are excuted looply until convergence. We test the proposed algorithm in actual hadoop production environment. Experiments show that the acceleration of the algorithm is in linear relationship with the number of cluster nodes.

Implicit stochastic approximation

The need to carry out parameter estimation from massive data has reinvigorated interest in iterative estimation methods, in statistics and machine learning. Classic work includes deterministic gradient-based methods, such as quasi-Newton, and stochastic gradient descent and its variants, including adaptive learning rates, acceleration and averaging. Current work increasingly relies on methods that employ proximal operators, leading to updates defined through implicit equations, which need to be solved at each iteration. Such methods are especially attractive in modern problems with massive data because they are numerically stable and converge with minimal assumptions, among other reasons. However, while the majority of existing methods can be subsumed into the gradient-free stochastic approximation framework developed by Robbins and Monro (1951), there is no such framework for methods with implicit updates. Here, we conceptualize a gradient-free implicit stochastic approximation procedure, and develop asymptotic and non-asymptotic theory for it. This new framework provides a theoretical foundation for gradient-based procedures that rely on implicit updates, and opens the door to iterative estimation methods that do not require a gradient, nor a fully known likelihood.

Intelligent Search Optimization using Artificial Fuzzy Logics

Information on the web is prodigious; searching relevant information is difficult making web users to rely on search engines for finding relevant information on the web. Search engines index and categorize web pages according to their contents using crawlers and rank them accordingly. For given user query they retrieve millions of webpages and display them to users according to web-page rank. Every search engine has their own algorithms based on certain parameters for ranking web-pages. Search Engine Optimization (SEO) is that technique by which webmasters try to improve ranking of their websites by optimizing it according to search engines ranking parameters. It is the aim of this research to identify the most popular SEO techniques used by search engines for ranking web-pages and to establish their importance for indexing and categorizing web data. The research tries to establish that using more SEO parameters in ranking algorithms helps in retrieving better search results thus increasing user satisfaction. In the accomplished research, a web based Meta search engine is proposed to aggregates search results from different search engines and rank web-pages based on new page ranking algorithm which will assign heuristic page rank to web-pages based on SEO parameters such as title tag, Meta description, sitemap etc. The research also provides insight into techniques which webmasters can use for better ranking their websites in Google and Bing. Initial results has shown that using certain SEO parameters in present ranking algorithm helps in retrieving more useful results for user queries. These results generated from Meta search engine outperformed existing search engines in terms of better retrieved search results and high precision.

Learning in Unlabeled Networks – An Active Learning and Inference Approach

The task of determining labels of all network nodes based on the knowledge about network structure and labels of some training subset of nodes is called the within-network classification. It may happen that none of the labels of the nodes is known and additionally there is no information about number of classes to which nodes can be assigned. In such a case a subset of nodes has to be selected for initial label acquisition. The question that arises is: ‘labels of which nodes should be collected and used for learning in order to provide the best classification accuracy for the whole network?’. Active learning and inference is a practical framework to study this problem. A set of methods for active learning and inference for within network classification is proposed and validated. The utility score calculation for each node based on network structure is the first step in the process. The scores enable to rank the nodes. Based on the ranking, a set of nodes, for which the labels are acquired, is selected (e.g. by taking top or bottom N from the ranking). The new measure-neighbour methods proposed in the paper suggest not obtaining labels of nodes from the ranking but rather acquiring labels of their neighbours. The paper examines 29 distinct formulations of utility score and selection methods reporting their impact on the results of two collective classification algorithms: Iterative Classification Algorithm and Loopy Belief Propagation. We advocate that the accuracy of presented methods depends on the structural properties of the examined network. We claim that measure-neighbour methods will work better than the regular methods for networks with higher clustering coefficient and worse than regular methods for networks with low clustering coefficient. According to our hypothesis, based on clustering coefficient we are able to recommend appropriate active learning and inference method.

Tree Realization Problem: Depth, Height and Subtree Size Sequences

The rooted tree is an important data structure, and the height, depth and subtree size are naturally defined attributes of every node. We consider the problem of the realization of a tree given a sequence of one of these attributes. We also consider the problem of the existence of a tree when multiple sequences of attributes are given or when a sequence of tuples of attributes is given. Our most significant results are the NP-Completeness of problems related to subtree size sequences, which we prove by non-trivial reductions from Numerical Matching with Target Sums. By contrast we give polynomial time algorithms for depth and height sequences. Even for the subtree size problem, we show that the realization of some classes of trees can be solved in O(log(n)) time.

A combinatorial theory of random matrices III: random walks on $\mathfrak{S}(N)$, ramified coverings and the $\mathfrak{S}(\infty)$ Yang-Mills measure

A Common-Factor Approach for Multivariate Data Cleaning with an Application to Mars Phoenix Mission Data

A Geometric View of Posterior Approximation

A Note on a Latent Space Network Model

A Note on Finding Dual Feedback Vertex Set

A note on structured means analysis for a single group

A Piecewise Deterministic Markov Toy Model for Traffic/Maintenance and Associated Hamilton-Jacobi Integrodifferential Systems on Networks

A Survey of Online Experiment Design with the Stochastic Multi-Armed Bandit

A unified approach for large queue asymptotics in a heterogeneous multiserver queue

Accuracy of Bayesian Latent Variable Estimation with Redundant Dimension

Ages of records in random walks

Approximate Fisher Kernels of non-iid Image Models for Image Categorization

Asymptotic Minimaxity, Optimal Posterior Concentration and Asymptotic Bayes Optimality of Horseshoe-type Priors Under Sparsity

Bayesian inference for misspecified exponential random graph models

Bayesian Inference via Approximation of Log-likelihood for Priors in Exponential Family

Blocking estimators and inference under the Neyman-Rubin model

Calculating entropy at different scales among diverse communication systems

Characterizing $2$-Distance Graphs and Solving the Equations $T_2(X)=kP_2$ or $K_m \cup K_n$

Client Profiling for an Anti-Money Laundering System

Comparing Trajectories on the Size and Shape Space

Controlling Chimera States – The influence of excitable units

Cross-Device Tracking: Matching Devices and Cookies

Decomposition of Graphs into $(k,r)$-Fans and Single Edges

Deep convolutional acoustic word embeddings using word-pair side information

Disentangling the role of structure and friction in shear jamming

Dynamic and spectral properties of transmission eigenchannels in random media

Existence of Timewise Optimal Feedback Controls for the Stochastic Navier Stokes Equation in 2D

Exploiting Multiple Levels of Parallelism in Sparse Matrix-Matrix Multiplication

Finite Dimensional Fokker-Planck Equations for Continuous Time Random Walks

Finite system scheme for infinite rate mutually catalytic branching with infinite branching rate

Fluctuation Phenomena in Chaotic Dirac Quantum Dots: Artificial Atoms on Graphene Flakes

From a Kac-like particle system to the Landau equation for hard potentials and Maxwell molecules

Geodesics and the competition interface for the corner growth model

Graph Theory

Heavy traffic approximation for the stationary distribution of a generalized Jackson network: the BAR approach

Hospital Mortality Rate Estimation for Public Reporting

Illustration of iterative linear solver behavior on simple 1D and 2D problems

Improved error bounds for quantization based numerical schemes for BSDE and nonlinear filtering

Isometric embeddings of dual polar graphs in Grassmann graphs over finite fields

It is not all downhill from here: Syllable Contact Law in Persian

Local spectral statistics of Gaussian matrices with correlated entries

Logarithmic tails of sums of products of positive random variables bounded by one

Machine Learning for Machine Data from a CATI Network

Mathematical aspects of Wiener index

Minimal number of edges in hypergraph guaranteeing perfect fractional matching and MMS conjecture, complete version

Monitoring Potential Drug Interactions and Reactions via Network Analysis of Instagram User Timelines

Nonlinear Spectral Analysis via One-homogeneous Functionals – Overview and Future Prospects

On a Selection Problem for Small Noise Perturbation in Multidimensional Case

On mutations in the branching model for multitype populations

On the delayed stochastic integration and continuity in the Hurst parameter

On the number of unary-binary tree-like structures with restrictions on the unary height

On the Rate of Convergence of Mean-Field Models: Stein’s Method Meets the Perturbation Theory

Partial Information Decomposition as a Unified Approach to the Specification of Neural Goal Functions

Partitions of the set of natural numbers and symplectic homology

Permanent index of matrices associated with graphs

P-trac Procedure: The Dispersion and Neutralization of Contrasts in Lexicon

Quadratic Optimization with Orthogonality Constraints: Explicit Lojasiewicz Exponent and Linear Convergence of Line-Search Methods

Quantifying and mitigating the effect of preferential sampling on phylodynamic inference

Random walks on Baumslag-Solitar groups

Range and critical generations of a random walk on Galton-Watson trees

Rank Aggregation: New Bounds for MCx

Rapidly Mixing Gibbs Sampling for a Class of Factor Graphs Using Hierarchy Width

Reduced Precision Checking to Detect Errors in Floating Point Arithmetic

Relaxed Multiple-Instance SVM with Application to Object Discovery

Routing in Unit Disk Graphs

Signed Enumeration of Upper-Right Corners in Path Shuffles

Sparse Density Representations for Simultaneous Inference on Large Spatial Datasets

Stationary cocycles and Busemann functions for the corner growth model

Stochastic differential equations revisited

Superfluid density and quasi-long-range order in the one-dimensional disordered Bose-Hubbard model

Symmetry and dynamics universality of supermetal in quantum chaos

Systolic Expanders of Every Dimension

Testing for Characteristics of Attribute Linked Infinite Networks based on Small Samples

The Bruss-Robertson Inequality: Elaborations, Extensions, and Applications

The Coulomb potential V(r)=1/r and other radial problems on the Bethe lattice

The Strong Chromatic Index of graphs with maximum degree $Δ$

Tight Variational Bounds via Random Projections and I-Projections

Time-dependent community structure in legislation cosponsorship networks in the Congress of the Republic of Peru

Total positivity in Markov structures

Total weight choosability of d-degenerate graphs

Urban Scaling in Europe