Benchmarking of LSTM Networks

LSTM (Long Short-Term Memory) recurrent neural networks have been highly successful in a number of application areas. This technical report describes the use of the MNIST and UW3 databases for benchmarking LSTM networks and explores the effect of different architectural and hyperparameter choices on performance. Significant findings include: (1) LSTM performance depends smoothly on learning rates, (2) batching and momentum has no significant effect on performance, (3) softmax training outperforms least square training, (4) peephole units are not useful, (5) the standard non-linearities (tanh and sigmoid) perform best, (6) bidirectional training combined with CTC performs better than other methods.

Distribution-Free Detection of Structured Anomalies: Permutation and Rank-Based Scans

The scan statistic is by far the most popular method for anomaly detection, being popular in syndromic surveillance, signal and image processing, and target detection based on sensor networks, among other applications. The use of the scan statistics in such settings yields an hypotheses testing procedure, where the null hypothesis corresponds to the absence of anomalous behavior. If the null distribution is known, then calibration of a scan-based test is relatively easy, as it can be done by Monte-Carlo simulation. When the null distribution is unknown, it is not clear what the best way to proceed is. We propose two procedures. One is a calibration by permutation and the other is a rank-based scan test, which is distribution-free and less sensitive to outliers. Furthermore, the rank-scan test requires only a one-time calibration for a given data size making it computationally more appealing. In both cases, we quantify the performance loss with respect to an oracle scan test that knows the null distribution, and show one incurs only a very small loss in the context of a natural exponential family. These results include the classical normal location model, as well as Poisson model popular in syndromic surveillance. We perform numerical experiments on simulated data further supporting our theory, and also experiments with a real dataset from genomics.

Learning to Hire Teams

Crowdsourcing and human computation has been employed in increasingly sophisticated projects that require the solution of a heterogeneous set of tasks. We explore the challenge of building or hiring an effective team, for performing tasks required for such projects on an ongoing basis, from an available pool of applicants or workers who have bid for the tasks. The recruiter needs to learn workers’ skills and expertise by performing online tests and interviews, and would like to minimize the amount of budget or time spent in this process before committing to hiring the team. How can one optimally spend budget to learn the expertise of workers as part of recruiting a team? How can one exploit the similarities among tasks as well as underlying social ties or commonalities among the workers for faster learning? We tackle these decision-theoretic challenges by casting them as an instance of online learning for best action selection. We present algorithms with PAC bounds on the required budget to hire a near-optimal team with high confidence. Furthermore, we consider an embedding of the tasks and workers in an underlying graph that may arise from task similarities or social ties, and that can provide additional side-observations for faster learning. We then quantify the improvement in the bounds that we can achieve depending on the characteristic properties of this graph structure. We evaluate our methodology on simulated problem instances as well as on real-world crowdsourcing data collected from the oDesk platform. Our methodology and results present an interesting direction of research to tackle the challenges faced by a recruiter for contract-based crowdsourcing.

OOASP: Connecting Object-oriented and Logic Programming

Most of contemporary software systems are implemented using an object-oriented approach. Modeling phases — during which software engineers analyze requirements to the future system using some modeling language — are an important part of the development process, since modeling errors are often hard to recognize and correct. In this paper we present a framework which allows the integration of Answer Set Programming into the object-oriented software development process. OOASP supports reasoning about object-oriented software models and their instantiations. Preliminary results of the OOASP application in CSL Studio, which is a Siemens internal modeling environment for product configurators, show that it can be used as a lightweight approach to verify, create and transform instantiations of object models at runtime and to support the software development process during design and testing.

RCR: Robust Compound Regression for Robust Estimation of Errors-in-Variables Model

The errors-in-variables (EIV) regression model, being more realistic by accounting for measurement errors in both the dependent and the independent variables, is widely adopted in applied sciences. The traditional EIV model estimators, however, can be highly biased by outliers and other departures from the underlying assumptions. In this paper, we develop a novel nonparametric regression approach – the robust compound regression (RCR) analysis method for the robust estimation of EIV models. We first introduce a robust and efficient estimator called least sine squares (LSS). Taking full advantage of both the new LSS method and the compound regression analysis method developed in our own group, we subsequently propose the RCR approach as a generalization of those two, which provides a robust counterpart of the entire class of the maximum likelihood estimation (MLE) solutions of the EIV model, in a 1-1 mapping. Technically, our approach gives users the flexibility to select from a class of RCR estimates the optimal one with a predefined regression efficiency criterion satisfied. Simulation studies and real-life examples are provided to illustrate the effectiveness of the RCR approach.

Space-efficient detection of unusual words

Detecting all the strings that occur in a text more frequently or less frequently than expected according to an IID or a Markov model is a basic problem in string mining, yet current algorithms are based on data structures that are either space-inefficient or incur large slowdowns, and current implementations cannot scale to genomes or metagenomes in practice. In this paper we engineer an algorithm based on the suffix tree of a string to use just a small data structure built on the Burrows-Wheeler transform, and a stack of O(\sigma^2\log^2 n) bits, where n is the length of the string and \sigma is the size of the alphabet. The size of the stack is o(n) except for very large values of \sigma. We further improve the algorithm by removing its time dependency on \sigma, by reporting only a subset of the maximal repeats and of the minimal rare words of the string, and by detecting and scoring candidate under-represented strings that \textit{do not occur} in the string. Our algorithms are practical and work directly on the BWT, thus they can be immediately applied to a number of existing datasets that are available in this form, returning this string mining problem to a manageable scale.

The Goulden-Jackson cluster method for monoid networks and an application to lattice path enumeration

The Goulden-Jackson cluster method is a powerful tool for obtaining generating functions for words in free monoids by occurrences of a set of subwords. We introduce a generalization of the cluster method for monoid networks, which generalize the combinatorial framework of free monoids. As a sample application of the monoid network cluster method, we compute bivariate and multivariate generating functions for Motzkin paths—both with height bounded and unbounded—by statistics corresponding to the number of occurrences of various subwords, yielding both closed-form and continued fraction formulae.

A Dramatically Growing Shear Rigidity Length in Simulated Model Metallic Glass Former

A Family of the Zeckendorf Theorem Related Identities

A group action on increasing sequences of set-indexed Brownian motions

A Lundberg-type inequality for an inhomogeneous renewal risk model

Accessible Proof of Standard Monomial Basis for Coordinatization of Schubert Sets of Flags

Algebraic flow theory of infinite graphs

Algorithmic Design of Majorizers for Large-Scale Inverse Problems

Are Slepian-Wolf Rates Necessary for Distributed Parameter Estimation?

Bayesian Dropout

Bayesian Variable Selection with Structure Learning: Applications in Integrative Genomics

Binary words avoiding xx^Rx and strongly unimodal sequences

Bootstrap Random Walks

Bounds for codes on pentagon and other cycles

Central limit theorem for functionals of a generalized self-similar process

Construction of maximum likelihood estimator in the mixed fractional–fractional Brownian motion model with double long-range dependence

Convergence of random walks to Brownian motion in phylogenetic tree-space

Convergence rates of sub-sampled Newton methods

De-biasing the Lasso: Optimal Sample Size for Gaussian Designs

Editing to a Planar Graph of Given Degrees

Estimate of the truncation error of a finite volume discretisation of the Navier-Stokes equations on colocated grids

Expected Supremum Representation of the Value of a Singular Stochastic Control Problem

Extremes and Limit Theorems for Difference of Chi-type processes

Fast $L_2$-approximation of integral-type functionals of Markov processes

FFT-Based Fast Computation of Multivariate Kernel Estimators with Unconstrained Bandwidth Matrices

From Cutting Planes Algorithms to Compression Schemes and Active Learning

Fullerenes with distant pentagons

Galton’s Family Heights Data Revisited

Identifiability of logistic regression with homoscedastic error: Berkson model

Incidence Geometry in a Weyl Chamber I: $GL_n$

Manifold regularization in structured output space for semi-supervised structured output prediction

Many-body localization and thermalization in disordered Hubbard chains

Maximum Entropy Vector Kernels for MIMO system identification

Minimal Length Maximal Green Sequences and Triangulations of Polygons

Modelling x-ray tomography using integer compositions

Newton’s method in practice: finding all roots of polynomials of degree one million efficiently

No Regret Bound for Extreme Bandits

On interval edge-colorings of bipartite graphs of small order

On the classification of certain ternary codes of length 12

On the Convergence of SGD Training of Neural Networks

On the Effect of Bias Estimation on Coverage Accuracy in Nonparametric Inference

On the Feynman–Kac semigroup for some Markov processes

Operational risk models and maximum likelihood estimation error for small sample-sizes

Permutation totally symmetric self-complementary plane partitions

Permutations $r_j$ such that $\sum_{i=1}^n \prod_{j=1}^k r_j(i)$ is maximized or minimized

Possible Mechanisms for Neural Reconfigurability and their Implications

Risk aggregation with empirical margins: Latin hypercubes, empirical copulas, and convergence of sum distributions

Technical Note: Split Algorithm in O(n) for the Vehicle Routing Problem

The Causal Effects for a Causal Loglinear Model

The Effects of Hyperparameters on SGD Training of Neural Networks

The predictive power of the business and bank sentiment of firms: A high-dimensional Granger Causality approach

The rate of convergence to the normal law in terms of pseudomoments

Uncertainty for calculating transport on Titan: a probabilistic description of bimolecular diffusion parameters