Large-scale Artificial Neural Network: MapReduce-based Deep Learning

Faced with continuously increasing scale of data, original back-propagation neural network based machine learning algorithm presents two non-trivial challenges: huge amount of data makes it difficult to maintain both efficiency and accuracy; redundant data aggravates the system workload. This project is mainly focused on the solution to the issues above, combining deep learning algorithm with cloud computing platform to deal with large-scale data. A MapReduce-based handwriting character recognizer will be designed in this project to verify the efficiency improvement this mechanism will achieve on training and practical large-scale data. Careful discussion and experiment will be developed to illustrate how deep learning algorithm works to train handwritten digits data, how MapReduce is implemented on deep learning neural network, and why this combination accelerates computation. Besides performance, the scalability and robustness will be mentioned in this report as well. Our system comes with two demonstration software that visually illustrates our handwritten digit recognition/encoding application.

Feedforward Sequential Memory Neural Networks without Recurrent Feedback

We introduce a new structure for memory neural networks, called feedforward sequential memory networks (FSMN), which can learn long-term dependency without using recurrent feedback. The proposed FSMN is a standard feedforward neural networks equipped with learnable sequential memory blocks in the hidden layers. In this work, we have applied FSMN to several language modeling (LM) tasks. Experimental results have shown that the memory blocks in FSMN can learn effective representations of long history. Experiments have shown that FSMN based language models can significantly outperform not only feedforward neural network (FNN) based LMs but also the popular recurrent neural network (RNN) LMs.

Some Theory For Practical Classifier Validation

We compare and contrast two approaches to validating a trained classifier while using all in-sample data for training. One is simultaneous validation over an organized set of hypotheses (SVOOSH), the well-known method that began with VC theory. The other is withhold and gap (WAG). WAG withholds a validation set, trains a holdout classifier on the remaining data, uses the validation data to validate that classifier, then adds the rate of disagreement between the holdout classifier and one trained using all in-sample data, which is an upper bound on the difference in error rates. We show that complex hypothesis classes and limited training data can make WAG a favorable alternative.

Controlled Experiments for Word Embeddings

An experimental approach to studying the properties of word embeddings is proposed. Controlled experiments, achieved through modifications of the training corpus, permit the demonstration of direct relations between word properties and word vector direction and length. The approach is demonstrated using the word2vec CBOW model with experiments that independently vary word frequency and word co-occurrence noise. The experiments reveal that word vector length depends more or less linearly on both word frequency and the level of noise in the co-occurrence distribution of the word. The coefficients of linearity depend upon the word. The special point in feature space, defined by the (artificial) word with pure noise in its co-occurrence distribution, is found to be small but non-zero.

Functional Frank-Wolfe Boosting for General Loss Functions

Boosting is a generic learning method for classification and regression. Yet, as the number of base hypotheses becomes larger, boosting can lead to a deterioration of test performance. Overfitting is an important and ubiquitous phenomenon, especially in regression settings. To avoid overfitting, we consider using l_1 regularization. We propose a novel Frank-Wolfe type boosting algorithm (FWBoost) applied to general loss functions. By using exponential loss, the FWBoost algorithm can be rewritten as a variant of AdaBoost for binary classification. FWBoost algorithms have exactly the same form as existing boosting methods, in terms of making calls to a base learning algorithm with different weights update. This direct connection between boosting and Frank-Wolfe yields a new algorithm that is as practical as existing boosting methods but with new guarantees and rates of convergence. Experimental results show that the test performance of FWBoost is not degraded with larger rounds in boosting, which is consistent with the theoretical analysis.

New Optimisation Methods for Machine Learning

A thesis submitted for the degree of Doctor of Philosophy of The Australian National University. In this work we introduce several new optimisation methods for problems in machine learning. Our algorithms broadly fall into two categories: optimisation of finite sums and of graph structured objectives. The finite sum problem is simply the minimisation of objective functions that are naturally expressed as a summation over a large number of terms, where each term has a similar or identical weight. Such objectives most often appear in machine learning in the empirical risk minimisation framework in the non-online learning setting. The second category, that of graph structured objectives, consists of objectives that result from applying maximum likelihood to Markov random field models. Unlike the finite sum case, all the non-linearity is contained within a partition function term, which does not readily decompose into a summation. For the finite sum problem, we introduce the Finito and SAGA algorithms, as well as variants of each. For graph-structured problems, we take three complementary approaches. We look at learning the parameters for a fixed structure, learning the structure independently, and learning both simultaneously. Specifically, for the combined approach, we introduce a new method for encouraging graph structures with the ‘scale-free’ property. For the structure learning problem, we establish SHORTCUT, a O(n^{2.5}) expected time approximate structure learning method for Gaussian graphical models. For problems where the structure is known but the parameters unknown, we introduce an approximate maximum likelihood learning algorithm that is capable of learning a useful subclass of Gaussian graphical models.

Prognostic factors associated with success rates of posterior orthodontic miniscrew implant

Recovering a Hidden Community Beyond the Spectral Limit in $O(|E| \log^*|V|)$ Time

A rate balance principle and its application to queueing models

What did Erwin Mean? The Physics of Information from the Materials Genomics of Aperiodic Crystals and Water to Molecular Information Catalysts and Life

Early Inference in Energy-Based Models Approximates Back-Propagation

Qubit stabilizer states are complex projective 3-designs

Organic direct and indirect effects with post-treatment common causes of mediator and outcome

Quasistatic dynamics with intermittency

Conditional Risk Minimization for Stochastic Processes

Colonization and Collapse

A nonlinear population Monte Carlo scheme for the Bayesian estimation of parameters of $α$-stable distributions

An Improved Combinatorial Polynomial Algorithm for the Linear Arrow-Debreu Market

Hamiltonian Properties of DCell Networks

A weighted sum over generalized Tesler matrices

Velocity of the $L$-branching Brownian motion

Enumeration of diagonally colored Young diagrams

Technical Report of Participation in Higgs Boson Machine Learning Challenge

Empirical variogram of the underlying Gaussian fields in the plurigaussian models

Universality of Load Balancing Schemes on Diffusion Scale

Reconstructing Compact Metrizable Spaces

Representations of lift bicircular matroids

The Price of Connectivity for Feedback Vertex Set

Efficient Ranking of Lyndon Words and Decoding Lexicographically Minimal de Bruijn Sequence

Injective Edge Chromatic Index of a Graph

An iterative technique for bounding derivatives of solutions of Stein equations

Energy-Efficient Infrastructure Sensor Network for Ad Hoc Cognitive Radio Network

Non-differentiability of the effective potential and the replica symmetry breaking in the random energy model

Sequential Monte Carlo Methods for State and Parameter Estimation in Abruptly Changing Environments

A randomness test for functional panels

On the Long-range Directed Polymer

Asymptotic Analysis of the Random-Walk Metropolis Algorithm on Ridged Densities

A Model-Based Approach to Climate Reconstruction Using Tree-Ring Data

Generalized Cramer-Rao Bound for Joint Estimation of Target Position and Velocity for Active and Passive Radar Networks

Characterization of ECG patterns with the Synchrosqueezing Transform

Multi-arm incipient infinite clusters in 2D: scaling limits and winding numbers

On path decompositions of 2k-regular graphs

On chromatic number of Latin square graphs

Differential Evolution with Generalized Mutation Operator

Differential Evolution with Universal Mutation

Path Weights, Networked Partial Correlations and their Application to the Analysis of Genetic Interactions

Statistical Analysis of Persistence Intensity Functions