Representative Approach google
We propose a fast and efficient strategy, called the representative approach, for big data analysis with linear models and generalized linear models. With a given partition of big dataset, this approach constructs a representative data point for each data block and fits the target model using the representative dataset. In terms of time complexity, it is as fast as the subsampling approaches in the literature. As for efficiency, its accuracy in estimating parameters is better than the divide-and-conquer method. With comprehensive simulation studies and theoretical justifications, we recommend two representative approaches. For linear models or generalized linear models with a flat inverse link function and moderate coefficients of continuous variables, we recommend mean representatives (MR). For other cases, we recommend score-matching representatives (SMR). As an illustrative application to the Airline on-time performance data, MR and SMR are as good as the full data estimate when available. Furthermore, the proposed representative strategy is ideal for analyzing massive data dispersed over a network of interconnected computers. …

Open Speech and Music Interpretation by Large Space Extraction (openSMILE) google
The openSMILE feature extraction tool enables you to extract large audio feature spaces, and apply machine learning methods to classify and analyze your data in real-time. It combines features from Music Information Retrieval and Speech Processing. SMILE is an acronym for Speech & Music Interpretation by Large-space Extraction. It is written in C++ and is available as both a standalone commandline executable as well as a dynamic library. The main features of openSMILE are its capability of on-line incremental processing and its modularity. Feature extractor components can be freely interconnected to create new and custom features, all via a simple configuration file. New components can be added to openSMILE via an easy binary plugin interface and a comprehensive API.

Nonlinearity Coefficient google
For a long time, designing neural architectures that exhibit high performance was considered a dark art that required expert hand-tuning. One of the few well-known guidelines for architecture design is the avoidance of exploding gradients, though even this guideline has remained relatively vague and circumstantial. We introduce the nonlinearity coefficient (NLC), a measurement of the complexity of the function computed by a neural network that is based on the magnitude of the gradient. Via an extensive empirical study, we show that the NLC is a powerful predictor of test error and that attaining a right-sized NLC is essential for optimal performance. The NLC exhibits a range of intriguing and important properties. It is closely tied to the amount of information gained from computing a single network gradient. It is tied to the error incurred when replacing the nonlinearity operations in the network with linear operations. It is not susceptible to the confounders of multiplicative scaling, additive bias and layer width. It is stable from layer to layer. Hence, we argue that the NLC is the first robust predictor of overfitting in deep networks. …