We introduce a new deep convolutional neural network, CrescendoNet, by stacking simple building blocks without residual connections. Each Crescendo block contains independent convolution paths with increased depths. The numbers of convolution layers and parameters are only increased linearly in Crescendo blocks. In experiments, CrescendoNet with only 15 layers outperforms almost all networks without residual connections on benchmark datasets, CIFAR10, CIFAR100, and SVHN. Given sufficient amount of data as in SVHN dataset, CrescendoNet with 15 layers and 4.1M parameters can match the performance of DenseNet-BC with 250 layers and 15.3M parameters. CrescendoNet provides a new way to construct high performance deep convolutional neural networks without residual connections. Moreover, through investigating the behavior and performance of subnetworks in CrescendoNet, we note that the high performance of CrescendoNet may come from its implicit ensemble behavior, which differs from the FractalNet that is also a deep convolutional neural network without residual connections. Furthermore, the independence between paths in CrescendoNet allows us to introduce a new path-wise training procedure, which can reduce the memory needed for training.
Recommendation systems occupy an expanding role in everyday decision making, from choice of movies and household goods to consequential medical and legal decisions. The data used to train and test these systems is algorithmically confounded in that it is the result of a feedback loop between human choices and an existing algorithmic recommendation system. Using simulations, we demonstrate that algorithmic confounding can disadvantage algorithms in training, bias held-out evaluation, and amplify homogenization of user behavior without gains in utility.
The ability to dynamically and efficiently allocate resources to meet the need of growing diversity in services and user behavior marks the future of wireless networks, giving rise to intelligent processing, which aims at enabling the system to perceive and assess the available resources, to autonomously learn to adapt to the perceived wireless environment, and to reconfigure its operating mode to maximize the utility of the available resources. The perception capability and reconfigurability are the essential features of cognitive technology while modern machine learning techniques project effectiveness in system adaptation. In this paper, we discuss the development of the cognitive technology and machine learning techniques and emphasize their roles in improving both spectrum and energy efficiency of the future wireless networks. We describe in detail the state-of-the-art of cognitive technology, covering spectrum sensing and access approaches that may enhance spectrum utilization and curtail energy consumption. We discuss powerful machine learning algorithms that enable spectrum- and energy-efficient communications in dynamic wireless environments. We also present practical applications of these techniques to the existing and future wireless communication systems, such as heterogeneous networks and device-to-device communications, and identify some research opportunities and challenges in cognitive technology and machine learning as applied to future wireless networks.
We study the theoretical properties of the generalized dynamic principal components introduced in Pe\~na and Yohai (2016). In particular, we prove that when the data follows a dynamic factor model, the reconstruction provided by the procedure converges in mean square to the common part of the model as the number of series and periods diverge to infinity. The results of a simulation study support our findings.
We present a method for the simultaneous Bayesian learning of the correlation matrix and graphical model of a multivariate dataset, using Metropolis-within-Gibbs inference. Here, the data comprises measurement of a vector-valued observable, that we model using a high-dimensional Gaussian Process (GP), such that, likelihood of GP parameters given the data, is Matrix-Normal, defined by a mean matrix and between-rows and between-columns covariance matrices. We marginalise over the between-row matrices, to achieve a closed-form likelihood of the between-columns correlation matrix, given the data. This correlation matrix is updated in the first block of an iteration, given the data, and the (generalised Binomial) graph is updated in the second block, at the partial correlation matrix that is computed given the updated correlation. We also learn the 95 Highest Probability Density credible regions of the correlation matrix as well as the graphical model of the data. The difference in the acknowledgement of measurement errors in learning the graphical model, is demonstrated on a small simulated dataset, while the large human disease-symptom network–with nodes–is learnt using real data. Data on the vino-chemical attributes of Portugese red and white wine samples are employed to learn the correlation structure and graphical model of each dataset, to then compute the distance between the learnt graphical models.
Due to their complex nature, it is hard to characterize the ways in which machine learning models can misbehave or be exploited when deployed. Recent work on adversarial examples, i.e. inputs with minor perturbations that result in substantially different model predictions, is helpful in evaluating the robustness of these models by exposing the adversarial scenarios where they fail. However, these malicious perturbations are often unnatural, not semantically meaningful, and not applicable to complicated domains such as language. In this paper, we propose a framework to generate natural and legible adversarial examples by searching in semantic space of dense and continuous data representation, utilizing the recent advances in generative adversarial networks. We present generated adversaries to demonstrate the potential of the proposed approach for black-box classifiers in a wide range of applications such as image classification, textual entailment, and machine translation. We include experiments to show that the generated adversaries are natural, legible to humans, and useful in evaluating and analyzing black-box classifiers.
Low-rank tensor regression, a new model class that learns high-order correlation from data, has recently received considerable attention. At the same time, Gaussian processes (GP) are well-studied machine learning models for structure learning. In this paper, we demonstrate interesting connections between the two, especially for multi-way data analysis. We show that low-rank tensor regression is essentially learning a multi-linear kernel in Gaussian processes, and the low-rank assumption translates to the constrained Bayesian inference problem. We prove the oracle inequality and derive the average case learning curve for the equivalent GP model. Our finding implies that low-rank tensor regression, though empirically successful, is highly dependent on the eigenvalues of covariance functions as well as variable correlations.
The relatively recent adoption of Knowledge Graphs as an enabling technology in multiple high-profile artificial intelligence and cognitive applications has led to growing interest in the Semantic Web technology stack. Many semantics-related tools, however, are focused on serving experts with a deep understanding of semantic technologies. For example, triplification of relational data is available but there is no open source tool that allows a user unfamiliar with OWL/RDF to import data into a semantic triple store in an intuitive manner. Further, many tools require users to have a working understanding of SPARQL to query data. Casual users interested in benefiting from the power of Knowledge Graphs have few tools available for exploring, querying, and managing semantic data. We present SemTK, the Semantics Toolkit, a user-friendly suite of tools that allow both expert and non-expert semantics users convenient ingestion of relational data, simplified query generation, and more. The exploration of ontologies and instance data is performed through SPARQLgraph, an intuitive web-based user interface in SemTK understandable and navigable by a lay user. The open source version of SemTK is available at http://semtk.research.ge.com.
TF Boosted Trees (TFBT) is a new open-sourced frame-work for the distributed training of gradient boosted trees. It is based on TensorFlow, and its distinguishing features include a novel architecture, automatic loss differentiation, layer-by-layer boosting that results in smaller ensembles and faster prediction, principled multi-class handling, and a number of regularization techniques to prevent overfitting.
Six simple, dynamic soft sensor methodologies with two update conditions were compared on two experimentally-obtained datasets and one simulated dataset. The soft sensors investigated were: moving window partial least squares regression (and a recursive variant), moving window random forest regression, feedforward neural networks, mean moving window, and a novel random forest partial least squares regression ensemble (RF-PLS). We found that, on two of the datasets studied, very small window sizes (4 samples) led to the lowest prediction errors. The RF-PLS method offered the lowest one-step-ahead prediction errors compared to those of the other methods, and demonstrated greater stability at larger time lags than moving window PLS alone. We found that this method most adequately modeled the datasets that did not feature purely monotonic increases in property values. In general, we observed that linear models deteriorated most rapidly at more delayed model update conditions while nonlinear methods tended to provide predictions that approached those from a simple mean moving window. Other data dependent findings are presented and discussed.
Learning to learn is a powerful paradigm for enabling models to learn from data more effectively and efficiently. A popular approach to meta-learning is to train a recurrent model to read in a training dataset as input and output the parameters of a learned model, or output predictions for new test inputs. Alternatively, a more recent approach to meta-learning aims to acquire deep representations that can be effectively fine-tuned, via standard gradient descent, to new tasks. In this paper, we consider the meta-learning problem from the perspective of universality, formalizing the notion of learning algorithm approximation and comparing the expressive power of the aforementioned recurrent models to the more recent approaches that embed gradient descent into the meta-learner. In particular, we seek to answer the following question: does deep representation combined with standard gradient descent have sufficient capacity to approximate any learning algorithm? We find that this is indeed true, and further find, in our experiments, that gradient-based meta-learning consistently leads to learning strategies that generalize more widely compared to those represented by recurrent models.