We introduce a principled approach for unsupervised structure learning of deep neural networks. We propose a new interpretation for depth and inter-layer connectivity where conditional independencies in the input distribution are encoded hierarchically in the network structure. Thus, the depth of the network is determined inherently (equal to the maximal order of independence in the input distribution). The proposed method casts the problem of neural network structure learning as a problem of Bayesian network structure learning. Then, instead of directly learning the discriminative structure, it learns a generative graph, constructs its stochastic inverse, and then constructs a discriminative graph. We prove that conditional-dependency relations among the latent variables in the generative graph are preserved in the class-conditional discriminative graph. We demonstrate on image classification benchmarks that the deepest layers (convolutional and dense) of common networks can be replaced by significantly smaller learned structures, while maintaining classification accuracy—state-of-the-art on tested benchmarks. Our structure learning algorithm requires a small computational cost and runs efficiently on a standard desktop CPU.
Deeper convolutional neural networks provide more capacity to approximate complex mapping functions. However, increasing network depth imposes difficulties on training and increases model complexity. This paper presents a new nonlinear computational layer of considerably high capacity to the deep convolutional neural network architectures. This layer performs a set of comprehensive convolution operations that mimics the overall function of the human visual system (HVS) via focusing on learning structural information in its input. The core of its computations is evaluating the components of the structural similarity metric (SSIM) in a setting that allows the kernels to learn to match structural information. The proposed SSIMLayer is inherently nonlinear and hence, it does not require subsequent nonlinear transformations. Experiments conducted on CIFAR-10 benchmark demonstrates that the SSIMLayer provides better convergence than the traditional convolutional layer, bypasses the need for nonlinear transformations and shows more robustness against noise perturbations and adversarial attacks.
The growth in the number of devices connected to the Internet of Things (IoT) poses major challenges in security. The integrity and trustworthiness of data and data analytics are increasingly important concerns in IoT applications. These are compounded by the highly distributed nature of IoT devices, making it infeasible to prevent attacks and intrusions on all data sources. Adversaries may hijack devices and compromise their data. As a result, reactive countermeasures, such as intrusion detection and resilient analytics, become vital components of security. This paper overviews algorithms for secure distributed inference in IoT.
The current trend of pushing CNNs deeper with convolutions has created a pressing demand to achieve higher compression gains on CNNs where convolutions dominate the computation and parameter amount (e.g., GoogLeNet, ResNet and Wide ResNet). Further, the high energy consumption of convolutions limits its deployment on mobile devices. To this end, we proposed a simple yet effective scheme for compressing convolutions though applying k-means clustering on the weights, compression is achieved through weight-sharing, by only recording $K$ cluster centers and weight assignment indexes. We then introduced a novel spectrally relaxed $k$-means regularization, which tends to make hard assignments of convolutional layer weights to $K$ learned cluster centers during re-training. We additionally propose an improved set of metrics to estimate energy consumption of CNN hardware implementations, whose estimation results are verified to be consistent with previously proposed energy estimation tool extrapolated from actual hardware measurements. We finally evaluated Deep $k$-Means across several CNN models in terms of both compression ratio and energy consumption reduction, observing promising results without incurring accuracy loss. The code is available at https://…/Deep-K-Means
The interpretation of the experimental data collected by testing systems across input datasets and model parameters is of strategic importance for system design and implementation. In particular, finding relationships between variables and detecting the latent variables affecting retrieval performance can provide designers, engineers and experimenters with useful if not necessary information about how a system is performing. This paper discusses the use of Structural Equation Modelling (SEM) in providing an in-depth explanation of evaluation results and an explanation of failures and successes of a system; in particular, we focus on the case of Information Retrieval.
Recently, advanced cyber attacks, which consist of a sequence of steps that involve many vulnerabilities and hosts, compromise the security of many well-protected businesses. This has led to the solutions that ubiquitously monitor system activities in each host (big data) as a series of events, and search for anomalies (abnormal behaviors) for triaging risky events. Since fighting against these attacks is a time-critical mission to prevent further damage, these solutions face challenges in incorporating expert knowledge to perform timely anomaly detection over the large-scale provenance data. To address these challenges, we propose a novel stream-based query system that takes as input, a real-time event feed aggregated from multiple hosts in an enterprise, and provides an anomaly query engine that queries the event feed to identify abnormal behaviors based on the specified anomalies. To facilitate the task of expressing anomalies based on expert knowledge, our system provides a domain-specific query language, SAQL, which allows analysts to express models for (1) rule-based anomalies, (2) time-series anomalies, (3) invariant-based anomalies, and (4) outlier-based anomalies. We deployed our system in NEC Labs America comprising 150 hosts and evaluated it using 1.1TB of real system monitoring data (containing 3.3 billion events). Our evaluations on a broad set of attack behaviors and micro-benchmarks show that our system has a low detection latency (<2s) and a high system throughput (110,000 events/s; supporting ~4000 hosts), and is more efficient in memory utilization than the existing stream-based complex event processing systems.
This manuscript surveys reinforcement learning from the perspective of optimization and control with a focus on continuous control applications. It surveys the general formulation, terminology, and typical experimental implementations of reinforcement learning and reviews competing solution paradigms. In order to compare the relative merits of various techniques, this survey presents a case study of the Linear Quadratic Regulator (LQR) with unknown dynamics, perhaps the simplest and best studied problem in optimal control. The manuscript describes how merging techniques from learning theory and control can provide non-asymptotic characterizations of LQR performance and shows that these characterizations tend to match experimental behavior. In turn, when revisiting more complex applications, many of the observed phenomena in LQR persist. In particular, theory and experiment demonstrate the role and importance of models and the cost of generality in reinforcement learning algorithms. This survey concludes with a discussion of some of the challenges in designing learning systems that safely and reliably interact with complex and uncertain environments and how tools from reinforcement learning and controls might be combined to approach these challenges.
We introduce inference trees (ITs), a new class of inference methods that build on ideas from Monte Carlo tree search to perform adaptive sampling in a manner that balances exploration with exploitation, ensures consistency, and alleviates pathologies in existing adaptive methods. ITs adaptively sample from hierarchical partitions of the parameter space, while simultaneously learning these partitions in an online manner. This enables ITs to not only identify regions of high posterior mass, but also maintain uncertainty estimates to track regions where significant posterior mass may have been missed. ITs can be based on any inference method that provides a consistent estimate of the marginal likelihood. They are particularly effective when combined with sequential Monte Carlo, where they capture long-range dependencies and yield improvements beyond proposal adaptation alone.
Advertisements