Although timing and synchronization of a dynamically-changing set of elements and their related power considerations are essential to many cyber-physical systems (CPS), they are absent from today’s programming languages, forcing programmers to handle these matters outside of the language and on a case-by-case basis. This paper proposes a framework for adding time-related concepts to languages. Complementing prior work in this area, this paper develops the notion of dynamically federated islands of variable-precision synchronization and coordinated entities through synergistic activities at the language, system, network, and device levels. At the language level, we explore constructs that capture key timing and synchronization concepts and, at the system level, we propose a flexible intermediate language that represents both program logic and timing constraints together with run-time mechanisms. At the network level, we argue for architectural extensions that permit the network to act as a combined computing, communication, storage, and synchronization platform and at the device level, we explore architectural concepts that can lead to greater interoperability, easy establishment of timing constraints, and more power-efficient designs.
We consider the stochastic multi-armed bandit problem and the contextual bandit problem with historical observations and pre-clustered arms. The historical observations can contain any number of instances for each arm, and the pre-clustering information is a fixed clustering of arms provided as part of the input. We develop a variety of algorithms which incorporate this offline information effectively during the online exploration phase and derive their regret bounds. In particular, we develop the META algorithm which effectively hedges between two other algorithms: one which uses both historical observations and clustering, and another which uses only the historical observations. The former outperforms the latter when the clustering quality is good, and vice-versa. Extensive experiments on synthetic and real world datasets on Warafin drug dosage and web server selection for latency minimization validate our theoretical insights and demonstrate that META is a robust strategy for optimally exploiting the pre-clustering information.
In real-world networks the interactions between network elements are inherently time-delayed. These time-delays can not only slow the network but can have a destabilizing effect on the network’s dynamics leading to poor performance. The same is true in computational networks used for machine learning etc. where time-delays increase the network’s memory but can degrade the network’s ability to be trained. However, not all networks can be destabilized by time-delays. Previously, it has been shown that if a network or high-dimensional dynamical system is intrinsically stable, which is a stronger form of the standard notion of global stability, then it maintains its stability when constant time-delays are introduced into the system. Here we show that intrinsically stable systems, including intrinsically stable networks and a broad class of switched systems, i.e. systems whose mapping is time-dependent, remain stable in the presence of any type of time-varying time-delays whether these delays are periodic, stochastic, or otherwise. We apply these results to a number of well-studied systems to demonstrate that the notion of intrinsic stability is both computationally inexpensive, relative to other methods, and can be used to improve on some of the best known stability results. We also show that the asymptotic state of an intrinsically stable switched system is exponentially independent of the system’s initial conditions.
Reducing the latency variance in machine learning inference is a key requirement in many applications. Variance is harder to control in a cloud deployment in the presence of stragglers. In spite of this challenge, inference is increasingly being done in the cloud, due to the advent of affordable machine learning as a service (MLaaS) platforms. Existing approaches to reduce variance rely on replication which is expensive and partially negates the affordability of MLaaS. In this work, we argue that MLaaS platforms also provide unique opportunities to cut the cost of redundancy. In MLaaS platforms, multiple inference requests are concurrently received by a load balancer which can then create a more cost-efficient redundancy coding across a larger collection of images. We propose a novel convolutional neural network model, Collage-CNN, to provide a low-cost redundancy framework. A Collage-CNN model takes a collage formed by combining multiple images and performs multi-image classification in one shot, albeit at slightly lower accuracy. We then augment a collection of traditional single image classifiers with a single Collage-CNN classifier which acts as a low-cost redundant backup. Collage-CNN then provides backup classification results if a single image classification straggles. Deploying the Collage-CNN models in the cloud, we demonstrate that the 99th percentile tail latency of inference can be reduced by 1.47X compared to replication based approaches while providing high accuracy. Also, variation in inference latency can be reduced by 9X with a slight increase in average inference latency.
The power of neural networks lies in their ability to generalize to unseen data, yet the underlying reasons for this phenomenon remains elusive. Numerous rigorous attempts have been made to explain generalization, but available bounds are still quite loose, and analysis does not always lead to true understanding. The goal of this work is to make generalization more intuitive. Using visualization methods, we discuss the mystery of generalization, the geometry of loss landscapes, and how the curse (or, rather, the blessing) of dimensionality causes optimizers to settle into minima that generalize well.
The proliferation of automated inference algorithms in Bayesian statistics has provided practitioners newfound access to fast, reproducible data analysis and powerful statistical models. Designing automated methods that are also both computationally scalable and theoretically sound, however, remains a significant challenge. Recent work on Bayesian coresets takes the approach of compressing the dataset before running a standard inference algorithm, providing both scalability and guarantees on posterior approximation error. But the automation of past coreset methods is limited because they depend on the availability of a reasonable coarse posterior approximation, which is difficult to specify in practice. In the present work we remove this requirement by formulating coreset construction as sparsity-constrained variational inference within an exponential family. This perspective leads to a novel construction via greedy optimization, and also provides a unifying information-geometric view of present and past methods. The proposed Riemannian coreset construction algorithm is fully automated, requiring no inputs aside from the dataset, probabilistic model, desired coreset size, and sample size used for Monte Carlo estimates. In addition to being easier to use than past methods, experiments demonstrate that the proposed algorithm achieves state-of-the-art Bayesian dataset summarization.
To understand causal relationships between events in the world, it is useful to pinpoint when actions occur in videos and to examine the state of the world at and around that time point. For example, one must accurately detect the start of an audience response — laughter in a movie, cheering at a sporting event — to understand the cause of the reaction. In this work, we focus on the problem of accurately detecting action starts rather than isolated events or action ends. We introduce a novel structured loss function based on matching predictions to true action starts that is tailored to this problem; it more heavily penalizes extra and missed action start detections over small misalignments. Recurrent neural networks are used to minimize a differentiable approximation of this loss. To evaluate these methods, we introduce the Mouse Reach Dataset, a large, annotated video dataset of mice performing a sequence of actions. The dataset was labeled by experts for the purpose of neuroscience research on causally relating neural activity to behavior. On this dataset, we demonstrate that the structured loss leads to significantly higher accuracy than a baseline of mean-squared error loss.
Recent advances in generative modeling of text have demonstrated remarkable improvements in terms of fluency and coherency. In this work we investigate to which extent a machine can discriminate real from machine generated text. This is important in itself for automatic detection of computer generated stories, but can also serve as a tool for further improving text generation. We show that learning a dedicated scoring function to discriminate between real and fake text achieves higher precision than employing the likelihood of a generative model. The scoring functions generalize to other generators than those used for training as long as these generators have comparable model complexity and are trained on similar datasets.
Medical image analysis using supervised deep learning methods remains problematic because of the reliance of deep learning methods on large amounts of labelled training data. Although medical imaging data repositories continue to expand there has not been a commensurate increase in the amount of annotated data. Hence, we propose a new unsupervised feature learning method that learns feature representations to then differentiate dissimilar medical images using an ensemble of different convolutional neural networks (CNNs) and K-means clustering. It jointly learns feature representations and clustering assignments in an end-to-end fashion. We tested our approach on a public medical dataset and show its accuracy was better than state-of-the-art unsupervised feature learning methods and comparable to state-of-the-art supervised CNNs. Our findings suggest that our method could be used to tackle the issue of the large volume of unlabelled data in medical imaging repositories.
In this paper, we introduce a novel semantic description approach inspired on Prototype Theory foundations. We propose a Computational Prototype Model (CPM) that encodes and stores the central semantic meaning of objects category: the semantic prototype. Also, we introduce a Prototype-based Description Model that encodes the semantic meaning of an object while describing its features using our CPM model. Our description method uses semantic prototypes computed by CNN-classifications models to create discriminative signatures that describe an object highlighting its most distinctive features within the category. Our experiments show that: i) our CPM model (semantic prototype + distance metric) is able to describe the internal semantic structure of objects categories; ii) our semantic distance metric can be understood as the object visual typicality score within a category; iii) our descriptor encoding is semantically interpretable and significantly outperforms other image global encodings in clustering and classification tasks.
The impact of designing for security of AI is critical for humanity in the AI era. With humans increasingly becoming dependent upon AI, there is a need for neural networks that work reliably, inspite of Adversarial attacks. The vision for Safe and secure AI for popular use is achievable. To achieve safety of AI, this paper explores strategies and a novel deep learning architecture. To guard AI from adversaries, paper explores combination of 3 strategies: 1. Introduce randomness at inference time to hide the representation learning from adversaries. 2. Detect presence of adversaries by analyzing the sequence of inferences. 3. Exploit visual similarity. To realize these strategies, this paper designs a novel architecture, Dynamic Neural Defense, DND. This defense has 3 deep learning architectural features: 1. By hiding the way a neural network learns from exploratory attacks using a random computation graph, DND evades attack. 2. By analyzing input sequence to cloud AI inference engine with LSTM, DND detects attack sequence. 3. By inferring with visual similar inputs generated by VAE, any AI defended by DND approach does not succumb to hackers. Thus, a roadmap to develop reliable, safe and secure AI is presented.
With convenient access to observational data, learning individual causal effects from such data draws more attention in many influential research areas such as economics, healthcare, and education. For example, we aim to study how a medicine (treatment) would affect the health condition (outcome) of a certain patient. To validate causal inference from observational data, we need to control the influence of confounders – the variables which causally influence both the treatment and the outcome. Along this line, existing work for learning individual treatment effect overwhelmingly relies on the assumption that there are no hidden confounders. However, in real-world observational data, this assumption is untenable and can be unrealistic. In fact, an important fact ignored by them is that observational data can come with network information that can be utilized to infer hidden confounders. For example, in an observational study of the individual treatment effect of a medicine, instead of randomized experiments, the medicine is assigned to individuals based on a series of factors. Some factors (e.g., socioeconomic status) are hard to measure directly and therefore become hidden confounders of observational datasets. Fortunately, the socioeconomic status of an individual can be reflected by whom she is connected in social networks. With this fact in mind, we aim to exploit the network structure to recognize patterns of hidden confounders in the task of learning individual treatment effects from observational data. In this work, we propose a novel causal inference framework, the network deconfounder, which learns representations of confounders by unraveling patterns of hidden confounders from the network structure between instances of observational data. Empirically, we perform extensive experiments to validate the effectiveness of the network deconfounder on various datasets.
Given the overwhelming number of emails, an effective subject line becomes essential to better inform the recipient of the email’s content. In this paper, we propose and study the task of email subject line generation: automatically generating an email subject line from the email body. We create the first dataset for this task and find that email subject line generation favor extremely abstractive summary which differentiates it from news headline generation or news single document summarization. We then develop a novel deep learning method and compare it to several baselines as well as recent state-of-the-art text summarization systems. We also investigate the efficacy of several automatic metrics based on correlations with human judgments and propose a new automatic evaluation metric. Our system outperforms competitive baselines given both automatic and human evaluations. To our knowledge, this is the first work to tackle the problem of effective email subject line generation.
This paper studies the problem of learning a sequence of sentiment classification tasks. The learned knowledge from each task is retained and used to help future or subsequent task learning. This learning paradigm is called Lifelong Learning (LL). However, existing LL methods either only transfer knowledge forward to help future learning and do not go back to improve the model of a previous task or require the training data of the previous task to retrain its model to exploit backward/reverse knowledge transfer. This paper studies reverse knowledge transfer of LL in the context of naive Bayesian (NB) classification. It aims to improve the model of a previous task by leveraging future knowledge without retraining using its training data. This is done by exploiting a key characteristic of the generative model of NB. That is, it is possible to improve the NB classifier for a task by improving its model parameters directly by using the retained knowledge from other tasks. Experimental results show that the proposed method markedly outperforms existing LL baselines.
Deep neural networks have achieved great success in classification tasks during the last years. However, one major problem to the path towards artificial intelligence is the inability of neural networks to accurately detect novel class distributions and therefore, most of the classification algorithms proposed make the assumption that all classes are known prior to the training stage. In this work, we propose a methodology for training a neural network that allows it to efficiently detect novel class distributions without compromising much of its classification accuracy on the test examples of known classes. Experimental results on the CIFAR 100 and MiniImagenet data sets demonstrate the effectiveness of the proposed algorithm. The way this method was constructed also makes it suitable for training any classification algorithm that is based on Maximum Likelihood methods.
Explainable machine learning (ML) has been implemented in numerous open source and proprietary software packages and explainable ML is an important aspect of commercial predictive modeling. However, explainable ML can be misused, particularly as a faulty safeguard for harmful black-boxes, e.g. fairwashing, and for other malevolent purposes like model stealing. This text discusses definitions, examples, and guidelines that promote a holistic and human-centered approach to ML which includes interpretable (i.e. white-box ) models and explanatory, debugging, and disparate impact analysis techniques.
A key component of most neural network architectures is the use of normalization layers, such as Batch Normalization. Despite its common use and large utility in optimizing deep architectures that are otherwise intractable, it has been challenging both to generically improve upon Batch Normalization and to understand specific circumstances that lend themselves to other enhancements. In this paper, we identify four improvements to the generic form of Batch Normalization and the circumstances under which they work, yielding performance gains across all batch sizes while requiring no additional computation during training. These contributions include proposing a method for reasoning about the current example in inference normalization statistics which fixes a training vs. inference discrepancy; recognizing and validating the powerful regularization effect of Ghost Batch Normalization for small and medium batch sizes; examining the effect of weight decay regularization on the scaling and shifting parameters; and identifying a new normalization algorithm for very small batch sizes by combining the strengths of Batch and Group Normalization. We validate our results empirically on four datasets: CIFAR-100, SVHN, Caltech-256, and ImageNet.
As the core component of Natural Language Processing (NLP) system, Language Model (LM) can provide word representation and probability indication of word sequences. Neural Network Language Models (NNLMs) overcome the curse of dimensionality and improve the performance of traditional LMs. A survey on NNLMs is performed in this paper. The structure of classic NNLMs is described firstly, and then some major improvements are introduced and analyzed. We summarize and compare corpora and toolkits of NNLMs. Further, some research directions of NNLMs are discussed.