Text Data Augmentation Made Simple By Leveraging NLP Cloud APIs

In practice, it is common to find oneself with far too little text data to train a deep neural network. This ‘Big Data Wall’ represents a challenge for minority language communities on the Internet, organizations, laboratories and companies that compete the GAFAM (Google, Amazon, Facebook, Apple, Microsoft). While most of the research effort in text data augmentation aims on the long-term goal of finding end-to-end learning solutions, which is equivalent to ‘using neural networks to feed neural networks’, this engineering work focuses on the use of practical, robust, scalable and easy-to-implement data augmentation pre-processing techniques similar to those that are successful in computer vision. Several text augmentation techniques have been experimented. Some existing ones have been tested for comparison purposes such as noise injection or the use of regular expressions. Others are modified or improved techniques like lexical replacement. Finally more innovative ones, such as the generation of paraphrases using back-translation or by the transformation of syntactic trees, are based on robust, scalable, and easy-to-use NLP Cloud APIs. All the text augmentation techniques studied, with an amplification factor of only 5, increased the accuracy of the results in a range of 4.3% to 21.6%, with significant statistical fluctuations, on a standardized task of text polarity prediction. Some standard deep neural network architectures were tested: the multilayer perceptron (MLP), the long short-term memory recurrent network (LSTM) and the bidirectional LSTM (biLSTM). Classical XGBoost algorithm has been tested with up to 2.5% improvements.

GC-LSTM: Graph Convolution Embedded LSTM for Dynamic Link Prediction

Dynamic link prediction is a research hot in complex networks area, especially for its wide applications in biology, social network, economy and industry. Compared with static link prediction, dynamic one is much more difficult since network structure evolves over time. Currently most researches focus on static link prediction which cannot achieve expected performance in dynamic network. Aiming at low AUC, high Error Rate, add/remove link prediction difficulty, we propose GC-LSTM, a Graph Convolution Network (GC) embedded Long Short Term Memory network (LTSM), for end-to-end dynamic link prediction. To the best of our knowledge, it is the first time that GCN embedded LSTM is put forward for link prediction of dynamic networks. GCN in this new deep model is capable of node structure learning of network snapshot for each time slide, while LSTM is responsible for temporal feature learning for network snapshot. Besides, current dynamic link prediction method can only handle removed links, GC-LSTM can predict both added or removed link at the same time. Extensive experiments are carried out to testify its performance in aspects of prediction accuracy, Error Rate, add/remove link prediction and key link prediction. The results prove that GC-LSTM outperforms current state-of-art method.

Rethink and Redesign Meta learning

Recently, Meta-learning has been shown as a promising way to improve the ability of learning from few data for Computer Vision. However, previous Meta-learning approaches exposed below problems: 1) they ignored the importance of attention mechanism for Meta-learner, leading the Meta-learner to be interfered by unimportant information; 2) they ignored the importance of past knowledge which can help the Meta-learner accurately understand the input data and further express them into high representations, and they train the Meta-learner to solve few shot learning task directly on the few original input data instead of on the high representations; 3) they suffer from a problem which we named as task-over-fitting (TOF) problem, which is probably caused by that they are requested to solve few shot learning task based on the original high dimensional input data, and redundant input information leads themselves to be easier to suffer from TOF. In this paper, we rethink the Meta-learning algorithm and propose that the attention mechanism and the past knowledge are crucial for the Meta-learner, and the Meta-learner should well use its past knowledge and express the input data into high representations to solve few shot learning tasks. Moreover, the Meta-learning approach should be free from the TOF problem. Based on these arguments, we redesign the Meta-learning algorithm to solve these three aforementioned problems, and proposed three methods. Extensive experiments demonstrate the effectiveness of our designation and methods with state-of-the-art performances on several few shot learning benchmarks. The source code of our proposed methods will be released soon.

Anomaly Generation using Generative Adversarial Networks in Host Based Intrusion Detection

Generative adversarial networks have been able to generate striking results in various domains. This generation capability can be general while the networks gain deep understanding regarding the data distribution. In many domains, this data distribution consists of anomalies and normal data, with the anomalies commonly occurring relatively less, creating datasets that are imbalanced. The capabilities that generative adversarial networks offer can be leveraged to examine these anomalies and help alleviate the challenge that imbalanced datasets propose via creating synthetic anomalies. This anomaly generation can be specifically beneficial in domains that have costly data creation processes as well as inherently imbalanced datasets. One of the domains that fits this description is the host-based intrusion detection domain. In this work, ADFA-LD dataset is chosen as the dataset of interest containing system calls of small foot-print next generation attacks. The data is first converted into images, and then a Cycle-GAN is used to create images of anomalous data from images of normal data. The generated data is combined with the original dataset and is used to train a model to detect anomalies. By doing so, it is shown that the classification results are improved, with the AUC rising from 0.55 to 0.71, and the anomaly detection rate rising from 17.07% to 80.49%. The results are also compared to SMOTE, showing the potential presented by generative adversarial networks in anomaly generation.

Considering Race a Problem of Transfer Learning

As biometric applications are fielded to serve large population groups, issues of performance differences between individual sub-groups are becoming increasingly important. In this paper we examine cases where we believe race is one such factor. We look in particular at two forms of problem; facial classification and image synthesis. We take the novel approach of considering race as a boundary for transfer learning in both the task (facial classification) and the domain (synthesis over distinct datasets). We demonstrate a series of techniques to improve transfer learning of facial classification; outperforming similar models trained in the target’s own domain. We conduct a study to evaluate the performance drop of Generative Adversarial Networks trained to conduct image synthesis, in this process, we produce a new annotation for the Celeb-A dataset by race. These networks are trained solely on one race and tested on another – demonstrating the subsets of the CelebA to be distinct domains for this task.

Strong-Weak Distribution Alignment for Adaptive Object Detection

We propose an approach for unsupervised adaptation of object detectors from label-rich to label-poor domains which can significantly reduce annotation costs associated with detection. Recently, approaches that align distributions of source and target images using an adversarial loss have been proven effective for adapting object classifiers. However, for object detection, fully matching the entire distributions of source and target images to each other at the global image level may fail, as domains could have distinct scene layouts and different combinations of objects. On the other hand, strong matching of local features such as texture and color makes sense, as it does not change category level semantics. This motivates us to propose a novel approach for detector adaptation based on strong local alignment and weak global alignment. Our key contribution is the weak alignment model, which focuses the adversarial alignment loss on images that are globally similar and puts less emphasis on aligning images that are globally dissimilar. Additionally, we design the strong domain alignment model to only look at local receptive fields of the feature map. We empirically verify the effectiveness of our approach on several detection datasets comprising both large and small domain shifts.

Can I trust you more? Model-Agnostic Hierarchical Explanations

Interactions such as double negation in sentences and scene interactions in images are common forms of complex dependencies captured by state-of-the-art machine learning models. We propose Mah\’e, a novel approach to provide Model-agnostic hierarchical \’explanations of how powerful machine learning models, such as deep neural networks, capture these interactions as either dependent on or free of the context of data instances. Specifically, Mah\’e provides context-dependent explanations by a novel local interpretation algorithm that effectively captures any-order interactions, and obtains context-free explanations through generalizing context-dependent interactions to explain global behaviors. Experimental results show that Mah\’e obtains improved local interaction interpretations over state-of-the-art methods and successfully explains interactions that are context-free.

Kernel Treelets

A new method for hierarchical clustering is presented. It combines treelets, a particular multiscale decomposition of data, with a projection on a reproducing kernel Hilbert space. The proposed approach, called kernel treelets (KT), effectively substitutes the correlation coefficient matrix used in treelets with a symmetric, positive semi-definite matrix efficiently constructed from a kernel function. Unlike most clustering methods, which require data sets to be numeric, KT can be applied to more general data and yield a multi-resolution sequence of basis on the data directly in feature space. The effectiveness and potential of KT in clustering analysis is illustrated with some examples.

Linking Artificial Intelligence Principles

Artificial Intelligence principles define social and ethical considerations to develop future AI. They come from research institutes, government organizations and industries. All versions of AI principles are with different considerations covering different perspectives and making different emphasis. None of them can be considered as complete and can cover the rest AI principle proposals. Here we introduce LAIP, an effort and platform for linking and analyzing different Artificial Intelligence Principles. We want to explicitly establish the common topics and links among AI Principles proposed by different organizations and investigate on their uniqueness. Based on these efforts, for the long-term future of AI, instead of directly adopting any of the AI principles, we argue for the necessity of incorporating various AI Principles into a comprehensive framework and focusing on how they can interact and complete each other.

Spatial-Temporal Subset-based Digital Image Correlation: A General Framework

A comprehensive and systematic framework for easily extending and implementing the spatial-temporal subset-based digital image correlation (DIC) algorithm is presented. The framework decouples the three main factors (shape function, correlation criterion, and optimization algorithm) in DIC, and represents different algorithms in a uniform form. One can freely choose and combine the three factors to meet his own need, or freely add more parameters to extract analytic results. Subpixel translation and a simulated image series with different velocity characters are analyzed using different algorithms based on the proposed framework. And an application of mitigating air disturbance due to heat haze using spatial-temporal DIC (ST-DIC) is demonstrated, proving the applicability of the framework.

Distributed Anomaly Detection using Autoencoder Neural Networks in WSN for IoT

Wireless sensor networks (WSN) are fundamental to the Internet of Things (IoT) by bridging the gap between the physical and the cyber worlds. Anomaly detection is a critical task in this context as it is responsible for identifying various events of interests such as equipment faults and undiscovered phenomena. However, this task is challenging because of the elusive nature of anomalies and the volatility of the ambient environments. In a resource-scarce setting like WSN, this challenge is further elevated and weakens the suitability of many existing solutions. In this paper, for the first time, we introduce autoencoder neural networks into WSN to solve the anomaly detection problem. We design a two-part algorithm that resides on sensors and the IoT cloud respectively, such that (i) anomalies can be detected at sensors in a fully distributed manner without the need for communicating with any other sensors or the cloud, and (ii) the relatively more computation-intensive learning task can be handled by the cloud with a much lower (and configurable) frequency. In addition to the minimal communication overhead, the computational load on sensors is also very low (of polynomial complexity) and readily affordable by most COTS sensors. Using a real WSN indoor testbed and sensor data collected over 4 consecutive months, we demonstrate via experiments that our proposed autoencoder-based anomaly detection mechanism achieves high detection accuracy and low false alarm rate. It is also able to adapt to unforeseeable and new changes in a non-stationary environment, thanks to the unsupervised learning feature of our chosen autoencoder neural networks.

STEP : A Distributed Multi-threading Framework Towards Efficient Data Analytics

Various general-purpose distributed systems have been proposed to cope with high-diversity applications in the pipeline of Big Data analytics. Most of them provide simple yet effective primitives to simplify distributed programming. While the rigid primitives offer great ease of use to savvy programmers, they probably compromise efficiency in performance and flexibility in data representation and programming specifications, which are critical properties in real systems. In this paper, we discuss the limitations of coarse-grained primitives and aim to provide an alternative for users to have flexible control over distributed programs and operate globally shared data more efficiently. We develop STEP, a novel distributed framework based on in-memory key-value store. The key idea of STEP is to adapt multi-threading in a single machine to a distributed environment. STEP enables users to take fine-grained control over distributed threads and apply task-specific optimizations in a flexible manner. The underlying key-value store serves as distributed shared memory to keep globally shared data. To ensure ease-of-use, STEP offers plentiful effective interfaces in terms of distributed shared data manipulation, cluster management, distributed thread management and synchronization. We conduct extensive experimental studies to evaluate the performance of STEP using real data sets. The results show that STEP outperforms the state-of-the-art general-purpose distributed systems as well as a specialized ML platform in many real applications.

Real-Time Anomaly Detection With HMOF Feature

Anomaly detection is a challenging problem in intelligent video surveillance. Most existing methods are computation consuming, which cannot satisfy the real-time requirement. In this paper, we propose a real-time anomaly detection framework with low computational complexity and high efficiency. A new feature, named Histogram of Magnitude Optical Flow (HMOF), is proposed to capture the motion of video patches. Compared with existing feature descriptors, HMOF is more sensitive to motion magnitude and more efficient to distinguish anomaly information. The HMOF features are computed for foreground patches, and are reconstructed by the auto-encoder for better clustering. Then, we use Gaussian Mixture Model (GMM) Classifiers to distinguish anomalies from normal activities in videos. Experimental results show that our framework outperforms state-of-the-art methods, and can reliably detect anomalies in real-time.

Causal inference, social networks, and chain graphs

Traditionally, statistical and causal inference on human subjects relies on the assumption that individuals are independently affected by treatments or exposures. However, recently there has been increasing interest in settings, such as social networks, where treatments may spill over from the treated individual to his or her social contacts and outcomes may be contagious. Existing models proposed for causal inference using observational data from networks have two major shortcomings. First, they often require a level of granularity in the data that is not often practically infeasible to collect, and second, the models are generally high-dimensional and often too big to fit to the available data. In this paper we propose and justify a parsimonious parameterization for social network data with interference and contagion. Our parameterization corresponds to a particular family of graphical models known as chain graphs. We demonstrate that, in some settings, chain graph models approximate the observed marginal distribution, which is missing most of the time points from the full data. We illustrate the use of chain graphs for causal inference about collective decision making in social networks using data from U.S. Supreme Court decisions between 1994 and 2004.

Fission: A Probably Fast, Scalable, and Secure Permissionless Blockchain

We present Fission, a new permissionless blockchain that achieves scalability in both terms of system throughput and transaction confirmation time, while at the same time, retaining blockchain’s core values of equality and decentralization. Fission overcomes the system throughput bottleneck by employing a novel Eager-Lazy pipeling model that achieves very high system throughputs via block pipelining, an adaptive partitioning mechanism that auto-scales to transaction volumes, and a provably secure energy-efficient consensus protocol to ensure security and robustness. Fission applies a hybrid network which consists of a relay network, and a peer-to-peer network. The goal of the relay network is to minimize the transaction confirmation time by minimizing the information propagation latency. To optimize the performance on the relay network in the presence of churn, dynamic network topologies, and network heterogeneity, we propose an ultra-fast game-theoretic relay selection algorithm that achieves near-optimal performance in a fully distributed manner. Fission’s peer-to-peer network complements the relay network and provides a very high data availability via enabling users to contribute their storage and bandwidth for information dissemination (with incentive). We propose a distributed online data retrieval strategy that optimally offloads the relay network without degrading the system performance. By re-innovating all the core elements of the blockchain technology – computation, networking, and storage – in a holistic manner, Fission aims to achieve the best balance among scalability, security and decentralization.

Effective Feature Learning with Unsupervised Learning for Improving the Predictive Models in Massive Open Online Courses

The effectiveness of learning in massive open online courses (MOOCs) can be significantly enhanced by introducing personalized intervention schemes which rely on building predictive models of student learning behaviors such as some engagement or performance indicators. A major challenge that has to be addressed when building such models is to design handcrafted features that are effective for the prediction task at hand. In this paper, we make the first attempt to solve the feature learning problem by taking the unsupervised learning approach to learn a compact representation of the raw features with a large degree of redundancy. Specifically, in order to capture the underlying learning patterns in the content domain and the temporal nature of the clickstream data, we train a modified auto-encoder (AE) combined with the long short-term memory (LSTM) network to obtain a fixed-length embedding for each input sequence. When compared with the original features, the new features that correspond to the embedding obtained by the modified LSTM-AE are not only more parsimonious but also more discriminative for our prediction task. Using simple supervised learning models, the learned features can improve the prediction accuracy by up to 17% compared with the supervised neural networks and reduce overfitting to the dominant low-performing group of students, specifically in the task of predicting students’ performance. Our approach is generic in the sense that it is not restricted to a specific supervised learning model nor a specific prediction task for MOOC learning analytics.

Recent Advances in Autoencoder-Based Representation Learning

Learning useful representations with little or no supervision is a key challenge in artificial intelligence. We provide an in-depth review of recent advances in representation learning with a focus on autoencoder-based models. To organize these results we make use of meta-priors believed useful for downstream tasks, such as disentanglement and hierarchical organization of features. In particular, we uncover three main mechanisms to enforce such properties, namely (i) regularizing the (approximate or aggregate) posterior distribution, (ii) factorizing the encoding and decoding distribution, or (iii) introducing a structured prior distribution. While there are some promising results, implicit or explicit supervision remains a key enabler and all current methods use strong inductive biases and modeling assumptions. Finally, we provide an analysis of autoencoder-based representation learning through the lens of rate-distortion theory and identify a clear tradeoff between the amount of prior knowledge available about the downstream tasks, and how useful the representation is for this task.