Recent trends in neural network based text-to-speech/speech synthesis pipelines have employed recurrent Seq2seq architectures that can synthesize realistic sounding speech directly from text characters. These systems however have complex architectures and takes a substantial amount of time to train. We introduce several modifications to these Seq2seq architectures that allow for faster training time, and also allows us to reduce the complexity of the model architecture at the same time. We show that our proposed model can achieve attention alignment much faster than previous architectures and that good audio quality can be achieved with a model that’s much smaller in size. Sample audio available at https://…/tts-samples-for-cmpt-419.
Collecting, analyzing and gaining insight from large volumes of data is now the norm in an ever increasing number of industries. Data analytics techniques, such as machine learning, are powerful tools used to analyze these large volumes of data. Synthetic data sets are routinely relied upon to train and develop such data analytics methods for several reasons: to generate larger data sets than are available, to generate diverse data sets, to preserve anonymity in data sets with sensitive information, etc. Processing, transmitting and storing data is a key issue faced when handling large data sets. This paper presents an ‘On the fly’ framework for generating big synthetic data sets, suitable for these data analytics methods, that is both computationally efficient and applicable to a diverse set of problems. An example application of the proposed framework is presented along with a mathematical analysis of its computational efficiency, demonstrating its effectiveness.
Self-organization can be broadly defined as the ability of a system to display ordered spatio-temporal patterns solely as the result of the interactions among the system components. Processes of this kind characterize both living and artificial systems, making self-organization a concept that is at the basis of several disciplines, from physics to biology to engineering. Placed at the frontiers between disciplines, Artificial Life (ALife) has heavily borrowed concepts and tools from the study of self-organization, providing mechanistic interpretations of life-like phenomena as well as useful constructivist approaches to artificial system design. Despite its broad usage within ALife, the concept of self-organization has been often excessively stretched or misinterpreted, calling for a clarification that could help with tracing the borders between what can and cannot be considered self-organization. In this review, we discuss the fundamental aspects of self-organization and list the main usages within three primary ALife domains, namely ‘soft’ (mathematical/computational modeling), ‘hard’ (physical robots), and ‘wet’ (chemical/biological systems) ALife. Finally, we discuss the usefulness of self-organization within ALife studies, point to perspectives for future research, and list open questions.
The dying ReLU refers to the problem when ReLU neurons become inactive and only output 0 for any input. There are many empirical and heuristic explanations on why ReLU neurons die. However, little is known about its theoretical analysis. In this paper, we rigorously prove that a deep ReLU network will eventually die in probability as the depth goes to infinite. Several methods have been proposed to alleviate the dying ReLU. Perhaps, one of the simplest treatments is to modify the initialization procedure. One common way of initializing weights and biases uses symmetric probability distributions, which suffers from the dying ReLU. We thus propose a new initialization procedure, namely, a randomized asymmetric initialization. We prove that the new initialization can effectively prevent the dying ReLU. All parameters required for the new initialization are theoretically designed. Numerical examples are provided to demonstrate the effectiveness of the new initialization procedure.
Weather prediction today is performed with numerical weather prediction (NWP) models. These are deterministic simulation models describing the dynamics of the atmosphere, and evolving the current conditions forward in time to obtain a prediction for future atmospheric states. To account for uncertainty in NWP models it has become common practice to employ ensembles of NWP forecasts. However, NWP ensembles often exhibit forecast biases and dispersion errors, thus require statistical postprocessing to improve reliability of the ensemble forecasts. This work proposes an extension of a recently developed postprocessing model utilizing autoregressive information present in the forecast error of the raw ensemble members. The original approach is modified to let the variance parameter depend on the ensemble spread, yielding a two-fold heteroscedastic model. Furthermore, an additional high-resolution forecast is included into the postprocessing model, yielding improved predictive performance. Finally, it is outlined how the autoregressive model can be utilized to postprocess ensemble forecasts with higher forecast horizons, without the necessity of making fundamental changes to the original model. We accompany the new methodology by an implementation within the R package ensAR to make our method available for other researchers working in this area. To illustrate the performance of the heteroscedastic extension of the autoregressive model, and its use for higher forecast horizons we present a case study for a data set containing 12 years of temperature forecasts and observations over Germany. The case study indicates that the autoregressive model yields particularly strong improvements for forecast horizons beyond 24 hours.
In the present scenario of domestic flights in USA, there have been numerous instances of flight delays and cancellations. In the United States, the American Airlines, Inc. have been one of the most entrusted and the world’s largest airline in terms of number of destinations served. But when it comes to domestic flights, AA has not lived up to the expectations in terms of punctuality or on-time performance. Flight Delays also result in airline companies operating commercial flights to incur huge losses. So, they are trying their best to prevent or avoid Flight Delays and Cancellations by taking certain measures. This study aims at analyzing flight information of US domestic flights operated by American Airlines, covering top 5 busiest airports of US and predicting possible arrival delay of the flight using Data Mining and Machine Learning Approaches. The Gradient Boosting Classifier Model is deployed by training and hyper-parameter tuning it, achieving a maximum accuracy of 85.73%. Such an Intelligent System is very essential in foretelling flights’on-time performance.
Deep neural networks are widely used for nonlinear function approximation with applications ranging from computer vision to control. Although these networks involve the composition of simple arithmetic operations, it can be very challenging to verify whether a particular network satisfies certain input-output properties. This article surveys methods that have emerged recently for soundly verifying such properties. These methods borrow insights from reachability analysis, optimization, and search. We discuss fundamental differences and connections between existing algorithms. In addition, we provide pedagogical implementations of existing methods and compare them on a set of benchmark problems.
Principal Component Analysis (PCA) is one of the most important methods to handle high dimensional data. However, most of the studies on PCA aim to minimize the loss after projection, which usually measures the Euclidean distance, though in some fields, angle distance is known to be more important and critical for analysis. In this paper, we propose a method by adding constraints on factors to unify the Euclidean distance and angle distance. However, due to the nonconvexity of the objective and constraints, the optimized solution is not easy to obtain. We propose an alternating linearized minimization method to solve it with provable convergence rate and guarantee. Experiments on synthetic data and real-world datasets have validated the effectiveness of our method and demonstrated its advantages over state-of-art clustering methods.
In multi-person videos, especially team sport videos, a semantic event is usually represented as a confrontation between two teams of players, which can be represented as collective motion. In broadcast basketball videos, specific camera motions are used to present specific events. Therefore, a semantic event in broadcast basketball videos is closely related to both the global motion (camera motion) and the collective motion. A semantic event in basketball videos can be generally divided into three stages: pre-event, event occurrence (event-occ), and post-event. In this paper, we propose an ontology-based global and collective motion pattern (On_GCMP) algorithm for basketball event classification. First, a two-stage GCMP based event classification scheme is proposed. The GCMP is extracted using optical flow. The two-stage scheme progressively combines a five-class event classification algorithm on event-occs and a two-class event classification algorithm on pre-events. Both algorithms utilize sequential convolutional neural networks (CNNs) and long short-term memory (LSTM) networks to extract the spatial and temporal features of GCMP for event classification. Second, we utilize post-event segments to predict success/failure using deep features of images in the video frames (RGB_DF_VF) based algorithms. Finally the event classification results and success/failure classification results are integrated to obtain the final results. To evaluate the proposed scheme, we collected a new dataset called NCAA+, which is automatically obtained from the NCAA dataset by extending the fixed length of video clips forward and backward of the corresponding semantic events. The experimental results demonstrate that the proposed scheme achieves the mean average precision of 59.22% on NCAA+. It is higher by 7.62% than state-of-the-art on NCAA.
We consider a cognitive radio-based Internet-of-Things (CR-IoT) network consisting of one primary IoT (PIoT) system and one secondary IoT (SIoT) system. The IoT devices of both the PIoT and the SIoT respectively monitor one physical process and send randomly generated status updates to their associated access points (APs). The timeliness of the status updates is important as the systems are interested in the latest condition (e.g., temperature, speed and position) of the IoT device. In this context, two natural questions arise: (1) How to characterize the timeliness of the status updates in CR-IoT systems? (2) Which scheme, overlay or underlay, is better in terms of the timeliness of the status updates. To answer these two questions, we adopt a new performance metric, named the age of information (AoI). We analyze the average peak AoI of the PIoT and the SIoT for overlay and underlay schemes, respectively. Simple asymptotic expressions of the average peak AoI are also derived when the PIoT operates at high signal-to-noise ratio (SNR). Based on the asymptotic expressions, we characterize a critical generation rate of the PIoT system, which can determine the superiority of overlay and underlay schemes in terms of the average peak AoI of the SIoT. Numerical results validate the theoretical analysis and uncover that the overlay and underlay schemes can outperform each other in terms of the average peak AoI of the SIoT for different system setups.
Many Natural Language Processing works on emotion analysis only focus on simple emotion classification without exploring the potentials of putting emotion into ‘event context’, and ignore the analysis of emotion-related events. One main reason is the lack of this kind of corpus. Here we present Cause-Emotion-Action Corpus, which manually annotates not only emotion, but also cause events and action events. We propose two new tasks based on the data-set: emotion causality and emotion inference. The first task is to extract a triple (cause, emotion, action). The second task is to infer the probable emotion. We are currently releasing the data-set with 10,603 samples and 15,892 events, basic statistic analysis and baseline on both emotion causality and emotion inference tasks. Baseline performance demonstrates that there is much room for both tasks to be improved.
Ranking models lie at the heart of research on information retrieval (IR). During the past decades, different techniques have been proposed for constructing ranking models, from traditional heuristic methods, probabilistic methods, to modern machine learning methods. Recently, with the advance of deep learning technology, we have witnessed a growing body of work in applying shallow or deep neural networks to the ranking problem in IR, referred to as neural ranking models in this paper. The power of neural ranking models lies in the ability to learn from the raw text inputs for the ranking problem to avoid many limitations of hand-crafted features. Neural networks have sufficient capacity to model complicated tasks, which is needed to handle the complexity of relevance estimation in ranking. Since there have been a large variety of neural ranking models proposed, we believe it is the right time to summarize the current status, learn from existing methodologies, and gain some insights for future development. In contrast to existing reviews, in this survey, we will take a deep look into the neural ranking models from different dimensions to analyze their underlying assumptions, major design principles, and learning strategies. We compare these models through benchmark tasks to obtain a comprehensive empirical understanding of the existing techniques. We will also discuss what is missing in the current literature and what are the promising and desired future directions.
Unlike conventional frame-based sensors, event-based visual sensors output information through spikes at a high temporal resolution. By only encoding changes in pixel intensity, they showcase a low-power consuming, low-latency approach to visual information sensing. To use this information for higher sensory tasks like object recognition and tracking, an essential simplification step is the extraction and learning of features. An ideal feature descriptor must be robust to changes involving (i) local transformations and (ii) re-appearances of a local event pattern. To that end, we propose a novel spatiotemporal feature representation learning algorithm based on slow feature analysis (SFA). Using SFA, smoothly changing linear projections are learnt which are robust to local visual transformations. In order to determine if the features can learn to be invariant to various visual transformations, feature point tracking tasks are used for evaluation. Extensive experiments across two datasets demonstrate the adaptability of the spatiotemporal feature learner to translation, scaling and rotational transformations of the feature points. More importantly, we find that the obtained feature representations are able to exploit the high temporal resolution of such event-based cameras in generating better feature tracks.
This paper reports our efforts on swCaffe, a highly efficient parallel framework for accelerating deep neural networks (DNNs) training on Sunway TaihuLight, the current fastest supercomputer in the world that adopts a unique many-core heterogeneous architecture, with 40,960 SW26010 processors connected through a customized communication network. First, we point out some insightful principles to fully exploit the performance of the innovative many-core architecture. Second, we propose a set of optimization strategies for redesigning a variety of neural network layers based on Caffe. Third, we put forward a topology-aware parameter synchronization scheme to scale the synchronous Stochastic Gradient Descent (SGD) method to multiple processors efficiently. We evaluate our framework by training a variety of widely used neural networks with the ImageNet dataset. On a single node, swCaffe can achieve 23\%\~{}119\% overall performance compared with Caffe running on K40m GPU. As compared with the Caffe on CPU, swCaffe runs 3.04\~{}7.84x faster on all the networks. Finally, we present the scalability of swCaffe for the training of ResNet-50 and AlexNet on the scale of 1024 nodes.
We introduce the use of neural networks as classifiers on classical disordered systems with no spatial ordering. In this study, we implement a convolutional neural network trained to identify the spin-glass state in the three-dimensional Edwards-Anderson Ising spin-glass model from an input of Monte Carlo sampled configurations at a given temperature. The neural network is designed to be flexible with the input size and can accurately perform inference over a small sample of the instances in the test set. Using the neural network to classify instances of the three-dimensional Edwards-Anderson Ising spin-glass in a (random) field we show that the inferred phase boundary is consistent with the absence of an Almeida-Thouless line.
Visual Query Answering (VQA) is of great significance in offering people convenience: one can raise a question for details of objects, or high-level understanding about the scene, over an image. This paper proposes a novel method to address the VQA problem. In contrast to prior works, our method that targets single scene VQA, replies on graph-based techniques and involves reasoning. In a nutshell, our approach is centered on three graphs. The first graph, referred to as inference graph GI , is constructed via learning over labeled data. The other two graphs, referred to as query graph Q and entity-attribute graph GEA, are generated from natural language query Qnl and image Img, that are issued from users, respectively. As GEA often does not take sufficient information to answer Q, we develop techniques to infer missing information of GEA with GI . Based on GEA and Q, we provide techniques to find matches of Q in GEA, as the answer of Qnl in Img. Unlike commonly used VQA methods that are based on end-to-end neural networks, our graph-based method shows well-designed reasoning capability, and thus is highly interpretable. We also create a dataset on soccer match (Soccer-VQA) with rich annotations. The experimental results show that our approach outperforms the state-of-the-art method and has high potential for future investigation.
Federated learning on edge devices poses new challenges arising from workers that misbehave, privacy needs, etc. We propose a new robust federated optimization algorithm, with provable convergence and robustness under non-IID settings. Empirical results show that the proposed algorithm stabilizes the convergence and tolerates data poisoning on a small number of workers.
A model-based task transfer learning (MBTTL) method is presented. We consider a constrained nonlinear dynamical system and assume that a dataset of state and input pairs that solve a task T1 is available. Our objective is to find a feasible state-feedback policy for a second task, T1, by using stored data from T2. Our approach applies to tasks T2 which are composed of the same subtasks as T1, but in different order. In this paper we formally introduce the definition of subtask, the MBTTL problem and provide examples of MBTTL in the fields of autonomous cars and manipulators. Then, a computationally efficient approach to solve the MBTTL problem is presented along with proofs of feasibility for constrained linear dynamical systems. Simulation results show the effectiveness of the proposed method.
High-dimensional time series are characterized by a large number of measurements and complex dependence, and often involve abrupt change points. We propose a new procedure to detect change points in the mean of high-dimensional time series data. The proposed procedure incorporates spatial and temporal dependence of data and is able to test and estimate the change point occurred on the boundary of time series. We study its asymptotic properties under mild conditions. Simulation studies demonstrate its robust performance through the comparison with other existing methods. Our procedure is applied to an fMRI dataset.
High-dimensional data in many machine learning applications leads to computational and analytical complexities. Feature selection provides an effective way for solving these problems by removing irrelevant and redundant features, thus reducing model complexity and improving accuracy and generalization capability of the model. In this paper, we present a novel teacher-student feature selection (TSFS) method in which a ‘teacher’ (a deep neural network or a complicated dimension reduction method) is first employed to learn the best representation of data in low dimension. Then a ‘student’ network (a simple neural network) is used to perform feature selection by minimizing the reconstruction error of low dimensional representation. Although the teacher-student scheme is not new, to the best of our knowledge, it is the first time that this scheme is employed for feature selection. The proposed TSFS can be used for both supervised and unsupervised feature selection. This method is evaluated on different datasets and is compared with state-of-the-art existing feature selection methods. The results show that TSFS performs better in terms of classification and clustering accuracies and reconstruction error. Moreover, experimental evaluations demonstrate a low degree of sensitivity to parameter selection in the proposed method.
In this paper, we present an asynchronous approximate gradient method that is easy to implement called DSPG (Decentralized Simultaneous Perturbation Stochastic Approximations, with Constant Sensitivity Parameters). It is obtained by modifying SPSA (Simultaneous Perturbation Stochastic Approximations) to allow for decentralized optimization in multi-agent learning and distributed control scenarios. SPSA is a popular approximate gradient method developed by Spall, that is used in Robotics and Learning. In the multi-agent learning setup considered herein, the agents are assumed to be asynchronous (agents abide by their local clocks) and communicate via a wireless medium, that is prone to losses and delays. We analyze the gradient estimation bias that arises from setting the sensitivity parameters to a single value, and the bias that arises from communication losses and delays. Specifically, we show that these biases can be countered through better and frequent communication and/or by choosing a small fixed value for the sensitivity parameters. We also discuss the variance of the gradient estimator and its effect on the rate of convergence. Finally, we present numerical results supporting DSPG and the aforementioned theories and discussions.
Time Series Classification (TSC) problems are encountered in many real life data mining tasks ranging from medicine and security to human activity recognition and food safety. With the recent success of deep neural networks in various domains such as computer vision and natural language processing, researchers started adopting these techniques for solving time series data mining problems. However, to the best of our knowledge, no previous work has considered the vulnerability of deep learning models to adversarial time series examples, which could potentially make them unreliable in situations where the decision taken by the classifier is crucial such as in medicine and security. For computer vision problems, such attacks have been shown to be very easy to perform by altering the image and adding an imperceptible amount of noise to trick the network into wrongly classifying the input image. Following this line of work, we propose to leverage existing adversarial attack mechanisms to add a special noise to the input time series in order to decrease the network’s confidence when classifying instances at test time. Our results reveal that current state-of-the-art deep learning time series classifiers are vulnerable to adversarial attacks which can have major consequences in multiple domains such as food safety and quality assurance.
Most of the existing methods for anomaly detection use only positive data to learn the data distribution, thus they usually need a pre-defined threshold at the detection stage to determine whether a test instance is an outlier. Unfortunately, a good threshold is vital for the performance and it is really hard to find an optimal one. In this paper, we take the discriminative information implied in unlabeled data into consideration and propose a new method for anomaly detection that can learn the labels of unlabelled data directly. Our proposed method has an end-to-end architecture with one encoder and two decoders that are trained to model inliers and outliers’ data distributions in a competitive way. This architecture works in a discriminative manner without suffering from overfitting, and the training algorithm of our model is adopted from SGD, thus it is efficient and scalable even for large-scale datasets. Empirical studies on 7 datasets including KDD99, MNIST, Caltech-256, and ImageNet etc. show that our model outperforms the state-of-the-art methods.
In this work, we are motivated to make predictive functionalities native to database systems with focus on time series data. We propose a system architecture, Time Series Predict DB, that enables predictive query in any existing time series database by building an additional ‘prediction index’ for time series data. To be effective, such an index needs to be built incrementally while keeping up with database throughput, able to scale with volume of data, provide accurate predictions for heterogeneous data, and allow for ‘predictive’ querying with latency comparable to the traditional database queries. Building upon a recently developed model agnostic time series algorithm by making it incremental and scalable, we build such a system on top of PostgreSQL. Using extensive experimentation, we show that our incremental prediction index updates faster than PostgreSQL ($1\mu s$ per data for prediction index vs $4\mu s$ per data for PostgreSQL) and thus not affecting the throughput of the database. Across a variety of time series data, we find that our incremental, model agnostic algorithm provides better accuracy compared to the best state-of-art time series libraries (median improvement in range 3.29 to 4.19x over Prophet of Facebook, 1.27 to 1.48x over AMELIA in R). The latency of predictive queries with respect to SELECT queries (0.5ms) is < 1.9x (0.8ms) for imputation and < 7.6x (3ms) for forecasting across machine platforms. As a by-product, we find that the incremental, scalable variant we propose improves the accuracy of the batch prediction algorithm which may be of interest in its own right.
readPTU is a python package designed to analyze time-correlated single-photon counting data. The use of the library promotes the storage of the complete time arrival information of the photons and full flexibility in post-processing data for analysis. The library supports the computation of time resolved signal with external triggers and second order autocorrelation function analysis can be performed using multiple algorithms that provide the user with different trade-offs with regards to speed and accuracy. Additionally, a thresholding algorithm to perform time post-selection is also available. The library has been designed with performance and extensibility in mind to allow future users to implement support for additional file extensions and algorithms without having to deal with low level details. We demonstrate the performance of readPTU by analyzing the second-order autocorrelation function of the resonance fluorescence from a single quantum dot in a two-dimensional semiconductor.
It has been proved that gradient descent converges linearly to the global minima for training deep neural network in the over-parameterized regime. However, according to \citet{allen2018convergence}, the width of each layer should grow at least with the polynomial of the depth (the number of layers) for residual network (ResNet) in order to guarantee the linear convergence of gradient descent, which shows no obvious advantage over feedforward network. In this paper, we successfully remove the dependence of the width on the depth of the network for ResNet and reach a conclusion that training deep residual network can be as easy as training a two-layer network. This theoretically justifies the benefit of skip connection in terms of facilitating the convergence of gradient descent. Our experiments also justify that the width of ResNet to guarantee successful training is much smaller than that of deep feedforward neural network.
We propose a topic-guided variational autoencoder (TGVAE) model for text generation. Distinct from existing variational autoencoder (VAE) based approaches, which assume a simple Gaussian prior for the latent code, our model specifies the prior as a Gaussian mixture model (GMM) parametrized by a neural topic module. Each mixture component corresponds to a latent topic, which provides guidance to generate sentences under the topic. The neural topic module and the VAE-based neural sequence module in our model are learned jointly. In particular, a sequence of invertible Householder transformations is applied to endow the approximate posterior of the latent code with high flexibility during model inference. Experimental results show that our TGVAE outperforms alternative approaches on both unconditional and conditional text generation, which can generate semantically-meaningful sentences with various topics.