The paper presents the comparative study of the nature of stock markets in short-term and long-term time scales with and without structural break in the stock data. Structural break point has been identified by applying Zivot and Andrews structural trend break model to break the original time series (TSO) into time series before structural break (TSB) and time series after structural break (TSA). The empirical mode decomposition based Hurst exponent and variance techniques have been applied to the TSO, TSB and TSA to identify the time scales in short-term and long-term from the decomposed intrinsic mode functions. We found that for TSO, TSB and TSA the short-term time scales and long-term time scales are within the range of few days to 3 months and greater than 5 months respectively, which indicates that the short-term and long-term time scales are present in the stock market. The Hurst exponent is $\sim 0.5$ and $\geq 0.75$ for TSO, TSB and TSA in short-term and long-term respectively, which indicates that the market is random in short-term and strongly correlated in long-term. The identification of time scales at short-term and long-term investment horizon will be useful for investors to design investment and trading strategies.
This article studies the financial time series data processing for machine learning. It introduces the most frequent scaling methods, then compares the resulting stationarity and preservation of useful information for trend forecasting. It proposes an empirical test based on the capability to learn simple data relationship with simple models. It also speaks about the data split method specific to time series, avoiding unwanted overfitting and proposes various labelling for classification and regression.
We introduce a set of axioms for the notion of computation, and show that P= NP is not derivable from this set of axioms.
Decision making often requires information that must be Provided with the rich data format. Addressing these new requirements appropriately makes it necessary for government agencies to orchestrate large amounts of information from different sources and formats, to be efficiently delivered through the devices commonly used by people, such as computers, netbooks, tablets and smartphones. To overcome these problems, a model is proposed for the conceptual representation of the State’s organizational units, seen as georeferenced entities of Electronic Government, based on ontologies designed under the principles of Linked Open Data, which allows the automatic extraction of information through the machines, which supports the process of governmental decision making and gives citizens full access to find and process through mobile technologies.
Previous survey papers offer knowledge of deep learning hardware devices and software frameworks. This paper introduces benchmarking principles, surveys machine learning devices including GPUs, FPGAs, and ASICs, and reviews deep learning software frameworks. It also reviews these technologies with respect to benchmarking from the angles of our 7-metric approach to frameworks and 12-metric approach to hardware platforms. After reading the paper, the audience will understand seven benchmarking principles, generally know that differential characteristics of mainstream AI devices, qualitatively compare deep learning hardware through our 12-metric approach for benchmarking hardware, and read benchmarking results of 16 deep learning frameworks via our 7-metric set for benchmarking frameworks.
Goal-conditioned policies are used in order to break down complex reinforcement learning (RL) problems by using subgoals, which can be defined either in state space or in a latent feature space. This can increase the efficiency of learning by using a curriculum, and also enables simultaneous learning and generalization across goals. A crucial requirement of goal-conditioned policies is to be able to determine whether the goal has been achieved. Having a notion of distance to a goal is thus a crucial component of this approach. However, it is not straightforward to come up with an appropriate distance, and in some tasks, the goal space may not even be known a priori. In this work we learn a distance-to-goal estimate which is computed in terms of the number of actions that would need to be carried out in a self-supervised approach. Our method solves complex tasks without prior domain knowledge in the online setting in three different scenarios in the context of goal-conditioned policies a) the goal space is the same as the state space b) the goal space is given but an appropriate distance is unknown and c) the state space is accessible, but only a subset of the state space represents desired goals, and this subset is known a priori. We also propose a goal-generation mechanism as a secondary contribution.
Knowledge bases store information about the semantic types of entities, which can be utilized in a range of information access tasks. This information, however, is often incomplete, due to new entities emerging on a daily basis. We address the task of automatically assigning types to entities in a knowledge base from a type taxonomy. Specifically, we present two neural network architectures, which take short entity descriptions and, optionally, information about related entities as input. Using the DBpedia knowledge base for experimental evaluation, we demonstrate that these simple architectures yield significant improvements over the current state of the art.
Matrix completion has become an extremely important technique as data scientists are routinely faced with large, incomplete datasets on which they wish to perform statistical inferences. We investigate how error introduced via matrix completion affects statistical inference. Furthermore, we prove recovery error bounds which depend upon the matrix recovery error for several common statistical inferences. We consider matrix recovery via nuclear norm minimization and a variant, $\ell_1$-regularized nuclear norm minimization for data with a structured sampling pattern. Finally, we run a series of numerical experiments on synthetic data and real patient surveys from MyLymeData, which illustrate the relationship between inference recovery error and matrix recovery error. These results indicate that exact matrix recovery is often not necessary to achieve small inference recovery error.
The decision-making process of many state-of-the-art machine learning models is inherently inscrutable to the extent that it is impossible for a human to interpret the model directly: they are black box models. This has led to a call for research on explaining black box models, for which there are two main approaches. Global explanations that aim to explain a model’s decision making process in general, and local explanations that aim to explain a single prediction. Since it remains challenging to establish fidelity to black box models in globally interpretable approximations, much attention is put on local explanations. However, whether local explanations are able to reliably represent the black box model and provide useful insights remains an open question. We present Global Aggregations of Local Explanations (GALE) with the objective to provide insights in a model’s global decision making process. Overall, our results reveal that the choice of aggregation matters. We find that the global importance introduced by Local Interpretable Model-agnostic Explanations (LIME) does not reliably represent the model’s global behavior. Our proposed aggregations are better able to represent how features affect the model’s predictions, and to provide global insights by identifying distinguishing features.
Imitation Learning (IL) is a machine learning approach to learn a policy from a dataset of demonstrations. IL can be useful to kick-start learning before applying reinforcement learning (RL) but it can also be useful on its own, e.g. to learn to imitate human players in video games. However, a major limitation of current IL approaches is that they learn only a single ‘average’ policy based on a dataset that possibly contains demonstrations of numerous different types of behaviors. In this paper, we propose a new approach called Behavioral Repertoire Imitation Learning (BRIL) that instead learns a repertoire of behaviors from a set of demonstrations by augmenting the state-action pairs with behavioral descriptions. The outcome of this approach is a single neural network policy conditioned on a behavior description that can be precisely modulated. We apply this approach to train a policy on 7,777 human replays to perform build-order planning in StarCraft II. Principal Component Analysis (PCA) is applied to construct a low-dimensional behavioral space from the high-dimensional army unit composition of each demonstration. The results demonstrate that the learned policy can be effectively manipulated to express distinct behaviors. Additionally, by applying the UCB1 algorithm, we are able to adapt the behavior of the policy – in-between games – to reach a performance beyond that of the traditional IL baseline approach.
Computer vision (CV) is the process of using machines to understand and analyze imagery, which is an integral branch of artificial intelligence. Among various research areas of CV, fine-grained image analysis (FGIA) is a longstanding and fundamental problem, and has become ubiquitous in diverse real-world applications. The task of FGIA targets analyzing visual objects from subordinate categories, \eg, species of birds or models of cars. The small inter-class variations and the large intra-class variations caused by the fine-grained nature makes it a challenging problem. During the booming of deep learning, recent years have witnessed remarkable progress of FGIA using deep learning techniques. In this paper, we aim to give a survey on recent advances of deep learning based FGIA techniques in a systematic way. Specifically, we organize the existing studies of FGIA techniques into three major categories: fine-grained image recognition, fine-grained image retrieval and fine-grained image generation. In addition, we also cover some other important issues of FGIA, such as publicly available benchmark datasets and its related domain specific applications. Finally, we conclude this survey by highlighting several directions and open problems which need be further explored by the community in the future.
Deep Learning Accelerators are prone to faults which manifest in the form of errors in Neural Networks. Fault Tolerance in Neural Networks is crucial in real-time safety critical applications requiring computation for long durations. Neural Networks with high regularisation exhibit superior fault tolerance, however, at the cost of classification accuracy. In the view of difference in functionality, a Neural Network is modelled as two separate networks, i.e, the Feature Extractor with unsupervised learning objective and the Classifier with a supervised learning objective. Traditional approaches of training the entire network using a single supervised learning objective is insufficient to achieve the objectives of the individual components optimally. In this work, a novel multi-criteria objective function, combining unsupervised training of the Feature Extractor followed by supervised tuning with Classifier Network is proposed. The unsupervised training solves two games simultaneously in the presence of adversary neural networks with conflicting objectives to the Feature Extractor. The first game minimises the loss in reconstructing the input image for indistinguishability given the features from the Extractor, in the presence of a generative decoder. The second game solves a minimax constraint optimisation for distributional smoothening of feature space to match a prior distribution, in the presence of a Discriminator network. The resultant strongly regularised Feature Extractor is combined with the Classifier Network for supervised fine-tuning. The proposed Adversarial Fault Tolerant Neural Network Training is scalable to large networks and is independent of the architecture. The evaluation on benchmarking datasets: FashionMNIST and CIFAR10, indicates that the resultant networks have high accuracy with superior tolerance to stuck at ‘0’ faults compared to widely used regularisers.
In statistical data assimilation (SDA) and supervised machine learning (ML), we wish to transfer information from observations to a model of the processes underlying those observations. For SDA, the model consists of a set of differential equations that describe the dynamics of a physical system. For ML, the model is usually constructed using other strategies. In this paper, we develop a systematic formulation based on Monte Carlo sampling to achieve such information transfer. Following the derivation of an appropriate target distribution, we present the formulation based on the standard Metropolis-Hasting (MH) procedure and the Hamiltonian Monte Carlo (HMC) method for performing the high dimensional integrals that appear. To the extensive literature on MH and HMC, we add (1) an annealing method using a hyperparameter that governs the precision of the model to identify and explore the highest probability regions of phase space dominating those integrals, and (2) a strategy for initializing the state space search. The efficacy of the proposed formulation is demonstrated using a nonlinear dynamical model with chaotic solutions widely used in geophysics.
This paper studies aligning knowledge graphs from different sources or languages. Most existing methods train supervised methods for the alignment, which usually require a large number of aligned knowledge triplets. However, such a large number of aligned knowledge triplets may not be available or are expensive to obtain in many domains. Therefore, in this paper we propose to study aligning knowledge graphs in fully-unsupervised or weakly-supervised fashion, i.e., without or with only a few aligned triplets. We propose an unsupervised framework to align the entity and relation embddings of different knowledge graphs with an adversarial learning framework. Moreover, a regularization term which maximizes the mutual information between the embeddings of different knowledge graphs is used to mitigate the problem of mode collapse when learning the alignment functions. Such a framework can be further seamlessly integrated with existing supervised methods by utilizing a limited number of aligned triples as guidance. Experimental results on multiple datasets prove the effectiveness of our proposed approach in both the unsupervised and the weakly-supervised settings.
Given a graph over which the contagions (e.g. virus, gossip) propagate, leveraging subgraphs with highly correlated nodes is beneficial to many applications. Yet, challenges abound. First, the propagation pattern between a pair of nodes may change in various temporal dimensions. Second, not always the same contagion is propagated. Hence, state-of-the-art text mining approaches ranging from similarity measures to topic-modeling cannot use the textual contents to compute the weights between the nodes. Third, the word-word co-occurrence patterns may differ in various temporal dimensions, which increases the difficulty to employ current word embedding approaches. We argue that inseparable multi-aspect temporal collaborations are inevitably needed to better calculate the correlation metrics in dynamical processes. In this work, we showcase a sophisticated framework that on the one hand, integrates a neural network based time-aware word embedding component that can collectively construct the word vectors through an assembly of infinite latent temporal facets, and on the other hand, uses an elaborate generative model to compute the edge weights through heterogeneous temporal attributes. After computing the intra-nodes weights, we utilize our Max-Heap Graph cutting algorithm to exploit subgraphs. We then validate our model through comprehensive experiments on real-world propagation data. The results show that the knowledge gained from the versatile temporal dynamics is not only indispensable for word embedding approaches but also plays a significant role in the understanding of the propagation behaviors. Finally, we demonstrate that compared with other rivals, our model can dominantly exploit the subgraphs with highly coordinated nodes.
This paper studies the capacity limits of graph neural networks (GNN). Rather than focusing on a specific architecture, the networks considered here are those that fall within the message-passing framework, a model that encompasses several state-of-the-art networks. Two main results are presented. First, GNN are shown to be Turing universal under sufficient conditions on their depth, width, node identification, and layer expressiveness. In addition, it is discovered that GNN can lose a significant portion of their power when their depth and width is restricted. The proposed impossibility statements stem from a new technique that enables the re-purposing of seminal results from theoretical computer science. This leads to lower bounds for an array of decision, optimization, and estimation problems involving graphs. Strikingly, several of these problems are deemed impossible unless the product of a GNN’s depth and width exceeds the graph size; this dependence remains significant even for tasks that appear simple or when considering approximation.
Deep networks realize complex mappings that are often understood by their locally linear behavior at or around points of interest. For example, we use the derivative of the mapping with respect to its inputs for sensitivity analysis, or to explain (obtain coordinate relevance for) a prediction. One key challenge is that such derivatives are themselves inherently unstable. In this paper, we propose a new learning problem to encourage deep networks to have stable derivatives over larger regions. While the problem is challenging in general, we focus on networks with piecewise linear activation functions. Our algorithm consists of an inference step that identifies a region around a point where linear approximation is provably stable, and an optimization step to expand such regions. We propose a novel relaxation to scale the algorithm to realistic models. We illustrate our method with residual and recurrent networks on image and sequence datasets.
Event factuality prediction (EFP) is the task of assessing the degree to which an event mentioned in a sentence has happened. For this task, both syntactic and semantic information are crucial to identify the important context words. The previous work for EFP has only combined these information in a simple way that cannot fully exploit their coordination. In this work, we introduce a novel graph-based neural network for EFP that can integrate the semantic and syntactic information more effectively. Our experiments demonstrate the advantage of the proposed model for EFP.
Advances in embedded systems have enabled integration of many lightweight sensory devices within our daily life. In particular, this trend has given rise to continuous expansion of wearable sensors in a broad range of applications from health and fitness monitoring to social networking and military surveillance. Wearables leverage machine learning techniques to profile behavioral routine of their end-users through activity recognition algorithms. Current research assumes that such machine learning algorithms are trained offline. In reality, however, wearables demand continuous reconfiguration of their computational algorithms due to their highly dynamic operation. Developing a personalized and adaptive machine learning model requires real-time reconfiguration of the model. Due to stringent computation and memory constraints of these embedded sensors, the training/re-training of the computational algorithms need to be memory- and computation-efficient. In this paper, we propose a framework, based on the notion of online learning, for real-time and on-device machine learning training. We propose to transform the activity recognition problem from a multi-class classification problem to a hierarchical model of binary decisions using cascading online binary classifiers. Our results, based on Pegasos online learning, demonstrate that the proposed approach achieves 97% accuracy in detecting activities of varying intensities using a limited memory while power usages of the system is reduced by more than 40%.