These tips provide a quick and concentrated guide for beginners in the analysis of network data.
Machine learning (ML) classifiers are vulnerable to adversarial examples. An adversarial example is an input sample which can be modified slightly to intentionally cause an ML classifier to misclassify it. In this work, we investigate white-box and grey-box evasion attacks to an ML-based malware detector and conducted performance evaluations in a real-world setting. We propose a framework for deploying grey-box and black-box attacks to malware detection systems. We compared the defense approaches in mitigating the attacks.
The majority of public transport vehicles are fitted with Automatic Vehicle Location (AVL) systems generating a continuous stream of data. The availability of this data has led to a substantial body of literature addressing the development of algorithms to predict Estimated Times of Arrival (ETA). Here research literature reporting the development of ETA prediction systems specific to busses is reviewed to give an overview of the state of the art. Generally, reviews in this area categorise publications according to the type of algorithm used, which does not allow an objective comparison. Therefore this survey will categorise the reviewed publications according to the input data used to develop the algorithm. The review highlighted inconsistencies in reporting standards of the literature. The inconsistencies were found in the varying measurements of accuracy preventing any comparison and the frequent omission of a benchmark algorithm. Furthermore, some publications were lacking in overall quality. Due to these highlighted issues, any objective comparison of prediction accuracies is impossible. The bus ETA research field therefore requires a universal set of standards to ensure the quality of reported algorithms. This could be achieved by using benchmark datasets or algorithms and ensuring the publication of any code developed.
The quest of `can machines think’ and `can machines do what human do’ are quests that drive the development of artificial intelligence. Although recent artificial intelligence succeeds in many data intensive applications, it still lacks the ability of learning from limited exemplars and fast generalizing to new tasks. To tackle this problem, one has to turn to machine learning, which supports the scientific study of artificial intelligence. Particularly, a machine learning problem called Few-Shot Learning (FSL) targets at this case. It can rapidly generalize to new tasks of limited supervised experience by turning to prior knowledge, which mimics human’s ability to acquire knowledge from few examples through generalization and analogy. It has been seen as a test-bed for real artificial intelligence, a way to reduce laborious data gathering and computationally costly training, and antidote for rare cases learning. With extensive works on FSL emerging, we give a comprehensive survey for it. We first give the formal definition for FSL. Then we point out the core issues of FSL, which turns the problem from ‘how to solve FSL’ to ‘how to deal with the core issues’. Accordingly, existing works from the birth of FSL to the most recent published ones are categorized in a unified taxonomy, with thorough discussion of the pros and cons for different categories. Finally, we envision possible future directions for FSL in terms of problem setup, techniques, applications and theory, hoping to provide insights to both beginners and experienced researchers.
Knowledge distillation aims at transferring knowledge acquired in one model (a teacher) to another model (a student) that is typically smaller. Previous approaches can be expressed as a form of training the student to mimic output activations of individual data examples represented by the teacher. We introduce a novel approach, dubbed relational knowledge distillation (RKD), that transfers mutual relations of data examples instead. For concrete realizations of RKD, we propose distance-wise and angle-wise distillation losses that penalize structural differences in relations. Experiments conducted on different tasks show that the proposed method improves educated student models with a significant margin. In particular for metric learning, it allows students to outperform their teachers’ performance, achieving the state of the arts on standard benchmark datasets.
We propose the Neuralogram — a deep neural network based representation for understanding audio signals which, as the name suggests, transforms an audio signal to a dense, compact representation based upon embeddings learned via a neural architecture. Through a series of probing signals, we show how our representation can encapsulate pitch, timbre and rhythm-based information, and other attributes. This representation suggests a method for revealing meaningful relationships in arbitrarily long audio signals that are not readily represented by existing algorithms. This has the potential for numerous applications in audio understanding, music recommendation, meta-data extraction to name a few.
The study of correlated time-series is ubiquitous in statistical analysis, and the matrix decomposition of the cross-correlations between time series is a universal tool to extract the principal patterns of behavior in a wide range of complex systems. Despite this fact, no general result is known for the statistics of eigenvectors of the cross-correlations of correlated time-series. Here we use supersymmetric theory to provide novel analytical results that will serve as a benchmark for the study of correlated signals for a vast community of researchers.
Deep network compression has been achieved notable progress via knowledge distillation, where a teacher-student learning manner is adopted by using predetermined loss. Recently, more focuses have been transferred to employ the adversarial training to minimize the discrepancy between distributions of output from two networks. However, they always emphasize on result-oriented learning while neglecting the scheme of process-oriented learning, leading to the loss of rich information contained in the whole network pipeline. Inspired by the assumption that, the small network can not perfectly mimic a large one due to the huge gap of network scale, we propose a knowledge transfer method, involving effective intermediate supervision, under the adversarial training framework to learn the student network. To achieve powerful but highly compact intermediate information representation, the squeezed knowledge is realized by task-driven attention mechanism. Then, the transferred knowledge from teacher network could accommodate the size of student network. As a result, the proposed method integrates merits from both process-oriented and result-oriented learning. Extensive experimental results on three typical benchmark datasets, i.e., CIFAR-10, CIFAR-100, and ImageNet, demonstrate that our method achieves highly superior performances against other state-of-the-art methods.
One of the challenges manifested after global growth of social networks and the exponential growth of user-generated data is to identify user needs based on the data they share or tend to like. ‘Big Data’ is a term referring to data that exist in huge volume and various formats, i.e. structured or semi structured. The inherent features of this data have forced organizations to seek to identify desirable patterns amongst big data and make their fundamental decisions based on this information, in order to improve their customer services and enhance their business. As long as the big data that is being used is not of good quality, the business needs would not be expected to be met. As a result, big data quality needs to be taken into consideration seriously. Since there is no systematic review in the big data quality area, this study aims to present a systematic literature review of the research efforts on big data quality for those researchers who attempt to enter this area. In this systematic review, and after determining the basic requirements, a total of 419 studies are initially considered to be relevant. Then, with a review of the abstracts of the studies, 170 papers are included and ultimately after the complete study, 88 papers have been added to the final papers pool. Through careful study and analysis of these papers, the desired information has been extracted. As a result, a research tree is presented that divides the studies based on the type of processing, task, and technique. Then the active venues and other interesting profiles, as well as the classification of the new challenges of this field are discussed.
In the framework of fair learning, we consider clustering methods that avoid or limit the influence of a set of protected attributes, $S$, (race, sex, etc) over the resulting clusters, with the goal of producing a fair clustering. For this, we introduce perturbations to the Euclidean distance that take into account $S$ in a way that resembles attraction-repulsion in charged particles in Physics and results in dissimilarities with an easy interpretation. Cluster analysis based on these dissimilarities penalizes homogeneity of the clusters in the attributes $S$, and leads to an improvement in fairness. We illustrate the use of our procedures with both synthetic and real data.
Question Answering has recently received high attention from artificial intelligence communities due to the advancements in learning technologies. Early question answering models used rule-based approaches and moved to the statistical approach to address the vastly available information. However, statistical approaches are shown to underperform in handling the dynamic nature and the variation of language. Therefore, learning models have shown the capability of handling the dynamic nature and variations in language. Many deep learning methods have been introduced to question answering. Most of the deep learning approaches have shown to achieve higher results compared to machine learning and statistical methods. The dynamic nature of language has profited from the nonlinear learning in deep learning. This has created prominent success and a spike in work on question answering. This paper discusses the successes and challenges in question answering question answering systems and techniques that are used in these challenges.
This paper seeks to model human language by the mathematical framework of quantum physics. With the well-designed mathematical formulations in quantum physics, this framework unifies different linguistic units in a single complex-valued vector space, e.g. words as particles in quantum states and sentences as mixed systems. A complex-valued network is built to implement this framework for semantic matching. With well-constrained complex-valued components, the network admits interpretations to explicit physical meanings. The proposed complex-valued network for matching (CNM) achieves comparable performances to strong CNN and RNN baselines on two benchmarking question answering (QA) datasets.
Deep learning has arguably achieved tremendous success in recent years. In simple words, deep learning uses the composition of many nonlinear functions to model the complex dependency between input features and labels. While neural networks have a long history, recent advances have greatly improved their performance in computer vision, natural language processing, etc. From the statistical and scientific perspective, it is natural to ask: What is deep learning? What are the new characteristics of deep learning, compared with classical methods? What are the theoretical foundations of deep learning? To answer these questions, we introduce common neural network models (e.g., convolutional neural nets, recurrent neural nets, generative adversarial nets) and training techniques (e.g., stochastic gradient descent, dropout, batch normalization) from a statistical point of view. Along the way, we highlight new characteristics of deep learning (including depth and over-parametrization) and explain their practical and theoretical benefits. We also sample recent results on theories of deep learning, many of which are only suggestive. While a complete understanding of deep learning remains elusive, we hope that our perspectives and discussions serve as a stimulus for new statistical research.
Convolutions are the fundamental building block of CNNs. The fact that their weights are spatially shared is one of the main reasons for their widespread use, but it also is a major limitation, as it makes convolutions content agnostic. We propose a pixel-adaptive convolution (PAC) operation, a simple yet effective modification of standard convolutions, in which the filter weights are multiplied with a spatially-varying kernel that depends on learnable, local pixel features. PAC is a generalization of several popular filtering techniques and thus can be used for a wide range of use cases. Specifically, we demonstrate state-of-the-art performance when PAC is used for deep joint image upsampling. PAC also offers an effective alternative to fully-connected CRF (Full-CRF), called PAC-CRF, which performs competitively, while being considerably faster. In addition, we also demonstrate that PAC can be used as a drop-in replacement for convolution layers in pre-trained networks, resulting in consistent performance improvements.
Machine learning pipeline potentially consists of several stages of operations like data preprocessing, feature engineering and machine learning model training. Each operation has a set of hyper-parameters, which can become irrelevant for the pipeline when the operation is not selected. This gives rise to a hierarchical conditional hyper-parameter space. To optimize this mixed continuous and discrete conditional hierarchical hyper-parameter space, we propose an efficient pipeline search and configuration algorithm which combines the power of Reinforcement Learning and Bayesian Optimization. Empirical results show that our method performs favorably compared to state of the art methods like Auto-sklearn , TPOT, Tree Parzen Window, and Random Search.
Mid-price movement prediction based on limit order book (LOB) data is a challenging task due to the complexity and dynamics of the LOB. So far, there have been very limited attempts for extracting relevant features based on LOB data. In this paper, we address this problem by designing a new set of handcrafted features and performing an extensive experimental evaluation on both liquid and illiquid stocks. More specifically, we implement a new set of econometrical features that capture statistical properties of the underlying securities for the task of mid-price prediction. Moreover, we develop a new experimental protocol for online learning that treats the task as a multi-objective optimization problem and predicts i) the direction of the next price movement and ii) the number of order book events that occur until the change takes place. In order to predict the mid-price movement, the features are fed into nine different deep learning models based on multi-layer perceptrons (MLP), convolutional neural networks (CNN) and long short-term memory (LSTM) neural networks. The performance of the proposed method is then evaluated on liquid and illiquid stocks, which are based on TotalView-ITCH US and Nordic stocks, respectively. For some stocks, results suggest that the correct choice of a feature set and a model can lead to the successful prediction of how long it takes to have a stock price movement.
Though statistical analyses are centered on research questions and hypotheses, current statistical analysis tools are not. Users must first translate their hypotheses into specific statistical tests and then perform API calls with functions and parameters. To do so accurately requires that users have statistical expertise. To lower this barrier to valid, replicable statistical analysis, we introduce Tea, a high-level declarative language and runtime system. In Tea, users express their study design, any parametric assumptions, and their hypotheses. Tea compiles these high-level specifications into a constraint satisfaction problem that determines the set of valid statistical tests, and then executes them to test the hypothesis. We evaluate Tea using a suite of statistical analyses drawn from popular tutorials. We show that Tea generally matches the choices of experts while automatically switching to non-parametric tests when parametric assumptions are not met. We simulate the effect of mistakes made by non-expert users and show that Tea automatically avoids both false negatives and false positives that could be produced by the application of incorrect statistical tests.