In complex processes, various events can happen in different sequences. The prediction of the next event activity given an a-priori process state is of importance in such processes. Recent methods leverage deep learning techniques such as recurrent neural networks to predict event activities from raw process logs. However, deep learning techniques cannot efficiently model logical behaviors of complex processes. In this paper, we take advantage of Petri nets as a powerful tool in modeling logical behaviors of complex processes. We propose an approach which first discovers Petri nets from event logs utilizing a recent process mining algorithm. In a second step, we enhance the obtained model with time decay functions to create timed process state samples. Finally, we use these samples in combination with token movement counters and Petri net markings to train a deep learning model that predicts the next event activity. We demonstrate significant performance improvements and outperform the state-of-the-art methods on eight out of nine real-world benchmark event logs in accuracy.
Slimmable networks are a family of neural networks that can instantly adjust the runtime width. The width can be chosen from a predefined widths set to adaptively optimize accuracy-efficiency trade-offs at runtime. In this work, we propose a systematic approach to train universally slimmable networks (US-Nets), extending slimmable networks to execute at arbitrary width, and generalizing to networks both with and without batch normalization layers. We further propose two improved training techniques for US-Nets, named the sandwich rule and inplace distillation, to enhance training process and boost testing accuracy. We show improved performance of universally slimmable MobileNet v1 and MobileNet v2 on ImageNet classification task, compared with individually trained ones and 4-switch slimmable network baselines. We also evaluate the proposed US-Nets and improved training techniques on tasks of image super-resolution and deep reinforcement learning. Extensive ablation experiments on these representative tasks demonstrate the effectiveness of our proposed methods. Our discovery opens up the possibility to directly evaluate FLOPs-Accuracy spectrum of network architectures. Code and models will be available at: https://…/slimmable_networks
Consider a general machine learning setting where the output is a set of labels or sequences. This output set is unordered and its size varies with the input. Whereas multi-label classification methods seem a natural first resort, they are not readily applicable to set-valued outputs because of the growth rate of the output space; and because conventional sequence generation doesn’t reflect sets’ order-free nature. In this paper, we propose a unified framework–sequential set generation (SSG)–that can handle output sets of labels and sequences. SSG is a meta-algorithm that leverages any probabilistic learning method for label or sequence prediction, but employs a proper regularization such that a new label or sequence is generated repeatedly until the full set is produced. Though SSG is sequential in nature, it does not penalize the ordering of the appearance of the set elements and can be applied to a variety of set output problems, such as a set of classification labels or sequences. We perform experiments with both benchmark and synthetic data sets and demonstrate SSG’s strong performance over baseline methods.
This paper describes a reference architecture for self-maintaining systems that can learn continually, as data arrives. In environments where data evolves, we need architectures that manage Machine Learning (ML) models in production, adapt to shifting data distributions, cope with outliers, retrain when necessary, and adapt to new tasks. This represents continual AutoML or Automatically Adaptive Machine Learning. We describe the challenges and proposes a reference architecture.
Learning to program could possibly be analogous to acquiring expertise in abstract mathematics, which may be boring or dull for a majority of students. Thus, among the countless options to approach learning coding [1-14], acquiring concepts through game creation could possibly be the most enriching experience for students. Consequently, it is important to select a lucid and familiar game for students. Then, the following step is to choose a language that introduces the basic concepts of object-oriented programming really well. For this paper, we chose the game of Tic-Tac-Toe, which is straight-forward for most people. The programming language chosen here is C++.
Learning from human feedback is a viable alternative to control design that does not require modelling or control expertise. Particularly, learning from corrective advice garners advantages over evaluative feedback as it is a more intuitive and scalable format. The current state-of-the-art in this field, COACH, has proven to be a effective approach for confined problems. However, it parameterizes the policy with Radial Basis Function networks, which require meticulous feature space engineering for higher order systems. We introduce Gaussian Process Coach (GPC), where feature space engineering is avoided by employing Gaussian Processes. In addition, we use the available policy uncertainty to 1) inquire feedback samples of maximal utility and 2) to adapt the learning rate to the teacher’s learning phase. We demonstrate that the novel algorithm outperforms the current state-of-the-art in final performance, convergence rate and robustness to erroneous feedback in OpenAI Gym continuous control benchmarks, both for simulated and real human teachers.
Shannon’s entropy and other entropy-based concepts are derived from the new, more general concept of relative divergence of one ‘grading’ function on a linearly ordered set from another such function. The definition of relative divergence is derived based on ‘common sense’ assumptions about comparing grading functions. Shannon’s entropy formulas emerge from the respective relative divergence ones, entropy based methods are extended to more general cases and some new applications.
We propose a new model selection criterion, the Neyman-Pearson criterion (NPC), for asymmetric binary classification problems such as cancer diagnosis, where the two types of classification errors have vastly different priorities. The NPC is a general prediction-based criterion that works for most classification methods including logistic regression, support vector machines, and random forests. We study the theoretical model selection properties of the NPC for nonparametric plug-in methods. Simulation studies show that the NPC outperforms the classical prediction-based criterion that minimizes the overall classification error under various asymmetric classification scenarios. A real data case study of breast cancer suggests that the NPC is a practical criterion that leads to the discovery of novel gene markers with both high sensitivity and specificity for breast cancer diagnosis. The NPC is available in an R package NPcriterion.
We propose a new low-cost machine-learning-based methodology which assists designers in reducing the gap between the problem and the solution in the design process. Our work applies reinforcement learning (RL) to find the optimal task-oriented design solution through the construction of the design action for each task. For this task-oriented design, the 3D design process in product design is assigned to an action space in Deep RL, and the desired 3D model is obtained by training each design action according to the task. By showing that this method achieves satisfactory design even when applied to a task pursuing multiple goals, we suggest the direction of how machine learning can contribute to the design process. Also, we have validated with product designers that this methodology can assist the creative part in the process of design.
Accurate segmenting nuclei instances is a crucial step in computer-aided image analysis to extract rich features for cellular estimation and following diagnosis as well as treatment. While it still remains challenging because the wide existence of nuclei clusters, along with the large morphological variances among different organs make nuclei instance segmentation susceptible to over-/under-segmentation. Additionally, the inevitably subjective annotating and mislabeling prevent the network learning from reliable samples and eventually reduce the generalization capability for robustly segmenting unseen organ nuclei. To address these issues, we propose a novel deep neural network, namely Contour-aware Informative Aggregation Network (CIA-Net) with multi-level information aggregation module between two task-specific decoders. Rather than independent decoders, it leverages the merit of spatial and texture dependencies between nuclei and contour by bi-directionally aggregating task-specific features. Furthermore, we proposed a novel smooth truncated loss that modulates losses to reduce the perturbation from outliers. Consequently, the network can focus on learning from reliable and informative samples, which inherently improves the generalization capability. Experiments on the 2018 MICCAI challenge of Multi-Organ-Nuclei-Segmentation validated the effectiveness of our proposed method, surpassing all the other 35 competitive teams by a significant margin.
We propose a novel Bayesian approach to the problem of variable selection in multiple linear regression models. In particular, we present a hierarchical setting which allows for direct specification of a-priori beliefs about the number of nonzero regression coefficients as well as a specification of beliefs that given coefficients are nonzero. To guarantee numerical stability, we adopt a $g$-prior with an additional ridge parameter for the unknown regression coefficients. In order to simulate from the joint posterior distribution an intelligent random walk Metropolis-Hastings algorithm which is able to switch between different models is proposed. Testing our algorithm on real and simulated data illustrates that it performs at least on par and often even better than other well-established methods. Finally, we prove that under some nominal assumptions, the presented approach is consistent in terms of model selection.
In real-world machine learning applications, there is a cost associated with sampling of different features. Budgeted learning can be used to select which feature-values to acquire from each instance in a dataset, such that the best model is induced under a given constraint. However, this approach is not possible in the domain of online learning since one may not retroactively acquire feature-values from past instances. In online learning, the challenge is to find the optimum set of features to be acquired from each instance upon arrival from a data stream. In this paper we introduce the issue of online budgeted learning and describe a general framework for addressing this challenge. We propose two types of feature value acquisition policies based on the multi-armed bandit problem: random and adaptive. Adaptive policies perform online adjustments according to new information coming from a data stream, while random policies are not sensitive to the information that arrives from the data stream. Our comparative study on five real-world datasets indicates that adaptive policies outperform random policies for most budget limitations and datasets. Furthermore, we found that in some cases adaptive policies achieve near-optimal results.
This paper introduces improved methods for sub-event detection in social media streams, by applying neural sequence models not only on the level of individual posts, but also directly on the stream level. Current approaches to identify sub-events within a given event, such as a goal during a soccer match, essentially do not exploit the sequential nature of social media streams. We address this shortcoming by framing the sub-event detection problem in social media streams as a sequence labeling task and adopt a neural sequence architecture that explicitly accounts for the chronological order of posts. Specifically, we (i) establish a neural baseline that outperforms a graph-based state-of-the-art method for binary sub-event detection (2.7% micro-F1 improvement), as well as (ii) demonstrate superiority of a recurrent neural network model on the posts sequence level for labeled sub-events (2.4% bin-level F1 improvement over non-sequential models).
Understanding the development of trends and identifying trend reversals in decadal time series is becoming more and more important. Many climatological and atmospheric time series are characterized by autocorrelation, heteroskedasticity and seasonal effects. Additionally, missing observations due to instrument failure or unfavorable measurement conditions are common in such series. This is why it is crucial to apply methods which work reliably under these circumstances. The goal of this paper is to provide a toolbox which can be used to determine the presence and form of changes in trend functions using parametric as well as nonparametric techniques. We consider bootstrap inference on broken linear trends and smoothly varying nonlinear trends. In particular, for the broken trend model, we propose a bootstrap method for inference on the break location and the corresponding changes in slope. For the smooth trend model we construct simultaneous confidence bands around the nonparametrically estimated trend. Our autoregressive wild bootstrap approach combined with a seasonal filter, is able to handle all issues mentioned above. We apply our methods to a set of atmospheric ethane series with a focus on the measurements obtained above the Jungfraujoch in the Swiss Alps. Ethane is the most abundant non-methane hydrocarbon in the Earth’s atmosphere, an important precursor of tropospheric ozone and a good indicator of oil and gas production as well as transport. Its monitoring is therefore crucial for the characterization of air quality and of the transport of tropospheric pollution.
Visual semantic information comprises two important parts: the meaning of each visual semantic unit and the coherent visual semantic relation conveyed by these visual semantic units. Essentially, the former one is a visual perception task while the latter one corresponds to visual context reasoning. Remarkable advances in visual perception have been achieved due to the success of deep learning. In contrast, visual semantic information pursuit, a visual scene semantic interpretation task combining visual perception and visual context reasoning, is still in its early stage. It is the core task of many different computer vision applications, such as object detection, visual semantic segmentation, visual relationship detection or scene graph generation. Since it helps to enhance the accuracy and the consistency of the resulting interpretation, visual context reasoning is often incorporated with visual perception in current deep end-to-end visual semantic information pursuit methods. However, a comprehensive review for this exciting area is still lacking. In this survey, we present a unified theoretical paradigm for all these methods, followed by an overview of the major developments and the future trends in each potential direction. The common benchmark datasets, the evaluation metrics and the comparisons of the corresponding methods are also introduced.
Financial market forecasting is one of the most attractive practical applications of sentiment analysis. In this paper, we investigate the potential of using sentiment \emph{attitudes} (positive vs negative) and also sentiment \emph{emotions} (joy, sadness, etc.) extracted from financial news or tweets to help predict stock price movements. Our extensive experiments using the \emph{Granger-causality} test have revealed that (i) in general sentiment attitudes do not seem to Granger-cause stock price changes; and (ii) while on some specific occasions sentiment emotions do seem to Granger-cause stock price changes, the exhibited pattern is not universal and must be looked at on a case by case basis. Furthermore, it has been observed that at least for certain stocks, integrating sentiment emotions as additional features into the machine learning based market trend prediction model could improve its accuracy.
We present MMKG, a collection of three knowledge graphs that contain both numerical features and (links to) images for all entities as well as entity alignments between pairs of KGs. Therefore, multi-relational link prediction and entity matching communities can benefit from this resource. We believe this data set has the potential to facilitate the development of novel multi-modal learning approaches for knowledge graphs.We validate the utility ofMMKG in the sameAs link prediction task with an extensive set of experiments. These experiments show that the task at hand benefits from learning of multiple feature types.
Because the choice and tuning of the optimizer affects the speed, and ultimately the performance of deep learning, there is significant past and recent research in this area. Yet, perhaps surprisingly, there is no generally agreed-upon protocol for the quantitative and reproducible evaluation of optimization strategies for deep learning. We suggest routines and benchmarks for stochastic optimization, with special focus on the unique aspects of deep learning, such as stochasticity, tunability and generalization. As the primary contribution, we present DeepOBS, a Python package of deep learning optimization benchmarks. The package addresses key challenges in the quantitative assessment of stochastic optimizers, and automates most steps of benchmarking. The library includes a wide and extensible set of ready-to-use realistic optimization problems, such as training Residual Networks for image classification on ImageNet or character-level language prediction models, as well as popular classics like MNIST and CIFAR-10. The package also provides realistic baseline results for the most popular optimizers on these test problems, ensuring a fair comparison to the competition when benchmarking new optimizers, and without having to run costly experiments. It comes with output back-ends that directly produce LaTeX code for inclusion in academic publications. It supports TensorFlow and is available open source.
Deep learning techniques are rapidly advanced recently, and becoming a necessity component for widespread systems. However, the inference process of deep learning is black-box, and not very suitable to safety-critical systems which must exhibit high transparency. In this paper, to address this black-box limitation, we develop a simple analysis method which consists of 1) structural feature analysis: lists of the features contributing to inference process, 2) linguistic feature analysis: lists of the natural language labels describing the visual attributes for each feature contributing to inference process, and 3) consistency analysis: measuring consistency among input data, inference (label), and the result of our structural and linguistic feature analysis. Our analysis is simplified to reflect the actual inference process for high transparency, whereas it does not include any additional black-box mechanisms such as LSTM for highly human readable results. We conduct experiments and discuss the results of our analysis qualitatively and quantitatively, and come to believe that our work improves the transparency of neural networks. Evaluated through 12,800 human tasks, 75% workers answer that input data and result of our feature analysis are consistent, and 70% workers answer that inference (label) and result of our feature analysis are consistent. In addition to the evaluation of the proposed analysis, we find that our analysis also provide suggestions, or possible next actions such as expanding neural network complexity or collecting training data to improve a neural network.
This paper presents a hardness-aware deep metric learning (HDML) framework. Most previous deep metric learning methods employ the hard negative mining strategy to alleviate the lack of informative samples for training. However, this mining strategy only utilizes a subset of training data, which may not be enough to characterize the global geometry of the embedding space comprehensively. To address this problem, we perform linear interpolation on embeddings to adaptively manipulate their hard levels and generate corresponding label-preserving synthetics for recycled training, so that information buried in all samples can be fully exploited and the metric is always challenged with proper difficulty. Our method achieves very competitive performance on the widely used CUB-200-2011, Cars196, and Stanford Online Products datasets.
This paper describes a baseline for the second iteration of the Fact Extraction and VERification shared task (FEVER2.0) which explores the resilience of systems through adversarial evaluation. We present a collection of simple adversarial attacks against systems that participated in the first FEVER shared task. FEVER modeled the assessment of truthfulness of written claims as a joint information retrieval and natural language inference task using evidence from Wikipedia. A large number of participants made use of deep neural networks in their submissions to the shared task. The extent as to whether such models understand language has been the subject of a number of recent investigations and discussion in literature. In this paper, we present a simple method of generating entailment-preserving and entailment-altering perturbations of instances by common patterns within the training data. We find that a number of systems are greatly affected with absolute losses in classification accuracy of up to $29\%$ on the newly perturbed instances. Using these newly generated instances, we construct a sample submission for the FEVER2.0 shared task. Addressing these types of attacks will aid in building more robust fact-checking models, as well as suggest directions to expand the datasets.
We have recently seen the emergence of several publicly available Natural Language Understanding (NLU) toolkits, which map user utterances to structured, but more abstract, Dialogue Act (DA) or Intent specifications, while making this process accessible to the lay developer. In this paper, we present the first wide coverage evaluation and comparison of some of the most popular NLU services, on a large, multi-domain (21 domains) dataset of 25K user utterances that we have collected and annotated with Intent and Entity Type specifications and which will be released as part of this submission. The results show that on Intent classification Watson significantly outperforms the other platforms, namely, Dialogflow, LUIS and Rasa; though these also perform well. Interestingly, on Entity Type recognition, Watson performs significantly worse due to its low Precision. Again, Dialogflow, LUIS and Rasa perform well on this task.
Data augmentation is a popular technique which helps improve generalization capabilities of deep neural networks. It plays a pivotal role in remote-sensing scenarios in which the amount of high-quality ground truth data is limited, and acquiring new examples is costly or impossible. This is a common problem in hyperspectral imaging, where manual annotation of image data is difficult, expensive, and prone to human bias. In this letter, we propose online data augmentation of hyperspectral data which is executed during the inference rather than before the training of deep networks. This is in contrast to all other state-of-the-art hyperspectral augmentation algorithms which increase the size (and representativeness) of training sets. Additionally, we introduce a new principal component analysis based augmentation. The experiments revealed that our data augmentation algorithms improve generalization of deep networks, work in real-time, and the online approach can be effectively combined with offline techniques to enhance the classification accuracy.
Matrix factorization is a powerful data analysis tool. It has been used in multivariate time series analysis, leading to the decomposition of the series in a small set of latent factors. However, little is known on the statistical performances of matrix factorization for time series. In this paper, we extend the results known for matrix estimation in the i.i.d setting to time series. Moreover, we prove that when the series exhibit some additional structure like periodicity or smoothness, it is possible to improve on the classical rates of convergence.
We study linear programming and general LP-type problems in several big data (streaming and distributed) models. We mainly focus on low dimensional problems in which the number of constraints is much larger than the number of variables. Low dimensional LP-type problems appear frequently in various machine learning tasks such as robust regression, support vector machines, and core vector machines. As supporting large-scale machine learning queries in database systems has become an important direction for database research, obtaining efficient algorithms for low dimensional LP-type problems on massive datasets is of great value. In this paper we give both upper and lower bounds for LP-type problems in distributed and streaming models. Our bounds are almost tight when the dimensionality of the problem is a fixed constant.
The spatio-temporal graph learning is becoming an increasingly important object of graph study. Many application domains involve highly dynamic graphs where temporal information is crucial, e.g. traffic networks and financial transaction graphs. Despite the constant progress made on learning structured data, there is still a lack of effective means to extract dynamic complex features from spatio-temporal structures. Particularly, conventional models such as convolutional networks or recurrent neural networks are incapable of revealing the temporal patterns in short or long terms and exploring the spatial properties in local or global scope from spatio-temporal graphs simultaneously. To tackle this problem, we design a novel multi-scale architecture, Spatio-Temporal U-Net (ST-UNet), for graph-structured time series modeling. In this U-shaped network, a paired sampling operation is proposed in spacetime domain accordingly: the pooling (ST-Pool) coarsens the input graph in spatial from its deterministic partition while abstracts multi-resolution temporal dependencies through dilated recurrent skip connections; based on previous settings in the downsampling, the unpooling (ST-Unpool) restores the original structure of spatio-temporal graphs and resumes regular intervals within graph sequences. Experiments on spatio-temporal prediction tasks demonstrate that our model effectively captures comprehensive features in multiple scales and achieves substantial improvements over mainstream methods on several real-world datasets.