Transfer Learning for Clinical Time Series Analysis using Recurrent Neural Networks

Deep neural networks have shown promising results for various clinical prediction tasks such as diagnosis, mortality prediction, predicting duration of stay in hospital, etc. However, training deep networks — such as those based on Recurrent Neural Networks (RNNs) — requires large labeled data, high computational resources, and significant hyperparameter tuning effort. In this work, we investigate as to what extent can transfer learning address these issues when using deep RNNs to model multivariate clinical time series. We consider transferring the knowledge captured in an RNN trained on several source tasks simultaneously using a large labeled dataset to build the model for a target task with limited labeled data. An RNN pre-trained on several tasks provides generic features, which are then used to build simpler linear models for new target tasks without training task-specific RNNs. For evaluation, we train a deep RNN to identify several patient phenotypes on time series from MIMIC-III database, and then use the features extracted using that RNN to build classifiers for identifying previously unseen phenotypes, and also for a seemingly unrelated task of in-hospital mortality. We demonstrate that (i) models trained on features extracted using pre-trained RNN outperform or, in the worst case, perform as well as task-specific RNNs; (ii) the models using features from pre-trained models are more robust to the size of labeled data than task-specific RNNs; and (iii) features extracted using pre-trained RNN are generic enough and perform better than typical statistical hand-crafted features.

Mining Periodic Patterns with a MDL Criterion

The quantity of event logs available is increasing rapidly, be they produced by industrial processes, computing systems, or life tracking, for instance. It is thus important to design effective ways to uncover the information they contain. Because event logs often record repetitive phenomena, mining periodic patterns is especially relevant when considering such data. Indeed, capturing such regularities is instrumental in providing condensed representations of the event sequences. We present an approach for mining periodic patterns from event logs while relying on a Minimum Description Length (MDL) criterion to evaluate candidate patterns. Our goal is to extract a set of patterns that suitably characterises the periodic structure present in the data. We evaluate the interest of our approach on several real-world event log datasets.

Dynamic network identification from non-stationary vector autoregressive time series

Learning the dynamics of complex systems features a large number of applications in data science. Graph-based modeling and inference underpins the most prominent family of approaches to learn complex dynamics due to their ability to capture the intrinsic sparsity of direct interactions in such systems. They also provide the user with interpretable graphs that unveil behavioral patterns and changes. To cope with the time-varying nature of interactions, this paper develops an estimation criterion and a solver to learn the parameters of a time-varying vector autoregressive model supported on a network of time series. The notion of local breakpoint is proposed to accommodate changes at individual edges. It contrasts with existing works, which assume that changes at all nodes are aligned in time. Numerical experiments validate the proposed schemes.

BayesGrad: Explaining Predictions of Graph Convolutional Networks

Recent advances in graph convolutional networks have significantly improved the performance of chemical predictions, raising a new research question: ‘how do we explain the predictions of graph convolutional networks ‘ A possible approach to answer this question is to visualize evidence substructures responsible for the predictions. For chemical property prediction tasks, the sample size of the training data is often small and/or a label imbalance problem occurs, where a few samples belong to a single class and the majority of samples belong to the other classes. This can lead to uncertainty related to the learned parameters of the machine learning model. To address this uncertainty, we propose BayesGrad, utilizing the Bayesian predictive distribution, to define the importance of each node in an input graph, which is computed efficiently using the dropout technique. We demonstrate that BayesGrad successfully visualizes the substructures responsible for the label prediction in the artificial experiment, even when the sample size is small. Furthermore, we use a real dataset to evaluate the effectiveness of the visualization. The basic idea of BayesGrad is not limited to graph-structured data and can be applied to other data types.

Transfer with Model Features in Reinforcement Learning

A key question in Reinforcement Learning is which representation an agent can learn to efficiently reuse knowledge between different tasks. Recently the Successor Representation was shown to have empirical benefits for transferring knowledge between tasks with shared transition dynamics. This paper presents Model Features: a feature representation that clusters behaviourally equivalent states and that is equivalent to a Model-Reduction. Further, we present a Successor Feature model which shows that learning Successor Features is equivalent to learning a Model-Reduction. A novel optimization objective is developed and we provide bounds showing that minimizing this objective results in an increasingly improved approximation of a Model-Reduction. Further, we provide transfer experiments on randomly generated MDPs which vary in their transition and reward functions but approximately preserve behavioural equivalence between states. These results demonstrate that Model Features are suitable for transfer between tasks with varying transition and reward functions.

A Comparative Study of Containers and Virtual Machines in Big Data Environment

Container technique is gaining increasing attention in recent years and has become an alternative to traditional virtual machines. Some of the primary motivations for the enterprise to adopt the container technology include its convenience to encapsulate and deploy applications, lightweight operations, as well as efficiency and flexibility in resources sharing. However, there still lacks an in-depth and systematic comparison study on how big data applications, such as Spark jobs, perform between a container environment and a virtual machine environment. In this paper, by running various Spark applications with different configurations, we evaluate the two environments from many interesting aspects, such as how convenient the execution environment can be set up, what are makespans of different workloads running in each setup, how efficient the hardware resources, such as CPU and memory, are utilized, and how well each environment can scale. The results show that compared with virtual machines, containers provide a more easy-to-deploy and scalable environment for big data workloads. The research work in this paper can help practitioners and researchers to make more informed decisions on tuning their cloud environment and configuring the big data applications, so as to achieve better performance and higher resources utilization.

TFLMS: Large Model Support in TensorFlow by Graph Rewriting

While accelerators such as GPUs have limited memory, deep neural networks are becoming larger and will not fit with the memory limitation of accelerators for training. We propose an approach to tackle this problem by rewriting the computational graph of a neural network, in which swap-out and swap-in operations are inserted to temporarily store intermediate results on CPU memory. In particular, we first revise the concept of a computational graph by defining a concrete semantics for variables in a graph. We then formally show how to derive swap-out and swap-in operations from an existing graph and present rules to optimize the graph. To realize our approach, we developed a module in TensorFlow, named TFLMS. TFLMS is published as a pull request in the TensorFlow repository for contributing to the TensorFlow community. With TFLMS, we were able to train ResNet-50 and 3DUnet with 4.7x and 2x larger batch size, respectively. In particular, we were able to train 3DUNet using images of size of 192^3 for image segmentation, which, without TFLMS, had been done only by dividing the images to smaller images, which affects the accuracy.

Shannon entropy for intuitionistic fuzzy information

The paper presents an extension of Shannon fuzzy entropy for intuitionistic fuzzy one. Firstly, we presented a new formula for calculating the distance and similarity of intuitionistic fuzzy information. Then, we constructed measures for information features like score, certainty and uncertainty. Also, a new concept was introduced, namely escort fuzzy information. Then, using the escort fuzzy information, Shannon’s formula for intuitionistic fuzzy information was obtained. It should be underlined that Shannon’s entropy for intuitionistic fuzzy information verifies the four defining conditions of intuitionistic fuzzy uncertainty. The measures of its two components were also identified: fuzziness (ambiguity) and incompleteness (ignorance).

Restructuring Batch Normalization to Accelerate CNN Training

Because CNN models are compute-intensive, where billions of operations can be required just for an inference over a single input image, a variety of CNN accelerators have been proposed and developed. For the early CNN models, the research mostly focused on convolutional and fully-connected layers because the two layers consumed most of the computation cycles. For more recent CNN models, however, non-convolutional layers have become comparably important because of the popular use of newly designed non-convolutional layers and because of the reduction in the number and size of convolutional filters. Non-convolutional layers, including batch normalization (BN), typically have relatively lower computational intensity compared to the convolutional or fully-connected layers, and hence are often constrained by main-memory bandwidth. In this paper, we focus on accelerating the BN layers among the non-convolutional layers, as BN has become a core design block of modern CNNs. A typical modern CNN has a large number of BN layers. BN requires mean and variance calculations over each mini-batch during training. Therefore, the existing memory-access reduction techniques, such as fusing multiple CONV layers, are not effective for accelerating BN due to their inability to optimize mini-batch related calculations. To address this increasingly important problem, we propose to restructure BN layers by first splitting it into two sub-layers and then combining the first sub-layer with its preceding convolutional layer and the second sub-layer with the following activation and convolutional layers. The proposed solution can significantly reduce main-memory accesses while training the latest CNN models, and the experiments on a chip multiprocessor with our modified Caffe implementation show that the proposed BN restructuring can improve the performance of DenseNet with 121 convolutional layers by 28.4%.

A Convolutional Neural Network for Aspect Sentiment Classification

With the development of the Internet, natural language processing (NLP), in which sentiment analysis is an important task, became vital in information processing.Sentiment analysis includes aspect sentiment classification. Aspect sentiment can provide complete and in-depth results with increased attention on aspect-level. Different context words in a sentence influence the sentiment polarity of a sentence variably, and polarity varies based on the different aspects in a sentence. Take the sentence, ‘I bought a new camera. The picture quality is amazing but the battery life is too short.’as an example. If the aspect is picture quality, then the expected sentiment polarity is ‘positive’, if the battery life aspect is considered, then the sentiment polarity should be ‘negative’; therefore, aspect is important to consider when we explore aspect sentiment in the sentence. Recurrent neural network (RNN) is regarded as a good model to deal with natural language processing, and RNNs has get good performance on aspect sentiment classification including Target-Dependent LSTM (TD-LSTM) ,Target-Connection LSTM (TC-LSTM) (Tang, 2015a, b), AE-LSTM, AT-LSTM, AEAT-LSTM (Wang et al., 2016).There are also extensive literatures on sentiment classification utilizing convolutional neural network, but there is little literature on aspect sentiment classification using convolutional neural network. In our paper, we develop attention-based input layers in which aspect information is considered by input layer. We then incorporate attention-based input layers into convolutional neural network (CNN) to introduce context words information. In our experiment, incorporating aspect information into CNN improves the latter’s aspect sentiment classification performance without using syntactic parser or external sentiment lexicons in a benchmark dataset from Twitter but get better performance compared with other models.

Recommendation Systems and Self Motivated Users

Modern recommendation systems rely on the wisdom of the crowd to learn the optimal course of action. This induces an inherent mis-alignment of incentives between the system’s objective to learn (explore) and the individual users’ objective to take the contemporaneous optimal action (exploit). The design of such systems must account for this and also for additional information available to the users. A prominent, yet simple, example is when agents arrive sequentially and each agent observes the action and reward of his predecessor. We provide an incentive compatible and asymptotically optimal mechanism for that setting. The complexity of the mechanism suggests that the design of such systems for general settings is a challenging task.

Pontogammarus Maeoticus Swarm Optimization: A Metaheuristic Optimization Algorithm

Nowadays, metaheuristic optimization algorithms are used to find the global optima in difficult search spaces. Pontogammarus Maeoticus Swarm Optimization (PMSO) is a metaheuristic algorithm imitating aquatic nature and foraging behavior. Pontogammarus Maeoticus, also called Gammarus in short, is a tiny creature found mostly in coast of Caspian Sea in Iran. In this algorithm, global optima is modeled as sea edge (coast) to which Gammarus creatures are willing to move in order to rest from sea waves and forage in sand. Sea waves satisfy exploration and foraging models exploitation. The strength of sea wave is determined according to distance of Gammarus from sea edge. The angles of waves applied on several particles are set randomly helping algorithm not be stuck in local bests. Meanwhile, the neighborhood of particles change adaptively resulting in more efficient progress in searching. The proposed algorithm, although is applicable on any optimization problem, is experimented for partially shaded solar PV array. Experiments on CEC05 benchmarks, as well as solar PV array, show the effectiveness of this optimization algorithm.

Logistic Regression, Neural Networks and Dempster-Shafer Theory: a New Perspective

We revisit logistic regression and its nonlinear extensions, including multilayer feedforward neural networks, by showing that these classifiers can be viewed as converting input or higher-level features into Dempster-Shafer mass functions and aggregating them by Dempster’s rule of combination. The probabilistic outputs of these classifiers are the normalized plausibilities corresponding to the underlying combined mass function. This mass function is more informative than the output probability distribution. In particular, it makes it possible to distinguish between lack of evidence (when none of the features provides discriminant information) from conflicting evidence (when different features support different classes). This expressivity of mass functions allows us to gain insight into the role played by each input feature in logistic regression, and to interpret hidden unit outputs in multilayer neural networks. It also makes it possible to use alternative decision rules, such as interval dominance, which select a set of classes when the available evidence does not unambiguously point to a single class, thus trading reduced error rate for higher imprecision.

An IDE-Based Context-Aware Meta Search Engine

Traditional web search forces the developers to leave their working environments and look for solutions in the web browsers. It often does not consider the context of their programming problems. The context-switching between the web browser and the working environment is time-consuming and distracting, and the keyword-based traditional search often does not help much in problem solving. In this paper, we propose an Eclipse IDE-based web search solution that collects the data from three web search APIs– Google, Yahoo, Bing and a programming Q & A site– Stack Overflow. It then provides search results within IDE taking not only the content of the selected error into account but also the problem context, popularity and search engine recommendation of the result links. Experiments with 25 run time errors and exceptions show that the proposed approach outperforms the keyword-based search approaches with a recommendation accuracy of 96%. We also validate the results with a user study involving five prospective participants where we get a result agreement of 64.28%. While the preliminary results are promising, the approach needs to be further validated with more errors and exceptions followed by a user study with more participants to establish itself as a complete IDE-based web search solution.

A Boo(n) for Evaluating Architecture Performance

We point out important problems with the common practice of using the best single model performance for comparing deep learning architectures, and we propose a method that corrects these flaws. Each time a model is trained, one gets a different result due to random factors in the training process, which include random parameter initialization and random data shuffling. Reporting the best single model performance does not appropriately address this stochasticity. We propose a normalized expected best-out-of-n performance (\text{Boo}_n) as a way to correct these problems.

Model-based Clustering

Mixture models extend the toolbox of clustering methods available to the data analyst. They allow for an explicit definition of the cluster shapes and structure within a probabilistic framework and exploit estimation and inference techniques available for statistical models in general. In this chapter an introduction to cluster analysis is provided, model-based clustering is related to standard heuristic clustering methods and an overview on different ways to specify the cluster model is given. Post-processing methods to determine a suitable clustering, infer cluster distribution characteristics and validate the cluster solution are discussed. The versatility of the model-based clustering approach is illustrated by giving an overview on the different areas of applications.

Uncertainty Quantification of Electronic and Photonic ICs with Non-Gaussian Correlated Process Variations
Synthetic contrast enhancement in cardiac CT with Deep Learning
Hidden Undernutrition: How universal cutoffs can fail to capture stunting in low and middle income countries
Transfer Learning From Synthetic To Real Images Using Variational Autoencoders For Precise Position Detection
An efficient quantum circuits optimizing scheme compared with QISKit
Bring a friend! Privately or Publicly
A Formal Ontology-Based Classification of Lexemes and its Applications
LaneNet: Real-Time Lane Detection Networks for Autonomous Driving
Proximal algorithms for large-scale statistical modeling and optimal sensor/actuator selection
Global Transition-based Non-projective Dependency Parsing
Graph functionality
Accelerated First-order Methods on the Wasserstein Space for Bayesian Inference
Massively-Parallel Break Detection for Satellite Data
Path integral for quantum Mabuchi K-energy
Learning Personalized Representation for Inverse Problems in Medical Imaging Using Deep Neural Network
Random cherry graphs
Seq2RDF: An end-to-end application for deriving Triples from Natural Language Text
Direct Uncertainty Prediction with Applications to Healthcare
BOHB: Robust and Efficient Hyperparameter Optimization at Scale
Program Language Translation Using a Grammar-Driven Tree-to-Tree Model
MITOS-RCNN: A Novel Approach to Mitotic Figure Detection in Breast Cancer Histopathology Images using Region Based Convolutional Neural Networks
Existential monadic second order logic of undirected graphs: a disproof of the Le Bars conjecture
Regularizing Autoencoder-Based Matrix Completion Models via Manifold Learning
Feature-based reformulation of entities in triple pattern queries
Optimal Ball Recycling
Deep Cross-modality Adaptation via Semantics Preserving Adversarial Learning for Sketch-based 3D Shape Retrieval
Discrete Sampling using Semigradient-based Product Mixtures
Mean-field avalanche size exponent for sandpiles on Galton-Watson trees
Systems of infinite horizon and ergodic BSDE arising in regime switching forward performance processes
Renewal-scaled solutions of the Kolmogorov forward equation for residual times
Tracy-Widom asymptotics for a river delta model
The Turán Number for Spanning Linear Forests
PortraitGAN for Flexible Portrait Manipulation
Learning Theory and Algorithms for Revenue Management in Sponsored Search
Per-decision Multi-step Temporal Difference Learning with Control Variates
Sanity Check: A Strong Alignment and Information Retrieval Baseline for Question Answering
Metamorphic Moving Horizon Estimation
Practical and Scalable Security Verification of Secure Architectures
Zipf’s law in 50 languages: its structural pattern, linguistic interpretation, and cognitive motivation
On the Menezes-Teske-Weng’s conjecture
Privacy-preserving Machine Learning through Data Obfuscation
Detecting Tiny Moving Vehicles in Satellite Videos
An integrated localization-navigation scheme for distance-based docking of UAVs
Hunting the Ethereum Smart Contract: Color-inspired Inspection of Potential Attacks
Road surface 3d reconstruction based on dense subpixel disparity map estimation
Surprising strategies obtained by stochastic optimization in partially observable games
General self-similarity properties for Markov processes and exponential functionals of L{é}vy processes
Chinese Lexical Analysis with Deep Bi-GRU-CRF Network
A Single Shot Text Detector with Scale-adaptive Anchors
Learning in Variational Autoencoders with Kullback-Leibler and Renyi Integral Bounds
Elementary proof of congruences modulo 25 for broken $k$-diamond partitions
Denoising Auto-encoder with Recurrent Skip Connections and Residual Regression for Music Source Separation
A Bayesian model for lithology/fluid class prediction using a Markov mesh prior fitted from a training image
Multi-robot Path Planning in Well-formed Infrastructures: Prioritized Planning vs. Prioritized Wait Adjustment (Preliminary Results)
A multiple-try Metropolis-Hastings algorithm with tailored proposals
Smart Meter Privacy: Adversarial Hypothesis Testing Models
Counting Induced Subgraphs: A Topological Approach to #W[1]-hardness
Branching Processes – A General Concept
Evaluating the impact of the 2012 Olympic Games policy on the regeneration of East London using spatio-temporal big data
Directed Continuous-Time Random Walk with memory
Placement and Implementation of Grid-Forming and Grid-Following Virtual Inertia
Incremental Relational Lenses
Volumetric performance capture from minimal camera viewpoints
Subpixel-Precise Tracking of Rigid Objects in Real-time
Lattice based Conceptual Spaces to Explore Cognitive Functionalities for Prosthetic Arm
Neural Language Codes for Multilingual Acoustic Models
Low Overhead Weighted-Graph-Coloring-Based Two-Layer Precoding for FDD Massive MIMO Systems
Deeply-Sparse Signal rePresentations ($\text{D}\text{S}^2\text{P}$)
Deep Reinforcement Learning for Doom using Unsupervised Auxiliary Tasks
Partitioning Vectors into Quadruples: Worst-Case Analysis of a Matching-Based Algorithm
Model-free Consensus Maximization for Non-Rigid Shapes
Open Logo Detection Challenge
Variational Bayesian dropout: pitfalls and fixes
Arcades: A deep model for adaptive decision making in voice controlled smart-home
Beef Cattle Instance Segmentation Using Fully Convolutional Neural Network
Universality of jamming of non-spherical particles
Optimal Portfolio in Intraday Electricity Markets Modelled by Lévy-Ornstein-Uhlenbeck Processes
Delay-induced stochastic bursting in excitable noisy systems
Perspective-Aware CNN For Crowd Counting
Quantum Symmetric Cooperative Game with a Harmonious Coalition
Acquire, Augment, Segment \& Enjoy: Weakly Supervised Instance Segmentation of Supermarket Products
A solution to a linear integral equation with an application to statistics of infinitely divisible moving averages
Calamari – A High-Performance Tensorflow-based Deep Learning Package for Optical Character Recognition
DNA Computing for Combinational Logic
Improving Unsupervised Defect Segmentation by Applying Structural Similarity to Autoencoders
Mean field systems on networks, with singular interaction through hitting times
Detection and Analysis of Content Creator Collaborations in YouTube Videos using Face- and Speaker-Recognition
Quantum circuits for floating-point arithmetic
Reflection Analysis for Face Morphing Attack Detection
Consistent Generative Query Networks
Frame-constrained Total Variation Regularization for White Noise Regression
Towards a simplified ontology for better e-commerce search
Joint Neural Network Equalizer and Decoder
Real-Time Subpixel Fast Bilateral Stereo
Graphs in perturbation theory: Algebraic structure and asymptotics
Searching for dense subsets in a graph via the partition function
Solving isomorphism problems about 2-designs from disjoint difference families
Stereo Vision-based Semantic 3D Object and Ego-motion Tracking for Autonomous Driving
A three-operator splitting perspective of a three-block ADMM for convex quadratic semidefinite programming and extensions
Representing scenarios for process evolution management
Modelling aspects of multi-mode antennas for direction-of-arrival estimation
Goal-oriented Trajectories for Efficient Exploration
Combining Background Subtraction Algorithms with Convolutional Neural Network
A Gauss-Newton Approach to Real-Time Monocular Multiple Object Tracking
Contextual Bandits under Delayed Feedback
On circumcenters of finite sets in Hilbert spaces
$n$-arc and $n$-circle connected graph-like spaces
MAT-CNN-SOPC: Motionless Analysis of Traffic Using Convolutional Neural Networks on System-On-a-Programmable-Chip
The Role of the Propensity Score in Fixed Effect Models