Mobile phones can record individual’s daily behavioral data as a time-series. In this paper, we present an effective time-series segmentation technique that extracts optimal time segments of individual’s similar behavioral characteristics utilizing their mobile phone data. One of the determinants of an individual’s behavior is the various activities undertaken at various times-of-the-day and days-of-the-week. In many cases, such behavior will follow temporal patterns. Currently, researchers use either equal or unequal interval-based segmentation of time for mining mobile phone users’ behavior. Most of them take into account static temporal coverage of 24-h-a-day and few of them take into account the number of incidences in time-series data. However, such segmentations do not necessarily map to the patterns of individual user activity and subsequent behavior because of not taking into account the diverse behaviors of individuals over time-of-the-week. Therefore, we propose a behavior-oriented time segmentation (BOTS) technique that takes into account not only the temporal coverage of the week but also the number of incidences of diverse behaviors dynamically for producing similar behavioral time segments over the week utilizing time-series data. Experiments on the real mobile phone datasets show that our proposed segmentation technique better captures the user’s dominant behavior at various times-of-the-day and days-of-the-week enabling the generation of high confidence temporal rules in order to mine individual mobile phone users’ behavior.
Most deep neural networks deployed today are trained using GPUs via high-level frameworks such as TensorFlow and PyTorch. This paper describes changes we made to the GPGPU-Sim simulator to enable it to run PyTorch by running PTX kernels included in NVIDIA’s cuDNN library. We use the resulting modified simulator, which we plan to make available publicly with this paper, to study some simple deep learning workloads. With our changes to GPGPU-Sim’s functional simulation model, we find GPGPU-Sim performance model running a cuDNN enabled implementation of LeNet for MNIST reports results within 30% of real hardware. Using GPGPU-Sim’s AerialVision performance analysis tool we observe that cuDNN API calls contain many varying phases and appear to include potentially inefficient microarchitecture behaviour such as DRAM partition bank camping, at least when executed on GPGPU-Sim’s current performance model.
Machine learning is becoming an ever present part in our lives as many decisions, e.g. to lend a credit, are no longer made by humans but by machine learning algorithms. However those decisions are often unfair and discriminating individuals belonging to protected groups based on race or gender. With the recent General Data Protection Regulation (GDPR) coming into effect, new awareness has been raised for such issues and with computer scientists having such a large impact on peoples lives it is necessary that actions are taken to discover and prevent discrimination. This work aims to give an introduction into discrimination, legislative foundations to counter it and strategies to detect and prevent machine learning algorithms from showing such behavior.
We propose an efficient and interpretable scene graph generator. We consider three types of features: visual, spatial and semantic, and we use a late fusion strategy such that each feature’s contribution can be explicitly investigated. We study the key factors about these features that have the most impact on the performance, and also visualize the learned visual features for relationships and investigate the efficacy of our model. We won the champion of the OpenImages Visual Relationship Detection Challenge on Kaggle, where we outperform the 2nd place by 5\% (20\% relatively). We believe an accurate scene graph generator is a fundamental stepping stone for higher-level vision-language tasks such as image captioning and visual QA, since it provides a semantic, structured comprehension of an image that is beyond pixels and objects.
An increasing number of datasets contain multiple views, such as video, sound and automatic captions. A basic challenge in representation learning is how to leverage multiple views to learn better representations. This is further complicated by the existence of a latent alignment between views, such as between speech and its transcription, and by the multitude of choices for the learning objective. We explore an advanced, correlation-based representation learning method on a 4-way parallel, multimodal dataset, and assess the quality of the learned representations on retrieval-based tasks. We show that the proposed approach produces rich representations that capture most of the information shared across views. Our best models for speech and textual modalities achieve retrieval rates from 70.7% to 96.9% on open-domain, user-generated instructional videos. This shows it is possible to learn reliable representations across disparate, unaligned and noisy modalities, and encourages using the proposed approach on larger datasets.
Estimating the individual treatment effect (ITE) from observational data is essential in medicine. A central challenge in estimating the ITE is handling confounders, which are factors that affect both an intervention and its outcome. Most previous work relies on the unconfoundedness assumption, which posits that all the confounders are measured in the observational data. However, if there are unmeasurable (latent) confounders, then confounding bias is introduced. Fortunately, noisy proxies for the latent confounders are often available and can be used to make an unbiased estimate of the ITE. In this paper, we develop a novel adversarial learning framework to make unbiased estimates of the ITE using noisy proxies.
We propose an estimation method that we call functional average variance estimation (FAVE), for estimating the EDR space in functional semiparametric regression model, based on kernel estimates of density and regression. Consistency results are then established for the estimator of the interest operator, and for the directions of EDR space. A simulation study that shows that the proposed approach performs as well as traditional ones is presented.
Grey-box fuzzers such as American Fuzzy Lop (AFL) are popular tools for finding bugs and potential vulnerabilities in programs. While these fuzzers have been able to find vulnerabilities in many widely used programs, they are not efficient; of the millions of inputs executed by AFL in a typical fuzzing run, only a handful discover unseen behavior or trigger a crash. The remaining inputs are redundant, exhibiting behavior that has already been observed. Here, we present an approach to increase the efficiency of fuzzers like AFL by applying machine learning to directly model how programs behave. We learn a forward prediction model that maps program inputs to execution traces, training on the thousands of inputs collected during standard fuzzing. This learned model guides exploration by focusing on fuzzing inputs on which our model is the most uncertain (measured via the entropy of the predicted execution trace distribution). By focusing on executing inputs our learned model is unsure about, and ignoring any input whose behavior our model is certain about, we show that we can significantly limit wasteful execution. Through testing our approach on a set of binaries released as part of the DARPA Cyber Grand Challenge, we show that our approach is able to find a set of inputs that result in more code coverage and discovered crashes than baseline fuzzers with significantly fewer executions.
Deep neural networks have shown remarkable performance across a wide range of vision-based tasks, particularly due to the availability of large-scale datasets for training and better architectures. However, data seen in the real world are often affected by distortions that not accounted for by the training datasets. In this paper, we address the challenge of robustness and stability of neural networks and propose a general training method that can be used to make the existing neural network architectures more robust and stable to input visual perturbations while using only available datasets for training. Proposed training method is convenient to use as it does not require data augmentation or changes in the network architecture. We provide theoretical proof as well as empirical evidence for the efficiency of the proposed training method by performing experiments with existing neural network architectures and demonstrate that same architecture when trained with the proposed training method perform better than when trained with conventional training approach in the presence of noisy datasets.
Nowadays, many fields of study are have to deal with large and sparse data matrixes, but the most important issue is finding the inverse of these matrixes. Thankfully, Krylov subspace methods can be used in solving these types of problem. However, it is difficult to understand mathematical principles behind these methods. In the first part of the article, Krylov methods are discussed in detail. Thus, readers equipped with a basic knowledge of linear algebra should be able to understand these methods. In this part, the knowledge of Krylov methods are put into some examples for simple implementations of a commonly known Krylov method GMRES. In the second part, the article talks about CG iteration, a wildly known method which is very similar to Krylov methods. By comparison between CG iteration and Krylov methods, readers can get a better comprehension of Krylov methods based on CG iteration. In the third part of the article, aiming to improve the efficiency of Krylov methods, preconditioners are discussed. In addition, the restarting GMRES is briefly introduced to reduce the space consumption of Krylov methods in this part.
Emerging technologies that generate a huge amount of data such as the Internet of Things (IoT) services need latency aware computing platforms to support time-critical applications. Due to the on-demand services and scalability features of cloud computing, Big Data application processing is done in the cloud infrastructure. Managing Big Data applications exclusively in the cloud is not an efficient solution for latency-sensitive applications related to smart transportation systems, healthcare solutions, emergency response systems and content delivery applications. Thus, the Fog computing paradigm that allows applications to perform computing operations in-between the cloud and the end devices has emerged. In Fog architecture, IoT devices and sensors are connected to the Fog devices which are located in close proximity to the users and it is also responsible for intermediate computation and storage. Most computations will be done on the edge by eliminating full dependencies on the cloud resources. In this chapter, we investigate and survey Fog computing architectures which have been proposed over the past few years. Moreover, we study the requirements of IoT applications and platforms, and the limitations faced by cloud systems when executing IoT applications. Finally, we review current research works that particularly focus on Big Data application execution on Fog and address several open challenges as well as future research directions.
Personalized medicine aims at identifying best treatments for a patient with given characteristics. It has been shown in the literature that these methods can lead to great improvements in medicine compared to traditional methods prescribing the same treatment to all patients. Subgroup identification is a branch of personalized medicine which aims at finding subgroups of the patients with similar characteristics for which some of the investigated treatments have a better effect than the other treatments. A number of approaches based on decision trees has been proposed to identify such subgroups, but most of them focus on the two-arm trials (control/treatment) while a few methods consider quantitative treatments (defined by the dose). However, no subgroup identification method exists that can predict the best treatments in a scenario with a categorical set of treatments. We propose a novel method for subgroup identification in categorical treatment scenarios. This method outputs a decision tree showing the probabilities of a given treatment being the best for a given group of patients as well as labels showing the possible best treatments. The method is implemented in an R package \textbf{psica} available at CRAN. In addition to numerical simulations based on artificial data, we present an analysis of a community-based nutrition intervention trial that justifies the validity of our method.
There are multiple real-world problems in which training data is unavailable, and still, the ambition is to learn values of the system parameters, at which test data on an observable is realised, subsequent to the learning of the functional relationship between these variables. We present a novel Bayesian method to deal with such a problem, in which we learn a system function of a stationary dynamical system, for which only test data on a vector-valued observable is available, and training data is unavailable. This exercise borrows heavily from the state space probability density function ($pdf$), that we also learn. As there is no training data available for either sought function, we cannot learn its correlation structure, and instead, perform inference (using Metropolis-within-Gibbs), on the discretised form of the sought system function and of the ${pdf}$, where this $pdf$ is constructed such that the unknown system parameters are embedded within its support. Likelihood of the unknowns given the available data, is defined in terms of such a ${pdf}$. We make an application to the learning of the density of all gravitational matter in a real galaxy.
In the Dictionary-based String Matching (DSM) problem, a retrieval system has access to a source sequence and stores the position of a certain number of strings in a posting table. When a user inquires the position of a string, the retrieval system, instead of searching in the source sequence directly, relies on the the posting table to answer the query more efficiently. In this paper, the Statistical DSM problem is a proposed as a statistical and information-theoretic formulation of the classic DSM problem in which both the source and the query have a statistical description while the strings stored in the posting sequence are described as a code. Through this formulation, we are able to define the efficiency of the retrieval system as the average cost in answering a users’ query in the limit of sufficiently long source sequence. This formulation is used to study the retrieval performance for the case in which (i) all the strings of a given length, referred to as k-grams , and (ii) prefix-free codes.
Word sense induction (WSI), or the task of automatically discovering multiple senses or meanings of a word, has three main challenges: domain adaptability, novel sense detection, and sense granularity flexibility. While current latent variable models are known to solve the first two challenges, they are not flexible to different word sense granularities, which differ very much among words, from aardvark with one sense, to play with over 50 senses. Current models either require hyperparameter tuning or nonparametric induction of the number of senses, which we find both to be ineffective. Thus, we aim to eliminate these requirements and solve the sense granularity problem by proposing AutoSense, a latent variable model based on two observations: (1) senses are represented as a distribution over topics, and (2) senses generate pairings between the target word and its neighboring word. These observations alleviate the problem by (a) throwing garbage senses and (b) additionally inducing fine-grained word senses. Results show great improvements over the state-of-the-art models on popular WSI datasets. We also show that AutoSense is able to learn the appropriate sense granularity of a word. Finally, we apply AutoSense to the unsupervised author name disambiguation task where the sense granularity problem is more evident and show that AutoSense is evidently better than competing models. We share our data and code here: https://…/AutoSense.
The process of preparing potentially large and complex data sets for further analysis or manual examination is often called data wrangling. In classical warehousing environments, the steps in such a process have been carried out using Extract-Transform-Load platforms, with significant manual involvement in specifying, configuring or tuning many of them. Cost-effective data wrangling processes need to ensure that data wrangling steps benefit from automation wherever possible. In this paper, we define a methodology to fully automate an end-to-end data wrangling process incorporating data context, which associates portions of a target schema with potentially spurious extensional data of types that are commonly available. Instance-based evidence together with data profiling paves the way to inform automation in several steps within the wrangling process, specifically, matching, mapping validation, value format transformation, and data repair. The approach is evaluated with real estate data showing substantial improvements in the results of automated wrangling.
Deep learning models for survival analysis have gained significant attention in the literature, but they suffer from severe performance deficits when the dataset contains many irrelevant features. We give empirical evidence for this problem in real-world medical settings using the state-of-the-art model DeepHit. Furthermore, we develop methods to improve the deep learning model through novel approaches to feature selection in survival analysis. We propose filter methods for \textit{hard} feature selection and a neural network architecture that weights features for \textit{soft} feature selection. Our experiments on two real-world medical datasets demonstrate that substantial performance improvements against the original models are achievable.
Pruning methods have shown to be effective at reducing the size of deep neural networks while keeping accuracy almost intact. Among the most effective methods are those that prune a network while training it with a sparsity prior loss and learnable dropout parameters. A shortcoming of these approaches however is that neither the size nor the inference speed of the pruned network can be controlled directly; yet this is a key feature for targeting deployment of CNNs on low-power hardware. To overcome this, we introduce a budgeted regularized pruning framework for deep convolutional neural networks. Our approach naturally fits into traditional neural network training as it consists of a learnable masking layer, a novel budget-aware objective function, and the use of knowledge distillation. We also provide insights on how to prune a residual network and how this can lead to new architectures. Experimental results reveal that CNNs pruned with our method are more accurate and less compute-hungry than state-of-the-art methods. Also, our approach is more effective at preventing accuracy collapse in case of severe pruning; this allows us to attain pruning factors up to 16x without significantly affecting the accuracy.
An accurate load forecast is always important for the power industry and energy players as it enables stakeholders to make critical decisions. In addition, its importance is further increased with growing uncertainties in the generation sector due to the high penetration of renewable energy and the introduction of demand side management strategies. An incremental improvement in grid-level demand forecast of anomalous days can potentially save millions of dollars. However, due to an increasing penetration of renewable energy resources and their dependency on several meteorological and exogenous variables, accurate load forecasting of anomalous days has now become very challenging. To improve the prediction accuracy of the load forecasting, an ensemble forecast framework (ENFF) is proposed with a systematic combination of three multiple predictors, namely Elman neural network (ELM), feedforward neural network (FNN) and radial basis function (RBF) neural network. These predictors are trained using global particle swarm optimization (GPSO) to improve their prediction capability in the ENFF. The outputs of individual predictors are combined using a trim aggregation technique by removing forecasting anomalies. Real recorded data of New England ISO grid is used for training and testing of the ENFF for anomalous days. The forecast results of the proposed ENFF indicate a significant improvement in prediction accuracy in comparison to autoregressive integrated moving average (ARIMA) and back-propagation neural networks (BPNN) based benchmark models.
The overturning of the Internet Privacy Rules by the Federal Communications Commissions (FCC) in late March 2017 allows Internet Service Providers (ISPs) to collect, share and sell their customers’ Web browsing data without their consent. With third-party trackers embedded on Web pages, this new rule has put user privacy under more risk. The need arises for users on their own to protect their Web browsing history from any potential adversaries. Although some available solutions such as Tor, VPN, and HTTPS can help users conceal their online activities, their use can also significantly hamper personalized online services, i.e., degraded utility. In this paper, we design an effective Web browsing history anonymization scheme, PBooster, aiming to protect users’ privacy while retaining the utility of their Web browsing history. The proposed model pollutes users’ Web browsing history by automatically inferring how many and what links should be added to the history while addressing the utility-privacy trade-off challenge. We conduct experiments to validate the quality of the manipulated Web browsing history and examine the robustness of the proposed approach for user privacy protection.
This paper presents Dokei, an effective supervised domain adaptation method to transform a pre-trained CNN model to one involving efficient grouped convolution. The basis of this approach is formalised as a novel optimisation problem constrained by group sparsity pattern (GSP), and a practical solution based on structured regularisation and maximal bipartite matching is provided. We show that it is vital to keep the connections specified by GSP when mapping pre-trained weights to grouped convolution. We evaluate Dokei on various domains and hardware platforms to demonstrate its effectiveness. The models resulting from Dokei are shown to be more accurate and slimmer than prior work targeting grouped convolution, and more regular and easier to deploy than other pruning techniques.
The training method of repetitively feeding all samples into a pre-defined network for image classification has been widely adopted by current state-of-the-art. In this work, we provide a new method, which can be leveraged to train classification networks in a more efficient way. Starting with a warm-up step, we propose to continually repeat a Drop-and-Pick (DaP) learning strategy. In particular, we drop those easy samples to encourage the network to focus on studying hard ones. Meanwhile, by picking up all samples periodically during training, we aim to recall the memory of the networks to prevent catastrophic forgetting of previously learned knowledge. Our DaP learning method can recover 99.88%, 99.60%, 99.83% top-1 accuracy on ImageNet for ResNet-50, DenseNet-121, and MobileNet-V1 but only requires 75% computation in training compared to those using the classic training schedule. Furthermore, our pre-trained models are equipped with strong knowledge transferability when used for downstream tasks, especially for hard cases. Extensive experiments on object detection, instance segmentation and pose estimation can well demonstrate the effectiveness of our DaP training method.
Like all sub-fields of machine learning, Bayesian Deep Learning is driven by empirical validation of its theoretical proposals. Given the many aspects of an experiment, it is always possible that minor or even major experimental flaws can slip by both authors and reviewers. One of the most popular experiments used to evaluate approximate inference techniques is the regression experiment on UCI datasets. However, in this experiment, models which have been trained to convergence have often been compared with baselines trained only for a fixed number of iterations. What we find is that if we take a well-established baseline and evaluate it under the same experimental settings, it shows significant improvements in performance. In fact, it outperforms or performs competitively with numerous to several methods that when they were introduced claimed to be superior to the very same baseline method. Hence, by exposing this flaw in experimental procedure, we highlight the importance of using identical experimental setups to evaluate, compare and benchmark methods in Bayesian Deep Learning.
Text classification is one of the fundamental tasks in natural language processing. Recently, deep neural networks have achieved promising performance in the text classification task compared to shallow models. Despite of the significance of deep models, they ignore the fine-grained (matching signals between words and classes) classification clues since their classifications mainly rely on the text-level representations. To address this problem, we introduce the interaction mechanism to incorporate word-level matching signals into the text classification task. In particular, we design a novel framework, EXplicit interAction Model (dubbed as EXAM), equipped with the interaction mechanism. We justified the proposed approach on several benchmark datasets including both multi-label and multi-class text classification tasks. Extensive experimental results demonstrate the superiority of the proposed method. As a byproduct, we have released the codes and parameter settings to facilitate other researches.
The performance of modern machine learning methods highly depends on their hyperparameter configurations. One simple way of selecting a configuration is to use default settings, often proposed along with the publication and implementation of a new algorithm. Those default values are usually chosen in an ad-hoc manner to work good enough on a wide variety of datasets. To address this problem, different automatic hyperparameter configuration algorithms have been proposed, which select an optimal configuration per dataset. This principled approach usually improves performance, but adds additional algorithmic complexity and computational costs to the training procedure. As an alternative to this, we propose learning a set of complementary default values from a large database of prior empirical results. Selecting an appropriate configuration on a new dataset then requires only a simple, efficient and embarrassingly parallel search over this set. We demonstrate the effectiveness and efficiency of the approach we propose in comparison to random search and Bayesian Optimization.
Designing neural architectures is a fundamental step in deep learning applications. As a partner technique, model compression on neural networks has been widely investigated to gear the needs that the deep learning algorithms could be run with the limited computation resources on mobile devices. Currently, both the tasks of architecture design and model compression require expertise tricks and tedious trials. In this paper, we integrate these two tasks into one unified framework, which enables the joint architecture search with quantization (compression) policies for neural networks. This method is named JASQ. Here our goal is to automatically find a compact neural network model with high performance that is suitable for mobile devices. Technically, a multi-objective evolutionary search algorithm is introduced to search the models under the balance between model size and performance accuracy. In experiments, we find that our approach outperforms the methods that search only for architectures or only for quantization policies. 1) Specifically, given existing networks, our approach can provide them with learning-based quantization policies, and outperforms their 2 bits, 4 bits, 8 bits, and 16 bits counterparts. It can yield higher accuracies than the float models, for example, over 1.02% higher accuracy on MobileNet-v1. 2) What is more, under the balance between model size and performance accuracy, two models are obtained with joint search of architectures and quantization policies: a high-accuracy model and a small model, JASQNet and JASQNet-Small that achieves 2.97% error rate with 0.9 MB on CIFAR-10.
In many practical machine-learning applications, it is critical to allow knowledge to be transferred from external domains while preserving user privacy. Unfortunately, existing transfer-learning works do not have a privacy guarantee. In this paper, for the first time, we propose a method that can simultaneously transfer knowledge from external datasets while offering an $\epsilon$-differential privacy guarantee. First, we show that a simple combination of the hypothesis transfer learning and the privacy preserving logistic regression can address the problem. However, the performance of this approach can be poor as the sample size in the target domain may be small. To address this problem, we propose a new method which splits the feature set in source and target data into several subsets, and trains models on these subsets before finally aggregating the predictions by a stacked generalization. Feature importance can also be incorporated into the proposed method to further improve performance. We prove that the proposed method has an $\epsilon$-differential privacy guarantee, and further analysis shows that its performance is better than above simple combination given the same privacy budget. Finally, experiments on MINST and real-world RUIJIN datasets show that our proposed method achieves the start-of-the-art performance.
Despite the progress throughout the last decades, weather forecasting is still a challenging and computationally expensive task. Most models which are currently operated by meteorological services around the world rely on numerical weather prediction, a system based on mathematical algorithms describing physical effects. Recent progress in artificial intelligence however demonstrates that machine learning can be successfully applied to many research fields, especially areas dealing with big data that can be used for training. Current approaches to predict thunderstorms often focus on indices describing temperature differences in the atmosphere. If these indices reach a critical threshold, the forecast system emits a thunderstorm warning. Other meteorological systems such as radar and lightning detection systems are added for a more precise prediction. This paper describes a new approach to the prediction of lightnings based on machine learning rather than complex numerical computations. The error of optical flow algorithms applied to images of meteorological satellites is interpreted as a sign for convection potentially leading to thunderstorms. These results are used as the base for the feature generation phase incorporating different convolution steps. Tree classifier models are then trained to predict lightnings within the next few hours (called nowcasting) based on these features. The evaluation section compares the predictive power of the different models and the impact of different features on the classification result.
Shearer’s inequality bounds the sum of joint entropies of random variables in terms of the total joint entropy. We give another lower bound for the same sum in terms of the individual entropies when the variables are functions of independent random seeds. The inequality involves a constant characterizing the expansion properties of the system. Our results generalize to entropy inequalities used in recent work in invariant settings, including the edge-vertex inequality for factor-of-IID processes, Bowen’s entropy inequalities, and Bollob\’as’s entropy bounds in random regular graphs. The proof method yields inequalities for other measures of randomness, including covariance. As an application, we give upper bounds for independent sets in both finite and infinite graphs.
We propose the general way of study the universal estimator for the regression problem in learning theory considered in ‘Universal algorithms for learning theory Part I: piecewise constant functions’ and ‘Universal algorithms for learning theory Part II: piecewise constant functions’ written by Binev, P., Cohen, A., Dahmen, W., DeVore, R., Temlyakov, V. This new approch allows us to improve results.