If you did not already know

L1-Norm Kernel PCA google
We present the first model and algorithm for L1-norm kernel PCA. While L2-norm kernel PCA has been widely studied, there has been no work on L1-norm kernel PCA. For this non-convex and non-smooth problem, we offer geometric understandings through reformulations and present an efficient algorithm where the kernel trick is applicable. To attest the efficiency of the algorithm, we provide a convergence analysis including linear rate of convergence. Moreover, we prove that the output of our algorithm is a local optimal solution to the L1-norm kernel PCA problem. We also numerically show its robustness when extracting principal components in the presence of influential outliers, as well as its runtime comparability to L2-norm kernel PCA. Lastly, we introduce its application to outlier detection and show that the L1-norm kernel PCA based model outperforms especially for high dimensional data. …

swCaffe google
This paper reports our efforts on swCaffe, a highly efficient parallel framework for accelerating deep neural networks (DNNs) training on Sunway TaihuLight, the current fastest supercomputer in the world that adopts a unique many-core heterogeneous architecture, with 40,960 SW26010 processors connected through a customized communication network. First, we point out some insightful principles to fully exploit the performance of the innovative many-core architecture. Second, we propose a set of optimization strategies for redesigning a variety of neural network layers based on Caffe. Third, we put forward a topology-aware parameter synchronization scheme to scale the synchronous Stochastic Gradient Descent (SGD) method to multiple processors efficiently. We evaluate our framework by training a variety of widely used neural networks with the ImageNet dataset. On a single node, swCaffe can achieve 23\%\~{}119\% overall performance compared with Caffe running on K40m GPU. As compared with the Caffe on CPU, swCaffe runs 3.04\~{}7.84x faster on all the networks. Finally, we present the scalability of swCaffe for the training of ResNet-50 and AlexNet on the scale of 1024 nodes. …

K-Indicators google
The K-means algorithm is arguably the most popular data clustering method, commonly applied to processed datasets in some ‘feature spaces’, as is in spectral clustering. Highly sensitive to initializations, however, K-means encounters a scalability bottleneck with respect to the number of clusters K as this number grows in big data applications. In this work, we promote a closely related model called K-indicators model and construct an efficient, semi-convex-relaxation algorithm that requires no randomized initializations. We present extensive empirical results to show advantages of the new algorithm when K is large. In particular, using the new algorithm to start the K-means algorithm, without any replication, can significantly outperform the standard K-means with a large number of currently state-of-the-art random replications. …

Outline Generation google
In this paper, we introduce and tackle the Outline Generation (OG) task, which aims to unveil the inherent content structure of a multi-paragraph document by identifying its potential sections and generating the corresponding section headings. Without loss of generality, the OG task can be viewed as a novel structured summarization task. To generate a sound outline, an ideal OG model should be able to capture three levels of coherence, namely the coherence between context paragraphs, that between a section and its heading, and that between context headings. The first one is the foundation for section identification, while the latter two are critical for consistent heading generation. In this work, we formulate the OG task as a hierarchical structured prediction problem, i.e., to first predict a sequence of section boundaries and then a sequence of section headings accordingly. We propose a novel hierarchical structured neural generation model, named HiStGen, for the task. Our model attempts to capture the three-level coherence via the following ways. First, we introduce a Markov paragraph dependency mechanism between context paragraphs for section identification. Second, we employ a section-aware attention mechanism to ensure the semantic coherence between a section and its heading. Finally, we leverage a Markov heading dependency mechanism and a review mechanism between context headings to improve the consistency and eliminate duplication between section headings. Besides, we build a novel WIKIOG dataset, a public collection which consists of over 1.75 million document-outline pairs for research on the OG task. Experimental results on our benchmark dataset demonstrate that our model can significantly outperform several state-of-the-art sequential generation models for the OG task. …

If you did not already know

L2-Nonexpansive Neural Network google
This paper proposes a class of well-conditioned neural networks in which a unit amount of change in the inputs causes at most a unit amount of change in the outputs or any of the internal layers. We develop the known methodology of controlling Lipschitz constants to realize its full potential in maximizing robustness: our linear and convolution layers subsume those in the previous Parseval networks as a special case and allow greater degrees of freedom; aggregation, pooling, splitting and other operators are adapted in new ways, and a new loss function is proposed, all for the purpose of improving robustness. With MNIST and CIFAR-10 classifiers, we demonstrate a number of advantages. Without needing any adversarial training, the proposed classifiers exceed the state of the art in robustness against white-box L2-bounded adversarial attacks. Their outputs are quantitatively more meaningful than ordinary networks and indicate levels of confidence. They are also free of exploding gradients, among other desirable properties. …

Jeffries-Matusita Distance google
Jeffries-Matusita Distance calculates the separability of a pair of probability distributions. This can be particularly meaningful for evaluating the results of Maximum Likelihood classifications. …

Area Attention google
Existing attention mechanisms, are mostly item-based in that a model is designed to attend to a single item in a collection of items (the memory). Intuitively, an area in the memory that may contain multiple items can be worth attending to as a whole. We propose area attention: a way to attend to an area of the memory, where each area contains a group of items that are either spatially adjacent when the memory has a 2-dimensional structure, such as images, or temporally adjacent for 1-dimensional memory, such as natural language sentences. Importantly, the size of an area, i.e., the number of items in an area, can vary depending on the learned coherence of the adjacent items. By giving the model the option to attend to an area of items, instead of only a single item, we hope attention mechanisms can better capture the nature of the task. Area attention can work along multi-head attention for attending to multiple areas in the memory. We evaluate area attention on two tasks: neural machine translation and image captioning, and improve upon strong (state-of-the-art) baselines in both cases. These improvements are obtainable with a basic form of area attention that is parameter free. In addition to proposing the novel concept of area attention, we contribute an efficient way for computing it by leveraging the technique of summed area tables. …

Discriminator Rejection Sampling (DRS) google
We propose a rejection sampling scheme using the discriminator of a GAN to approximately correct errors in the GAN generator distribution. We show that under quite strict assumptions, this will allow us to recover the data distribution exactly. We then examine where those strict assumptions break down and design a practical algorithm – called Discriminator Rejection Sampling (DRS) – that can be used on real data-sets. Finally, we demonstrate the efficacy of DRS on a mixture of Gaussians and on the state of the art SAGAN model. On ImageNet, we train an improved baseline that increases the best published Inception Score from 52.52 to 62.36 and reduces the Frechet Inception Distance from 18.65 to 14.79. We then use DRS to further improve on this baseline, improving the Inception Score to 76.08 and the FID to 13.75. …

If you did not already know

ART google
Transfer learning aims to solve the data sparsity for a target domain by applying information of the source domain. Given a sequence (e.g. a natural language sentence), the transfer learning, usually enabled by recurrent neural network (RNN), represents the sequential information transfer. RNN uses a chain of repeating cells to model the sequence data. However, previous studies of neural network based transfer learning simply represents the whole sentence by a single vector, which is unfeasible for seq2seq and sequence labeling. Meanwhile, such layer-wise transfer learning mechanisms lose the fine-grained cell-level information from the source domain. In this paper, we proposed the aligned recurrent transfer, ART, to achieve cell-level information transfer. ART is under the pre-training framework. Each cell attentively accepts transferred information from a set of positions in the source domain. Therefore, ART learns the cross-domain word collocations in a more flexible way. We conducted extensive experiments on both sequence labeling tasks (POS tagging, NER) and sentence classification (sentiment analysis). ART outperforms the state-of-the-arts over all experiments. …

ReDMark google
Due to the rapid growth of machine learning tools and specifically deep networks in various computer vision and image processing areas, application of Convolutional Neural Networks for watermarking have recently emerged. In this paper, we propose a deep end-to-end diffusion watermarking framework (ReDMark) which can be adapted for any desired transform space. The framework is composed of two Fully Convolutional Neural Networks with the residual structure for embedding and extraction. The whole deep network is trained end-to-end to conduct a blind secure watermarking. The framework is customizable for the level of robustness vs. imperceptibility. It is also adjustable for the trade-off between capacity and robustness. The proposed framework simulates various attacks as a differentiable network layer to facilitate end-to-end training. For JPEG attack, a differentiable approximation is utilized, which drastically improves the watermarking robustness to this attack. Another important characteristic of the proposed framework, which leads to improved security and robustness, is its capability to diffuse watermark information among a relatively wide area of the image. Comparative results versus recent state-of-the-art researches highlight the superiority of the proposed framework in terms of imperceptibility and robustness. …

Optical Neural Network (ONN) google
We develop a novel optical neural network (ONN) framework which introduces a degree of scalar invariance to image classification estimation. Taking a hint from the human eye, which has higher resolution near the center of the retina, images are broken out into multiple levels of varying zoom based on a focal point. Each level is passed through an identical convolutional neural network (CNN) in a Siamese fashion, and the results are recombined to produce a high accuracy estimate of the object class. ONNs act as a wrapper around existing CNNs, and can thus be applied to many existing algorithms to produce notable accuracy improvements without having to change the underlying architecture. …

Google AI Platform google
AI Platform makes it easy for machine learning developers, data scientists, and dataengineers to take their ML projects from ideation to production and deployment, quicklyand cost-effectively. From data engineering to ‘no lock-in’ flexibility,AI Platform’s integrated tool chain helps you build and run your own machinelearning applications. AI Platform supports Kubeflow, Google’s open-source platform, which lets you buildportable ML pipelines that you can run on-premises or on Google Cloud withoutsignificant code changes. And you’ll have access to cutting-edge Google AItechnology like TensorFlow, TPUs, and TFX tools as you deploy your AI applications toproduction. …

If you did not already know

Ridge Regularized Linear Models (RRLM) google
Ridge regularized linear models (RRLMs), such as ridge regression and the SVM, are a popular group of methods that are used in conjunction with coefficient hypothesis testing to discover explanatory variables with a significant multivariate association to a response. …

Data Shared Adaptive Bootstrap Aggregated (AdaBag) google
In this article we propose a new supervised ensemble learning method called Data Shared Adaptive Bootstrap Aggregated (AdaBag) Lasso for capturing low dimensional useful features for word based sentiment analysis and mining problems. The literature on ensemble methods is very rich in both statistics and machine learning. The algorithm is a substantial upgrade of the Data Shared Lasso uplift algorithm. The most significant conceptual addition to the existing literature lies in the final selection of bag of predictors through a special bootstrap aggregation scheme. We apply the algorithm to one simulated data and perform dimension reduction in grouped IMDb data (drama, comedy and horror) to extract reduced set of word features for predicting sentiment ratings of movie reviews demonstrating different aspects. We also compare the performance of the present method with the classical Principal Components with associated Linear Discrimination (PCA-LD) as baseline. There are few limitations in the algorithm. Firstly, the algorithm workflow does not incorporate online sequential data acquisition and it does not use sentence based models which are common in ANN algorithms . Our results produce slightly higher error rate compare to the reported state-of-the-art as a consequence. …

Target Set Selection in a Social Network (TSS Problem) google
Given a social network with diffusion probabilities as edge weights and an integer k, which k nodes should be chosen for initial injection of information to maximize influence in the network? This problem is known as Target Set Selection in a social network (TSS Problem) and more popularly, Social Influence Maximization Problem (SIM Problem). …

Kernel Wasserstein Distance google
The Wasserstein distance is a powerful metric based on the theory of optimal transport. It gives a natural measure of the distance between two distributions with a wide range of applications. In contrast to a number of the common divergences on distributions such as Kullback-Leibler or Jensen-Shannon, it is (weakly) continuous, and thus ideal for analyzing corrupted data. To date, however, no kernel methods for dealing with nonlinear data have been proposed via the Wasserstein distance. In this work, we develop a novel method to compute the L2-Wasserstein distance in a kernel space implemented using the kernel trick. The latter is a general method in machine learning employed to handle data in a nonlinear manner. We evaluate the proposed approach in identifying computerized tomography (CT) slices with dental artifacts in head and neck cancer, performing unsupervised hierarchical clustering on the resulting Wasserstein distance matrix that is computed on imaging texture features extracted from each CT slice. Our experiments show that the kernel approach outperforms classical non-kernel approaches in identifying CT slices with artifacts. …

If you did not already know

Hyperbolic Bayesian Personalized Ranking (HyperBPR) google
Many well-established recommender systems are based on representation learning in Euclidean space. In these models, matching functions such as the Euclidean distance or inner product are typically used for computing similarity scores between user and item embeddings. This paper investigates the notion of learning user and item representations in Hyperbolic space. In this paper, we argue that Hyperbolic space is more suitable for learning user-item embeddings in the recommendation domain. Unlike Euclidean spaces, Hyperbolic spaces are intrinsically equipped to handle hierarchical structure, encouraged by its property of exponentially increasing distances away from origin. We propose HyperBPR (Hyperbolic Bayesian Personalized Ranking), a conceptually simple but highly effective model for the task at hand. Our proposed HyperBPR not only outperforms their Euclidean counterparts, but also achieves state-of-the-art performance on multiple benchmark datasets, demonstrating the effectiveness of personalized recommendation in Hyperbolic space. …

SAMSA google
Current measures for evaluating text simplification systems focus on evaluating lexical text aspects, neglecting its structural aspects. In this paper we propose the first measure to address structural aspects of text simplification, called SAMSA. It leverages recent advances in semantic parsing to assess simplification quality by decomposing the input based on its semantic structure and comparing it to the output. SAMSA provides a reference-less automatic evaluation procedure, avoiding the problems that reference-based methods face due to the vast space of valid simplifications for a given sentence. Our human evaluation experiments show both SAMSA’s substantial correlation with human judgments, as well as the deficiency of existing reference-based measures in evaluating structural simplification. …

VERIFAI google
We present VERIFAI, a software toolkit for the formal design and analysis of systems that include artificial intelligence (AI) and machine learning (ML) components. VERIFAI particularly seeks to address challenges with applying formal methods to perception and ML components, including those based on neural networks, and to model and analyze system behavior in the presence of environment uncertainty. We describe the initial version of VERIFAI which centers on simulation guided by formal models and specifications. Several use cases are illustrated with examples, including temporal-logic falsification, model-based systematic fuzz testing, parameter synthesis, counterexample analysis, and data set augmentation. …

Linguistic Descriptions of Complex Phenomena (LDCP) google
Linguistic Descriptions of Complex Phenomena (LDCP) is an architecture and methodology that allows us to model complex phenomena, interpreting input data, and generating automatic text reports customized to the user needs (see <doi:10.1016/j.ins.2016.11.002> and <doi:10.1007/s00500-016-2430-5> ). …

If you did not already know

Beta Regression google
How should we one perform a regression analysis in which the dependent variable is restricted to the standard unit interval such as rates and proportions? Ferrari and Cribari-Neto, 2004 proposed a regression model for continuous variates that assume values in the standard unit interval, e.g., rates, proportions, or concentrations indices. The model is based on the assumption that the response is beta-distributed, they called their model the beta regression model. The regression parameters are interpretable in terms of the mean of y (the variable of interest) and the model is naturally heteroskedastic and easily accommodates asymmetries. A variant of the beta regression model that allows for nonlinearities and variable dispersion was proposed by Simas et al., 2010.
zoib: An R package for Bayesian Inference for Beta Regression and Zero/One Inflated Beta Regression
A Short Course in Beta Regression Models


Best Subsets google
Best Subsets Regression is a method used to help determine which predictor (independent) variables should be included in a multiple regression model. This method involves examining all of the models created from all possible combination of predictor variables. Best Subsets Regression uses R2 to check for the best model. It would not be fun or fast to compute this method without the use of a statistical software program. First, all models that have only one predictor variable included are checked and the two models with the highest R2 are selected. Then all models that have only two predictor variables included are checked and the two models with the highest R2 are chosen, again. This process continues until all combinations of all predictors variables have been taken into account.
Fast Best Subset Selection: Coordinate Descent and Local Combinatorial Optimization Algorithms


AttriGuard google
Users in various web and mobile applications are vulnerable to attribute inference attacks, in which an attacker leverages a machine learning classifier to infer a target user’s private attributes (e.g., location, sexual orientation, political view) from its public data (e.g., rating scores, page likes). Existing defenses leverage game theory or heuristics based on correlations between the public data and attributes. These defenses are not practical. Specifically, game-theoretic defenses require solving intractable optimization problems, while correlation-based defenses incur large utility loss of users’ public data. In this paper, we present AttriGuard, a practical defense against attribute inference attacks. AttriGuard is computationally tractable and has small utility loss. Our AttriGuard works in two phases. Suppose we aim to protect a user’s private attribute. In Phase I, for each value of the attribute, we find a minimum noise such that if we add the noise to the user’s public data, then the attacker’s classifier is very likely to infer the attribute value for the user. We find the minimum noise via adapting existing evasion attacks in adversarial machine learning. In Phase II, we sample one attribute value according to a certain probability distribution and add the corresponding noise found in Phase I to the user’s public data. We formulate finding the probability distribution as solving a constrained convex optimization problem. We extensively evaluate AttriGuard and compare it with existing methods using a real-world dataset. Our results show that AttriGuard substantially outperforms existing methods. Our work is the first one that shows evasion attacks can be used as defensive techniques for privacy protection. …

Hierarchical Block Sparse Neural Network (HBsNN) google
Sparse deep neural networks(DNNs) are efficient in both memory and compute when compared to dense DNNs. But due to irregularity in computation of sparse DNNs, their efficiencies are much lower than that of dense DNNs on general purpose hardwares. This leads to poor/no performance benefits for sparse DNNs. Performance issue for sparse DNNs can be alleviated by bringing structure to the sparsity and leveraging it for improving runtime efficiency. But such structural constraints often lead to sparse models with suboptimal accuracies. In this work, we jointly address both accuracy and performance of sparse DNNs using our proposed class of neural networks called HBsNN ( Hierarchical Block Sparse Neural Networks). …

If you did not already know

Layered Tree-Based Pipeline Optimization Tool (Layered TPOT) google
With the demand for machine learning increasing, so does the demand for tools which make it easier to use. Automated machine learning (AutoML) tools have been developed to address this need, such as the Tree-Based Pipeline Optimization Tool (TPOT) which uses genetic programming to build optimal pipelines. We introduce Layered TPOT, a modification to TPOT which aims to create pipelines equally good as the original, but in significantly less time. This approach evaluates candidate pipelines on increasingly large subsets of the data according to their fitness, using a modified evolutionary algorithm to allow for separate competition between pipelines trained on different sample sizes. Empirical evaluation shows that, on sufficiently large datasets, Layered TPOT indeed finds better models faster.
“Tree-Based Pipeline Optimization Tool”


jLDADMM google
In this technical report, we present jLDADMM—an easy-to-use Java toolkit for conventional topic models. jLDADMM is released to provide alternatives for topic modeling on normal or short texts. It provides implementations of the Latent Dirichlet Allocation topic model and the one-topic-per-document Dirichlet Multinomial Mixture model (i.e. mixture of unigrams), using collapsed Gibbs sampling. In addition, jLDADMM supplies a document clustering evaluation to compare topic models. jLDADMM is open-source and available to download at: https://…/jLDADMM

Time Reversibility from Ordinal Patterns (TiROP) google
Time irreversibility is a common signature of nonlinear processes, and a fundamental property of non-equilibrium systems driven by non-conservative forces. A time series is said to be reversible if its statistical properties are invariant regardless of the direction of time. Here we propose the Time Reversibility from Ordinal Patterns method (TiROP) to assess time-reversibility from an observed finite time series. TiROP captures the information of scalar observations in time forward, as well as its time-reversed counterpart by means of ordinal patterns. The method compares both underlying information contents by quantifying its (dis)-similarity via Jensen-Shannon divergence. The statistic is contrasted with a population of divergences coming from a set of surrogates to unveil the temporal nature and its involved time scales. We tested TiROP in different synthetic and real, linear and non linear time series, juxtaposed with results from the classical Ramsey’s time reversibility test. Our results depict a novel, fast-computation, and fully data-driven methodology to assess time-reversibility at different time scales with no further assumptions over data. This approach adds new insights about the current non-linear analysis techniques, and also could shed light on determining new physiological biomarkers of high reliability and computational efficiency. …

Tensor Product Generation Network (TPGN) google
We present a new tensor product generation network (TPGN) that generates natural language descriptions for images. The model has a novel architecture that instantiates a general framework for encoding and processing symbolic structure through neural network computation. This framework is built on Tensor Product Representations (TPRs). We evaluated the proposed TPGN on the MS COCO image captioning task. The experimental results show that the TPGN outperforms the LSTM based state-of-the-art baseline with a significant margin. Further, we show that our caption generation model can be interpreted as generating sequences of grammatical categories and retrieving words by their categories from a plan encoded as a distributed representation. …

If you did not already know

Zeno++ google
We propose Zeno++, a new robust asynchronous synchronous Stochastic Gradient Descent~(SGD) under a general Byzantine failure model with unbounded number of Byzantine workers. …

TensorFlow Ranking (TF-Ranking) google
TensorFlow Ranking is a library for Learning-to-Rank (LTR) techniques on the TensorFlow platform. It contains the following components:
• Commonly used loss functions including pointwise, pairwise, and listwise losses.
• Commonly used ranking metrics like Mean Reciprocal Rank (MRR) and Normalized Discounted Cumulative Gain (NDCG).
• Multi-item (also known as groupwise) scoring functions.
• LambdaLoss implementation for direct ranking metric optimization.
• Unbiased Learning-to-Rank from biased feedback data.
We envision that this library will provide a convenient open platform for hosting and advancing state-of-the-art ranking models based on deep learning techniques, and thus facilitate both academic research as well as industrial applications. …


Discovery Testing Concept Activation Vectors (DTCAV) google
Interpretability has become an important topic of research as more machine learning (ML) models are deployed and widely used to make important decisions. Due to it’s complexity, i For high-stakes domains such as medical, providing intuitive explanations that can be consumed by domain experts without ML expertise becomes crucial. To this demand, concept-based methods (e.g., TCAV) were introduced to provide explanations using user-chosen high-level concepts rather than individual input features. While these methods successfully leverage rich representations learned by the networks to reveal how human-defined concepts are related to the prediction, they require users to select concepts of their choice and collect labeled examples of those concepts. In this work, we introduce DTCAV (Discovery TCAV) a global concept-based interpretability method that can automatically discover concepts as image segments, along with each concept’s estimated importance for a deep neural network’s predictions. We validate that discovered concepts are as coherent to humans as hand-labeled concepts. We also show that the discovered concepts carry significant signal for prediction by analyzing a network’s performance with stitched/added/deleted concepts. DTCAV results revealed a number of undesirable correlations (e.g., a basketball player’s jersey was a more important concept for predicting the basketball class than the ball itself) and show the potential shallow reasoning of these networks. …

Weighted Entropy google
The concept of weighted entropy takes into account values of different outcomes, i.e., makes entropy context-dependent, through the weight function. …

If you did not already know

QuickSel google
Estimating the selectivity of a query is a key step in almost any cost-based query optimizer. Most of today’s databases rely on histograms or samples that are periodically refreshed by re-scanning the data as the underlying data changes. Since frequent scans are costly, these statistics are often stale and lead to poor selectivity estimates. As an alternative to scans, query-driven histograms have been proposed, which refine the histograms based on the actual selectivities of the observed queries. Unfortunately, these approaches are either too costly to use in practice—i.e., require an exponential number of buckets—or quickly lose their advantage as they observe more queries. For example, the state-of-the-art technique requires 318,936 buckets (and over 8 seconds of refinement overhead per query) after observing only 300 queries. In this paper, we propose a selectivity learning framework, called QuickSel, which falls into the query-driven paradigm but does not use histograms. Instead, it builds an internal model of the underlying data, which can be refined significantly faster (e.g., only 1.9 milliseconds for 300 queries). This fast refinement allows QuickSel to continuously learn from each query and yield increasingly more accurate selectivity estimates over time. Unlike query-driven histograms, QuickSel relies on a mixture model and a new optimization algorithm for training its model. Our extensive experiments on two real-world datasets confirm that, given the same target accuracy, QuickSel is on average 254.6x faster than state-of-the-art query-driven histograms, including ISOMER and STHoles. Further, given the same space budget, QuickSel is on average 57.3% and 91.1% more accurate than periodically-updated histograms and samples, respectively. …

Successive Over Relaxation (SOR) google
In a discounted reward Markov Decision Process (MDP) the objective is to find the optimal value function, i.e., the value function corresponding to an optimal policy. This problem reduces to solving a functional equation known as the Bellman equation and fixed point iteration scheme known as value iteration is utilized to obtain the solution. In [1], a successive over relaxation value iteration scheme is proposed to speed up the computation of the optimal value function. They propose a modified Bellman equation and prove the faster convergence to the optimal value function. However, in many practical applications, the model information is not known and we resort to Reinforcement Learning (RL) algorithms to obtain optimal policy and value function. One such popular algorithm is Q-Learning. In this paper, we propose Successive Over Relaxation (SOR) Q-Learning. We first derive a fixed point iteration for optimal Q-values based on [1] and utilize the stochastic approximation scheme to derive a learning algorithm to compute the optimal value function and an optimal policy. We then prove the convergence of the SOR Q-Learning to optimal Q-values. Finally, through numerical experiments, we show that SOR Q-Learning is faster compared to the Q-Learning algorithm. …

FML-kNN google
Efficient management and analysis of large volumes of data is a demanding task of increasing scientific and industrial importance, as the ubiquitous generation of information governs more and more aspects of human life. In this article, we introduce FML-kNN, a novel distributed processing framework for Big Data that performs probabilistic classification and regression, implemented in Apache Flink. The framework’s core is consisted of a k-nearest neighbor joins algorithm which, contrary to similar approaches, is executed in a single distributed session and is able to operate on very large volumes of data of variable granularity and dimensionality. We assess FML-kNN’s performance and scalability in a detailed experimental evaluation, in which it is compared to similar methods implemented in Apache Hadoop, Spark, and Flink distributed processing engines. The results indicate an overall superiority of our framework in all the performed comparisons. Further, we apply FML-kNN in two motivating uses cases for water demand management, against real-world domestic water consumption data. In particular, we focus on forecasting water consumption using 1-h smart meter data, and extracting consumer characteristics from water use data in the shower. We further discuss on the obtained results, demonstrating the framework’s potential in useful knowledge extraction. …

Split Batch Normalization (Split-BN) google
Recent work has shown that using unlabeled data in semi-supervised learning is not always beneficial and can even hurt generalization, especially when there is a class mismatch between the unlabeled and labeled examples. We investigate this phenomenon for image classification on the CIFAR-10 and the ImageNet datasets, and with many other forms of domain shifts applied (e.g. salt-and-pepper noise). Our main contribution is Split Batch Normalization (Split-BN), a technique to improve SSL when the additional unlabeled data comes from a shifted distribution. We achieve it by using separate batch normalization statistics for unlabeled examples. Due to its simplicity, we recommend it as a standard practice. Finally, we analyse how domain shift affects the SSL training process. In particular, we find that during training the statistics of hidden activations in late layers become markedly different between the unlabeled and the labeled examples. …

If you did not already know

AutoSpearman google
The interpretation of defect models heavily relies on software metrics that are used to construct them. However, such software metrics are often correlated to defect models. Prior work often uses feature selection techniques to remove correlated metrics in order to improve the performance of defect models. Yet, the interpretation of defect models may be misleading if feature selection techniques produce subsets of inconsistent and correlated metrics. In this paper, we investigate the consistency and correlation of the subsets of metrics that are produced by nine commonly-used feature selection techniques. Through a case study of 13 publicly-available defect datasets, we find that feature selection techniques produce inconsistent subsets of metrics and do not mitigate correlated metrics, suggesting that feature selection techniques should not be used and correlation analyses must be applied when the goal is model interpretation. Since correlation analyses often involve manual selection of metrics by a domain expert, we introduce AutoSpearman, an automated metric selection approach based on correlation analyses. Our evaluation indicates that AutoSpearman yields the highest consistency of subsets of metrics among training samples and mitigates correlated metrics, while impacting model performance by 1-2%pts. Thus, to automatically mitigate correlated metrics when interpreting defect models, we recommend future studies use AutoSpearman in lieu of commonly-used feature selection techniques. …

KloakDB google
A private data federation enables data owners to pool their information for querying without disclosing their secret tuples to one another. Here, a client queries the union of the records of all data owners. The data owners work together to answer the query using privacy-preserving algorithms that prevent them from learning unauthorized information about the inputs of their peers. Only the client, and a federation coordinator, learn the query’s output. KloakDB is a private data federation that uses trusted hardware to process SQL queries over the inputs of two or more parties. Currently private data federations compute their queries fully-obliviously, guaranteeing that no information is revealed about the sensitive inputs of a data owner to their peers by observing the query’s instruction traces and memory access patterns. Oblivious querying almost always exacts multiple orders of magnitude slowdown in query runtimes compared to plaintext execution, making it impractical for many applications. KloakDB offers a semi-oblivious computing framework, $k$-anonymous query processing. We make the query’s observable transcript $k$-anonymous because it is a popular standard for data release in many domains including medicine, educational research, and government data. KloakDB’s queries run such that each data owner may deduce information about no fewer than $k$ individuals in the data of their peers. In addition, stakeholders set $k$, creating a novel trade-off between privacy and performance. Our results show that KloakDB enjoys speedups of up to $117$X using k-anonymous query processing over full-oblivious evaluation. …

Unsupervised Recurrent Neural Network Grammars (URNNG) google
Recurrent neural network grammars (RNNG) are generative models of language which jointly model syntax and surface structure by incrementally generating a syntax tree and sentence in a top-down, left-to-right order. Supervised RNNGs achieve strong language modeling and parsing performance, but require an annotated corpus of parse trees. In this work, we experiment with unsupervised learning of RNNGs. Since directly marginalizing over the space of latent trees is intractable, we instead apply amortized variational inference. To maximize the evidence lower bound, we develop an inference network parameterized as a neural CRF constituency parser. On language modeling, unsupervised RNNGs perform as well their supervised counterparts on benchmarks in English and Chinese. On constituency grammar induction, they are competitive with recent neural language models that induce tree structures from words through attention mechanisms. …

Return Decomposition for Delayed Rewards (RUDDER) google
We propose a novel reinforcement learning approach for finite Markov decision processes (MDPs) with delayed rewards. In this work, biases of temporal difference (TD) estimates are proved to be corrected only exponentially slowly in the number of delay steps. Furthermore, variances of Monte Carlo (MC) estimates are proved to increase the variance of other estimates, the number of which can exponentially grow in the number of delay steps. We introduce RUDDER, a return decomposition method, which creates a new MDP with same optimal policies as the original MDP but with redistributed rewards that have largely reduced delays. If the return decomposition is optimal, then the new MDP does not have delayed rewards and TD estimates are unbiased. In this case, the rewards track Q-values so that the future expected reward is always zero. We experimentally confirm our theoretical results on bias and variance of TD and MC estimates. On artificial tasks with different lengths of reward delays, we show that RUDDER is exponentially faster than TD, MC, and MC Tree Search (MCTS). RUDDER outperforms rainbow, A3C, DDQN, Distributional DQN, Dueling DDQN, Noisy DQN, and Prioritized DDQN on the delayed reward Atari game Venture in only a fraction of the learning time. RUDDER considerably improves the state-of-the-art on the delayed reward Atari game Bowling in much less learning time. Source code is available at https://…/baselines-rudder, with demonstration videos at https://goo.gl/EQerZV.