• Home
  • About
  • Books
  • Courses
  • Documents
  • eBooks
  • Feeds
  • Images
  • Quotes
  • R Packages
  • What is …

AnalytiXon

~ Broaden your Horizon

Category Archives: What is …

If you did not already know

17 Wednesday Feb 2021

Posted by Michael Laux in What is ...

≈ Leave a comment

Layered Tree-Based Pipeline Optimization Tool (Layered TPOT) google
With the demand for machine learning increasing, so does the demand for tools which make it easier to use. Automated machine learning (AutoML) tools have been developed to address this need, such as the Tree-Based Pipeline Optimization Tool (TPOT) which uses genetic programming to build optimal pipelines. We introduce Layered TPOT, a modification to TPOT which aims to create pipelines equally good as the original, but in significantly less time. This approach evaluates candidate pipelines on increasingly large subsets of the data according to their fitness, using a modified evolutionary algorithm to allow for separate competition between pipelines trained on different sample sizes. Empirical evaluation shows that, on sufficiently large datasets, Layered TPOT indeed finds better models faster.
➘ “Tree-Based Pipeline Optimization Tool” …


jLDADMM google
In this technical report, we present jLDADMM—an easy-to-use Java toolkit for conventional topic models. jLDADMM is released to provide alternatives for topic modeling on normal or short texts. It provides implementations of the Latent Dirichlet Allocation topic model and the one-topic-per-document Dirichlet Multinomial Mixture model (i.e. mixture of unigrams), using collapsed Gibbs sampling. In addition, jLDADMM supplies a document clustering evaluation to compare topic models. jLDADMM is open-source and available to download at: https://…/jLDADMM …

Time Reversibility from Ordinal Patterns (TiROP) google
Time irreversibility is a common signature of nonlinear processes, and a fundamental property of non-equilibrium systems driven by non-conservative forces. A time series is said to be reversible if its statistical properties are invariant regardless of the direction of time. Here we propose the Time Reversibility from Ordinal Patterns method (TiROP) to assess time-reversibility from an observed finite time series. TiROP captures the information of scalar observations in time forward, as well as its time-reversed counterpart by means of ordinal patterns. The method compares both underlying information contents by quantifying its (dis)-similarity via Jensen-Shannon divergence. The statistic is contrasted with a population of divergences coming from a set of surrogates to unveil the temporal nature and its involved time scales. We tested TiROP in different synthetic and real, linear and non linear time series, juxtaposed with results from the classical Ramsey’s time reversibility test. Our results depict a novel, fast-computation, and fully data-driven methodology to assess time-reversibility at different time scales with no further assumptions over data. This approach adds new insights about the current non-linear analysis techniques, and also could shed light on determining new physiological biomarkers of high reliability and computational efficiency. …

Tensor Product Generation Network (TPGN) google
We present a new tensor product generation network (TPGN) that generates natural language descriptions for images. The model has a novel architecture that instantiates a general framework for encoding and processing symbolic structure through neural network computation. This framework is built on Tensor Product Representations (TPRs). We evaluated the proposed TPGN on the MS COCO image captioning task. The experimental results show that the TPGN outperforms the LSTM based state-of-the-art baseline with a significant margin. Further, we show that our caption generation model can be interpreted as generating sequences of grammatical categories and retrieving words by their categories from a plan encoded as a distributed representation. …

If you did not already know

16 Tuesday Feb 2021

Posted by Michael Laux in What is ...

≈ Leave a comment

Zeno++ google
We propose Zeno++, a new robust asynchronous synchronous Stochastic Gradient Descent~(SGD) under a general Byzantine failure model with unbounded number of Byzantine workers. …

TensorFlow Ranking (TF-Ranking) google
TensorFlow Ranking is a library for Learning-to-Rank (LTR) techniques on the TensorFlow platform. It contains the following components:
• Commonly used loss functions including pointwise, pairwise, and listwise losses.
• Commonly used ranking metrics like Mean Reciprocal Rank (MRR) and Normalized Discounted Cumulative Gain (NDCG).
• Multi-item (also known as groupwise) scoring functions.
• LambdaLoss implementation for direct ranking metric optimization.
• Unbiased Learning-to-Rank from biased feedback data.
We envision that this library will provide a convenient open platform for hosting and advancing state-of-the-art ranking models based on deep learning techniques, and thus facilitate both academic research as well as industrial applications. …


Discovery Testing Concept Activation Vectors (DTCAV) google
Interpretability has become an important topic of research as more machine learning (ML) models are deployed and widely used to make important decisions. Due to it’s complexity, i For high-stakes domains such as medical, providing intuitive explanations that can be consumed by domain experts without ML expertise becomes crucial. To this demand, concept-based methods (e.g., TCAV) were introduced to provide explanations using user-chosen high-level concepts rather than individual input features. While these methods successfully leverage rich representations learned by the networks to reveal how human-defined concepts are related to the prediction, they require users to select concepts of their choice and collect labeled examples of those concepts. In this work, we introduce DTCAV (Discovery TCAV) a global concept-based interpretability method that can automatically discover concepts as image segments, along with each concept’s estimated importance for a deep neural network’s predictions. We validate that discovered concepts are as coherent to humans as hand-labeled concepts. We also show that the discovered concepts carry significant signal for prediction by analyzing a network’s performance with stitched/added/deleted concepts. DTCAV results revealed a number of undesirable correlations (e.g., a basketball player’s jersey was a more important concept for predicting the basketball class than the ball itself) and show the potential shallow reasoning of these networks. …

Weighted Entropy google
The concept of weighted entropy takes into account values of different outcomes, i.e., makes entropy context-dependent, through the weight function. …

If you did not already know

16 Tuesday Feb 2021

Posted by Michael Laux in What is ...

≈ Leave a comment

QuickSel google
Estimating the selectivity of a query is a key step in almost any cost-based query optimizer. Most of today’s databases rely on histograms or samples that are periodically refreshed by re-scanning the data as the underlying data changes. Since frequent scans are costly, these statistics are often stale and lead to poor selectivity estimates. As an alternative to scans, query-driven histograms have been proposed, which refine the histograms based on the actual selectivities of the observed queries. Unfortunately, these approaches are either too costly to use in practice—i.e., require an exponential number of buckets—or quickly lose their advantage as they observe more queries. For example, the state-of-the-art technique requires 318,936 buckets (and over 8 seconds of refinement overhead per query) after observing only 300 queries. In this paper, we propose a selectivity learning framework, called QuickSel, which falls into the query-driven paradigm but does not use histograms. Instead, it builds an internal model of the underlying data, which can be refined significantly faster (e.g., only 1.9 milliseconds for 300 queries). This fast refinement allows QuickSel to continuously learn from each query and yield increasingly more accurate selectivity estimates over time. Unlike query-driven histograms, QuickSel relies on a mixture model and a new optimization algorithm for training its model. Our extensive experiments on two real-world datasets confirm that, given the same target accuracy, QuickSel is on average 254.6x faster than state-of-the-art query-driven histograms, including ISOMER and STHoles. Further, given the same space budget, QuickSel is on average 57.3% and 91.1% more accurate than periodically-updated histograms and samples, respectively. …

Successive Over Relaxation (SOR) google
In a discounted reward Markov Decision Process (MDP) the objective is to find the optimal value function, i.e., the value function corresponding to an optimal policy. This problem reduces to solving a functional equation known as the Bellman equation and fixed point iteration scheme known as value iteration is utilized to obtain the solution. In [1], a successive over relaxation value iteration scheme is proposed to speed up the computation of the optimal value function. They propose a modified Bellman equation and prove the faster convergence to the optimal value function. However, in many practical applications, the model information is not known and we resort to Reinforcement Learning (RL) algorithms to obtain optimal policy and value function. One such popular algorithm is Q-Learning. In this paper, we propose Successive Over Relaxation (SOR) Q-Learning. We first derive a fixed point iteration for optimal Q-values based on [1] and utilize the stochastic approximation scheme to derive a learning algorithm to compute the optimal value function and an optimal policy. We then prove the convergence of the SOR Q-Learning to optimal Q-values. Finally, through numerical experiments, we show that SOR Q-Learning is faster compared to the Q-Learning algorithm. …

FML-kNN google
Efficient management and analysis of large volumes of data is a demanding task of increasing scientific and industrial importance, as the ubiquitous generation of information governs more and more aspects of human life. In this article, we introduce FML-kNN, a novel distributed processing framework for Big Data that performs probabilistic classification and regression, implemented in Apache Flink. The framework’s core is consisted of a k-nearest neighbor joins algorithm which, contrary to similar approaches, is executed in a single distributed session and is able to operate on very large volumes of data of variable granularity and dimensionality. We assess FML-kNN’s performance and scalability in a detailed experimental evaluation, in which it is compared to similar methods implemented in Apache Hadoop, Spark, and Flink distributed processing engines. The results indicate an overall superiority of our framework in all the performed comparisons. Further, we apply FML-kNN in two motivating uses cases for water demand management, against real-world domestic water consumption data. In particular, we focus on forecasting water consumption using 1-h smart meter data, and extracting consumer characteristics from water use data in the shower. We further discuss on the obtained results, demonstrating the framework’s potential in useful knowledge extraction. …

Split Batch Normalization (Split-BN) google
Recent work has shown that using unlabeled data in semi-supervised learning is not always beneficial and can even hurt generalization, especially when there is a class mismatch between the unlabeled and labeled examples. We investigate this phenomenon for image classification on the CIFAR-10 and the ImageNet datasets, and with many other forms of domain shifts applied (e.g. salt-and-pepper noise). Our main contribution is Split Batch Normalization (Split-BN), a technique to improve SSL when the additional unlabeled data comes from a shifted distribution. We achieve it by using separate batch normalization statistics for unlabeled examples. Due to its simplicity, we recommend it as a standard practice. Finally, we analyse how domain shift affects the SSL training process. In particular, we find that during training the statistics of hidden activations in late layers become markedly different between the unlabeled and the labeled examples. …

If you did not already know

15 Monday Feb 2021

Posted by Michael Laux in What is ...

≈ Leave a comment

AutoSpearman google
The interpretation of defect models heavily relies on software metrics that are used to construct them. However, such software metrics are often correlated to defect models. Prior work often uses feature selection techniques to remove correlated metrics in order to improve the performance of defect models. Yet, the interpretation of defect models may be misleading if feature selection techniques produce subsets of inconsistent and correlated metrics. In this paper, we investigate the consistency and correlation of the subsets of metrics that are produced by nine commonly-used feature selection techniques. Through a case study of 13 publicly-available defect datasets, we find that feature selection techniques produce inconsistent subsets of metrics and do not mitigate correlated metrics, suggesting that feature selection techniques should not be used and correlation analyses must be applied when the goal is model interpretation. Since correlation analyses often involve manual selection of metrics by a domain expert, we introduce AutoSpearman, an automated metric selection approach based on correlation analyses. Our evaluation indicates that AutoSpearman yields the highest consistency of subsets of metrics among training samples and mitigates correlated metrics, while impacting model performance by 1-2%pts. Thus, to automatically mitigate correlated metrics when interpreting defect models, we recommend future studies use AutoSpearman in lieu of commonly-used feature selection techniques. …

KloakDB google
A private data federation enables data owners to pool their information for querying without disclosing their secret tuples to one another. Here, a client queries the union of the records of all data owners. The data owners work together to answer the query using privacy-preserving algorithms that prevent them from learning unauthorized information about the inputs of their peers. Only the client, and a federation coordinator, learn the query’s output. KloakDB is a private data federation that uses trusted hardware to process SQL queries over the inputs of two or more parties. Currently private data federations compute their queries fully-obliviously, guaranteeing that no information is revealed about the sensitive inputs of a data owner to their peers by observing the query’s instruction traces and memory access patterns. Oblivious querying almost always exacts multiple orders of magnitude slowdown in query runtimes compared to plaintext execution, making it impractical for many applications. KloakDB offers a semi-oblivious computing framework, $k$-anonymous query processing. We make the query’s observable transcript $k$-anonymous because it is a popular standard for data release in many domains including medicine, educational research, and government data. KloakDB’s queries run such that each data owner may deduce information about no fewer than $k$ individuals in the data of their peers. In addition, stakeholders set $k$, creating a novel trade-off between privacy and performance. Our results show that KloakDB enjoys speedups of up to $117$X using k-anonymous query processing over full-oblivious evaluation. …

Unsupervised Recurrent Neural Network Grammars (URNNG) google
Recurrent neural network grammars (RNNG) are generative models of language which jointly model syntax and surface structure by incrementally generating a syntax tree and sentence in a top-down, left-to-right order. Supervised RNNGs achieve strong language modeling and parsing performance, but require an annotated corpus of parse trees. In this work, we experiment with unsupervised learning of RNNGs. Since directly marginalizing over the space of latent trees is intractable, we instead apply amortized variational inference. To maximize the evidence lower bound, we develop an inference network parameterized as a neural CRF constituency parser. On language modeling, unsupervised RNNGs perform as well their supervised counterparts on benchmarks in English and Chinese. On constituency grammar induction, they are competitive with recent neural language models that induce tree structures from words through attention mechanisms. …

Return Decomposition for Delayed Rewards (RUDDER) google
We propose a novel reinforcement learning approach for finite Markov decision processes (MDPs) with delayed rewards. In this work, biases of temporal difference (TD) estimates are proved to be corrected only exponentially slowly in the number of delay steps. Furthermore, variances of Monte Carlo (MC) estimates are proved to increase the variance of other estimates, the number of which can exponentially grow in the number of delay steps. We introduce RUDDER, a return decomposition method, which creates a new MDP with same optimal policies as the original MDP but with redistributed rewards that have largely reduced delays. If the return decomposition is optimal, then the new MDP does not have delayed rewards and TD estimates are unbiased. In this case, the rewards track Q-values so that the future expected reward is always zero. We experimentally confirm our theoretical results on bias and variance of TD and MC estimates. On artificial tasks with different lengths of reward delays, we show that RUDDER is exponentially faster than TD, MC, and MC Tree Search (MCTS). RUDDER outperforms rainbow, A3C, DDQN, Distributional DQN, Dueling DDQN, Noisy DQN, and Prioritized DDQN on the delayed reward Atari game Venture in only a fraction of the learning time. RUDDER considerably improves the state-of-the-art on the delayed reward Atari game Bowling in much less learning time. Source code is available at https://…/baselines-rudder, with demonstration videos at https://goo.gl/EQerZV. …

If you did not already know

13 Saturday Feb 2021

Posted by Michael Laux in What is ...

≈ Leave a comment

People + AI Research (PAIR) google
The past few years have seen rapid advances in machine learning, with new technologies achieving dramatic improvements in technical performance. But we can go beyond optimizing objective functions. By building AI systems with users in mind from the ground up, we open up entire new areas of design and interaction. PAIR is devoted to advancing the research and design of people-centric AI systems. We’re interested in the full spectrum of human interaction with machine intelligence, from supporting engineers to understanding everyday experiences with AI. Our goal is to do fundamental research, invent new technology, and create frameworks for design in order to drive a human-centered approach to artificial intelligence. And we want to be as open as possible: we’re building open source tools that everyone can use, hosting public events, and supporting academics in advancing the state of the art. …

Differentiable Subset Sampling google
Many machine learning tasks require sampling a subset of items from a collection. Due to the non-differentiability of subset sampling, the procedure is usually not included in end-to-end deep learning models. We show that through a connection to weighted reservoir sampling, the Gumbel-max trick can be extended to produce exact subset samples, and that a recently proposed top-k relaxation can be used to differentiate through the subset sampling procedure. We test our method on end-to-end tasks requiring subset sampling, including a differentiable k-nearest neighbors task and an instance-wise feature selection task for model interpretability. …

Masked Convolutional Generative Flow (MaCow) google
Flow-based generative models, conceptually attractive due to tractability of both the exact log-likelihood computation and latent-variable inference, and efficiency of both training and sampling, has led to a number of impressive empirical successes and spawned many advanced variants and theoretical investigations. Despite their computational efficiency, the density estimation performance of flow-based generative models significantly falls behind those of state-of-the-art autoregressive models. In this work, we introduce masked convolutional generative flow (MaCow), a simple yet effective architecture of generative flow using masked convolution. By restricting the local connectivity in a small kernel, MaCow enjoys the properties of fast and stable training, and efficient sampling, while achieving significant improvements over Glow for density estimation on standard image benchmarks, considerably narrowing the gap to autoregressive models. …

FIRE-DES++ google
Despite being very effective in several classification tasks, Dynamic Ensemble Selection (DES) techniques can select classifiers that classify all samples in the region of competence as being from the same class. The Frienemy Indecision REgion DES (FIRE-DES) tackles this problem by pre-selecting classifiers that correctly classify at least one pair of samples from different classes in the region of competence of the test sample. However, FIRE-DES applies the pre-selection for the classification of a test sample if and only if its region of competence is composed of samples from different classes (indecision region), even though this criterion is not reliable for determining if a test sample is located close to the borders of classes (true indecision region) when the region of competence is obtained using classical nearest neighbors approach. Because of that, FIRE-DES mistakes noisy regions for true indecision regions, leading to the pre-selection of incompetent classifiers, and mistakes true indecision regions for safe regions, leaving samples in such regions without any pre-selection. To tackle these issues, we propose the FIRE-DES++, an enhanced FIRE-DES that removes noise and reduces the overlap of classes in the validation set; and defines the region of competence using an equal number of samples of each class, avoiding selecting a region of competence with samples of a single class. Experiments are conducted using FIRE-DES++ with 8 different dynamic selection techniques on 64 classification datasets. Experimental results show that FIRE-DES++ increases the classification performance of all DES techniques considered in this work, outperforming FIRE-DES with 7 out of the 8 DES techniques, and outperforming state-of-the-art DES frameworks. …

If you did not already know

12 Friday Feb 2021

Posted by Michael Laux in What is ...

≈ Leave a comment

LAHA google
Extreme multi-label text classification (XMTC) aims at tagging a document with most relevant labels from an extremely large-scale label set. It is a challenging problem especially for the tail labels because there are only few training documents to build classifier. This paper is motivated to better explore the semantic relationship between each document and extreme labels by taking advantage of both document content and label correlation. Our objective is to establish an explicit label-aware representation for each document with a hybrid attention deep neural network model (LAHA). LAHA consists of three parts. The first part adopts a multi-label self-attention mechanism to detect the contribution of each word to labels. The second part exploits the label structure and document content to determine the semantic connection between words and labels in a same latent space. An adaptive fusion strategy is designed in the third part to obtain the final label-aware document representation so that the essence of previous two parts can be sufficiently integrated. Extensive experiments have been conducted on six benchmark datasets by comparing with the state-of-the-art methods. The results show the superiority of our proposed LAHA method, especially on the tail labels. …

PUBSUB-SGX google
This paper presents PUBSUB-SGX, a content-based publish-subscribe system that exploits trusted execution environments (TEEs), such as Intel SGX, to guarantee confidentiality and integrity of data as well as anonymity and privacy of publishers and subscribers. We describe the technical details of our Python implementation, as well as the required system support introduced to deploy our system in a container-based runtime. Our evaluation results show that our approach is sound, while at the same time highlighting the performance and scalability trade-offs. In particular, by supporting just-in-time compilation inside of TEEs, Python programs inside of TEEs are in general faster than when executed natively using standard CPython. …

Deep Contextual Multi-armed Bandits google
Contextual multi-armed bandit problems arise frequently in important industrial applications. Existing solutions model the context either linearly, which enables uncertainty driven (principled) exploration, or non-linearly, by using epsilon-greedy exploration policies. Here we present a deep learning framework for contextual multi-armed bandits that is both non-linear and enables principled exploration at the same time. We tackle the exploration vs. exploitation trade-off through Thompson sampling by exploiting the connection between inference time dropout and sampling from the posterior over the weights of a Bayesian neural network. In order to adjust the level of exploration automatically as more data is made available to the model, the dropout rate is learned rather than considered a hyperparameter. We demonstrate that our approach substantially reduces regret on two tasks (the UCI Mushroom task and the Casino Parity task) when compared to 1) non-contextual bandits, 2) epsilon-greedy deep contextual bandits, and 3) fixed dropout rate deep contextual bandits. Our approach is currently being applied to marketing optimization problems at HubSpot. …

Task-Free Continual Learning google
Methods proposed in the literature towards continual deep learning typically operate in a task-based sequential learning setup. A sequence of tasks is learned, one at a time, with all data of current task available but not of previous or future tasks. Task boundaries and identities are known at all times. This setup, however, is rarely encountered in practical applications. Therefore we investigate how to transform continual learning to an online setup. We develop a system that keeps on learning over time in a streaming fashion, with data distributions gradually changing and without the notion of separate tasks. To this end, we build on the work on Memory Aware Synapses, and show how this method can be made online by providing a protocol to decide i) when to update the importance weights, ii) which data to use to update them, and iii) how to accumulate the importance weights at each update step. Experimental results show the validity of the approach in the context of two applications: (self-)supervised learning of a face recognition model by watching soap series and learning a robot to avoid collisions. …

If you did not already know

11 Thursday Feb 2021

Posted by Michael Laux in What is ...

≈ Leave a comment

Bach2Bach google
A model of music needs to have the ability to recall past details and have a clear, coherent understanding of musical structure. Detailed in the paper is a deep reinforcement learning architecture that predicts and generates polyphonic music aligned with musical rules. The probabilistic model presented is a Bi-axial LSTM trained with a pseudo-kernel reminiscent of a convolutional kernel. To encourage exploration and impose greater global coherence on the generated music, a deep reinforcement learning approach DQN is adopted. When analyzed quantitatively and qualitatively, this approach performs well in composing polyphonic music. …

Staircase Network google
Language recognition system is typically trained directly to optimize classification error on the target language labels, without using the external, or meta-information in the estimation of the model parameters. However labels are not independent of each other, there is a dependency enforced by, for example, the language family, which affects negatively on classification. The other external information sources (e.g. audio encoding, telephony or video speech) can also decrease classification accuracy. In this paper, we attempt to solve these issues by constructing a deep hierarchical neural network, where different levels of meta-information are encapsulated by attentive prediction units and also embedded into the training progress. The proposed method learns auxiliary tasks to obtain robust internal representation and to construct a variant of attentive units within the hierarchical model. The final result is the structural prediction of the target language and a closely related language family. The algorithm reflects a ‘staircase’ way of learning in both its architecture and training, advancing from the fundamental audio encoding to the language family level and finally to the target language level. This process not only improves generalization but also tackles the issues of imbalanced class priors and channel variability in the deep neural network model. Our experimental findings show that the proposed architecture outperforms the state-of-the-art i-vector approaches on both small and big language corpora by a significant margin. …

PHOENICS google
In this work we introduce PHOENICS, a probabilistic global optimization algorithm combining ideas from Bayesian optimization with concepts from Bayesian kernel density estimation. We propose an inexpensive acquisition function balancing the explorative and exploitative behavior of the algorithm. This acquisition function enables intuitive sampling strategies for an efficient parallel search of global minima. The performance of PHOENICS is assessed via an exhaustive benchmark study on a set of 15 discrete, quasi-discrete and continuous multidimensional functions. Unlike optimization methods based on Gaussian processes (GP) and random forests (RF), we show that PHOENICS is less sensitive to the nature of the co-domain, and outperforms GP and RF optimizations. We illustrate the performance of PHOENICS on the Oregonator, a difficult case-study describing a complex chemical reaction network. We demonstrate that only PHOENICS was able to reproduce qualitatively and quantitatively the target dynamic behavior of this nonlinear reaction dynamics. We recommend PHOENICS for rapid optimization of scalar, possibly non-convex, black-box unknown objective functions. …

T-PRISM google
We introduce a new logic programming language T-PRISM based on tensor embeddings. Our embedding scheme is a modification of the distribution semantics in PRISM, one of the state-of-the-art probabilistic logic programming languages, by replacing distribution functions with multidimensional arrays, i.e., tensors. T-PRISM consists of two parts: logic programming part and numerical computation part. The former provides flexible and interpretable modeling at the level of first order logic, and the latter part provides scalable computation utilizing parallelization and hardware acceleration with GPUs. Combing these two parts provides a remarkably wide range of high-level declarative modeling from symbolic reasoning to deep learning. To embody this programming language, we also introduce a new semantics, termed tensorized semantics, which combines the traditional least model semantics in logic programming with the embeddings of tensors. In T-PRISM, we first derive a set of equations related to tensors from a given program using logical inference, i.e., Prolog execution in a symbolic space and then solve the derived equations in a continuous space by TensorFlow. Using our preliminary implementation of T-PRISM, we have successfully dealt with a wide range of modeling. We have succeeded in dealing with real large-scale data in the declarative modeling. This paper presents a DistMult model for knowledge graphs using the FB15k and WN18 datasets. …

If you did not already know

10 Wednesday Feb 2021

Posted by Michael Laux in What is ...

≈ Leave a comment

Control Toolbox (CT) google
We introduce the Control Toolbox (CT), an open-source C++ library for efficient modelling, control, estimation, trajectory optimization and model predictive control. The CT is applicable to a broad class of dynamic systems, but features additional modelling tools specially designed for robotics. This paper outlines its general concept, its major building blocks and highlights selected application examples. The CT was designed for intuitive modelling of systems governed by ordinary differential- or difference equations. It supports rapid prototyping of cost functions and constraints and provides common interfaces for different optimal control solvers. To date, we support Single Shooting, the iterative Linear-Quadratic Regulator, Gauss-Newton Multiple Shooting and classical Direct Multiple Shooting. We provide interfaces to different NLP and linear-quadratic solvers, such as IPOPT, SNOPT, HPIPM, or a custom Riccati solver. The CT was designed with performance for online control in mind and allows to solve large-scale optimal control problems highly efficiently. Some of the key features enabling fast run-time performance are full support for Automatic Differentiation, derivative code generation and thorough multi-threading. For robotics problems, the we offer an interface to a fully auto-differentiable rigid-body dynamics modelling engine. In combination with derivative code generation, this allows for an unprecedented performance in solving optimal control problems for complex articulated robotic systems. …

TransATT google
Attribute acquisition for classes is a key step in ontology construction, which is often achieved by community members manually. This paper investigates an attention-based automatic paradigm called TransATT for attribute acquisition, by learning the representation of hierarchical classes and attributes in Chinese ontology. The attributes of an entity can be acquired by merely inspecting its classes, because the entity can be regard as the instance of its classes and inherit their attributes. For explicitly describing of the class of an entity unambiguously, we propose class-path to represent the hierarchical classes in ontology, instead of the terminal class word of the hypernym-hyponym relation (i.e., is-a relation) based hierarchy. The high performance of TransATT on attribute acquisition indicates the promising ability of the learned representation of class-paths and attributes. Moreover, we construct a dataset named \textbf{BigCilin11k}. To the best of our knowledge, this is the first Chinese dataset with abundant hierarchical classes and entities with attributes. …

Lindy Effect google
The Lindy effect is a theory of the life expectancy of non-perishable things that posits for a certain class of nonperishables, like a technology or an idea, every additional day may imply a longer (remaining) life expectancy: the mortality rate decreases with time. This contrasts with living creatures and mechanical things, which instead follow a bathtub curve, where every additional day in its life translates into a shorter additional life expectancy (though longer overall life expectancy, due to surviving this far): after childhood, the mortality rate increases with time. …

Facetize google
There is a plethora of datasets in various formats which are usually stored in files, hosted in catalogs, or accessed through SPARQL endpoints. In most cases, these datasets cannot be straightforwardly explored by end users, for satisfying recall-oriented information needs. To fill this gap, in this paper we present the design and implementation of Facetize, an editor that allows users to transform (in an interactive manner) datasets, either static (i.e. stored in files), or dynamic (i.e. being the results of SPARQL queries), to datasets that can be directly explored effectively by themselves or other users. The latter (exploration) is achieved through the familiar interaction paradigm of Faceted Search (and Preference-enriched Faceted Search). Specifically in this paper we describe the requirements, we introduce the required set of transformations, and then we detail the functionality and the implementation of the editor Facetize that realizes these transformations. The supported operations cover a wide range of tasks (selection, visibility, deletions, edits, definition of hierarchies, intervals, derived attributes, and others) and Facetize enables the user to carry them out in a user-friendly and guided manner, without presupposing any technical background (regarding data representation or query languages). Finally we present the results of an evaluation with users. To the best of your knowledge, this is the first editor for this kind of tasks. …

If you did not already know

09 Tuesday Feb 2021

Posted by Michael Laux in What is ...

≈ Leave a comment

Staleness google
Most distributed machine learning (ML) systems store a copy of the model parameters locally on each machine to minimize network communication. In practice, in order to reduce synchronization waiting time, these copies of the model are not necessarily updated in lock-steps, and can become stale. Despite much development in large-scale ML, the effect of staleness on the learning efficiency is inconclusive, mainly because it is challenging to control or monitor the staleness in complex distributed environments. In this work, we study the convergence behaviors of a wide array of ML models and algorithms under delayed updates. Our extensive experiments reveal the rich diversity of the effects of staleness on the convergence of ML algorithms, and offer insights into seemingly contradictory reports in the literature. The empirical findings also inspire a new convergence analysis of SGD in non-convex optimization under staleness, matching the best known convergence rate. …

Distributional Reinforcement Learning (Distributional RL) google
In this paper we argue for the fundamental importance of the value distribution: the distribution of the random return received by a reinforcement learning agent. This is in contrast to the common approach to reinforcement learning which models the expectation of this return, or value. Although there is an established body of literature studying the value distribution, thus far it has always been used for a specific purpose such as implementing risk-aware behaviour. We begin with theoretical results in both the policy evaluation and control settings, exposing a significant distributional instability in the latter. We then use the distributional perspective to design a new algorithm which applies Bellman’s equation to the learning of approximate value distributions. We evaluate our algorithm using the suite of games from the Arcade Learning Environment. We obtain both state-of-the-art results and anecdotal evidence demonstrating the importance of the value distribution in approximate reinforcement learning. Finally, we combine theoretical and empirical evidence to highlight the ways in which the value distribution impacts learning in the approximate setting.
A Comparative Analysis of Expected and Distributional Reinforcement Learning …


Positional Cartesian Genetic Programming (Positional CGP) google
Cartesian Genetic Programming (CGP) has many modifications across a variety of implementations, such as recursive connections and node weights. Alternative genetic operators have also been proposed for CGP, but have not been fully studied. In this work, we present a new form of genetic programming based on a floating point representation. In this new form of CGP, called Positional CGP, node positions are evolved. This allows for the evaluation of many different genetic operators while allowing for previous CGP improvements like recurrency. Using nine benchmark problems from three different classes, we evaluate the optimal parameters for CGP and PCGP, including novel genetic operators. …

ETNLP google
In this paper, we introduce a comprehensive toolkit, ETNLP, which can evaluate, extract, and visualize multiple sets of pre-trained word embeddings. First, for evaluation, ETNLP analyses the quality of pre-trained embeddings based on an input word analogy list. Second, for extraction ETNLP provides a subset of the embeddings to be used in the downstream NLP tasks. Finally, ETNLP has a visualization module which is for exploring the embedded words interactively. We demonstrate the effectiveness of ETNLP on our pre-trained word embeddings in Vietnamese. Specifically, we create a large Vietnamese word analogy list to evaluate the embeddings. We then utilize the pre-trained embeddings for the name entity recognition (NER) task in Vietnamese and achieve the new state-of-the-art results on a benchmark dataset for the NER task. A video demonstration of ETNLP is available at https://…/317599106. The source code and data are available at https: //github.com/vietnlp/etnlp. …

If you did not already know

07 Sunday Feb 2021

Posted by Michael Laux in What is ...

≈ Leave a comment

SocialGCN google
Collaborative Filtering (CF) is one of the most successful approaches for recommender systems. With the emergence of online social networks, social recommendation has become a popular research direction. Most of these social recommendation models utilized each user’s local neighbors’ preferences to alleviate the data sparsity issue in CF. However, they only considered the local neighbors of each user and neglected the process that users’ preferences are influenced as information diffuses in the social network. Recently, Graph Convolutional Networks~(GCN) have shown promising results by modeling the information diffusion process in graphs that leverage both graph structure and node feature information. To this end, in this paper, we propose an effective graph convolutional neural network based model for social recommendation. Based on a classical CF model, the key idea of our proposed model is that we borrow the strengths of GCNs to capture how users’ preferences are influenced by the social diffusion process in social networks. The diffusion of users’ preferences is built on a layer-wise diffusion manner, with the initial user embedding as a function of the current user’s features and a free base user latent vector that is not contained in the user feature. Similarly, each item’s latent vector is also a combination of the item’s free latent vector, as well as its feature representation. Furthermore, we show that our proposed model is flexible when user and item features are not available. Finally, extensive experimental results on two real-world datasets clearly show the effectiveness of our proposed model. …

SubGram google
Skip-gram (word2vec) is a recent method for creating vector representations of words (‘distributed word representations’) using a neural network. The representation gained popularity in various areas of natural language processing, because it seems to capture syntactic and semantic information about words without any explicit supervision in this respect. We propose SubGram, a refinement of the Skip-gram model to consider also the word structure during the training process, achieving large gains on the Skip-gram original test set. …

Incremental Cascade Regression (ICR) google
Traditional face alignment based on machine learning usually tracks the localizations of facial landmarks employing a static model trained offline where all of the training data is available in advance. When new training samples arrive, the static model must be retrained from scratch, which is excessively time-consuming and memory-consuming. In many real-time applications, the training data is obtained one by one or batch by batch. It results in that the static model limits its performance on sequential images with extensive variations. Therefore, the most critical and challenging aspect in this field is dynamically updating the tracker’s models to enhance predictive and generalization capabilities continuously. In order to address this question, we develop a fast and accurate online learning algorithm for face alignment. Particularly, we incorporate on-line sequential extreme learning machine into a parallel cascaded regression framework, coined incremental cascade regression (ICR). To the best of our knowledge, this is the first incremental cascaded framework with the non-linear regressor. One main advantage of ICR is that the tracker model can be fast updated in an incremental way without the entire retraining process when a new input is incoming. Experimental results demonstrate that the proposed ICR is more accurate and efficient on still or sequential images compared with the recent state-of-the-art cascade approaches. Furthermore, the incremental learning proposed in this paper can update the trained model in real time. …

Custodes google
We present Custodes: a new approach to solving the complex issue of preventing ‘p-hacking’ in scientific studies. The novel protocol provides a concrete and publicly auditable method for controlling false-discoveries and eliminates any potential for data dredging on the part of researchers during data-analysis phase. Custodes provides provable guarantees on the validity of each hypotheses test performed on a dataset by using cryptographic techniques to certify outcomes of statistical tests. Custodes achieves this using a decentralized authority and a tamper-proof ledger which enables the auditing of the hypothesis testing process. We present a construction of Custodes which we implement and evaluate using both real and synthetic datasets on common statistical tests, demonstrating the effectiveness and practicality of Custodes in the real world. …

← Older posts
Newer posts →

Blogs by Category

  • arXiv
  • arXiv Papers
  • Blogs
  • Books
  • Causality
  • Distilled News
  • Documents
  • Ethics
  • Magister Dixit
  • Personal Productivity
  • Python Packages
  • R Packages
  • Uncategorized
  • What is …
  • WordPress

Blogs by Month

Follow Blog via Email

Enter your email address to follow this blog and receive notifications of new posts by email.

Follow AnalytiXon

Powered by WordPress.com.