Affect in Tweets Using Experts Model

Estimating the intensity of emotion has gained significance as modern textual inputs in potential applications like social media, e-retail markets, psychology, advertisements etc., carry a lot of emotions, feelings, expressions along with its meaning. However, the approaches of traditional sentiment analysis primarily focuses on classifying the sentiment in general (positive or negative) or at an aspect level (very positive, low negative, etc.) and cannot exploit the intensity information. Moreover, automatically identifying emotions like anger, fear, joy, sadness, disgust etc., from text introduces challenging scenarios where single tweet may contain multiple emotions with different intensities and some emotions may even co-occur in some of the tweets. In this paper, we propose an architecture, Experts Model, inspired from the standard Mixture of Experts (MoE) model. The key idea here is each expert learns different sets of features from the feature vector which helps in better emotion detection from the tweet. We compared the results of our Experts Model with both baseline results and top five performers of SemEval-2018 Task-1, Affect in Tweets (AIT). The experimental results show that our proposed approach deals with the emotion detection problem and stands at top-5 results.

DSL: Discriminative Subgraph Learning via Sparse Self-Representation

The goal in network state prediction (NSP) is to classify the global state (label) associated with features embedded in a graph. This graph structure encoding feature relationships is the key distinctive aspect of NSP compared to classical supervised learning. NSP arises in various applications: gene expression samples embedded in a protein-protein interaction (PPI) network, temporal snapshots of infrastructure or sensor networks, and fMRI coherence network samples from multiple subjects to name a few. Instances from these domains are typically “wide” (more features than samples), and thus, feature sub-selection is required for robust and generalizable prediction. How to best employ the network structure in order to learn succinct connected subgraphs encompassing the most discriminative features becomes a central challenge in NSP. Prior work employs connected subgraph sampling or graph smoothing within optimization frameworks, resulting in either large variance of quality or weak control over the connectivity of selected subgraphs. In this work we propose an optimization framework for discriminative subgraph learning (DSL) which simultaneously enforces (i) sparsity, (ii) connectivity and (iii) high discriminative power of the resulting subgraphs of features. Our optimization algorithm is a single-step solution for the NSP and the associated feature selection problem. It is rooted in the rich literature on maximal-margin optimization, spectral graph methods and sparse subspace self-representation. DSL simultaneously ensures solution interpretability and superior predictive power (up to 16% improvement in challenging instances compared to baselines), with execution times up to an hour for large instances.

Question Embeddings Based on Shannon Entropy: Solving intent classification task in goal-oriented dialogue system

Question-answering systems and voice assistants are becoming major part of client service departments of many organizations, helping them to reduce the labor costs of staff. In many such systems, there is always natural language understanding module that solves intent classification task. This task is complicated because of its case-dependency – every subject area has its own semantic kernel. The state of art approaches for intent classification are different machine learning and deep learning methods that use text vector representations as input. The basic vector representation models such as Bag of words and TF-IDF generate sparse matrixes, which are becoming very big as the amount of input data grows. Modern methods such as word2vec and FastText use neural networks to evaluate word embeddings with fixed dimension size. As we are developing a question-answering system for students and enrollees of the Perm National Research Polytechnic University, we have faced the problem of user’s intent detection. The subject area of our system is very specific, that is why there is a lack of training data. This aspect makes intent classification task more challenging for using state of the art deep learning methods. In this paper, we propose an approach of the questions embeddings representation based on calculation of Shannon entropy.The goal of the approach is to produce low dimensional question vectors as neural approaches do and to outperform related methods, described above in condition of small dataset. We evaluate and compare our model with existing ones using logistic regression and dataset that contains questions asked by students and enrollees. The data is labeled into six classes. Experimental comparison of proposed approach and other models revealed that proposed model performed better in the given task.

Improved Dynamic Time Warping (DTW) Approach for Online Signature Verification

Online signature verification is the process of verifying time series signature data which is generally obtained from the tablet-based device. Unlike offline signature images, the online signature image data consists of points that are arranged in a sequence of time. The aim of this research is to develop an improved approach to map the strokes in both test and reference signatures. Current methods make use of the Dynamic Time Warping (DTW) algorithm and its variant to segment them before comparing each of its data dimension. This paper presents a modified DTW algorithm with the proposed Lost Box Recovery Algorithm aims to improve the mapping performance for online signature verification

A Convolutional Neural Network for Language-Agnostic Source Code Summarization

Descriptive comments play a crucial role in the software engineering process. They decrease development time, enable better bug detection, and facilitate the reuse of previously written code. However, comments are commonly the last of a software developer’s priorities and are thus either insufficient or missing entirely. Automatic source code summarization may therefore have the ability to significantly improve the software development process. We introduce a novel encoder-decoder model that summarizes source code, effectively writing a comment to describe the code’s functionality. We make two primary innovations beyond current source code summarization models. First, our encoder is fully language-agnostic and requires no complex input preprocessing. Second, our decoder has an open vocabulary, enabling it to predict any word, even ones not seen in training. We demonstrate results comparable to state-of-the-art methods on a single-language data set and provide the first results on a data set consisting of multiple programming languages.

Interpreting Black Box Models with Statistical Guarantees

While many methods for interpreting machine learning models have been proposed, they are frequently ad hoc, difficult to evaluate, and come with no statistical guarantees on the error rate. This is especially problematic in scientific domains, where interpretations must be accurate and reliable. In this paper, we cast black box model interpretation as a hypothesis testing problem. The task is to discover ‘important’ features by testing whether the model prediction is significantly different from what would be expected if the features were replaced with randomly-sampled counterfactuals. We derive a multiple hypothesis testing framework for finding important features that enables control over the false discovery rate. We propose two testing methods, as well as analogs of one-sided and two-sided tests. In simulation, the methods have high power and compare favorably against existing interpretability methods. When applied to vision and language models, the framework selects features that intuitively explain model predictions.

Query the model: precomputations for efficient inference with Bayesian Networks

We consider a setting where a Bayesian network has been built over a relational database to represent the joint distribution of data attributes and is used to perform approximate query processing or answer predictive queries. We explain how careful precomputation of certain probabilistic quantities can lead to large efficiency gains in response time and provide algorithms for optimal precomputation.

Keyphrase Generation: A Text Summarization Struggle

Authors’ keyphrases assigned to scientific articles are essential for recognizing content and topic aspects. Most of the proposed supervised and unsupervised methods for keyphrase generation are unable to produce terms that are valuable but do not appear in the text. In this paper, we explore the possibility of considering the keyphrase string as an abstractive summary of the title and the abstract. First, we collect, process and release a large dataset of scientific paper metadata that contains 2.2 million records. Then we experiment with popular text summarization neural architectures. Despite using advanced deep learning models, large quantities of data and many days of computation, our systematic evaluation on four test datasets reveals that the explored text summarization methods could not produce better keyphrases than the simpler unsupervised methods, or the existing supervised ones.

Robust Subspace Recovery Layer for Unsupervised Anomaly Detection

We propose a neural network for unsupervised anomaly detection with a novel robust subspace recovery layer (RSR layer). This layer seeks to extract the underlying subspace from a latent representation of the given data and remove outliers that lie away from this subspace. It is used together with an encoder and a decoder. The encoder maps the data into the latent space, from which the RSR layer extracts the subspace. The decoder then smoothly maps back the underlying subspace to a “manifold’ close to the original data. We illustrate algorithmic choices and performance for artificial data with corrupted manifold structure. We also demonstrate competitive precision and recall for image datasets.

Asymptotic nonparametric statistical analysis of stationary time series

Stationarity is a very general, qualitative assumption, that can be assessed on the basis of application specifics. It is thus a rather attractive assumption to base statistical analysis on, especially for problems for which less general qualitative assumptions, such as independence or finite memory, clearly fail. However, it has long been considered too general to allow for statistical inference to be made. One of the reasons for this is that rates of convergence, even of frequencies to the mean, are not available under this assumption alone. Recently, it has been shown that, while some natural and simple problems such as homogeneity, are indeed provably impossible to solve if one only assumes that the data is stationary (or stationary ergodic), many others can be solved using rather simple and intuitive algorithms. The latter problems include clustering and change point estimation. In this volume I summarize these results. The emphasis is on asymptotic consistency, since this the strongest property one can obtain assuming stationarity alone. While for most of the problems for which a solution is found this solution is algorithmically realizable, the main objective in this area of research, the objective which is only partially attained, is to understand what is possible and what is not possible to do for stationary time series. The considered problems include homogeneity testing, clustering with respect to distribution, clustering with respect to independence, change-point estimation, identity testing, and the general question of composite hypotheses testing. For the latter problem, a topological criterion for the existence of a consistent test is presented. In addition, several open questions are discussed.

Nonparametric Density Estimation for High-Dimensional Data – Algorithms and Applications

Density Estimation is one of the central areas of statistics whose purpose is to estimate the probability density function underlying the observed data. It serves as a building block for many tasks in statistical inference, visualization, and machine learning. Density Estimation is widely adopted in the domain of unsupervised learning especially for the application of clustering. As big data become pervasive in almost every area of data sciences, analyzing high-dimensional data that have many features and variables appears to be a major focus in both academia and industry. High-dimensional data pose challenges not only from the theoretical aspects of statistical inference, but also from the algorithmic/computational considerations of machine learning and data analytics. This paper reviews a collection of selected nonparametric density estimation algorithms for high-dimensional data, some of them are recently published and provide interesting mathematical insights. The important application domain of nonparametric density estimation, such as { modal clustering}, are also included in this paper. Several research directions related to density estimation and high-dimensional data analysis are suggested by the authors.

Modeling Drug-Disease Relations with Linguistic and Knowledge Graph Constraints

FDA drug labels are rich sources of information about drugs and drug-disease relations, but their complexity makes them challenging texts to analyze in isolation. To overcome this, we situate these labels in two health knowledge graphs: one built from precise structured information about drugs and diseases, and another built entirely from a database of clinical narrative texts using simple heuristic methods. We show that Probabilistic Soft Logic models defined over these graphs are superior to text-only and relation-only variants, and that the clinical narratives graph delivers exceptional results with little manual effort. Finally, we release a new dataset of drug labels with annotations for five distinct drug-disease relations.

A New Expert Questioning Approach to More Efficient Fault Localization in Ontologies

When ontologies reach a certain size and complexity, faults such as inconsistencies, unsatisfiable classes or wrong entailments are hardly avoidable. Locating the incorrect axioms that cause these faults is a hard and time-consuming task. Addressing this issue, several techniques for semi-automatic fault localization in ontologies have been proposed. Often, these approaches involve a human expert who provides answers to system-generated questions about the intended (correct) ontology in order to reduce the possible fault locations. To suggest as informative questions as possible, existing methods draw on various algorithmic optimizations as well as heuristics. However, these computations are often based on certain assumptions about the interacting user. In this work, we characterize and discuss different user types and show that existing approaches do not achieve optimal efficiency for all of them. As a remedy, we suggest a new type of expert question which aims at fitting the answering behavior of all analyzed experts. Moreover, we present an algorithm to optimize this new query type which is fully compatible with the (tried and tested) heuristics used in the field. Experiments on faulty real-world ontologies show the potential of the new querying method for minimizing the expert consultation time, independent of the expert type. Besides, the gained insights can inform the design of interactive debugging tools towards better meeting their users’ needs.

Fully Learnable Group Convolution for Acceleration of Deep Neural Networks

Benefitted from its great success on many tasks, deep learning is increasingly used on low-computational-cost devices, e.g. smartphone, embedded devices, etc. To reduce the high computational and memory cost, in this work, we propose a fully learnable group convolution module (FLGC for short) which is quite efficient and can be embedded into any deep neural networks for acceleration. Specifically, our proposed method automatically learns the group structure in the training stage in a fully end-to-end manner, leading to a better structure than the existing pre-defined, two-steps, or iterative strategies. Moreover, our method can be further combined with depthwise separable convolution, resulting in 5 times acceleration than the vanilla Resnet50 on single CPU. An additional advantage is that in our FLGC the number of groups can be set as any value, but not necessarily 2^k as in most existing methods, meaning better tradeoff between accuracy and speed. As evaluated in our experiments, our method achieves better performance than existing learnable group convolution and standard group convolution when using the same number of groups.