Automatic voice-controlled systems have changed the way humans interact with a computer. Voice or speech recognition systems allow a user to make a hands-free request to the computer, which in turn processes the request and serves the user with appropriate responses. After years of research and developments in machine learning and artificial intelligence, today voice-controlled technologies have become more efficient and are widely applied in many domains to enable and improve human-to-human and human-to-computer interactions. The state-of-the-art e-commerce applications with the help of web technologies offer interactive and user-friendly interfaces. However, there are some instances where people, especially with visual disabilities, are not able to fully experience the serviceability of such applications. A voice-controlled system embedded in a web application can enhance user experience and can provide voice as a means to control the functionality of e-commerce websites. In this paper, we propose a taxonomy of speech recognition systems (SRS) and present a voice-controlled commodity purchase e-commerce application using IBM Watson speech-to-text to demonstrate its usability. The prototype can be extended to other application scenarios such as government service kiosks and enable analytics of the converted text data for scenarios such as medical diagnosis at the clinics.
In recent years, machine learning researchers have focused on methods to construct flexible and interpretable prediction models. However, the interpretability evaluation, the relationship between the generalization performance and the interpretability of the model and the method for improving the interpretability are very important factors to consider. In this paper, the quantitative index of the interpretability is proposed and its rationality is given, and the relationship between the interpretability and the generalization performance is analyzed. For traditional supervised kernel machine learning problem, a universal learning framework is put forward to solve the equilibrium problem between the two performances. The uniqueness of solution of the problem is proved and condition of unique solution is obtained. Probability upper bound of the sum of the two performances is analyzed.
Emotion recognition and classification is a very active area of research. In this paper, we present a first approach to emotion classification using persistent entropy and support vector machines. A topology-based model is applied to obtain a single real number from each raw signal. These data are used as input of a support vector machine to classify signals into 8 different emotions (calm, happy, sad, angry, fearful, disgust and surprised).
It is widely recognized that the deeper networks or networks with more feature maps have better performance. Existing studies mainly focus on extending the network depth and increasing the feature maps of networks. At the same time, horizontal expansion network (e.g. Inception Model) as an alternative way to improve network performance has not been fully investigated. Accordingly, we proposed NeuroTreeNet (NTN), as a new horizontal extension network through the combination of random forest and Inception Model. Based on the tree structure, in which each branch represents a network and the root node features are shared to child nodes, network parameters are effectively reduced. By combining all features of leaf nodes, even less feature maps achieved better performance. In addition, the relationship between tree structure and the performance of NTN was investigated in depth. Comparing to other networks (e.g. VDSR\_5) with equal magnitude parameters, our model showed preferable performance in super resolution reconstruction task.
Class-imbalance refers to classification problems in which many more instances are available for certain classes than for others. Such imbalanced datasets require special attention because traditional classifiers generally favor the majority class which has a large number of instances. Ensemble of classifiers have been reported to yield promising results. However, the majority of ensemble methods applied to imbalanced learning are static ones. Moreover, they only deal with binary imbalanced problems. Hence, this paper presents an empirical analysis of dynamic selection techniques and data preprocessing methods for dealing with multi-class imbalanced problems. We considered five variations of preprocessing methods and fourteen dynamic selection schemes. Our experiments conducted on 26 multi-class imbalanced problems show that the dynamic ensemble improves the AUC and the G-mean as compared to the static ensemble. Moreover, data preprocessing plays an important role in such cases.
The basic goal of computer engineering is the analysis of data. Such data are often large data sets distributed according to various distribution models. In this manuscript we focus on the analysis of non-Gaussian distributed data. In the case of univariate data analysis we discuss stochastic processes with auto-correlated increments and univariate distributions derived from specific stochastic processes, i.e. Levy and Tsallis distributions. Deep investigation of multivariate non-Gaussian distributions requires the copula approach. A copula is an component of multivariate distribution that models the mutual interdependence between marginals. There are many copula families characterised by various measures of the dependence between marginals. Importantly, one of those are `tail’ dependencies that model the simultaneous appearance of extreme values in many marginals. Those extreme events may reflect a crisis given financial data, outliers in machine learning, or a traffic congestion. Next we discuss higher order multivariate cumulants that are non-zero if multivariate distribution is non-Gaussian. However, the relation between cumulants and copulas is not straight forward and rather complicated. We discuss the application of those cumulants to extract information about non-Gaussian multivariate distributions, such that information about non-Gaussian copulas. The use of higher order multivariate cumulants in computer science is inspired by financial data analysis, especially by the safe investment portfolio evaluation. There are many other applications of higher order multivariate cumulants in data engineering, especially in: signal processing, non-linear system identification, blind sources separation, and direction finding algorithms of multi-source signals.
Distributed machine learning (ML) systems today use an unsophisticated threat model: data sources must trust a central ML process. We propose a brokered learning abstraction that allows data sources to contribute towards a globally-shared model with provable privacy guarantees in an untrusted setting. We realize this abstraction by building on federated learning, the state of the art in multi-party ML, to construct TorMentor: an anonymous hidden service that supports private multi-party ML. We define a new threat model by characterizing, developing and evaluating new attacks in the brokered learning setting, along with new defenses for these attacks. We show that TorMentor effectively protects data providers against known ML attacks while providing them with a tunable trade-off between model accuracy and privacy. We evaluate TorMentor with local and geo-distributed deployments on Azure/Tor. In an experiment with 200 clients and 14 MB of data per client, our prototype trained a logistic regression model using stochastic gradient descent in 65s.
There has been significant interest of late in generating behavior of agents that is interpretable to the human (observer) in the loop. However, the work in this area has typically lacked coherence on the topic, with proposed solutions for ‘explicable’, ‘legible’, ‘predictable’ and ‘transparent’ planning with overlapping, and sometimes conflicting, semantics all aimed at some notion of understanding what intentions the observer will ascribe to an agent by observing its behavior. This is also true for the recent works on ‘security’ and ‘privacy’ of plans which are also trying to answer the same question, but from the opposite point of view — i.e. when the agent is trying to hide instead of revealing its intentions. This paper attempts to provide a workable taxonomy of relevant concepts in this exciting and emerging field of inquiry.
Deep neural networks (DNNs) have become core computation components within low latency Function as a Service (FaaS) prediction pipelines: including image recognition, object detection, natural language processing, speech synthesis, and personalized recommendation pipelines. Cloud computing, as the de-facto backbone of modern computing infrastructure for both enterprise and consumer applications, has to be able to handle user-defined pipelines of diverse DNN inference workloads while maintaining isolation and latency guarantees, and minimizing resource waste. The current solution for guaranteeing isolation within FaaS is suboptimal — suffering from ‘cold start’ latency. A major cause of such inefficiency is the need to move large amount of model data within and across servers. We propose TrIMS as a novel solution to address these issues. Our proposed solution consists of a persistent model store across the GPU, CPU, local storage, and cloud storage hierarchy, an efficient resource management layer that provides isolation, and a succinct set of application APIs and container technologies for easy and transparent integration with FaaS, Deep Learning (DL) frameworks, and user code. We demonstrate our solution by interfacing TrIMS with the Apache MXNet framework and demonstrate up to 24x speedup in latency for image classification models and up to 210x speedup for large models. We achieve up to 8x system throughput improvement.
The current landscape of Machine Learning (ML) and Deep Learning (DL) is rife with non-uniform frameworks, models, and system stacks but lacks standard tools to facilitate the evaluation and measurement of model. Due to the absence of such tools, the current practice for evaluating and comparing the benefits of proposed AI innovations (be it hardware or software) on end-to-end AI pipelines is both arduous and error prone — stifling the adoption of the innovations. We propose MLModelScope — a hardware/software agnostic platform to facilitate the evaluation, measurement, and introspection of ML models within AI pipelines. MLModelScope aids application developers in discovering and experimenting with models, data scientists developers in replicating and evaluating for publishing models, and system architects in understanding the performance of AI workloads. We describe the design and implementation of MLModelScope and show how it is able to give users a holistic view into the execution of models within AI pipelines. Using AlexNet as a case study, we demonstrate how MLModelScope aids in identifying deviation in accuracy, helps in pin pointing the source of system bottlenecks, and automates the evaluation and performance aggregation of models across frameworks and systems.
When labeled data is scarce for a specific target task, transfer learning often offers an effective solution by utilizing data from a related source task. However, when transferring knowledge from a less related source, it may inversely hurt the target performance, a phenomenon known as negative transfer. Despite its pervasiveness, negative transfer is usually described in an informal manner, lacking rigorous definition, careful analysis, or systematic treatment. This paper proposes a formal definition of negative transfer and analyzes three important aspects thereof. Stemming from this analysis, a novel technique is proposed to circumvent negative transfer by filtering out unrelated source data. Based on adversarial networks, the technique is highly generic and can be applied to a wide range of transfer learning algorithms. The proposed approach is evaluated on six state-of-the-art deep transfer methods via experiments on four benchmark datasets with varying levels of difficulty. Empirically, the proposed method consistently improves the performance of all baseline methods and largely avoids negative transfer, even when the source data is degenerate.
The article describes the new approach for quality improvement of automated dialogue systems for customer support service. Analysis produced in the paper demonstrates the dependency of the quality of the retrieval-based dialogue system quality on the choice of negative responses. The proposed approach implies choosing the negative samples according to the distribution of responses in the train set. In this implementation the negative samples are randomly chosen from the original response distribution and from the ‘artificial’ distribution of negative responses, such as uniform distribution or the distribution obtained by transformation of the original one. The results obtained for the implemented systems and reported in this paper confirm the significant improvement of automated dialogue systems quality in case of using the negative responses from transformed distribution.
Recurrent neural networks (RNNs) such as long short-term memory and gated recurrent units are pivotal building blocks across a broad spectrum of sequence modeling problems. This paper proposes a recurrently controlled recurrent network (RCRN) for expressive and powerful sequence encoding. More concretely, the key idea behind our approach is to learn the recurrent gating functions using recurrent networks. Our architecture is split into two components – a controller cell and a listener cell whereby the recurrent controller actively influences the compositionality of the listener cell. We conduct extensive experiments on a myriad of tasks in the NLP domain such as sentiment analysis (SST, IMDb, Amazon reviews, etc.), question classification (TREC), entailment classification (SNLI, SciTail), answer selection (WikiQA, TrecQA) and reading comprehension (NarrativeQA). Across all 26 datasets, our results demonstrate that RCRN not only consistently outperforms BiLSTMs but also stacked BiLSTMs, suggesting that our controller architecture might be a suitable replacement for the widely adopted stacked architecture.
Many prediction tasks, especially in computer vision, are often inherently ambiguous. For example, the output of semantic segmentation may depend on the scale one is looking at, and image saliency or video summarization is often user or context dependent. Arguably, in such scenarios, exploiting instance specific evidence, such as scale or user context, can help resolve the underlying ambiguity leading to the improved predictions. While existing literature has considered incorporating such evidence in classical models such as probabilistic graphical models (PGMs), there is limited (or no) prior work looking at this problem in the context of deep neural network (DNN) models. In this paper, we present a generic multi task learning (MTL) based framework which handles the evidence as the output of one or more secondary tasks, while modeling the original problem as the primary task of interest. Our training phase is identical to the one used by standard MTL architectures. During prediction, we back-propagate the loss on secondary task(s) such that network weights are re-adjusted to match the evidence. An early stopping or two norm based regularizer ensures weights do not deviate significantly from the ones learned originally. Implementation in two specific scenarios (a) predicting semantic segmentation given the image level tags (b) predicting instance level segmentation given the text description of the image, clearly demonstrates the effectiveness of our proposed approach.
We present a collection of algorithms to filter a stream of documents in such a way that the filtered documents will cover as well as possible the interest of a person, keeping in mind that, at any given time, the offered documents should not only be relevant, but should also be diversified, in the sense not only of avoiding nearly identical documents, but also of covering as well as possible all the interests of the person. We use a modification of the WEBSOM algorithm, with limited architectural adaptation, to create a user model (which we call the ‘user context’ or simply the ‘context’) based on a network of units laid out in the word space and trained using a collection of documents representative of the context. We introduce the concepts of novelty and coverage. Novelty is related to, but not identical to, the homonymous information retrieval concept: a document is novel it it belongs to a semantic area of interest to a person for which no documents have been seen in the recent past. A group of documents has coverage to the extent to which it is a good representation of all the interests of a person. In order to increase coverage, we introduce an ‘interest’ (or ‘urgency’) factor for each unit of the user model, modulated by the scores of the incoming documents: the interest of a unit is decreased drastically when a document arrives that belongs to its semantic area and slowly recovers its initial value if no documents from that semantic area are displayed. Our tests show that these algorithms can effectively increase the coverage of the documents that are shown to the user without overly affecting precision.
This paper presents a method called One-class Classification using Length statistics of Emerging Patterns Plus (OCLEP+).
How many bits of information are revealed by a learning algorithm for a concept class of VC-dimension $d$? Previous works have shown that even for $d=1$ the amount of information may be unbounded (tend to $\infty$ with the universe size). Can it be that all concepts in the class require leaking a large amount of information? We show that typically concepts do not require leakage. There exists a proper learning algorithm that reveals $O(d)$ bits of information for most concepts in the class. This result is a special case of a more general phenomenon we explore. If there is a low information learner when the algorithm {\em knows} the underlying distribution on inputs, then there is a learner that reveals little information on an average concept {\em without knowing} the distribution on inputs.
Group fairness is an important concern for machine learning researchers, developers, and regulators. However, the strictness to which models must be constrained to be considered fair is still under debate. The focus of this work is on constraining the expected outcome of subpopulations in kernel regression and, in particular, decision tree regression, with application to random forests, boosted trees and other ensemble models. While individual constraints were previously addressed, this work addresses concerns about incorporating multiple constraints simultaneously. The proposed solution does not affect the order of computational or memory complexity of the decision trees and is easily integrated into models post training.
Recently, graph Convolutional Neural Networks (graph CNNs) have been widely used for graph data representation and semi-supervised learning tasks. However, existing graph CNNs generally use a fixed graph which may be not optimal for semi-supervised learning tasks. In this paper, we propose a novel Graph Learning-Convolutional Network (GLCN) for graph data representation and semi-supervised learning. The aim of GLCN is to learn an optimal graph structure that best serves graph CNNs for semi-supervised learning by integrating both graph learning and graph convolution together in a unified network architecture. The main advantage is that in GLCN, both given labels and the estimated labels are incorporated and thus can provide useful ‘weakly’ supervised information to refine (or learn) the graph construction and also to facilitate the graph convolution operation in GLCN for unknown label estimation. Experimental results on seven benchmarks demonstrate that GLCN significantly outperforms state-of-the-art traditional fixed structure based graph CNNs.