This paper introduces a novel low-cost device prototype for the automatic diagnosis of diseases, utilizing inputted symptoms and personal background. The engineering goal is to solve the problem of limited healthcare access with a single device. Diagnosing diseases automatically is an immense challenge, owing to their variable properties and symptoms. On the other hand, Neural Networks have developed into a powerful tool in the field of machine learning, one that is showing to be extremely promising at computing diagnosis even with inconsistent variables. In this research, a cheap device was created to allow for straightforward diagnosis and treatment of human diseases. By utilizing Deep Neural Networks (DNNs) and Convolutional Neural Networks (CNNs), outfitted on a Raspberry Pi Zero processor ($latex $
The Long Short-Term Memory (LSTM) layer is an important advancement in the field of neural networks and machine learning, allowing for effective training and impressive inference performance. LSTM-based neural networks have been successfully employed in various applications such as speech processing and language translation. The LSTM layer can be simplified by removing certain components, potentially speeding up training and runtime with limited change in performance. In particular, the recently introduced variants, called SLIM LSTMs, have shown success in initial experiments to support this view. Here, we perform computational analysis of the validation accuracy of a convolutional plus recurrent neural network architecture using comparatively the standard LSTM and three SLIM LSTM layers. We have found that some realizations of the SLIM LSTM layers can potentially perform as well as the standard LSTM layer for our considered architecture.
Statistical inference involves estimation of parameters of a model based on observations. Building on the recently proposed Equilibrium Expectation approach and Persistent Contrastive Divergence, we derive a simple and fast Markov chain Monte Carlo algorithm for maximum likelihood estimation (MLE) of parameters of exponential family distributions. The algorithm has good scaling properties and is suitable for Monte Carlo inference on large network data with billions of tie variables. The performance of the algorithm is demonstrated on Markov random fields, conditional random fields, exponential random graph models and Boltzmann machines.
For convolutional neural network models that optimize an image embedding, we propose a method to highlight the regions of images that contribute most to pairwise similarity. This work is a corollary to the visualization tools developed for classification networks, but applicable to the problem domains better suited to similarity learning. The visualization shows how similarity networks that are fine-tuned learn to focus on different features. We also generalize our approach to embedding networks that use different pooling strategies and provide a simple mechanism to support image similarity searches on objects or sub-regions in the query image.
This paper proposes a novel discriminative regression method, called adaptive locality preserving regression (ALPR) for classification. In particular, ALPR aims to learn a more flexible and discriminative projection that not only preserves the intrinsic structure of data, but also possesses the properties of feature selection and interpretability. To this end, we introduce a target learning technique to adaptively learn a more discriminative and flexible target matrix rather than the pre-defined strict zero-one label matrix for regression. Then a locality preserving constraint regularized by the adaptive learned weights is further introduced to guide the projection learning, which is beneficial to learn a more discriminative projection and avoid overfitting. Moreover, we replace the conventional `Frobenius norm’ with the special l21 norm to constrain the projection, which enables the method to adaptively select the most important features from the original high-dimensional data for feature extraction. In this way, the negative influence of the redundant features and noises residing in the original data can be greatly eliminated. Besides, the proposed method has good interpretability for features owing to the row-sparsity property of the l21 norm. Extensive experiments conducted on the synthetic database with manifold structure and many real-world databases prove the effectiveness of the proposed method.
Event detection using social media streams needs a set of informative features with strong signals that need minimal preprocessing and are highly associated with events of interest. Identifying these informative features as keywords from Twitter is challenging, as people use informal language to express their thoughts and feelings. This informality includes acronyms, misspelled words, synonyms, transliteration and ambiguous terms. In this paper, we propose an efficient method to select the keywords frequently used in Twitter that are mostly associated with events of interest such as protests. The volume of these keywords is tracked in real time to identify the events of interest in a binary classification scheme. We use keywords within word-pairs to capture the context. The proposed method is to binarize vectors of daily counts for each word-pair by applying a spike detection temporal filter, then use the Jaccard metric to measure the similarity of the binary vector for each word-pair with the binary vector describing event occurrence. The top n word-pairs are used as features to classify any day to be an event or non-event day. The selected features are tested using multiple classifiers such as Naive Bayes, SVM, Logistic Regression, KNN and decision trees. They all produced AUC ROC scores up to 0.91 and F1 scores up to 0.79. The experiment is performed using the English language in multiple cities such as Melbourne, Sydney and Brisbane as well as the Indonesian language in Jakarta. The two experiments, comprising different languages and locations, yielded similar results.
We present a comprehensive language theoretic causality analysis framework for explaining safety property violations in the setting of concurrent reactive systems. Our framework allows us to uniformly express a number of causality notions studied in the areas of artificial intelligence and formal methods, as well as define new ones that are of potential interest in these areas. Furthermore, our formalization provides means for reasoning about the relationships between individual notions which have mostly been considered independently in prior work; and allows us to judge the appropriateness of the different definitions for various applications in system design. In particular, we consider causality analysis notions for debugging, error resilience, and liability resolution in concurrent reactive systems. Finally, we present automata-based algorithms for computing various causal sets based on our language-theoretic encoding, and derive the algorithmic complexities.
Deep learning has revolutionized many machine learning tasks in recent years, ranging from image classification and video processing to speech recognition and natural language understanding. The data in these tasks are typically represented in the Euclidean space. However, there is an increasing number of applications where data are generated from non-Euclidean domains and are represented as graphs with complex relationships and interdependency between objects. The complexity of graph data has imposed significant challenges on existing machine learning algorithms. Recently, many studies on extending deep learning approaches for graph data have emerged. In this survey, we provide a comprehensive overview of graph neural networks (GNNs) in data mining and machine learning fields. We propose a new taxonomy to divide the state-of-the-art graph neural networks into different categories. With a focus on graph convolutional networks, we review alternative architectures that have recently been developed; these learning paradigms include graph attention networks, graph autoencoders, graph generative networks, and graph spatial-temporal networks. We further discuss the applications of graph neural networks across various domains and summarize the open source codes and benchmarks of the existing algorithms on different learning tasks. Finally, we propose potential research directions in this fast-growing field.
Sparse learning aims to learn the sparse structure of the true target function from the collected data, which plays a crucial role in high dimensional data analysis. This article proposes a unified and universal method for learning sparsity of M-estimators within a rich family of loss functions in a reproducing kernel Hilbert space (RKHS). The family of loss functions interested is very rich, including most commonly used ones in literature. More importantly, the proposed method is motivated by some nice properties in the induced RKHS, and is computationally efficient for large-scale data, and can be further improved through parallel computing. The asymptotic estimation and selection consistencies of the proposed method are established for a general loss function under mild conditions. It works for general loss function, admits general dependence structure, allows for efficient computation, and with theoretical guarantee. The superior performance of our proposed method is also supported by a variety of simulated examples and a real application in the human breast cancer study (GSE20194).
For very large datasets, random projections (RP) have become the tool of choice for dimensionality reduction. This is due to the computational complexity of principal component analysis. However, the recent development of randomized principal component analysis (RPCA) has opened up the possibility of obtaining approximate principal components on very large datasets. In this paper, we compare the performance of RPCA and RP in dimensionality reduction for supervised learning. In Experiment 1, study a malware classification task on a dataset with over 10 million samples, almost 100,000 features, and over 25 billion non-zero values, with the goal of reducing the dimensionality to a compressed representation of 5,000 features. In order to apply RPCA to this dataset, we develop a new algorithm called large sample RPCA (LS-RPCA), which extends the RPCA algorithm to work on datasets with arbitrarily many samples. We find that classification performance is much higher when using LS-RPCA for dimensionality reduction than when using random projections. In particular, across a range of target dimensionalities, we find that using LS-RPCA reduces classification error by between 37% and 54%. Experiment 2 generalizes the phenomenon to multiple datasets, feature representations, and classifiers. These findings have implications for a large number of research projects in which random projections were used as a preprocessing step for dimensionality reduction. As long as accuracy is at a premium and the target dimensionality is sufficiently less than the numeric rank of the dataset, randomized PCA may be a superior choice. Moreover, if the dataset has a large number of samples, then LS-RPCA will provide a method for obtaining the approximate principal components.
Helm has recently been proposed by practitioners as technology to package and deploy complex software applications on top of Kubernetes-based cloud computing platforms. Despite growing popularity, little is known about the individual so-called Helm Charts and about the emerging ecosystem of charts around the KubeApps Hub website and decentralised charts repositories. This article contributes first quantified insights around both the charts and the artefact development community based on metrics automatically gathered by a proposed quality assessment tool named HelmQA. The work further identifies quality insufficiencies detectable in public charts, proposes a developer-centric hypothesis-based methodology to systematically improve the quality by using HelmQA, and finally empirically attempts to validate the methodology and thus the practical usefulness of the tool by presenting results of its application over a representative four-month period. Although one of our initial hypotheses does not statistically hold during the experiment, we still infer that using HelmQA regularly in continuous software development would lead to reduced quality issues.
An ML-based system for interactive labeling of image datasets is contributed in TensorBoard Projector to speed up image annotation performed by humans. The tool visualizes feature spaces and makes it directly editable by online integration of applied labels, and it is a system for verifying and managing machine learning data pertaining to labels. We propose realistic annotation emulation to evaluate the system design of interactive active learning, based on our improved semi-supervised extension of t-SNE dimensionality reduction. Our active learning tool can significantly increase labeling efficiency compared to uncertainty sampling, and we show that less than 100 labeling actions are typically sufficient for good classification on a variety of specialized image datasets. Our contribution is unique given that it needs to perform dimensionality reduction, feature space visualization and editing, interactive label propagation, low-complexity active learning, human perceptual modeling, annotation emulation and unsupervised feature extraction for specialized datasets in a production-quality implementation.
Growing popularity of social networks demands a highly efficient Personalized PageRank (PPR) updating due to the fast-evolving web graphs of enormous size. While current researches are focusing on PPR updating under link structure modification, efficiently updating PPR when node insertion/ deletion involved remains a challenge. In the previous work called Virtual Web (VW), a few VW architectures are designed, which results in some highly effective initializations to significantly accelerate PageRank updating under both link modification and page insertion/deletion. In the paper, under the general scenario of link modification and node insertion/deletion we tackle the fast PPR updating problem. Specifically, we combine VW with the TrackingPPR method to generate initials, which are then used by the Gauss-Southwell method for fast PPR updating. The algorithm is named VWPPR method. In extensive experiments, three real-world datasets are used that contain 1~5.6M nodes and 6.7M~129M links, while a node perturbation of 40k and link perturbation of 1% are applied. Comparing to the more recent LazyForwardUpdate method, which handles the general PPR updating problem, the VWPPR method is 3~6 times faster in terms of running time, or 4.4~10 times faster in terms of iteration numbers.
Generating value from data requires the ability to find, access and make sense of datasets. There are many efforts underway to encourage data sharing and reuse, from scientific publishers asking authors to submit data alongside manuscripts to data marketplaces, open data portals and data communities. Google recently beta released a search service for datasets, which allows users to discover data stored in various online repositories via keyword queries. These developments foreshadow an emerging research field around dataset search or retrieval that broadly encompasses frameworks, methods and tools that help match a user data need against a collection of datasets. Here, we survey the state of the art of research and commercial systems in dataset retrieval. We identify what makes dataset search a research field in its own right, with unique challenges and methods and highlight open problems. We look at approaches and implementations from related areas dataset search is drawing upon, including information retrieval, databases, entity-centric and tabular search in order to identify possible paths to resolve these open problems as well as immediate next steps that will take the field forward.
Explanation in machine learning and related fields such as artificial intelligence aims at making machine learning models and their decisions understandable to humans. Existing work suggests that personalizing explanations might help to improve understandability. In this work, we derive a conceptualization of personalized explanation by defining and structuring the problem based on prior work on machine learning explanation, personalization (in machine learning) and concepts and techniques from other domains such as privacy and knowledge elicitation. We perform a categorization of explainee information used in the process of personalization as well as describing means to collect this information. We also identify three key explanation properties that are amendable to personalization: complexity, decision information and presentation. We also enhance existing work on explanation by introducing additional desiderata and measures to quantify the quality of personalized explanations.
In this paper, we will discuss how operational limitations affect input-output behaviours of the system. In particular, we will provide formulations for passivity and passivity indices of a nonlinear system given operational limitations on the input and state variables. This formulation is presented in the form of local passivity and indices. We will provide optimisation based formulation to derive passivity properties of the system through polynomial approximations. Two different approaches are taken to approximate the nonlinear dynamics of a system through polynomial functions; namely, Taylor’s theorem and a multivariate generalisation of Bernstein polynomials. For each approach, conditions for stability, dissipativity, and passivity of a system, as well as methods to find its passivity indices, are given. Two different methods are also presented to reduce the size of the optimisation problem in Taylor’s theorem approach. Examples are provided to show the applicability of the results.