A Model Counter’s Guide to Probabilistic Systems

In this paper, we systematize the modeling of probabilistic systems for the purpose of analyzing them with model counting techniques. Starting from unbiased coin flips, we show how to model biased coins, correlated coins, and distributions over finite sets. From there, we continue with modeling sequential systems, such as Markov chains, and revisit the relationship between weighted and unweighted model counting. Thereby, this work provides a conceptual framework for deriving #SAT encodings for probabilistic inference.


Novel Artificial Human Optimization Field Algorithms – The Beginning

New Artificial Human Optimization (AHO) Field Algorithms can be created from scratch or by adding the concept of Artificial Humans into other existing Optimization Algorithms. Particle Swarm Optimization (PSO) has been very popular for solving complex optimization problems due to its simplicity. In this work, new Artificial Human Optimization Field Algorithms are created by modifying existing PSO algorithms with AHO Field Concepts. These Hybrid PSO Algorithms comes under PSO Field as well as AHO Field. There are Hybrid PSO research articles based on Human Behavior, Human Cognition and Human Thinking etc. But there are no Hybrid PSO articles which based on concepts like Human Disease, Human Kindness and Human Relaxation. This paper proposes new AHO Field algorithms based on these research gaps. Some existing Hybrid PSO algorithms are given a new name in this work so that it will be easy for future AHO researchers to find these novel Artificial Human Optimization Field Algorithms. A total of 6 Artificial Human Optimization Field algorithms titled ‘Human Safety Particle Swarm Optimization (HuSaPSO)’, ‘Human Kindness Particle Swarm Optimization (HKPSO)’, ‘Human Relaxation Particle Swarm Optimization (HRPSO)’, ‘Multiple Strategy Human Particle Swarm Optimization (MSHPSO)’, ‘Human Thinking Particle Swarm Optimization (HTPSO)’ and ‘Human Disease Particle Swarm Optimization (HDPSO)’ are tested by applying these novel algorithms on Ackley, Beale, Bohachevsky, Booth and Three-Hump Camel Benchmark Functions. Results obtained are compared with PSO algorithm.


Detecting service provider alliances

We present an algorithm for detecting service provider alliances. To perform this, we modelize a cooperative game-theoretic model for competitor service providers. A choreography (a peer-to-peer service composition model) needs a set of services to fulfill its requirements. Users must choose, for each requirement, which service providers will be used to enact the choreography at lowest cost. Due to the lack of centralization, vendors can form alliances to control the market. We propose a novel algorithm capable of detecting alliances among service providers, based on our findings showing that this game has an empty core, but a non-empty bargaining set.


A Simple Haploid-Diploid Evolutionary Algorithm

It has recently been suggested that evolution exploits a form of fitness landscape smoothing within eukaryotic sex due to the haploid-diploid cycle. This short paper presents a simple modification to the standard evolutionary computing algorithm to similarly benefit from the process. Using the well-known NK model of fitness landscapes it is shown that the benefit emerges as ruggedness is increased.


Multiscale shortest path algorithm for big-size utility networks

This paper proposes a dimension reduction process for computing the Dijkstra’s shortest path algorithm in a complex network. This is done through a novel multiscale (MS) network decomposition into base-elements: links and landmark-nodes. All of them result to be essential for keeping all the network connectivity information and for speeding up the exact computation of the Dijkstra’s shortest path. The multiscale shortest path (MS-SP) algorithm shows to be advantageous when dealing with big-size utility networks in comparison with other shortest-path algorithms: unfeasible for the curse of the dimensionality for traditional approaches or providing approximate solution in other cases. The novel methodology is of high interest when it is computed on urban utility networks as it explodes several of their inherent properties. However, the proposal extends straightforwardly to another kind of networks. MS-SP has been successfully applied for 2 water utility networks (medium and big size). In both cases, MS-SP provides the exact solution that the obtained by applying the Dijkstra’s shortest path while showing its efficiency in terms of computational time.


Fairness in Algorithmic Decision Making: An Excursion Through the Lens of Causality

As virtually all aspects of our lives are increasingly impacted by algorithmic decision making systems, it is incumbent upon us as a society to ensure such systems do not become instruments of unfair discrimination on the basis of gender, race, ethnicity, religion, etc. We consider the problem of determining whether the decisions made by such systems are discriminatory, through the lens of causal models. We introduce two definitions of group fairness grounded in causality: fair on average causal effect (FACE), and fair on average causal effect on the treated (FACT). We use the Rubin-Neyman potential outcomes framework for the analysis of cause-effect relationships to robustly estimate FACE and FACT. We demonstrate the effectiveness of our proposed approach on synthetic data. Our analyses of two real-world data sets, the Adult income data set from the UCI repository (with gender as the protected attribute), and the NYC Stop and Frisk data set (with race as the protected attribute), show that the evidence of discrimination obtained by FACE and FACT, or lack thereof, is often in agreement with the findings from other studies. We further show that FACT, being somewhat more nuanced compared to FACE, can yield findings of discrimination that differ from those obtained using FACE.


The Semantic Web Rule Language Expressiveness Extensions-A Survey

The Semantic Web Rule Language (SWRL) is a direct extension of OWL 2 DL with a subset of RuleML, and it is designed to be the rule language of the Semantic Web. This paper explores the state-of-the-art of SWRL’s expressiveness extensions proposed over time. As a motivation, the effectiveness of the SWRL/OWL combination in modeling domain facts is discussed while some of the common expressive limitations of the combination are also highlighted. The paper then classifies and presents the relevant language extensions of the SWRL and their added expressive powers to the original SWRL definition. Furthermore, it provides a comparative analysis of the syntax and semantics of the proposed extensions. In conclusion, the decidability requirement and usability of each expressiveness extension are evaluated towards an efficient inclusion into the OWL ontologies.


Network Slimming by Slimmable Networks: Towards One-Shot Architecture Search for Channel Numbers

We study how to set channel numbers in a neural network to achieve better accuracy under constrained resources (e.g., FLOPs, latency, memory footprint or model size). A simple and one-shot solution, named AutoSlim, is presented. Instead of training many network samples and searching with reinforcement learning, we train a single slimmable network to approximate the network accuracy of different channel configurations. We then iteratively evaluate the trained slimmable model and greedily slim the layer with minimal accuracy drop. By this single pass, we can obtain the optimized channel configurations under different resource constraints. We present experiments with MobileNet v1, MobileNet v2, ResNet-50 and RL-searched MNasNet on ImageNet classification. We show significant improvements over their default channel configurations. We also achieve better accuracy than recent channel pruning methods and neural architecture search methods. Notably, by setting optimized channel numbers, our AutoSlim-MobileNet-v2 at 305M FLOPs achieves 74.2% top-1 accuracy, 2.4% better than default MobileNet-v2 (301M FLOPs), and even 0.2% better than RL-searched MNasNet (317M FLOPs). Our AutoSlim-ResNet-50 at 570M FLOPs, without depthwise convolutions, achieves 1.3% better accuracy than MobileNet-v1 (569M FLOPs). Code and models will be available at: https://…/slimmable_networks


Posteriori Probabilistic Bounds of Convex Scenario Programs with Validation Tests

Scenario programs have established themselves as efficient tools towards decision-making under uncertainty. To assess the quality of scenario-based solutions a posteriori, validation tests based on Bernoulli trials have been widely adopted in practice. However, to reach a theoretically reliable conclusion, one typically needs to collect massive validation samples. In this work, we propose new posteriori bounds for convex scenario programs with validation tests, which are dependent on both realizations of support constraints and performance on out-of-sample validation data. The proposed bounds enjoy wide generality in that many existing theoretical results can be incorporated as particular cases. To facilitate practical use, a systematic approach for parameterizing posteriori probability functions is also developed, which is shown to possess a variety of desirable properties allowing for easy implementations and clear interpretations. By synthesizing information about support constraints and validation tests, less conservative estimates of reliability levels can be attained for randomized solutions in comparison with existing posteriori bounds. Case studies on controller design of aircraft lateral motion are presented to validate the effectiveness of the proposed posteriori bounds.


Medical Time Series Classification with Hierarchical Attention-based Temporal Convolutional Networks: A Case Study of Myotonic Dystrophy Diagnosis

Myotonia, which refers to delayed muscle relaxation after contraction, is the main symptom of myotonic dystrophy patients. We propose a hierarchical attention-based temporal convolutional network (HA-TCN) for myotonic dystrohpy diagnosis from handgrip time series data, and introduce mechanisms that enable model explainability. We compare the performance of the HA-TCN model against that of benchmark TCN models, LSTM models with and without attention mechanisms, and SVM approaches with handcrafted features. In terms of classification accuracy and F1 score, we found all deep learning models have similar levels of performance, and they all outperform SVM. Further, the HA-TCN model outperforms its TCN counterpart with regards to computational efficiency regardless of network depth, and in terms of performance particularly when the number of hidden layers is small. Lastly, HA-TCN models can consistently identify relevant time series segments in the relaxation phase of the handgrip time series, and exhibit increased robustness to noise when compared to attention-based LSTM models.


Distributed Algorithms for Fully Personalized PageRank on Large Graphs

Personalized PageRank (PPR) has enormous applications, such as link prediction and recommendation systems for social networks, which often require the fully PPR to be known. Besides, most of real-life graphs are edge-weighted, e.g., the interaction between users on the Facebook network. However, it is computationally difficult to compute the fully PPR, especially on large graphs, not to mention that most existing approaches do not consider the weights of edges. In particular, the existing approach cannot handle graphs with billion edges on a moderate-size cluster. To address this problem, this paper presents a novel study on the computation of fully edge-weighted PPR on large graphs using the distributed computing framework. Specifically, we employ the Monte Carlo approximation that performs a large number of random walks from each node of the graph, and exploits the parallel pipeline framework to reduce the overall running time of the fully PPR. Based on that, we develop several optimization techniques which (i) alleviate the issue of large nodes that could explode the memory space, (ii) pre-compute short walks for small nodes that largely speedup the computation of random walks, and (iii) optimize the amount of random walks to compute in each pipeline that significantly reduces the overhead. With extensive experiments on a variety of real-life graph datasets, we demonstrate that our solution is several orders of magnitude faster than the state-of-the-arts, and meanwhile, largely outperforms the baseline algorithms in terms of accuracy.


ThunderNet: Towards Real-time Generic Object Detection

Real-time generic object detection on mobile platforms is a crucial but challenging computer vision task. However, previous CNN-based detectors suffer from enormous computational cost, which hinders them from real-time inference in computation-constrained scenarios. In this paper, we investigate the effectiveness of two-stage detectors in real-time generic detection and propose a lightweight two-stage detector named ThunderNet. In the backbone part, we analyze the drawbacks in previous lightweight backbones and present a lightweight backbone designed for object detection. In the detection part, we exploit an extremely efficient RPN and detection head design. To generate more discriminative feature representation, we design two efficient architecture blocks, Context Enhancement Module and Spatial Attention Module. At last, we investigate the balance between the input resolution, the backbone, and the detection head. Compared with lightweight one-stage detectors, ThunderNet achieves superior performance with only 40% of the computational cost on PASCAL VOC and COCO benchmarks. Without bells and whistles, our model runs at 24.1 fps on an ARM-based device. To the best of our knowledge, this is the first real-time detector reported on ARM platforms. Code will be released for paper reproduction.


Pyramid Mask Text Detector

Scene text detection, an essential step of scene text recognition system, is to locate text instances in natural scene images automatically. Some recent attempts benefiting from Mask R-CNN formulate scene text detection task as an instance segmentation problem and achieve remarkable performance. In this paper, we present a new Mask R-CNN based framework named Pyramid Mask Text Detector (PMTD) to handle the scene text detection. Instead of binary text mask generated by the existing Mask R-CNN based methods, our PMTD performs pixel-level regression under the guidance of location-aware supervision, yielding a more informative soft text mask for each text instance. As for the generation of text boxes, PMTD reinterprets the obtained 2D soft mask into 3D space and introduces a novel plane clustering algorithm to derive the optimal text box on the basis of 3D shape. Experiments on standard datasets demonstrate that the proposed PMTD brings consistent and noticeable gain and clearly outperforms state-of-the-art methods. Specifically, it achieves an F-measure of 80.13% on ICDAR 2017 MLT dataset.


A Survey on Graph Kernels

Graph kernels have become an established and widely-used technique for solving classification tasks on graphs. This survey gives a comprehensive overview of techniques for kernel-based graph classification developed in the past 15 years. We describe and categorize graph kernels based on properties inherent to their design, such as the nature of their extracted graph features, their method of computation and their applicability to problems in practice. In an extensive experimental evaluation, we study the classification accuracy of a large suite of graph kernels on established benchmarks as well as new datasets. We compare the performance of popular kernels with several baseline methods and study the effect of applying a Gaussian RBF kernel to the metric induced by a graph kernel. In doing so, we find that simple baselines become competitive after this transformation on some datasets. Moreover, we study the extent to which existing graph kernels agree in their predictions (and prediction errors) and obtain a data-driven categorization of kernels as result. Finally, based on our experimental results, we derive a practitioner’s guide to kernel-based graph classification.


Sogou Machine Reading Comprehension Toolkit

Machine reading comprehension have been intensively studied in recent years, and neural network-based models have shown dominant performances. In this paper, we present a Sogou Machine Reading Comprehension (SMRC) toolkit that can be used to provide the fast and efficient development of modern machine comprehension models, including both published models and original prototypes. To achieve this goal, the toolkit provides dataset readers, a flexible preprocessing pipeline, necessary neural network components, and built-in models, which make the whole process of data preparation, model construction, and training easier.


Meta-Learning surrogate models for sequential decision making

Meta-learning methods leverage past experience to learn data-driven inductive biases from related problems, increasing learning efficiency on new tasks. This ability renders them particularly suitable for sequential decision making with limited experience. Within this problem family, we argue for the use of such approaches in the study of model-based approaches to Bayesian Optimisation, contextual bandits and Reinforcement Learning. We approach the problem by learning distributions over functions using Neural Processes (NPs), a recently introduced probabilistic meta-learning method. This allows the treatment of model uncertainty to tackle the exploration/exploitation dilemma. We show that NPs are suitable for sequential decision making on a diverse set of domains, including adversarial task search, recommender systems and model-based reinforcement learning.


Handling Noisy Labels for Robustly Learning from Self-Training Data for Low-Resource Sequence Labeling

In this paper, we address the problem of effectively self-training neural networks in a low-resource setting. Self-training is frequently used to automatically increase the amount of training data. However, in a low-resource scenario, it is less effective due to unreliable annotations created using self-labeling of unlabeled data. We propose to combine self-training with noise handling on the self-labeled data. Directly estimating noise on the combined clean training set and self-labeled data can lead to corruption of the clean data and hence, performs worse. Thus, we propose the Clean and Noisy Label Neural Network which trains on clean and noisy self-labeled data simultaneously by explicitly modelling clean and noisy labels separately. In our experiments on Chunking and NER, this approach performs more robustly than the baselines. Complementary to this explicit approach, noise can also be handled implicitly with the help of an auxiliary learning task. To such a complementary approach, our method is more beneficial than other baseline methods and together provides the best performance overall.


Forecasting model based on information-granulated GA-SVR and ARIMA for producer price index

The accuracy of predicting the Producer Price Index (PPI) plays an indispensable role in government economic work. However, it is difficult to forecast the PPI. In our research, we first propose an unprecedented hybrid model based on fuzzy information granulation that integrates the GA-SVR and ARIMA (Autoregressive Integrated Moving Average Model) models. The fuzzy-information-granulation-based GA-SVR-ARIMA hybrid model is intended to deal with the problem of imprecision in PPI estimation. The proposed model adopts the fuzzy information-granulation algorithm to pre-classification-process monthly training samples of the PPI, and produced three different sequences of fuzzy information granules, whose Support Vector Regression (SVR) machine forecast models were separately established for their Genetic Algorithm (GA) optimization parameters. Finally, the residual errors of the GA-SVR model were rectified through ARIMA modeling, and the PPI estimate was reached. Research shows that the PPI value predicted by this hybrid model is more accurate than that predicted by other models, including ARIMA, GRNN, and GA-SVR, following several comparative experiments. Research also indicates the precision and validation of the PPI prediction of the hybrid model and demonstrates that the model has consistent ability to leverage the forecasting advantage of GA-SVR in non-linear space and of ARIMA in linear space.


Train, Sort, Explain: Learning to Diagnose Translation Models

Evaluating translation models is a trade-off between effort and detail. On the one end of the spectrum there are automatic count-based methods such as BLEU, on the other end linguistic evaluations by humans, which arguably are more informative but also require a disproportionately high effort. To narrow the spectrum, we propose a general approach on how to automatically expose systematic differences between human and machine translations to human experts. Inspired by adversarial settings, we train a neural text classifier to distinguish human from machine translations. A classifier that performs and generalizes well after training should recognize systematic differences between the two classes, which we uncover with neural explainability methods. Our proof-of-concept implementation, DiaMaT, is open source. Applied to a dataset translated by a state-of-the-art neural Transformer model, DiaMaT achieves a classification accuracy of 75% and exposes meaningful differences between humans and the Transformer, amidst the current discussion about human parity.


Multimodal Deep Network Embedding with Integrated Structure and Attribute Information

Network embedding is the process of learning low-dimensional representations for nodes in a network, while preserving node features. Existing studies only leverage network structure information and focus on preserving structural features. However, nodes in real-world networks often have a rich set of attributes providing extra semantic information. It has been demonstrated that both structural and attribute features are important for network analysis tasks. To preserve both features, we investigate the problem of integrating structure and attribute information to perform network embedding and propose a Multimodal Deep Network Embedding (MDNE) method. MDNE captures the non-linear network structures and the complex interactions among structures and attributes, using a deep model consisting of multiple layers of non-linear functions. Since structures and attributes are two different types of information, a multimodal learning method is adopted to pre-process them and help the model to better capture the correlations between node structure and attribute information. We employ both structural proximity and attribute proximity in the loss function to preserve the respective features and the representations are obtained by minimizing the loss function. Results of extensive experiments on four real-world datasets show that the proposed method performs significantly better than baselines on a variety of tasks, which demonstrate the effectiveness and generality of our method.


Towards a Theory of Systems Engineering Processes: A Principal-Agent Model of a One-Shot, Shallow Process

Systems engineering processes coordinate the effort of different individuals to generate a product satisfying certain requirements. As the involved engineers are self-interested agents, the goals at different levels of the systems engineering hierarchy may deviate from the system-level goals which may cause budget and schedule overruns. Therefore, there is a need of a systems engineering theory that accounts for the human behavior in systems design. To this end, the objective of this paper is to develop and analyze a principal-agent model of a one-shot (single iteration), shallow (one level of hierarchy) systems engineering process. We assume that the systems engineer maximizes the expected utility of the system, while the subsystem engineers seek to maximize their expected utilities. Furthermore, the systems engineer is unable to monitor the effort of the subsystem engineer and may not have a complete information about their types or the complexity of the design task. However, the systems engineer can incentivize the subsystem engineers by proposing specific contracts. To obtain an optimal incentive, we pose and solve numerically a bi-level optimization problem. Through extensive simulations, we study the optimal incentives arising from different system-level value functions under various combinations of effort costs, problem-solving skills, and task complexities.


Learning to Weight for Text Classification

In information retrieval (IR) and related tasks, term weighting approaches typically consider the frequency of the term in the document and in the collection in order to compute a score reflecting the importance of the term for the document. In tasks characterized by the presence of training data (such as text classification) it seems logical that the term weighting function should take into account the distribution (as estimated from training data) of the term across the classes of interest. Although `supervised term weighting’ approaches that use this intuition have been described before, they have failed to show consistent improvements. In this article we analyse the possible reasons for this failure, and call consolidated assumptions into question. Following this criticism we propose a novel supervised term weighting approach that, instead of relying on any predefined formula, learns a term weighting function optimised on the training set of interest; we dub this approach \emph{Learning to Weight} (LTW). The experiments that we run on several well-known benchmarks, and using different learning methods, show that our method outperforms previous term weighting approaches in text classification.


Building Automated Survey Coders via Interactive Machine Learning

Software systems trained via machine learning to automatically classify open-ended answers (a.k.a. verbatims) are by now a reality. Still, their adoption in the survey coding industry has been less widespread than it might have been. Among the factors that have hindered a more massive takeup of this technology are the effort involved in manually coding a sufficient amount of training data, the fact that small studies do not seem to justify this effort, and the fact that the process needs to be repeated anew when brand new coding tasks arise. In this paper we will argue for an approach to building verbatim classifiers that we will call ‘Interactive Learning’, and that addresses all the above problems. We will show that, for the same amount of training effort, interactive learning delivers much better coding accuracy than standard ‘non-interactive’ learning. This is especially true when the amount of data we are willing to manually code is small, which makes this approach attractive also for small-scale studies. Interactive learning also lends itself to reusing previously trained classifiers for dealing with new (albeit related) coding tasks. Interactive learning also integrates better in the daily workflow of the survey specialist, and delivers a better user experience overall.


Many Task Learning with Task Routing

Typical multi-task learning (MTL) methods rely on architectural adjustments and a large trainable parameter set to jointly optimize over several tasks. However, when the number of tasks increases so do the complexity of the architectural adjustments and resource requirements. In this paper, we introduce a method which applies a conditional feature-wise transformation over the convolutional activations that enables a model to successfully perform a large number of tasks. To distinguish from regular MTL, we introduce Many Task Learning (MaTL) as a special case of MTL where more than 20 tasks are performed by a single model. Our method dubbed Task Routing (TR) is encapsulated in a layer we call the Task Routing Layer (TRL), which applied in an MaTL scenario successfully fits hundreds of classification tasks in one model. We evaluate our method on 5 datasets against strong baselines and state-of-the-art approaches.


Distilling Task-Specific Knowledge from BERT into Simple Neural Networks

In the natural language processing literature, neural networks are becoming increasingly deeper and complex. The recent poster child of this trend is the deep language representation model, which includes BERT, ELMo, and GPT. These developments have led to the conviction that previous-generation, shallower neural networks for language understanding are obsolete. In this paper, however, we demonstrate that rudimentary, lightweight neural networks can still be made competitive without architecture changes, external training data, or additional input features. We propose to distill knowledge from BERT, a state-of-the-art language representation model, into a single-layer BiLSTM, as well as its siamese counterpart for sentence-pair tasks. Across multiple datasets in paraphrasing, natural language inference, and sentiment classification, we achieve comparable results with ELMo, while using roughly 100 times fewer parameters and 15 times less inference time.