Fairness, through its many forms and definitions, has become an important issue facing the machine learning community. In this work, we consider how to incorporate group fairness constraints in kernel regression methods. More specifically, we focus on examining the incorporation of these constraints in decision tree regression when cast as a form of kernel regression, with direct applications to random forests and boosted trees amongst other widespread popular inference techniques. We show that order of complexity of memory and computation is preserved for such models and bounds the expected perturbations to the model in terms of the number of leaves of the trees. Importantly, the approach works on trained models and hence can be easily applied to models in current use.
Learning to generate fluent natural language from structured data with neural networks has become an common approach for NLG. This problem can be challenging when the form of the structured data varies between examples. This paper presents a survey of several extensions to sequence-to-sequence models to account for the latent content selection process, particularly variants of copy attention and coverage decoding. We further propose a training method based on diverse ensembling to encourage models to learn distinct sentence templates during training. An empirical evaluation of these techniques shows an increase in the quality of generated text across five automated metrics, as well as human evaluation.
In this paper, we apply Vuong’s (1989) general approach of model selection to the comparison of both nested and non-nested unidimensional and multidimensional item response theory (IRT) models. This approach is especially useful because it allows for formal statistical tests of non-nested models, and, in the nested case, it offers statistics that are highly competitive with the traditional likelihood ratio test. After summarising the statistical theory underlying the tests, we study the tests’ performance in the context of IRT, using simulation studies and real data. We find that, in the non-nested case, the tests can reliably distinguish between the graded response model and the generalized partial credit model. In the nested case, the tests often perform better than the likelihood ratio test.
In this paper, we present a local information theoretic approach to explicitly learn probabilistic clustering of a discrete random variable. Our formulation yields a convex maximization problem for which it is NP-hard to find the global optimum. In order to algorithmically solve this optimization problem, we propose two relaxations that are solved via gradient ascent and alternating maximization. Experiments on the MSR Sentence Completion Challenge, MovieLens 100K, and Reuters21578 datasets demonstrate that our approach is competitive with existing techniques and worthy of further investigation.
We study the problem of learning latent feature models (LFMs) for tensor data commonly observed in science and engineering such as hyperspectral imagery. However, the problem is challenging not only due to the non-convex formulation, the combinatorial nature of the constraints in LFMs, but also the high-order correlations in the data. In this work, we formulate a tensor latent feature learning problem by representing the data as a mixture of high-order latent features and binary codes, which are memory efficient and easy to interpret. To make the learning tractable, we propose a novel optimization procedure, Binary matching pursuit (BMP), that iteratively searches for binary bases via a MAXCUT-like boolean quadratic solver. Such a procedure is guaranteed to achieve an? suboptimal solution in O($1/\epsilon$) greedy steps, resulting in a trade-off between accuracy and sparsity. When evaluated on both synthetic and real datasets, our experiments show superior performance over baseline methods.
Stochastic computing (SC) is an emerging computing technique which offers higher computational density, and lower power over binary-encoded (BE) computation. Unlike BE computation, SC encodes values as probabilistic bitstreams which makes designing new circuits unintuitive. Existing techniques for synthesizing SC circuits are limited to specific classes of functions such as polynomial evaluation or constant scaling. In this paper, we propose using stochastic synthesis, which is originally a program synthesis technique, to automate the task of synthesizing new SC circuits. Our results show stochastic synthesis is more general than past techniques and can synthesize manually designed SC circuits as well as new ones such as an approximate square root unit.
This paper studies finding the K nearest neighbors (KNN) of all points in a dataset. Typical solutions to KNN searches use indexing to prune the search, which reduces the number of candidate points that may be within the set of the nearest $K$ points of each query point. In high dimensionality, index searches degrade, making the KNN self-join a prohibitively expensive operation in some scenarios. Furthermore, there are a significant number of distance calculations needed to determine which points are nearest to each query point. To address these challenges, we propose a hybrid CPU/GPU approach. Since the CPU and GPU are considerably different architectures that are best exploited using different algorithms, we advocate for splitting the work between both architectures based on the characteristic workloads defined by the query points in the dataset. As such, we assign dense regions to the GPU, and sparse regions to the CPU to most efficiently exploit the relative strengths of each architecture. Critically, we find that the relative performance gains over the reference implementation across four real-world datasets are a function of the data properties (size, dimensionality, distribution), and number of neighbors, K.
Nowadays, crowd sensing becomes increasingly more popular due to the ubiquitous usage of mobile devices. However, the quality of such human-generated sensory data varies significantly among different users. To better utilize sensory data, the problem of truth discovery, whose goal is to estimate user quality and infer reliable aggregated results through quality-aware data aggregation, has emerged as a hot topic. Although the existing truth discovery approaches can provide reliable aggregated results, they fail to protect the private information of individual users. Moreover, crowd sensing systems typically involve a large number of participants, making encryption or secure multi-party computation based solutions difficult to deploy. To address these challenges, in this paper, we propose an efficient privacy-preserving truth discovery mechanism with theoretical guarantees of both utility and privacy. The key idea of the proposed mechanism is to perturb data from each user independently and then conduct weighted aggregation among users’ perturbed data. The proposed approach is able to assign user weights based on information quality, and thus the aggregated results will not deviate much from the true results even when large noise is added. We adapt local differential privacy definition to this privacy-preserving task and demonstrate the proposed mechanism can satisfy local differential privacy while preserving high aggregation accuracy. We formally quantify utility and privacy trade-off and further verify the claim by experiments on both synthetic data and a real-world crowd sensing system.
This paper discusses an axiomatic approach for the integration of ontologies, an approach that extends to first order logic a previous approach (Kent 2000) based on information flow. This axiomatic approach is represented in the Information Flow Framework (IFF), a metalevel framework for organizing the information that appears in digital libraries, distributed databases and ontologies (Kent 2001). The paper argues that the integration of ontologies is the two-step process of alignment and unification. Ontological alignment consists of the sharing of common terminology and semantics through a mediating ontology. Ontological unification, concentrated in a virtual ontology of community connections, is fusion of the alignment diagram of participant community ontologies – the quotient of the sum of the participant portals modulo the ontological alignment structure.
In many settings, a decision-maker wishes to learn a rule, or policy, that maps from observable characteristics of an individual to an action. Examples include selecting offers, prices, advertisements, or emails to send to consumers, as well as the problem of determining which medication to prescribe to a patient. While there is a growing body of literature devoted to this problem, most existing results are focused on the case where data comes from a randomized experiment, and further, there are only two possible actions, such as giving a drug to a patient or not. In this paper, we study the offline multi-action policy learning problem with observational data and where the policy may need to respect budget constraints or belong to a restricted policy class such as decision trees. We build on the theory of efficient semi-parametric inference in order to propose and implement a policy learning algorithm that achieves asymptotically minimax-optimal regret. To the best of our knowledge, this is the first result of this type in the multi-action setup, and it provides a substantial performance improvement over the existing learning algorithms. We then consider additional computational challenges that arise in implementing our method for the case where the policy is restricted to take the form of a decision tree. We propose two different approaches, one using a mixed integer program formulation and the other using a tree-search based algorithm.
User intended actions are widely seen in many areas. Forecasting these actions and taking proactive measures to optimize business outcome is a crucial step towards sustaining the steady business growth. In this work, we focus on pre- dicting attrition, which is one of typical user intended actions. Conventional attrition predictive modeling strategies suffer a few inherent drawbacks. To overcome these limitations, we propose a novel end-to-end learning scheme to keep track of the evolution of attrition patterns for the predictive modeling. It integrates user activity logs, dynamic and static user profiles based on multi-path learning. It exploits historical user records by establishing a decaying multi-snapshot technique. And finally it employs the precedent user intentions via guiding them to the subsequent learning procedure. As a result, it addresses all disadvantages of conventional methods. We evaluate our methodology on two public data repositories and one private user usage dataset provided by Adobe Creative Cloud. The extensive experiments demonstrate that it can offer the appealing performance in comparison with several existing approaches as rated by different popular metrics. Furthermore, we introduce an advanced interpretation and visualization strategy to effectively characterize the periodicity of user activity logs. It can help to pinpoint important factors that are critical to user attrition and retention and thus suggests actionable improvement targets for business practice. Our work will provide useful insights into the prediction and elucidation of other user intended actions as well.
We propose PANDA, an AdaPtive Noise Augmentation technique to regularize estimating and constructing undirected graphical models (UGMs). PANDA iteratively solves MLEs given noise augmented data in the regression-based framework until convergence to achieve the designed regularization effects. The augmented noises can be designed to achieve various regularization effects on graph estimation, including the bridge, elastic net, adaptive lasso, and SCAD penalization; it can also offer group lasso and fused ridge when some nodes belong to the same group. We establish theoretically that the noise-augmented loss functions and its minimizer converge almost surely to the expected penalized loss function and its minimizer, respectively. We derive the asymptotic distributions for the regularized regression coefficients through PANDA in GLMs, based on which, the inferences for the parameters can be obtained simultaneously with variable selection. Our empirical results suggest the inferences achieve nominal or near-nominal coverage and are far more efficient compared to some existing post-selection procedures. On the algorithm level, PANDA can be easily programmed in any standard software without resorting to complicated optimization techniques. We show the non-inferior performance of PANDA in constructing graphs of different types in simulation studies and also apply PANDA to the autism spectrum disorder data to construct a mixed-node graph.
A surprising property of word vectors is that vector algebra can often be used to solve word analogies. However, it is unclear why — and when — linear operators correspond to non-linear embedding models such as skip-gram with negative sampling (SGNS). We provide a rigorous explanation of this phenomenon without making the strong assumptions that past work has made about the vector space and word distribution. Our theory has several implications. Past work has often conjectured that linear structures exist in vector spaces because relations can be represented as ratios; we prove that this holds for SGNS. We provide novel theoretical justification for the addition of SGNS word vectors by showing that it automatically down-weights the more frequent word, as weighting schemes do ad hoc. Lastly, we offer an information theoretic interpretation of Euclidean distance in vector spaces, providing rigorous justification for its use in capturing word dissimilarity.
Feature Selection (FS) plays an important role in learning and classification tasks. The object of FS is to select the relevant and non-redundant features. Considering the huge amount number of features in real-world applications, FS methods using batch learning technique can’t resolve big data problem especially when data arrive sequentially. In this paper, we propose an online feature selection system which resolves this problem. More specifically, we treat the problem of online supervised feature selection for binary classification as a decision-making problem. A philosophical vision to this problem leads to a hybridization between two important domains: feature selection using online learning technique (OFS) and automated negotiation (AN). The proposed OFS system called MOANOFS (Multi-Objective Automated Negotiation based Online Feature Selection) uses two levels of decision. In the first level, from n learners (or OFS methods), we decide which are the k trustful ones (with high confidence or trust value). These elected k learners will participate in the second level. In this level, we integrate our proposed Multilateral Automated Negotiation based OFS (MANOFS) method to decide finally which is the best solution or which are relevant features. We show that MOANOFS system is applicable to different domains successfully and achieves high accuracy with several real-world applications. Index Terms: Feature selection, online learning, multi-objective automated negotiation, trust, classification, big data.
In-memory databases (IMDBs) are gaining increasing popularity in big data applications, where clients commit updates intensively. Specifically, it is necessary for IMDBs to have efficient snapshot performance to support certain special applications (e.g., consistent checkpoint, HTAP). Formally, the in-memory consistent snapshot problem refers to taking an in-memory consistent time-in-point snapshot with the constraints that 1) clients can read the latest data items and 2) any data item in the snapshot should not be overwritten. Various snapshot algorithms have been proposed in academia to trade off throughput and latency, but industrial IMDBs such as Redis adhere to the simple fork algorithm. To understand this phenomenon, we conduct comprehensive performance evaluations on mainstream snapshot algorithms. Surprisingly, we observe that the simple fork algorithm indeed outperforms the state-of-the-arts in update-intensive workload scenarios. On this basis, we identify the drawbacks of existing research and propose two lightweight improvements. Extensive evaluations on synthetic data and Redis show that our lightweight improvements yield better performance than fork, the current industrial standard, and the representative snapshot algorithms from academia. Finally, we have opensourced the implementation of all the above snapshot algorithms so that practitioners are able to benchmark the performance of each algorithm and select proper methods for different application scenarios.
In this paper, we present sequeval, a software tool capable of performing the offline evaluation of a recommender system designed to suggest a sequence of items. A sequence-based recommender is trained considering the sequences already available in the system and its purpose is to generate a personalized sequence starting from an initial seed. This tool automatically evaluates the sequence-based recommender considering a comprehensive set of eight different metrics adapted to the sequential scenario. sequeval has been developed following the best practices of software extensibility. For this reason, it is possible to easily integrate and evaluate novel recommendation techniques. sequeval is publicly available as an open source tool and it aims to become a focal point for the community to assess sequence-based recommender systems.
Different software tools have been developed with the purpose of performing offline evaluations of recommender systems. However, the results obtained with these tools may be not directly comparable because of subtle differences in the experimental protocols and metrics. Furthermore, it is difficult to analyze in the same experimental conditions several algorithms without disclosing their implementation details. For these reasons, we introduce RecLab, an open source software for evaluating recommender systems in a distributed fashion. By relying on consolidated web protocols, we created RESTful APIs for training and querying recommenders remotely. In this way, it is possible to easily integrate into the same toolkit algorithms realized with different technologies. In details, the experimenter can perform an evaluation by simply visiting a web interface provided by RecLab. The framework will then interact with all the selected recommenders and it will compute and display a comprehensive set of measures, each representing a different metric. The results of all experiments are permanently stored and publicly available in order to support accountability and comparative analyses.
Generation expansion planning (GEP) models have been useful aids for long-term planning. Recent growth in intermittent renewable generation has increased the need to represent the capability for non-renewables to respond to rapid changes in daily loads, leading research to bring unit commitment (UC) features into GEPs. Such GEP+UC models usually contain discrete variables which, along with many details, make computation times impractically long for analysts who need to develop, debug, modify and use the GEP for many alternative runs. We propose a GEP with generation aggregated by technology type, and with the minimal UC content necessary to represent the limitations on generation to respond to rapid changes in demand, i.e., ramp-up and ramp-down constraints, with ramp limits estimated from historical data on maximum rates of change of each generation type. We illustrate with data for the province of Ontario in Canada; the GEP is a large scale linear program that solves in less than one hour on modest computing equipment, with credible solutions.
Image translation is a burgeoning field in computer vision where the goal is to learn the mapping between an input image and an output image. However, most recent methods require multiple generators for modeling different domain mappings, which are inefficient and ineffective on some multi-domain image translation tasks. In this paper, we propose a novel method, SingleGAN, to perform multi-domain image-to-image translations with a single generator. We introduce the domain code to explicitly control the different generative tasks and integrate multiple optimization goals to ensure the translation. Experimental results on several unpaired datasets show superior performance of our model in translation between two domains. Besides, we explore variants of SingleGAN for different tasks, including one-to-many domain translation, many-to-many domain translation and one-to-one domain translation with multimodality. The extended experiments show the universality and extensibility of our model.
Humans are experts at high-fidelity imitation — closely mimicking a demonstration, often in one attempt. Humans use this ability to quickly solve a task instance, and to bootstrap learning of new tasks. Achieving these abilities in autonomous agents is an open problem. In this paper, we introduce an off-policy RL algorithm (MetaMimic) to narrow this gap. MetaMimic can learn both (i) policies for high-fidelity one-shot imitation of diverse novel skills, and (ii) policies that enable the agent to solve tasks more efficiently than the demonstrators. MetaMimic relies on the principle of storing all experiences in a memory and replaying these to learn massive deep neural network policies by off-policy RL. This paper introduces, to the best of our knowledge, the largest existing neural networks for deep RL and shows that larger networks with normalization are needed to achieve one-shot high-fidelity imitation on a challenging manipulation task. The results also show that both types of policy can be learned from vision, in spite of the task rewards being sparse, and without access to demonstrator actions.
Current measures for evaluating text simplification systems focus on evaluating lexical text aspects, neglecting its structural aspects. In this paper we propose the first measure to address structural aspects of text simplification, called SAMSA. It leverages recent advances in semantic parsing to assess simplification quality by decomposing the input based on its semantic structure and comparing it to the output. SAMSA provides a reference-less automatic evaluation procedure, avoiding the problems that reference-based methods face due to the vast space of valid simplifications for a given sentence. Our human evaluation experiments show both SAMSA’s substantial correlation with human judgments, as well as the deficiency of existing reference-based measures in evaluating structural simplification.
Since the proposal of big data analysis and Graphic Processing Unit (GPU), the deep learning technology has received a great deal of attention and has been widely applied in the field of imaging processing. In this paper, we have an aim to completely review and summarize the deep learning technologies for image denoising proposed in recent years. Morever, we systematically analyze the conventional machine learning methods for image denoising. Finally, we point out some research directions for the deep learning technologies in image denoising.
We present the Tamed Cross Entropy (TCE) loss function, a robust derivative of the standard Cross Entropy (CE) loss used in deep learning for classification tasks. However, unlike other robust losses, the TCE loss is designed to exhibit the same training properties than the CE loss in noiseless scenarios. Therefore, the TCE loss requires no modification on the training regime compared to the CE loss and, in consequence, can be applied in all applications where the CE loss is currently used. We evaluate the TCE loss using the ResNet architecture on four image datasets that we artificially contaminated with various levels of label noise. The TCE loss outperforms the CE loss in every tested scenario.
Statistical physics is the natural framework to model complex networks. In the last twenty years, it has brought novel physical insights on a variety of emergent phenomena, such as self-organisation, scale invariance, mixed distributions and ensemble non-equivalence, which cannot be deduced from the behaviour of the individual constituents. At the same time, thanks to its deep connection with information theory, statistical physics and the principle of maximum entropy have led to the definition of null models reproducing some features of empirical networks, but otherwise as random as possible. We review here the statistical physics approach for complex networks and the null models for the various physical problems, focusing in particular on the analytic frameworks reproducing the local features of the network. We show how these models have been used to detect statistically significant and predictive structural patterns in real-world networks, as well as to reconstruct the network structure in case of incomplete information. We further survey the statistical physics frameworks that reproduce more complex, semi-local network features using Markov chain Monte Carlo sampling, and the models of generalised network structures such as multiplex networks, interacting networks and simplicial complexes.
Identifying influential nodes in a network is a fundamental issue due to its wide applications, such as accelerating information diffusion or halting virus spreading. Many measures based on the network topology have emerged over the years to identify influential nodes such as Betweenness, Closeness, and Eigenvalue centrality. However, although most real-world networks are modular, few measures exploit this property. Recent works have shown that it has a significant effect on the dynamics on networks. In a modular network, a node has two types of influence: a local influence (on the nodes of its community) through its intra-community links and a global influence (on the nodes in other communities) through its inter-community links. Depending of the strength of the community structure, these two components are more or less influential. Based on this idea, we propose to extend all the standard centrality measures defined for networks with no community structure to modular networks. The so-called ‘Modular centrality’ is a two dimensional vector. Its first component quantifies the local influence of a node in its community while the second component quantifies its global influence on the other communities of the network. In order to illustrate the effectiveness of the Modular centrality extensions, comparison with their scalar counterpart are performed in an epidemic process setting. Simulation results using the Susceptible-Infected-Recovered (SIR) model on synthetic networks with controlled community structure allows getting a clear idea about the relation between the strength of the community structure and the major type of influence (global/local). Furthermore, experiments on real-world networks demonstrate the merit of this approach.
Sentence splitting is a major simplification operator. Here we present a simple and efficient splitting algorithm based on an automatic semantic parser. After splitting, the text is amenable for further fine-tuned simplification operations. In particular, we show that neural Machine Translation can be effectively used in this situation. Previous application of Machine Translation for simplification suffers from a considerable disadvantage in that they are over-conservative, often failing to modify the source in any way. Splitting based on semantic parsing, as proposed here, alleviates this issue. Extensive automatic and human evaluation shows that the proposed method compares favorably to the state-of-the-art in combined lexical and structural simplification.
There is a previously identified equivalence between wide fully connected neural networks (FCNs) and Gaussian processes (GPs). This equivalence enables, for instance, test set predictions that would have resulted from a fully Bayesian, infinitely wide trained FCN to be computed without ever instantiating the FCN, but by instead evaluating the corresponding GP. In this work, we derive an analogous equivalence for multi-layer convolutional neural networks (CNNs) both with and without pooling layers, and achieve state of the art results on CIFAR10 for GPs without trainable kernels. We also introduce a Monte Carlo method to estimate the GP corresponding to a given neural network architecture, even in cases where the analytic form has too many terms to be computationally feasible. Surprisingly, in the absence of pooling layers, the GPs corresponding to CNNs with and without weight sharing are identical. As a consequence, translation equivariance in finite-channel CNNs trained with stochastic gradient descent (SGD) has no corresponding property in the Bayesian treatment of the infinite channel limit – a qualitative difference between the two regimes that is not present in the FCN case. We confirm experimentally, that while in some scenarios the performance of SGD-trained finite CNNs approaches that of the corresponding GPs as the channel count increases, with careful tuning SGD-trained CNNs can significantly outperform their corresponding GPs, suggesting advantages from SGD training compared to fully Bayesian parameter estimation.