A long line of literature has focused on the problem of selecting a team of individuals from a large pool of candidates, such that certain constraints are respected, and a given objective function is maximized. Even though extant research has successfully considered diverse families of objective functions and constraints, one of the most common limitations is the focus on the single-team paradigm. Despite its well-documented applications in multiple domains, this paradigm is not appropriate when the team-builder needs to partition the entire population into multiple teams. Team-partitioning tasks are very common in an educational setting, in which the teacher has to partition the students in her class into teams for collaborative projects. The task also emerges in the context of organizations, when managers need to partition the workforce into teams with specific properties to tackle relevant projects. In this work, we extend the team formation literature by introducing the Guided Team-Partitioning (GTP) problem, which asks for the partitioning of a population into teams such that the centroid of each team is as close as possible to a given target vector. As we describe in detail in our work, this formulation allows the team-builder to control the composition of the produced teams and has natural applications in practical settings. Algorithms for the GTP need to simultaneously consider the composition of multiple non-overlapping teams that compete for the same population of candidates. This makes the problem considerably more challenging than formulations that focus on the optimization of a single team. In fact, we prove that GTP is NP-hard to solve and even to approximate. The complexity of the problem motivates us to consider efficient algorithmic heuristics, which we evaluate via experiments on both real and synthetic datasets.
This paper advocates against permute-and-predict (PaP) methods for interpreting black box functions. Methods such as the variable importance measures proposed for random forests, partial dependence plots, and individual conditional expectation plots remain popular because of their ability to provide model-agnostic measures that depend only on the pre-trained model output. However, numerous studies have found that these tools can produce diagnostics that are highly misleading, particularly when there is strong dependence among features. Rather than simply add to this growing literature by further demonstrating such issues, here we seek to provide an explanation for the observed behavior. In particular, we argue that breaking dependencies between features in hold-out data places undue emphasis on sparse regions of the feature space by forcing the original model to extrapolate to regions where there is little to no data. We explore these effects through various settings where a ground-truth is understood and find support for previous claims in the literature that PaP metrics tend to over-emphasize correlated features both in variable importance and partial dependence plots, even though applying permutation methods to the ground-truth models do not. As an alternative, we recommend more direct approaches that have proven successful in other settings: explicitly removing features, conditional permutations, or model distillation methods.
In this study we provided a scheduling procedure which is combination of machine learning and mathematical programming. Outpatients who request for appointment in healthcare facilities have different priorities. Determining the priority of outpatients and allocating the capacity based on the priority classes are important concepts that have to be considered in scheduling of outpatients. Two stages are defined for scheduling an incoming patient. In the ?st stage, We applied and compared different clustering methods such as k-mean clustering and agglomerative hierarchical clustering methods to classify outpatients into priority classes and suggested the best pattern to cluster the outpatients. In the second stage, we modeled the scheduling problem as a Markov Decision Process (MDP) problem that tries to decrease waiting time of higher priority outpatients. Due to the curse of dimensionality, we used uid approximation method to estimate the optimal solution of the MDP. We applied our methodology on a dataset of Shaheed Rajaei Medical and Research Center in Iran, and we represented that how our models works in prioritizing and scheduling of outpatients.
The Receiver Operating Characteristic (ROC) curve is a representation of the statistical information discovered in binary classification problems and is a key concept in machine learning and data science. This paper studies the statistical properties of ROC curves and its implication on model selection. We analyze the implications of different models of incentive heterogeneity and information asymmetry on the relation between human decisions and the ROC curves. Our theoretical discussion is illustrated in the context of a large data set of pregnancy outcomes and doctor diagnosis from the Pre-Pregnancy Checkups of reproductive age couples in Henan Province provided by the Chinese Ministry of Health.
Recent work on mode connectivity in the loss landscape of deep neural networks has demonstrated that the locus of (sub-)optimal weight vectors lies on continuous paths. In this work, we train a neural network that serves as a hypernetwork, mapping a latent vector into high-performance (low-loss) weight vectors, generalizing recent findings of mode connectivity to higher dimensional manifolds. We formulate the training objective as a compromise between accuracy and diversity, where the diversity takes into account trivial symmetry transformations of the target network. We demonstrate how to reduce the number of parameters in the hypernetwork by parameter sharing. Once learned, the hypernetwork allows for a computationally efficient, ancestral sampling of neural network weights, which we recruit to form large ensembles. The improvement in classification accuracy obtained by this ensembling indicates that the generated manifold extends in dimensions other than directions implied by trivial symmetries. For computational efficiency, we distill an ensemble into a single classifier while retaining generalization.
Platooning involves a set of vehicles moving in a cooperative fashion at equal inter-vehicular distances. Taking advantage of wireless communication technology, this paper aims to show the impact of network protocols on a platoon using a controller, based on the Cooperative Adaptive Cruise Control (CACC) principles. The network protocols used in this work are DSRC (Dedicated Short Range Communication) and LTE-V2V sidelink (Mode 4). The main focus of this work is to showcase the ability of the controller to maintain platoon stability despite having uncertainties in both, the platoon and the message delivery rates over the network protocols. The controller interacts with all vehicles using messages transmitted over the network protocols. The controller is designed to be responsible for micro-managing every vehicle in the platoon and to ensure that the platoon does not break under any circumstances. SUMO (Simulation of Urban MObility) is used as the simulation platform. Results indicate, that the controller manages to achieve platoon stability in all scenarios, unless a set number of consecutive messages are not transmitted, in which case it leads to collisions. This work also presents certain bottlenecks pertaining to wireless communication with vehicles.
Convolutional Neural Networks have achieved impressive results in various tasks, but interpreting the internal mechanism is a challenging problem. To tackle this problem, we exploit a multi-channel attention mechanism in feature space. Our network architecture allows us to obtain an attention mask for each feature while existing CNN visualization methods provide only a common attention mask for all features. We apply the proposed multi-channel attention mechanism to multi-attribute recognition task. We can obtain different attention mask for each feature and for each attribute. Those analyses give us deeper insight into the feature space of CNNs. The experimental results for the benchmark dataset show that the proposed method gives high interpretability to humans while accurately grasping the attributes of the data.
In this paper, we propose a novel convolutional neural network (CNN) architecture considering both local and global features for image enhancement. Most conventional image enhancement methods, including Retinex-based methods, cannot restore lost pixel values caused by clipping and quantizing. CNN-based methods have recently been proposed to solve the problem, but they still have a limited performance due to network architectures not handling global features. To handle both local and global features, the proposed architecture consists of three networks: a local encoder, a global encoder, and a decoder. In addition, high dynamic range (HDR) images are used for generating training data for our networks. The use of HDR images makes it possible to train CNNs with better-quality images than images directly captured with cameras. Experimental results show that the proposed method can produce higher-quality images than conventional image enhancement methods including CNN-based methods, in terms of various objective quality metrics: TMQI, entropy, NIQE, and BRISQUE.
Keeping up with threat intelligence is a must for a security analyst today. There is a volume of information present in the wild’ that affects an organization. We need to develop an artificial intelligence system that scours the intelligence sources, to keep the analyst updated about various threats that pose a risk to her organization. A security analyst who is better tapped in’ can be more effective. In this paper we present, Cyber-All-Intel an artificial intelligence system to aid a security analyst. It is a system for knowledge extraction, representation and analytics in an end-to-end pipeline grounded in the cybersecurity informatics domain. It uses multiple knowledge representations like, vector spaces and knowledge graphs in a ‘VKG structure’ to store incoming intelligence. The system also uses neural network models to pro-actively improve its knowledge. We have also created a query engine and an alert system that can be used by an analyst to find actionable cybersecurity insights.
Estimating statistical uncertainties allows autonomous agents to communicate their confidence during task execution and is important for applications in safety-critical domains such as autonomous driving. In this work, we present the uncertainty-aware imitation learning (UAIL) algorithm for improving end-to-end control systems via data aggregation. UAIL applies Monte Carlo Dropout to estimate uncertainty in the control output of end-to-end systems, using states where it is uncertain to selectively acquire new training data. In contrast to prior data aggregation algorithms that force human experts to visit sub-optimal states at random, UAIL can anticipate its own mistakes and switch control to the expert in order to prevent visiting a series of sub-optimal states. Our experimental results from simulated driving tasks demonstrate that our proposed uncertainty estimation method can be leveraged to reliably predict infractions. Our analysis shows that UAIL outperforms existing data aggregation algorithms on a series of benchmark tasks.
Evaluation of deep reinforcement learning (RL) is inherently challenging. In particular, learned policies are largely opaque, and hypotheses about the behavior of deep RL agents are difficult to test in black-box environments. Considerable effort has gone into addressing opacity, but almost no effort has been devoted to producing high quality environments for experimental evaluation of agent behavior. We present TOYBOX, a new high-performance, open-source* subset of Atari environments re-designed for the experimental evaluation of deep RL. We show that TOYBOX enables a wide range of experiments and analyses that are impossible in other environments. *https://…/Toybox
This paper studies accelerations in Q-learning algorithms. We propose an accelerated target update scheme by incorporating the historical iterates of Q functions. The idea is conceptually inspired by the momentum-based accelerated methods in the optimization theory. Conditions under which the proposed accelerated algorithms converge are established. The algorithms are validated using commonly adopted testing problems in reinforcement learning, including the FrozenLake grid world game, two discrete-time LQR problems from the Deepmind Control Suite, and the Atari 2600 games. Simulation results show that the proposed accelerated algorithms can improve the convergence performance compared with the vanilla Q-learning algorithm.
Pattern analysis often requires a pre-processing stage for extracting or selecting features in order to help the classification, prediction, or clustering stage discriminate or represent the data in a better way. The reason for this requirement is that the raw data are complex and difficult to process without extracting or selecting appropriate features beforehand. This paper reviews theory and motivation of different common methods of feature selection and extraction and introduces some of their applications. Some numerical implementations are also shown for these methods. Finally, the methods in feature selection and extraction are compared.
Frequently Asked Question (FAQ) retrieval is an important task where the objective is to retrieve the appropriate Question-Answer (QA) pair from a database based on the user’s query. In this study, we propose a FAQ retrieval system that considers the similarity between a user’s query and a question computed by a traditional unsupervised information retrieval system, as well as the relevance between the query and an answer computed by the recently-proposed BERT model. By combining the rule-based approach and the flexible neural approach, the proposed system realizes robust FAQ retrieval. A common approach to FAQ retrieval is to construct labeled data for training, which takes a lot of costs. However, a FAQ database generally contains a too small number of QA pairs to train a model. To surmount this problem, we leverage FAQ sets that are similar to the one in question. We construct localgovFAQ dataset based on FAQ pages of administrative municipalities throughout Japan. In this research, we evaluate our approach on two datasets, localgovFAQ dataset and StackExchange dataset, and demonstrate that our proposed method works effectively.
This paper introduces the jackknife+, which is a novel method for constructing predictive confidence intervals. Whereas the jackknife outputs an interval centered at the predicted response of a test point, with the width of the interval determined by the quantiles of leave-one-out residuals, the jackknife+ also uses the leave-one-out predictions at the test point to account for the variability in the fitted regression function. Assuming exchangeable training samples, we prove that this crucial modification permits rigorous coverage guarantees regardless of the distribution of the data points, for any algorithm that treats the training points symmetrically. Such guarantees are not possible for the original jackknife and we demonstrate examples where the coverage rate may actually vanish. Our theoretical and empirical analysis reveals that the jackknife and the jackknife+ intervals achieve nearly exact coverage and have similar lengths whenever the fitting algorithm obeys some form of stability. Further, we extend the jackknife+ to K-fold cross validation and similarly establish rigorous coverage properties. Our methods are related to cross-conformal prediction proposed by Vovk [2015] and we discuss connections.
Artificial intelligence (AI) researchers claim that they have made great achievements’ in clinical realms. However, clinicians point out the so-called achievements’ have no ability to implement into natural clinical settings. The root cause for this huge gap is that many essential features of natural clinical tasks are overlooked by AI system developers without medical background. In this paper, we propose that the clinical benchmark suite is a novel and promising direction to capture the essential features of the real-world clinical tasks, hence qualifies itself for guiding the development of AI systems, promoting the implementation of AI in real-world clinical practice.
Emotion is intrinsic to humans and consequently emotion understanding is a key part of human-like artificial intelligence (AI). Emotion recognition in conversation (ERC) is becoming increasingly popular as a new research frontier in natural language processing (NLP) due to its ability to mine opinions from the plethora of publicly available conversational data in platforms such as Facebook, Youtube, Reddit, Twitter, and others. Moreover, it has potential applications in health-care systems (as a tool for psychological analysis), education (understanding student frustration) and more. Additionally, ERC is also extremely important for generating emotion-aware dialogues that require an understanding of the user’s emotions. Catering to these needs calls for effective and scalable conversational emotion-recognition algorithms. However, it is a strenuous problem to solve because of several research challenges. In this paper, we discuss these challenges and shed light on the recent research in this field. We also describe the drawbacks of these approaches and discuss the reasons why they fail to successfully overcome the research challenges in ERC.
Vanilla convolutional neural networks are known to provide superior performance not only in image recognition tasks but also in natural language processing and time series analysis. One of the strengths of convolutional layers is the ability to learn features about spatial relations in the input domain using various parameterized convolutional kernels. However, in time series analysis learning such spatial relations is not necessarily required nor effective. In such cases, kernels which model temporal dependencies or kernels with broader spatial resolutions are recommended for more efficient training as proposed by dilation kernels. However, the dilation has to be fixed a priori which limits the flexibility of the kernels. We propose generalized dilation networks which generalize the initial dilations in two aspects. First we derive an end-to-end learnable architecture for dilation layers where also the dilation rate can be learned. Second we break up the strict dilation structure, in that we develop kernels operating independently in the input space.
A robust estimator is proposed for the parameters that characterize the linear regression problem. It is based on the notion of shrinkages, often used in Finance and previously studied for outlier detection in multivariate data. A thorough simulation study is conducted to investigate: the efficiency with normal and heavy-tailed errors, the robustness under contamination, the computational times, the affine equivariance and breakdown value of the regression estimator. Two classical data-sets often used in the literature and a real socio-economic data-set about the Living Environment Deprivation of areas in Liverpool (UK), are studied. The results from the simulations and the real data examples show the advantages of the proposed robust estimator in regression.
This paper proposes a new extension to Deep Evolutionary Network Structured Evolution (DENSER), called Fast-DENSER++ (F-DENSER++). The vast majority of NeuroEvolution methods that optimise Deep Artificial Neural Networks (DANNs) only evaluate the candidate solutions for a fixed amount of epochs; this makes it difficult to effectively assess the learning strategy, and requires the best generated network to be further trained after evolution. F-DENSER++ enables the training time of the candidate solutions to grow continuously as necessary, i.e., in the initial generations the candidate solutions are trained for shorter times, and as generations proceed it is expected that longer training cycles enable better performances. Consequently, the models discovered by F-DENSER++ are fully-trained DANNs, and are ready for deployment after evolution, without the need for further training. The results demonstrate the ability of F-DENSER++ to effectively generate fully-trained DANNs; by the end of evolution, whilst the average performance of the models generated by F-DENSER++ is of 88.73%, the performance of the models generated by the previous version of DENSER (Fast-DENSER) is 86.91% (statistically significant), which increases to 87.76% when allowed to train for longer.
In recent years, Signal Temporal Logic (STL) has gained traction as a practical and expressive means of encoding control objectives for robotic and cyber-physical systems. The state-of-the-art in STL trajectory synthesis is to formulate the problem as a Mixed Integer Linear Program (MILP). The MILP approach is sound and complete for bounded specifications, but such strong correctness guarantees come at the price of exponential complexity in the number of predicates and the time bound of the specification. In this work, we propose an alternative synthesis paradigm that relies on Bayesian optimization rather than mixed integer programming. This relaxes the completeness guarantee to probabilistic completeness, but is significantly more efficient: our approach scales polynomially in the STL time-bound and linearly in the number of predicates. We prove that our approach is sound and probabilistically complete, and demonstrate its scalability with a nontrivial example.
Clinical diagnostic decision making and population-based studies often rely on multi-modal data which is noisy and incomplete. Recently, several works proposed geometric deep learning approaches to solve disease classification, by modeling patients as nodes in a graph, along with graph signal processing of multi-modal features. Many of these approaches are limited by assuming modality- and feature-completeness, and by transductive inference, which requires re-training of the entire model for each new test sample. In this work, we propose a novel inductive graph-based approach that can generalize to out-of-sample patients, despite missing features from entire modalities per patient. We propose multi-modal graph fusion which is trained end-to-end towards node-level classification. We demonstrate the fundamental working principle of this method on a simplified MNIST toy dataset. In experiments on medical data, our method outperforms single static graph approach in multi-modal disease classification.
We present state-of-the-art automatic speech recognition (ASR) systems employing a standard hybrid DNN\/HMM architecture compared to an attention-based encoder-decoder design for the LibriSpeech task. Detailed descriptions of the system development, including model design, pretraining schemes, training schedules, and optimization approaches are provided for both system architectures. Both hybrid DNN/HMM and attention-based systems employ bi-directional LSTMs for acoustic modeling/encoding. For language modeling, we employ both LSTM and Transformer based architectures. All our systems are built using RWTHs open-source toolkits RASR and RETURNN. To the best knowledge of the authors, the results obtained when training on the full LibriSpeech training set, are the best published currently, both for the hybrid DNN/HMM and the attention-based systems. Our single hybrid system even outperforms previous results obtained from combining eight single systems. Our comparison shows that on the LibriSpeech 960h task, the hybrid DNN/HMM system outperforms the attention-based system by 15% relative on the clean and 40% relative on the other test sets in terms of word error rate. Moreover, experiments on a reduced 100h-subset of the LibriSpeech training corpus even show a more pronounced margin between the hybrid DNN/HMM and attention-based architectures.
This paper proposes a method for machine learning from unlabeled data in the form of a time-series. The mapping that is learned is shown to extract slowly evolving information that would be useful for control applications, while efficiently filtering out unwanted, higher-frequency noise. The method consists of training a feedforward artificial neural network with backpropagation using two opposing objectives. The first of these is to minimize the squared changes in activations between time steps of each unit in the network. This ‘temporal smoothing’ has the effect of correlating inputs that occur close in time with outputs that are close in the L2-norm. The second objective is to maximize the log determinant of the covariance matrix of activations in each layer of the network. This objective ensures that information from each layer is passed through to the next. This second objective acts as a balance to the first, which on its own would result in a network with all input weights equal to zero.
We give parallel and distributed algorithms for the housing allocation problem. In this problem, there is a set of agents and a set of houses. Each agent has a strict preference list for a subset of houses. We need to find a matching such that some criterion is optimized. One such criterion is Pareto Optimality. A matching is Pareto optimal if no coalition of agents can be strictly better off by exchanging houses among themselves. We also study the housing market problem, a variant of the housing allocation problem, where each agent initially owns a house. In addition to Pareto optimality, we are also interested in finding the core of a housing market. A matching is in the core if there is no coalition of agents that can be better off by breaking away from other agents and switching houses only among themselves. In the first part of this work, we show that computing a Pareto optimal matching of a house allocation is in {\bf CC} and computing the core of a housing market is {\bf CC}-hard. Given a matching, we also show that verifying whether it is in the core can be done in {\bf NC}. We then give an algorithm to show that computing a maximum Pareto optimal matching for the housing allocation problem is in {\bf RNC}^2 and quasi-{\bf NC}^2. In the second part of this work, we present a distributed version of the top trading cycle algorithm for finding the core of a housing market. To that end, we first present two algorithms for finding all the disjoint cycles in a functional graph: a Las Vegas algorithm which terminates in $O(\log l)$ rounds with high probability, where $l$ is the length of the longest cycle, and a deterministic algorithm which terminates in $O(\log^* n \log l)$ rounds, where $n$ is the number of nodes in the graph. Both algorithms work in the synchronous distributed model and use messages of size $O(\log n)$.