Increasingly fast development and update cycle of online course contents, and diverse demographics of students in each online classroom, make student performance prediction in real-time (before the course finishes) an interesting topic for both industrial research and practical needs. In that, we tackle the problem of real-time student performance prediction with on-going courses in domain adaptation framework, which is a system trained on students’ labeled outcome from one previous coursework but is meant to be deployed on another. In particular, we first review recently-developed GritNet architecture which is the current state of the art for student performance prediction problem, and introduce a new unsupervised domain adaptation method to transfer a GritNet trained on a past course to a new course without any (students’ outcome) label. Our results for real Udacity students’ graduation predictions show that the GritNet not only generalizes well from one course to another across different Nanodegree programs, but enhances real-time predictions explicitly in the first few weeks when accurate predictions are most challenging.
Modern vision-based reinforcement learning techniques often use convolutional neural networks (CNN) as universal function approximators to choose which action to take for a given visual input. Until recently, CNNs have been treated like black-box functions, but this mindset is especially dangerous when used for control in safety-critical settings. In this paper, we present our extensions of CNN visualization algorithms to the domain of vision-based reinforcement learning. We use a simulated drone environment as an example scenario. These visualization algorithms are an important tool for behavior introspection and provide insight into the qualities and flaws of trained policies when interacting with the physical world. A video may be seen at https://…/drlvisual.
The primary focus of this paper is on designing inexact first-order methods for solving large-scale constrained nonlinear optimization problems. By controlling the inexactness of the subproblem solution, we can significantly reduce the computational cost needed for each iteration. A penalty parameter updating strategy during the subproblem solve enables the algorithm to automatically detect infeasibility. Global convergence for both feasible and infeasible cases are proved. Complexity analysis for the KKT residual is also derived under loose assumptions. Numerical experiments exhibit the ability of the proposed algorithm to rapidly find inexact optimal solution through cheap computational cost.
We address two challenges in topic models: (1) Context information around words helps in determining their actual meaning, e.g., ‘networks’ used in the contexts artificial neural networks vs. biological neuron networks. Generative topic models infer topic-word distributions, taking no or only little context into account. Here, we extend a neural autoregressive topic model to exploit the full context information around words in a document in a language modeling fashion. The proposed model is named as iDocNADE. (2) Due to the small number of word occurrences (i.e., lack of context) in short text and data sparsity in a corpus of few documents, the application of topic models is challenging on such texts. Therefore, we propose a simple and efficient way of incorporating external knowledge into neural autoregressive topic models: we use embeddings as a distributional prior. The proposed variants are named as DocNADE2 and iDocNADE2. We present novel neural autoregressive topic model variants that consistently outperform state-of-the-art generative topic models in terms of generalization, interpretability (topic coherence) and applicability (retrieval and classification) over 6 long-text and 8 short-text datasets from diverse domains.
Sparse reward problems are one of the biggest challenges in Reinforcement Learning. Goal-directed tasks are one such sparse reward problems where a reward signal is received only when the goal is reached. One promising way to train an agent to perform goal-directed tasks is to use Hindsight Learning approaches. In these approaches, even when an agent fails to reach the desired goal, the agent learns to reach the goal it achieved instead. Doing this over multiple trajectories while generalizing the policy learned from the achieved goals, the agent learns a goal conditioned policy to reach any goal. One such approach is Hindsight Experience replay which uses an off-policy Reinforcement Learning algorithm to learn a goal conditioned policy. In this approach, a replay of the past transitions happens in a uniformly random fashion. Another approach is to use a Hindsight version of the policy gradients to directly learn a policy. In this work, we discuss different ways to replay past transitions to improve learning in hindsight experience replay focusing on prioritized variants in particular. Also, we implement the Hindsight Policy gradient methods to robotic tasks.
Open data refers to data that is freely available for reuse. Although there has been rapid increase in availability of open data to public in the last decade, this has not translated into better decision-support tools for them. We propose intelligent conversation generators as a grand challenge that would automatically create data-driven conversation interfaces (CIs), also known as chatbots or dialog systems, from open data and deliver personalized analytical insights to users based on their contextual needs. Such generators will not only help bring Artificial Intelligence (AI)-based solutions for important societal problems to the masses but also advance AI by providing an integrative testbed for human-centric AI and filling gaps in the state-of-art towards this aim.
Convolutional neural networks memorize part of their training data, which is why strategies such as data augmentation and drop-out are employed to mitigate overfitting. This paper considers the related question of ‘membership inference’, where the goal is to determine if an image was used during training. We consider it under three complementary angles. We show how to detect which dataset was used to train a model, and in particular whether some validation images were used at train time. We then analyze explicit memorization and extend classical random label experiments to the problem of learning a model that predicts if an image belongs to an arbitrary set. Finally, we propose a new approach to infer membership when a few of the top layers are not available or have been fine-tuned, and show that lower layers still carry information about the training samples. To support our findings, we conduct large-scale experiments on Imagenet and subsets of YFCC-100M with modern architectures such as VGG and Resnet.
Learning interpretable features from complex multilayer networks is a challenging and important problem. The need for such representations is particularly evident in multilayer networks of the brain, where nodal characteristics may help model and differentiate regions of the brain according to individual, cognitive task, or disease. Motivated by this problem, we introduce the multi-node2vec algorithm, an efficient and scalable feature engineering method that automatically learns continuous node feature representations from multilayer networks. Multi-node2vec relies upon a second-order random walk sampling procedure that efficiently explores the inner- and intra-layer ties of the observed multilayer network is utilized to identify multilayer neighborhoods. Maximum likelihood estimators of the nodal features are identified through the use of the Skip-gram neural network model on the collection of sampled neighborhoods. We investigate the conditions under which multi-node2vec is an approximation of a closed-form matrix factorization problem. We demonstrate the efficacy of multi-node2vec on a multilayer functional brain network from resting state fMRI scans over a group of 74 healthy individuals. We find that multi-node2vec outperforms contemporary methods on complex networks, and that multi-node2vec identifies nodal characteristics that closely associate with the functional organization of the brain.
In this paper we first present a class of algorithms for training multi-level neural networks with a quadratic cost function one layer at a time starting from the input layer. The algorithm is based on the fact that for any layer to be trained, the effect of a direct connection to an optimized linear output layer can be computed without the connection being made. Thus, starting from the input layer, we can train each layer in succession in isolation from the other layers. Once trained, the weights are kept fixed and the outputs of the trained layer then serve as the inputs to the next layer to be trained. The result is a very fast algorithm. The simplicity of this training arrangement allows the activation function and step size in weight adjustment to be adaptive and self-adjusting. Furthermore, the stability of the training process allows relatively large steps to be taken and thereby achieving in even greater speeds. Finally, in our context configuring the network means determining the number of outputs for each layer. By decomposing the overall cost function into separate components related to approximation and estimation, we obtain an optimization formula for determining the number of outputs for each layer. With the ability to self-configure and set parameters, we now have more than a fast training algorithm, but the ability to build automatically a fully trained deep neural network starting with nothing more than data.
In critical applications of anomaly detection including computer security and fraud prevention, the anomaly detector must be configurable by the analyst to minimize the effort on false positives. One important way to configure the anomaly detector is by providing true labels for a few instances. We study the problem of label-efficient active learning to automatically tune anomaly detection ensembles and make four main contributions. First, we present an important insight into how anomaly detector ensembles are naturally suited for active learning. This insight allows us to relate the greedy querying strategy to uncertainty sampling, with implications for label-efficiency. Second, we present a novel formalism called compact description to describe the discovered anomalies and show that it can also be employed to improve the diversity of the instances presented to the analyst without loss in the anomaly discovery rate. Third, we present a novel data drift detection algorithm that not only detects the drift robustly, but also allows us to take corrective actions to adapt the detector in a principled manner. Fourth, we present extensive experiments to evaluate our insights and algorithms in both batch and streaming settings. Our results show that in addition to discovering significantly more anomalies than state-of-the-art unsupervised baselines, our active learning algorithms under the streaming-data setup are competitive with the batch setup.
Emerging applications in autonomy require control techniques that take into account uncertain environments, communication and sensing constraints, while satisfying highlevel mission specifications. Motivated by this need, we consider a class of Markov decision processes (MDPs), along with a transfer entropy cost function. In this context, we study highlevel mission specifications as co-safe linear temporal logic (LTL) formulae. We provide a method to synthesize a policy that minimizes the weighted sum of the transfer entropy and the probability of failure to satisfy the specification. We derive a set of coupled non-linear equations that an optimal policy must satisfy. We then use a modified Arimoto-Blahut algorithm to solve the non-linear equations. Finally, we demonstrated the proposed method on a navigation and path planning scenario of a Mars rover.
In a variety of applications, an agent’s success depends on the knowledge that an adversarial observer has or can gather about the agent’s decisions. It is therefore desirable for the agent to achieve a task while reducing the ability of an observer to infer the agent’s policy. We consider the task of the agent as a reachability problem in a Markov decision process and study the synthesis of policies that minimize the observer’s ability to infer the transition probabilities of the agent between the states of the Markov decision process. We introduce a metric that is based on the Fisher information as a proxy for the information leaked to the observer and using this metric formulate a problem that minimizes expected total information subject to the reachability constraint. We proceed to solve the problem using convex optimization methods. To verify the proposed method, we analyze the relationship between the expected total information and the estimation error of the observer, and show that, for a particular class of Markov decision processes, these two values are inversely proportional.
The widespread online misinformation could cause public panic and serious economic damages. The misinformation containment problem aims at limiting the spread of misinformation in online social networks by launching competing campaigns. Motivated by realistic scenarios, we present the first analysis of the misinformation containment problem for the case when an arbitrary number of cascades are allowed. This paper makes four contributions. First, we provide a formal model for multi-cascade diffusion and introduce an important concept called as cascade priority. Second, we show that the misinformation containment problem cannot be approximated within a factor of $\Omega(2^{\log^{1-\epsilon}n^4})$ in polynomial time unless $NP \subseteq DTIME(n^{\polylog{n}})$. Third, we introduce several types of cascade priority that are frequently seen in real social networks. Finally, we design novel algorithms for solving the misinformation containment problem. The effectiveness of the proposed algorithm is supported by encouraging experimental results.
Adversarial machine learning in the context of image processing and related applications has received a large amount of attention. However, adversarial machine learning, especially adversarial deep learning, in the context of malware detection has received much less attention despite its apparent importance. In this paper, we present a framework for enhancing the robustness of Deep Neural Networks (DNNs) against adversarial malware samples, dubbed Hashing Transformation Deep Neural Networks} (HashTran-DNN). The core idea is to use hash functions with a certain locality-preserving property to transform samples to enhance the robustness of DNNs in malware classification. The framework further uses a Denoising Auto-Encoder (DAE) regularizer to reconstruct the hash representations of samples, making the resulting DNN classifiers capable of attaining the locality information in the latent space. We experiment with two concrete instantiations of the HashTran-DNN framework to classify Android malware. Experimental results show that four known attacks can render standard DNNs useless in classifying Android malware, that known defenses can at most defend three of the four attacks, and that HashTran-DNN can effectively defend against all of the four attacks.
Sample entropy ($SampEn$) has been accepted as an alternate, and sometimes a replacement, measure to approximate entropy ($ApEn$) for characterizing temporal complexity of time series. However, it still suffers from issues such as inconsistency over short-length signals and its tolerance parameter $r$, susceptibility to signal amplitude changes and insensitivity to self-similarity of time series. We propose modifications to the $ApEn$ and $SampEn$ measures which are defined for 0<$r$<1, are more robust to signal amplitude changes and sensitive to self-similarity property of time series. We modified $ApEn$ and $SampEn$ by redefining the distance function used originally in their definitions. We then evaluated the new entropy measures, called range entropies ($RangeEn$) using different random processes and nonlinear deterministic signals. We further applied the proposed entropies to normal and epileptic electroencephalographic (EEG) signals under different states. Our results suggest that, unlike $ApEn$ and $SampEn$, $RangeEn$ measures are robust to stationary and nonstationary signal amplitude variations and that their trajectories in the tolerance r-plane are constrained between 0 (maximum entropy) and 1 (minimum entropy). We also showed that $RangeEn$ have direct relationships with the Hurst exponent; suggesting that the new definitions are sensitive to self-similarity structures of signals. $RangeEn$ analysis of epileptic EEG data showed distinct behaviours in the $r$-domain for extracranial versus intracranial recordings as well as different states of epileptic EEG data. The constrained trajectory of $RangeEn$ in the r-plane makes them a good candidate for studying complex biological signals such as EEG during seizure and non-seizure states. The Python package used to generate the results shown in this paper is publicly available at: https://…/RangeEn.
Despite its simplicity, bag-of-n-grams sentence representation has been found to excel in some NLP tasks. However, it has not received much attention in recent years and further analysis on its properties is necessary. We propose a framework to investigate the amount and type of information captured in a general-purposed bag-of-n-grams sentence representation. We first use sentence reconstruction as a tool to obtain bag-of-n-grams representation that contains general information of the sentence. We then run prediction tasks (sentence length, word content, phrase content and word order) using the obtained representation to look into the specific type of information captured in the representation. Our analysis demonstrates that bag-of-n-grams representation does contain sentence structure level information. However, incorporating n-grams with higher order n empirically helps little with encoding more information in general, except for phrase content information.
Classification models are often used to make decisions that affect humans: whether to approve a loan application, extend a job offer, or provide insurance. In such applications, individuals should have the ability to change the decision of the model. When a person is denied a loan by a credit scoring model, for example, they should be able to change the input variables of the model in a way that will guarantee approval. Otherwise, this person will be denied the loan so long as the model is deployed, and — more importantly — will lack agency over a decision that affects their livelihood. In this paper, we propose to audit a linear classification model in terms of recourse, which we define as the ability of a person to change the decision of the model through actionable input variables (e.g., income vs. gender, age, or marital status). We present an integer programming toolkit to: (i) measure the feasibility and difficulty of recourse in a target population; and (ii) generate a list of actionable changes for an individual to obtain a desired outcome. We demonstrate how our tools can inform practitioners, policymakers, and consumers by auditing credit scoring models built using real-world datasets. Our results illustrate how recourse can be significantly impacted by common modeling practices, and motivate the need to guarantee recourse as a policy objective for regulation in algorithmic decision-making.
Black box discrete optimization (BBDO) appears in wide range of engineering tasks. Evolutionary or other BBDO approaches have been applied, aiming at automating necessary tuning of system parameters, such as hyper parameter tuning of machine learning based systems when being installed for a specific task. However, automation is often jeopardized by the need of strategy parameter tuning for BBDO algorithms. An expert with the domain knowledge must undergo time-consuming strategy parameter tuning. This paper proposes a parameterless BBDO algorithm based on information geometric optimization, a recent framework for black box optimization using stochastic natural gradient. Inspired by some theoretical implications, we develop an adaptation mechanism for strategy parameters of the stochastic natural gradient method for discrete search domains. The proposed algorithm is evaluated on commonly used test problems. It is further extended to two examples of simultaneous optimization of the hyper parameters and the connection weights of deep learning models, leading to a faster optimization than the existing approaches without any effort of parameter tuning.
R (Version 3.5.1 patched) has two issues with its random sampling functionality. First, it uses a version of the Mersenne Twister known to have a seeding problem, which was corrected by the authors of the Mersenne Twister in 2002. Updated C source code is available at http://…/mt19937ar.c. Second, R generates random integers between $1$ and $m$ by multiplying random floats by $m$, taking the floor, and adding $1$ to the result. Well-known quantization effects in this approach result in a non-uniform distribution on $\{ 1, \ldots, m\}$. The difference, which depends on $m$, can be substantial. Because the sample function in R relies on generating random integers, random sampling in R is biased. There is an easy fix: construct random integers directly from random bits, rather than multiplying a random float by $m$. That is the strategy taken in Python’s numpy.random.randint() function, among others. Example source code in Python is available at https://…/cryptorandom.py (see functions getrandbits() and randbelow_from_randbits()).