Jaccard Index  The Jaccard index, also known as the Jaccard similarity coefficient (originally coined coefficient de communauté by Paul Jaccard), is a statistic used for comparing the similarity and diversity of sample sets. The Jaccard coefficient measures similarity between finite sample sets, and is defined as the size of the intersection divided by the size of the union of the sample sets. Scalable Jaccard similarity using MinHash and Spark 
Jack the Reader (Jack) 
Many Machine Reading and Natural Language Understanding tasks require reading supporting text in order to answer questions. For example, in Question Answering, the supporting text can be newswire or Wikipedia articles; in Natural Language Inference, premises can be seen as the supporting text and hypotheses as questions. Providing a set of useful primitives operating in a single framework of related tasks would allow for expressive modelling, and easier model comparison and replication. To that end, we present Jack the Reader (Jack), a framework for Machine Reading that allows for quick model prototyping by component reuse, evaluation of new models on existing datasets as well as integrating new datasets and applying them on a growing set of implemented baseline models. Jack is currently supporting (but not limited to) three tasks: Question Answering, Natural Language Inference, and Link Prediction. It is developed with the aim of increasing research efficiency and code reuse. 
Jackknife Regression  Jackknife logistic and linear regression for clustering and predictions. Our goal is to produce a regression tool that can be used as a black box, be very robust and parameterfree, and usable and easytointerpret by nonstatisticians. It is part of a bigger project: automating many fundamental data science tasks, to make it easy, scalable and cheap for data consumers, not just for data experts. 
Jackknife Resampling  In statistics, the jackknife is a resampling technique especially useful for variance and bias estimation. The jackknife predates other common resampling methods such as the bootstrap. The jackknife estimator of a parameter is found by systematically leaving out each observation from a dataset and calculating the estimate and then finding the average of these calculations. Given a sample of size N, the jackknife estimate is found by aggregating the estimates of each N – 1 estimate in the sample. The jackknife technique was developed in Quenouille (1949, 1956). Tukey (1958) expanded on the technique and proposed the name “jackknife” since, like a Boy Scout’s jackknife, it is a “rough and ready” tool that can solve a variety of problems even though specific problems may be more efficiently solved with a purposedesigned tool. The jackknife represents a linear approximation of the bootstrap. 
jackknife+  This paper introduces the jackknife+, which is a novel method for constructing predictive confidence intervals. Whereas the jackknife outputs an interval centered at the predicted response of a test point, with the width of the interval determined by the quantiles of leaveoneout residuals, the jackknife+ also uses the leaveoneout predictions at the test point to account for the variability in the fitted regression function. Assuming exchangeable training samples, we prove that this crucial modification permits rigorous coverage guarantees regardless of the distribution of the data points, for any algorithm that treats the training points symmetrically. Such guarantees are not possible for the original jackknife and we demonstrate examples where the coverage rate may actually vanish. Our theoretical and empirical analysis reveals that the jackknife and the jackknife+ intervals achieve nearly exact coverage and have similar lengths whenever the fitting algorithm obeys some form of stability. Further, we extend the jackknife+ to Kfold cross validation and similarly establish rigorous coverage properties. Our methods are related to crossconformal prediction proposed by Vovk [2015] and we discuss connections. 
JacSketch  We develop a new family of variance reduced stochastic gradient descent methods for minimizing the average of a very large number of smooth functions. Our method –JacSketch– is motivated by novel developments in randomized numerical linear algebra, and operates by maintaining a stochastic estimate of a Jacobian matrix composed of the gradients of individual functions. In each iteration, JacSketch efficiently updates the Jacobian matrix by first obtaining a random linear measurement of the true Jacobian through (cheap) sketching, and then projecting the previous estimate onto the solution space of a linear matrix equation whose solutions are consistent with the measurement. The Jacobian estimate is then used to compute a variancereduced unbiased estimator of the gradient. Our strategy is analogous to the way quasiNewton methods maintain an estimate of the Hessian, and hence our method can be seen as a stochastic quasigradient method. We prove that for smooth and strongly convex functions, JacSketch converges linearly with a meaningful rate dictated by a single convergence theorem which applies to general sketches. We also provide a refined convergence theorem which applies to a smaller class of sketches. This enables us to obtain sharper complexity results for variants of JacSketch with importance sampling. By specializing our general approach to specific sketching strategies, JacSketch reduces to the stochastic average gradient (SAGA) method, and several of its existing and many new minibatch, reduced memory, and importance sampling variants. Our rate for SAGA with importance sampling is the current bestknown rate for this method, resolving a conjecture by Schmidt et al (2015). The rates we obtain for minibatch SAGA are also superior to existing rates. 
JamesStein Estimator  The JamesStein estimator is a biased estimator of the mean of Gaussian random vectors. It can be shown that the JamesStein estimator dominates the ‘ordinary’ least squares approach, i.e., it has lower mean squared error on average. It is the bestknown example of Stein’s phenomenon. An earlier version of the estimator was developed by Charles Stein in 1956, and is sometimes referred to as Stein’s estimator. The result was improved by Willard James and Charles Stein in 1961. 
Jamovi  The jamovi project was founded to develop a free and open statistical platform which is intuitive to use, and can provide the latest developments in statistical methodology. At the core of the jamovi philosophy, is that scientific software should be ‘community driven’, where anyone can develop and publish analyses, and make them available to a wide audience. jamovi for R: Easy but Controversial 
JAMUL  Browsing news articles on multiple devices is now possible. The lengths of news article headlines have precise upper bounds, dictated by the size of the display of the relevant device or interface. Therefore, controlling the length of headlines is essential when applying the task of headline generation to news production. However, because there is no corpus of headlines of multiple lengths for a given article, prior researches on controlling output length in headline generation have not discussed whether the evaluation of the setting that uses a single length reference can evaluate multiple length outputs appropriately. In this paper, we introduce two corpora (JNC and JAMUL) to confirm the validity of prior experimental settings and provide for the next step toward the goal of controlling output length in headline generation. The JNC provides common supervision data for headline generation. The JAMUL is a largescale evaluation dataset for headlines of three different lengths composed by professional editors. We report new findings on these corpora; for example, while the longest length reference summary can appropriately evaluate the existing methods controlling output length, the methods do not manage the selection of words according to length constraint. 
JANUS  The rapid evolution of deep neural networks is demanding deep learning (DL) frameworks not only to satisfy the traditional requirement of quickly executing large computations, but also to support straightforward programming models for quickly implementing and experimenting with complex network structures. However, existing frameworks fail to excel in both departments simultaneously, leading to diverged efforts for optimizing performance and improving usability. This paper presents JANUS, a system that combines the advantages from both sides by transparently converting an imperative DL program written in Python, the defacto scripting language for DL, into an efficiently executable symbolic dataflow graph. JANUS can convert various dynamic features of Python, including dynamic control flow, dynamic types, and impure functions, into elements of a symbolic dataflow graph. Experiments demonstrate that JANUS can achieve fast DL training by exploiting the techniques imposed by symbolic graphbased DL frameworks, while maintaining the simple and flexible programmability of imperative DL frameworks at the same time. 
JaroWinker Distance  In computer science and statistics, the JaroWinkler distance (Winkler, 1990) is a measure of similarity between two strings. It is a variant of the Jaro distance metric (Jaro, 1989, 1995), a type of string edit distance, and mainly used in the area of record linkage (duplicate detection). The higher the JaroWinkler distance for two strings is, the more similar the strings are. The JaroWinkler distance metric is designed and best suited for short strings such as person names. The score is normalized such that 0 equates to no similarity and 1 is an exact match. 
JASP  JASP is a free and opensource graphical program for statistical analysis, designed to be easy to use, and familiar to users of SPSS. Additionally, it provides many Bayesian statistical methods. JASP generally produces APA style results tables and plots to ease publication. It promotes open science by integration with the Open Science Framework and reproducibility by integrating the analysis settings into the results. The development of JASP is financially supported by several universities and research funds. JASP offers frequentist inference and Bayesian inference on the same statistical models. Frequentist inference uses pvalues and confidence intervals to control error rates in the limit of infinite perfect replications. Bayesian inference uses credible intervals and Bayes factors to estimate credible parameter values and model evidence given the available data and prior knowledge. A Comparative Review of the JASP Statistical Software 
Java Class Library for Evolutionary Computation (JCLEC) 
JCLEC is a software system for Evolutionary Computation (EC) research, developed in the Java programming language. It provides a highlevel software framework to do any kind of Evolutionary Algorithm (EA), providing support for genetic algorithms (binary, integer and real encoding), genetic programming (Koza’s style, strongly typed, and grammar based) and evolutionary programming. 
Java Data Mining (JDM) 
Java Data Mining (JDM) is a standard Java API for developing data mining applications and tools. JDM defines an object model and Java API for data mining objects and processes. JDM enables applications to integrate data mining technology for developing predictive analytics applications and tools. The JDM 1.0 standard was developed under the Java Community Process as JSR 73. In 2006, the JDM 2.0 specification was being developed under JSR 247, but has been withdrawn in 2011 without standardization. Various data mining functions and techniques like statistical classification and association, regression analysis, data clustering, and attribute importance are covered by the 1.0 release of this standard. 
JAVA PMML (jpmml) 
jpmml, the world’s leading opensource PMML scoring engine to rapidly deploy predictive models into production. 
JavaScript 3D Library (three.js) 
The aim of the project is to create a lightweight 3D library with a very low level of complexity. The library provides <canvas>, <svg>, CSS3D and WebGL renderers. threejs 
JavaScript Object Notation (JSON) 
JSON, or JavaScript Object Notation, is an open standard format that uses humanreadable text to transmit data objects consisting of attributevalue pairs. It is used primarily to transmit data between a server and web application, as an alternative to XML. 
Javelin  In this work, we present a new scalable incomplete LU factorization framework called Javelin to be used as a preconditioner for solving sparse linear systems with iterative methods. Javelin allows for improved parallel factorization on sharedmemory manycore systems, while packaging the coefficient matrix into a format that allows for high performance sparse matrixvector multiplication and sparse triangular solves with minimal overheads. The framework achieves these goals by using a collection of traditional permutations, pointtopoint thread synchronizations, tasking, and segmented prefix scans in a conventional compressed sparse row format. Using these changes, traditional fillin and drop tolerance methods can be used, while still being able to have observed speedups of up to ~42x on 68 Intel Knights Landing cores and ~12 x on 14 Intel Haswell cores. 
Jaya Optimisation Algorithm  An Efficient Multicore Implementation of the Jaya Optimisation Algorithm 
Jazz  Jazz is a lightweight modular data processing framework, including a web server. It provides data persistence and computation capabilities accessible from R and Python and also through a REST API. rjazz 
jblas  jblas is a fast linear algebra library for Java. jblas is based on BLAS and LAPACK, the defacto industry standard for matrix computations, and uses stateoftheart implementations like ATLAS for all its computational routines, making jBLAS very fast. jblas can is essentially a lightwight wrapper around the BLAS and LAPACK routines. These packages have originated in the Fortran community which explains their often archaic API. On the other hand modern implementations are hard to beat performance wise. jblas aims to make this functionality available to Java programmers such that they do not have to worry about writing JNI interfaces and calling conventions of Fortran code. jblas depends on an implementation of the LAPACK and BLAS routines. Currently it is tested with ATLAS (http://mathatlas.sourceforge.net ) and BLAS/LAPACK (http://…/lapack) 
JEDI  With the increasing demand for large amount of labeled data, crowdsourcing has been used in many largescale data mining applications. However, most existing works in crowdsourcing mainly focus on label inference and incentive design. In this paper, we address a different problem of adaptive crowd teaching, which is a subarea of machine teaching in the context of crowdsourcing. Compared with machines, human beings are extremely good at learning a specific target concept (e.g., classifying the images into given categories) and they can also easily transfer the learned concepts into similar learning tasks. Therefore, a more effective way of utilizing crowdsourcing is by supervising the crowd to label in the form of teaching. In order to perform the teaching and expertise estimation simultaneously, we propose an adaptive teaching framework named JEDI to construct the personalized optimal teaching set for the crowdsourcing workers. In JEDI teaching, the teacher assumes that each learner has an exponentially decayed memory. Furthermore, it ensures comprehensiveness in the learning process by carefully balancing teaching diversity and learner’s accurate learning in terms of teaching usefulness. Finally, we validate the effectiveness and efficacy of JEDI teaching in comparison with the stateoftheart techniques on multiple data sets with both synthetic learners and real crowdsourcing workers. 
JeffreysLindley Paradox (JLP) 
Lindley’s paradox is a counterintuitive situation in statistics in which the Bayesian and frequentist approaches to a hypothesis testing problem give different results for certain choices of the prior distribution. The problem of the disagreement between the two approaches was discussed in Harold Jeffreys’ 1939 textbook; it became known as Lindley’s paradox after Dennis Lindley called the disagreement a paradox in a 1957 paper. 
JeffriesMatusita Distance  JeffriesMatusita Distance calculates the separability of a pair of probability distributions. This can be particularly meaningful for evaluating the results of Maximum Likelihood classifications. varSel 
Jensen  This paper introduces Jensen, an easily extensible and scalable toolkit for productionlevel machine learning and convex optimization. Jensen implements a framework of convex (or loss) functions, convex optimization algorithms (including Gradient Descent, LBFGS, Stochastic Gradient Descent, Conjugate Gradient, etc.), and a family of machine learning classifiers and regressors (Logistic Regression, SVMs, Least Square Regression, etc.). This framework makes it possible to deploy and train models with a few lines of code, and also extend and build upon this by integrating new loss functions and optimization algorithms. 
JensenShannon Distance (JSD) 
In probability theory and statistics, the JensenShannon divergence is a popular method of measuring the similarity between two probability distributions. It is also known as information radius (IRad) or total divergence to the average. It is based on the KullbackLeibler divergence, with some notable (and useful) differences, including that it is symmetric and it is always a finite value. The square root of the JensenShannon divergence is a metric often referred to as JensenShannon distance. 
JensenShannon Divergence  In probability theory and statistics, the JensenShannon divergence is a method of measuring the similarity between two probability distributions. It is also known as information radius (IRad) or total divergence to the average. It is based on the KullbackLeibler divergence, with some notable (and useful) differences, including that it is symmetric and it always has a finite value. The square root of the JensenShannon divergence is a metric often referred to as JensenShannon distance. 
Jericho  A learning environment for Interactive Fiction games. 
JGraphT  Mathematical software and graphtheoretical algorithmic packages to efficiently model, analyze and query graphs are crucial in an era where largescale spatial, societal and economic network data are abundantly available. One such package is JGraphT, a programming library which contains very efficient and generic graph datastructures along with a large collection of stateoftheart algorithms. The library is written in Java with stability, interoperability and performance in mind. A distinctive feature of this library is the ability to model vertices and edges as arbitrary objects, thereby permitting natural representations of many common networks including transportation, social and biological networks. Besides classic graph algorithms such as shortestpaths and spanningtree algorithms, the library contains numerous advanced algorithms: graph and subgraph isomorphism; matching and flow problems; approximation algorithms for NPhard problems such as independent set and TSP; and several more exotic algorithms such as Berge graph detection. Due to its versatility and generic design, JGraphT is currently used in largescale commercial, noncommercial and academic research projects. In this work we describe in detail the design and underlying structure of the library, and discuss its most important features and algorithms. A computational study is conducted to evaluate the performance of JGraphT versus a number of similar libraries. Experiments on a large number of graphs over a variety of popular algorithms show that JGraphT is highly competitive with other established libraries such as NetworkX or the BGL. 
JigsawNet  This paper proposes a novel algorithm to reassemble an arbitrarily shredded image to its original status. Existing reassembly pipelines commonly consist of a local matching stage and a global compositions stage. In the local stage, a key challenge in fragment reassembly is to reliably compute and identify correct pairwise matching, for which most existing algorithms use handcrafted features, and hence, cannot reliably handle complicated puzzles. We build a deep convolutional neural network to detect the compatibility of a pairwise stitching, and use it to prune computed pairwise matches. To improve the network efficiency and accuracy, we transfer the calculation of CNN to the stitching region and apply a boost training strategy. In the global composition stage, we modify the commonly adopted greedy edge selection strategies to two new loop closure based searching algorithms. Extensive experiments show that our algorithm significantly outperforms existing methods on solving various puzzles, especially those challenging ones with many fragment pieces. 
jLDADMM  In this technical report, we present jLDADMM—an easytouse Java toolkit for conventional topic models. jLDADMM is released to provide alternatives for topic modeling on normal or short texts. It provides implementations of the Latent Dirichlet Allocation topic model and the onetopicperdocument Dirichlet Multinomial Mixture model (i.e. mixture of unigrams), using collapsed Gibbs sampling. In addition, jLDADMM supplies a document clustering evaluation to compare topic models. jLDADMM is opensource and available to download at: https://…/jLDADMM 
JMP  SAS created JMP in 1989 to empower scientists and engineers to explore data visually. Since then, JMP has grown from a single product into a family of statistical discovery tools, each one tailored to meet specific needs. All of our software is visual, interactive, comprehensive and extensible. 
JNC  Browsing news articles on multiple devices is now possible. The lengths of news article headlines have precise upper bounds, dictated by the size of the display of the relevant device or interface. Therefore, controlling the length of headlines is essential when applying the task of headline generation to news production. However, because there is no corpus of headlines of multiple lengths for a given article, prior researches on controlling output length in headline generation have not discussed whether the evaluation of the setting that uses a single length reference can evaluate multiple length outputs appropriately. In this paper, we introduce two corpora (JNC and JAMUL) to confirm the validity of prior experimental settings and provide for the next step toward the goal of controlling output length in headline generation. The JNC provides common supervision data for headline generation. The JAMUL is a largescale evaluation dataset for headlines of three different lengths composed by professional editors. We report new findings on these corpora; for example, while the longest length reference summary can appropriately evaluate the existing methods controlling output length, the methods do not manage the selection of words according to length constraint. 
Job Safety Analysis (JSA) 
A Job Safety Analysis (JSA) is one of the risk assessment tools used to identify and control workplace hazards. A JSA is a second tier risk assessment with the aim of preventing personal injury to a person, or their colleagues, and any other person passing or working adjacent, above or below. JSAs are also known as Activity Hazard Analysis (AHA), Job Hazard Analysis (JHA) and Task Hazard Analysis (THA). 
Joint Accuracy and LatencyAware Deep Structure Decoupling (JALAD) 
Recent years have witnessed a rapid growth of deepnetwork based services and applications. A practical and critical problem thus has emerged: how to effectively deploy the deep neural network models such that they can be executed efficiently. Conventional cloudbased approaches usually run the deep models in data center servers, causing large latency because a significant amount of data has to be transferred from the edge of network to the data center. In this paper, we propose JALAD, a joint accuracy and latencyaware execution framework, which decouples a deep neural network so that a part of it will run at edge devices and the other part inside the conventional cloud, while only a minimum amount of data has to be transferred between them. Though the idea seems straightforward, we are facing challenges including i) how to find the best partition of a deep structure; ii) how to deploy the component at an edge device that only has limited computation power; and iii) how to minimize the overall execution latency. Our answers to these questions are a set of strategies in JALAD, including 1) A normalization based inlayer data compression strategy by jointly considering compression rate and model accuracy; 2) A latencyaware deep decoupling strategy to minimize the overall execution latency; and 3) An edgecloud structure adaptation strategy that dynamically changes the decoupling for different network conditions. Experiments demonstrate that our solution can significantly reduce the execution latency: it speeds up the overall inference execution with a guaranteed model accuracy loss. 
Joint and Individual Variation Explained (JIVE) 
Research in several fields now requires the analysis of datasets in which multiple highdimensional types of data are available for a common set of objects. In particular, The Cancer Genome Atlas (TCGA) includes data from several diverse genomic technologies on the same cancerous tumor samples. In this paper we introduce Joint and Individual Variation Explained (JIVE), a general decomposition of variation for the integrated analysis of such datasets. The decomposition consists of three terms: a lowrank approximation capturing joint variation across data types, lowrank approximations for structured variation individual to each data type, and residual noise. JIVE quantifies the amount of joint variation between data types, reduces the dimensionality of the data, and provides new directions for the visual exploration of joint and individual structure. The proposed method represents an extension of Principal Component Analysis and has clear advantages over popular twoblock methods such as Canonical Correlation Analysis and Partial Least Squares. A JIVE analysis of gene expression and miRNA data on Glioblastoma Multiforme tumor samples reveals genemiRNA associations and provides better characterization of tumor types. r.jive 
Joint and Progressive Learning strAtegY (JPlay) 
Despite the fact that nonlinear subspace learning techniques (e.g. manifold learning) have successfully applied to data representation, there is still room for improvement in explainability (explicit mapping), generalization (outofsamples), and costeffectiveness (linearization). To this end, a novel linearized subspace learning technique is developed in a joint and progressive way, called \textbf{j}oint and \textbf{p}rogressive \textbf{l}earning str\textbf{a}teg\textbf{y} (JPlay), with its application to multilabel classification. The JPlay learns highlevel and semantically meaningful feature representation from highdimensional data by 1) jointly performing multiple subspace learning and classification to find a latent subspace where samples are expected to be better classified; 2) progressively learning multicoupled projections to linearly approach the optimal mapping bridging the original space with the most discriminative subspace; 3) locally embedding manifold structure in each learnable latent subspace. Extensive experiments are performed to demonstrate the superiority and effectiveness of the proposed method in comparison with previous stateoftheart methods. 
Joint Approximate Diagonalization of Eigenmatrices (JADE) 

Joint Association and Classification Analysis (JACA) 
Multiview data, that is matched sets of measurements on the same subjects, have become increasingly common with technological advances in genomics and other fields. Often, the subjects are separated into known classes, and it is of interest to find associations between the views that are related to the class membership. Existing classification methods can either be applied to each view separately, or to the concatenated matrix of all views without taking into account betweenviews associations. On the other hand, existing association methods can not directly incorporate class information. In this work we propose a framework for Joint Association and Classification Analysis of multiview data (JACA). We support the methodology with theoretical guarantees for estimation consistency in highdimensional settings, and numerical comparisons with existing methods. In addition to joint learning framework, a distinct advantage of our approach is its ability to use partial information: it can be applied both in the settings with missing class labels, and in the settings with missing subsets of views. We apply JACA to colorectal cancer data from The Cancer Genome Atlas project, and quantify the association between RNAseq and miRNA views with respect to consensus molecular subtypes of colorectal cancer. 
Joint Greedy Equivalence Search (jointGES) 
We consider the problem of jointly estimating multiple related directed acyclic graph (DAG) models based on highdimensional data from each graph. This problem is motivated by the task of learning gene regulatory networks based on gene expression data from different tissues, developmental stages or disease states. We prove that under certain regularity conditions, the proposed $\ell_0$penalized maximum likelihood estimator converges in Frobenius norm to the adjacency matrices consistent with the datagenerating distributions and has the correct sparsity. In particular, we show that this joint estimation procedure leads to a faster convergence rate than estimating each DAG model separately. As a corollary we also obtain highdimensional consistency results for causal inference from a mix of observational and interventional data. For practical purposes, we propose jointGES consisting of Greedy Equivalence Search (GES) to estimate the union of all DAG models followed by variable selection using lasso to obtain the different DAGs, and we analyze its consistency guarantees. The proposed method is illustrated through an analysis of simulated data as well as epithelial ovarian cancer gene expression data. 
Joint Influence Model (JIM) 
Previous work has shown that popular trending events are important external factors which pose significant influence on user search behavior and also provided a way to computationally model this influence. However, their problem formulation was based on the strong assumption that each event poses its influence independently. This assumption is unrealistic as there are many correlated events in the real world which influence each other and thus, would pose a joint influence on the user search behavior rather than posing influence independently. In this paper, we study this novel problem of Modeling the Joint Influences posed by multiple correlated events on user search behavior. We propose a Joint Influence Model based on the Multivariate Hawkes Process which captures the interdependency among multiple events in terms of their influence upon user search behavior. We evaluate the proposed Joint Influence Model using two months querylog data from https://…/. Experimental results show that the model can indeed capture the temporal dynamics of the joint influence over time and also achieves superior performance over different baseline methods when applied to solve various interesting prediction problems as well as realword application scenarios, e.g., query autocompletion. 
Joint Matrix Factorization  Nonnegative matrix factorization (NMF) is a powerful tool in data exploratory analysis by discovering the hidden features and partbased patterns from highdimensional data. NMF and its variants have been successfully applied into diverse fields such as pattern recognition, signal processing, data mining, bioinformatics and so on. Recently, NMF has been extended to analyze multiple matrices simultaneously. However, a unified framework is still lacking. In this paper, we introduce a sparse multiple relationship data regularized joint matrix factorization (JMF) framework and two adapted prediction models for pattern recognition and data integration. Next, we present four update algorithms to solve this framework. The merits and demerits of these algorithms are systematically explored. Furthermore, extensive computational experiments using both synthetic data and real data demonstrate the effectiveness of JMF framework and related algorithms on pattern recognition and data mining. 
Joint Maximum Likelihood Estimation (JMLE) 
JMLE ‘Joint Maximum Likelihood Estimation’ is also called UCON, ‘Unconditional maximum likelihood estimation’. It was devised by Wright & Panchapakesan, www.rasch.org/memo46.htm. In this formulation, the estimate of the Rasch parameter (for which the observed data are most likely, assuming those data fit the Rasch model) occurs when the observed raw score for the parameter matches the expected raw score. ‘Joint’ means that the estimates for the persons (rows) and items (columns) and rating scale structures (if any) of the data matrix are obtained simultaneously. ➘ “Rasch Model” 
Joint Neural Architecture Search and Quantization (JASQ) 
Designing neural architectures is a fundamental step in deep learning applications. As a partner technique, model compression on neural networks has been widely investigated to gear the needs that the deep learning algorithms could be run with the limited computation resources on mobile devices. Currently, both the tasks of architecture design and model compression require expertise tricks and tedious trials. In this paper, we integrate these two tasks into one unified framework, which enables the joint architecture search with quantization (compression) policies for neural networks. This method is named JASQ. Here our goal is to automatically find a compact neural network model with high performance that is suitable for mobile devices. Technically, a multiobjective evolutionary search algorithm is introduced to search the models under the balance between model size and performance accuracy. In experiments, we find that our approach outperforms the methods that search only for architectures or only for quantization policies. 1) Specifically, given existing networks, our approach can provide them with learningbased quantization policies, and outperforms their 2 bits, 4 bits, 8 bits, and 16 bits counterparts. It can yield higher accuracies than the float models, for example, over 1.02% higher accuracy on MobileNetv1. 2) What is more, under the balance between model size and performance accuracy, two models are obtained with joint search of architectures and quantization policies: a highaccuracy model and a small model, JASQNet and JASQNetSmall that achieves 2.97% error rate with 0.9 MB on CIFAR10. 
Joint Probability Distribution  In the study of probability, given at least two random variables X, Y, …, that are defined on a probability space, the joint probability distribution for X, Y, … is a probability distribution that gives the probability that each of X, Y, … falls in any particular range or discrete set of values specified for that variable. In the case of only two random variables, this is called a bivariate distribution, but the concept generalizes to any number of random variables, giving a multivariate distribution. 
Joint Pyramid Upsampling (JPU) 
Modern approaches for semantic segmentation usually employ dilated convolutions in the backbone to extract highresolution feature maps, which brings heavy computation complexity and memory footprint. To replace the time and memory consuming dilated convolutions, we propose a novel joint upsampling module named Joint Pyramid Upsampling (JPU) by formulating the task of extracting highresolution feature maps into a joint upsampling problem. With the proposed JPU, our method reduces the computation complexity by more than three times without performance loss. Experiments show that JPU is superior to other upsampling modules, which can be plugged into many existing approaches to reduce computation complexity and improve performance. By replacing dilated convolutions with the proposed JPU module, our method achieves the stateoftheart performance in Pascal Context dataset (mIoU of 53.13%) and ADE20K dataset (final score of 0.5584) while running 3 times faster. 
Joint Random Forest (JRF) 
JRF 
Joint Sequence Fusion (JSFusion) 
We present an approach named JSFusion (Joint Sequence Fusion) that can measure semantic similarity between any pairs of multimodal sequence data (e.g. a video clip and a language sentence). Our multimodal matching network consists of two key components. First, the Joint Semantic Tensor composes a dense pairwise representation of two sequence data into a 3D tensor. Then, the Convolutional Hierarchical Decoder computes their similarity score by discovering hidden hierarchical matches between the two sequence modalities. Both modules leverage hierarchical attention mechanisms that learn to promote wellmatched representation patterns while prune out misaligned ones in a bottomup manner. Although the JSFusion is a universal model to be applicable to any multimodal sequence data, this work focuses on videolanguage tasks including multimodal retrieval and video QA. We evaluate the JSFusion model in three retrieval and VQA tasks in LSMDC, for which our model achieves the best performance reported so far. We also perform multiplechoice and movie retrieval tasks for the MSRVTT dataset, on which our approach outperforms many stateoftheart methods. 
Joint Spectrum  We introduce the notion of joint spectrum of a compact set of matrices $S \subset$ GL$_{d}(\mathbb{C})$, which is a multidimensional generalization of the joint spectral radius. In the irreducible case we describe its properties and examine how it relates to the set of eigenvalues of elements in the semigroup generated by $S$. We also make connections with the theory of random products of matrices. 
JointDNN  Deep neural networks are among the most influential architectures of deep learning algorithms, being deployed in many mobile intelligent applications. Endside services, such as intelligent personal assistants (IPAs), autonomous cars, and smart home services often employ either simple local models or complex remote models on the cloud. Mobileonly and cloudonly computations are currently the status quo approaches. In this paper, we propose an efficient, adaptive, and practical engine, JointDNN, for collaborative computation between a mobile device and cloud for DNNs in both inference and training phase. JointDNN not only provides an energy and performance efficient method of querying DNNs for the mobile side, but also benefits the cloud server by reducing the amount of its workload and communications compared to the cloudonly approach. Given the DNN architecture, we investigate the efficiency of processing some layers on the mobile device and some layers on the cloud server. We provide optimization formulations at layer granularity for forward and backward propagation in DNNs, which can adapt to mobile battery limitations and cloud server load constraints and quality of service. JointDNN achieves up to 18X and 32X reductions on the latency and mobile energy consumption of querying DNNs, respectively. 
JointGAN  A new generative adversarial network is developed for joint distribution matching. Distinct from most existing approaches, that only learn conditional distributions, the proposed model aims to learn a joint distribution of multiple random variables (domains). This is achieved by learning to sample from conditional distributions between the domains, while simultaneously learning to sample from the marginals of each individual domain. The proposed framework consists of multiple generators and a single softmaxbased critic, all jointly trained via adversarial learning. From a simple noise source, the proposed framework allows synthesis of draws from the marginals, conditional draws given observations from a subset of random variables, or complete draws from the full joint distribution. Most examples considered are for joint analysis of two domains, with examples for three domains also presented. 
Jointly Multiple Events Extraction (JMEE) 
Event extraction is of practical utility in natural language processing. In the real world, it is a common phenomenon that multiple events existing in the same sentence, where extracting them are more difficult than extracting a single event. Previous works on modeling the associations between events by sequential modeling methods suffer a lot from the low efficiency in capturing very longrange dependencies. In this paper, we propose a novel Jointly Multiple Events Extraction (JMEE) framework to jointly extract multiple event triggers and arguments by introducing syntactic shortcut arcs to enhance information flow and attentionbased graph convolution networks to model graph information. The experiment results demonstrate that our proposed framework achieves competitive results compared with stateoftheart methods. 
JointPolicy Correlation  To achieve general intelligence, agents must learn how to interact with others in a shared environment: this is the challenge of multiagent reinforcement learning (MARL). The simplest form is independent reinforcement learning (InRL), where each agent treats its experience as part of its (nonstationary) environment. In this paper, we first observe that policies learned using InRL can overfit to the other agents’ policies during training, failing to sufficiently generalize during execution. We introduce a new metric, jointpolicy correlation, to quantify this effect. We describe an algorithm for general MARL, based on approximate best responses to mixtures of policies generated using deep reinforcement learning, and empirical gametheoretic analysis to compute metastrategies for policy selection. The algorithm generalizes previous ones such as InRL, iterated best response, double oracle, and fictitious play. Then, we present a scalable implementation which reduces the memory requirement using decoupled metasolvers. Finally, we demonstrate the generality of the resulting policies in two partially observable settings: gridworld coordination games and poker. 
JointSparse Optimization From Bootstrap Samples (JOBS) 
Classical signal recovery based on $\ell_1$ minimization solves the least squares problem with all available measurements via sparsitypromoting regularization. In practice, it is often the case that not all measurements are available or required for recovery. Measurements might be corrupted/missing or they arrive sequentially in streaming fashion. In this paper, we propose a global sparse recovery strategy based on subsets of measurements, named JOBS, in which multiple measurements vectors are generated from the original pool of measurements via bootstrapping, and then a jointsparse constraint is enforced to ensure support consistency among multiple predictors. The final estimate is obtained by averaging over the $K$ predictors. The performance limits associated with different choices of number of bootstrap samples $L$ and number of estimates $K$ is analyzed theoretically. Simulation results validate some of the theoretical analysis, and show that the proposed method yields stateoftheart recovery performance, outperforming $\ell_1$ minimization and a few other existing bootstrapbased techniques in the challenging case of low levels of measurements and is preferable over other baggingbased methods in the streaming setting since it performs better with small $K$ and $L$ for datasets with large sizes. 
Joyplot  joyplot: a series of histograms, density plots or time series for a number of data segments, all aligned to the same horizontal scale and presented with a slight overlap. 
jQuery  jQuery is a fast, small, and featurerich JavaScript library. It makes things like HTML document traversal and manipulation, event handling, animation, and Ajax much simpler with an easytouse API that works across a multitude of browsers. With a combination of versatility and extensibility, jQuery has changed the way that millions of people write JavaScript. 
JSONstat  JSONstat is a simple lightweight JSON dissemination format best suited for data visualization, mobile apps or open data initiatives, that has been designed for all kinds of disseminators. JSONstat also proposes an HTML microdata schema to enrich HTML tables and put the JSONstat vocabulary in the browser. Fortunately, there are already tools that ease the use of JSONstat, like the JSONstat Javascript Toolkit, a library to process JSONstat responses. 
Jubatus  Jubatus is a distributed processing framework and streaming machine learning library. Jubatus includes these functionalities: · Online Machine Learning Library: Classification, Regression, Recommendation (Nearest Neighbor Search), Graph Mining, Anomaly Detection, Clustering · Feature Vector Converter (fv_converter): Data Preprocess and Feature Extraction · Framework for Distributed Online Machine Learning with Fault Tolerance 
JuliaReach  We present JuliaReach, a toolbox for setbased reachability analysis of dynamical systems. JuliaReach consists of two main packages: Reachability, containing implementations of reachability algorithms for continuous and hybrid systems, and LazySets, a standalone library that implements stateoftheart algorithms for calculus with convex sets. The library offers both concrete and lazy set representations, where the latter stands for the ability to delay set computations until they are needed. The choice of the programming language Julia and the accompanying documentation of our toolbox allow researchers to easily translate setbased algorithms from mathematics to software in a platformindependent way, while achieving runtime performance that is comparable to statically compiled languages. Combining lazy operations in high dimensions and explicit computations in low dimensions, JuliaReach can be applied to solve complex, largescale problems. 
Jump Process  A jump process is a type of stochastic process that has discrete movements, called jumps, with random arrival times, rather than continuous movement, typically modelled as a simple or compound Poisson process. In finance, various stochastic models are used to model the price movements of financial instruments; for example the BlackScholes model for pricing options assumes that the underlying instrument follows a traditional diffusion process, with continuous, random movements at all scales, no matter how small. John Carrington Cox and Stephen Ross:145166 proposed that prices actually follow a ‘jump process’. Robert C. Merton extended this approach to a hybrid model known as jump diffusion, which states that the prices have large jumps interspersed with small continuous movements. 
JUMPER  In early years, text classification is typically accomplished by featurebased machine learning models; recently, deep neural networks, as a powerful learning machine, make it possible to work with raw input as the text stands. However, exiting endtoend neural networks lack explicit interpretation of the prediction. In this paper, we propose a novel framework, JUMPER, inspired by the cognitive process of text reading, that models text classification as a sequential decision process. Basically, JUMPER is a neural system that scans a piece of text sequentially and makes classification decisions at the time it wishes. Both the classification result and when to make the classification are part of the decision process, which is controlled by a policy network and trained with reinforcement learning. Experimental results show that a properly trained JUMPER has the following properties: (1) It can make decisions whenever the evidence is enough, therefore reducing total text reading by 3040% and often finding the key rationale of prediction. (2) It achieves classification accuracy better than or comparable to stateoftheart models in several benchmark and industrial datasets. 
Jumping Knowledge Network (JKN) 
Recent deep learning approaches for representation learning on graphs follow a neighborhood aggregation procedure. We analyze some important properties of these models, and propose a strategy to overcome those. In particular, the range of ‘neighboring’ nodes that a node’s representation draws from strongly depends on the graph structure, analogous to the spread of a random walk. To adapt to local neighborhood properties and tasks, we explore an architecture — jumping knowledge (JK) networks — that flexibly leverages, for each node, different neighborhood ranges to enable better structureaware representation. In a number of experiments on social, bioinformatics and citation networks, we demonstrate that our model achieves stateoftheart performance. Furthermore, combining the JK framework with models like Graph Convolutional Networks, GraphSAGE and Graph Attention Networks consistently improves those models’ performance. 
JumpReLU  It has been demonstrated that very simple attacks can fool highlysophisticated neural network architectures. In particular, socalled adversarial examples, constructed from perturbations of input data that are small or imperceptible to humans but lead to different predictions, may lead to an enormous risk in certain critical applications. In light of this, there has been a great deal of work on developing adversarial training strategies to improve model robustness. These training strategies are very expensive, in both human and computational time. To complement these approaches, we propose a very simple and inexpensive strategy which can be used to “retrofit” a previouslytrained network to improve its resilience to adversarial attacks. More concretely, we propose a new activation function—the JumpReLU—which, when used in place of a ReLU in an alreadytrained model, leads to a tradeoff between predictive accuracy and robustness. This tradeoff is controlled by the jump size, a hyperparameter which can be tuned during the validation stage. Our empirical results demonstrate that this increases model robustness, protecting against adversarial attacks with substantially increased levels of perturbations. This is accomplished simply by retrofitting existing networks with our JumpReLU activation function, without the need for retraining the model. Additionally, we demonstrate that adversarially trained (robust) models can greatly benefit from retrofitting. 
Juniper  Nonconvex mixedinteger nonlinear programs (MINLPs) represent a challenging class of optimization problems that often arise in engineering and scientific applications. Because of nonconvexities, these programs are typically solved with global optimization algorithms, which have limited scalability. However, nonlinear branchandbound has recently been shown to be an effective heuristic for quickly finding highquality solutions to largescale nonconvex MINLPs, such as those arising in infrastructure network optimization. This work proposes Juniper, a Juliabased opensource solver for nonlinear branchandbound. Leveraging the highlevel Julia programming language makes it easy to modify Juniper’s algorithm and explore extensions, such as branching heuristics, feasibility pumps, and parallelization. Detailed numerical experiments demonstrate that the initial release of Juniper is comparable with other nonlinear branchandbound solvers, such as Bonmin, Minotaur, and Knitro, illustrating that Juniper provides a strong foundation for further exploration in utilizing nonlinear branchandbound algorithms as heuristics for nonconvex MINLPs. 
Jupyter  The Jupyter Notebook is a web application for interactive data science and scientific computing. It allows users to author documents that combine livecode with narrative text, equations, images, video and visualizations. These documents encode a complete and reproducible record of a computation that can be shared with others on GitHub, Dropbox and the Jupyter Notebook Viewer. 
Jupytext  You’ve always wanted to • edit Jupyter notebooks as e.g. plain Python scripts in your favorite editor? • do version control of Jupyter notebooks with clear and meaningful diffs? • collaborate on Jupyter notebooks using standard (text oriented) merge tools? Jupytext can convert notebooks to and from • Julia, Python and R scripts (extensions .jl, .py and .R), • Markdown documents (extension .md), • R Markdown documents (extension .Rmd). 
Just Another Gibbs Sampler (JAGS) 
Just another Gibbs sampler (JAGS) is a program for simulation from Bayesian hierarchical models using Markov chain Monte Carlo (MCMC), developed by Martyn Plummer. JAGS has been employed for statistical work in many fields, for example ecology, management, and genetics. JAGS aims for compatibility with WinBUGS/OpenBUGS through the use of a dialect of the same modeling language (informally, BUGS), but it provides no GUI for model building and MCMC sample postprocessing, which must therefore be treated in a separate program (for example calling JAGS from R through a library such as rjags and postprocessing MCMC output in R). The main advantage of JAGS in comparison to the members of the original BUGS family (WinBUGS and OpenBUGS) is its platform independence. It is written in C++, while the BUGS family is written in Component Pascal, a less widely known programming language. In addition, JAGS is already part of many repositories of Linux distributions such as Ubuntu. It can also be compiled as a 64bit application on 64bit platforms, thus making all the addressable space available to BUGS models. JAGS can be used via the command line or run in batch mode through script files. This means that there is no need to redo the settings with every run and that the program can be called and controlled from within another program (e.g. from R via rjags as outlined above). JAGS is licensed under the GNU General Public License. 