C Math Library (CML) 
➘ “C Numerical Library” 
C Numerical Library (CNL) 
The IMSL C Numerical Library provides advanced mathematical and statistical functionality for programmers to embed in their existing or new applications. Written in standard C, the IMSL C Library can be embedded into any C or C++ application as well as any existing application that can reference a C library. 
C++ Based Probabilistic Programming Library (CPProb) 
We consider the problem of Bayesian inference in the family of probabilistic models implicitly defined by stochastic generative models of data. In scientific fields ranging from population biology to cosmology, lowlevel mechanistic components are composed to create complex generative models. These models lead to intractable likelihoods and are typically nondifferentiable, which poses challenges for traditional approaches to inference. We extend previous work in ‘inference compilation’, which combines universal probabilistic programming and deep learning methods, to largescale scientific simulators, and introduce a C++ based probabilistic programming library called CPProb. We successfully use CPProb to interface with SHERPA, a large codebase used in particle physics. Here we describe the technical innovations realized and planned for this library. 
C4.5  C4.5 is an algorithm used to generate a decision tree developed by Ross Quinlan. C4.5 is an extension of Quinlan’s earlier ID3 algorithm. The decision trees generated by C4.5 can be used for classification, and for this reason, C4.5 is often referred to as a statistical classifier. 
Cabinet Tree  Treemaps are wellknown for visualizing hierarchical data. Most related approaches have been focused on layout algorithms and paid little attention to other display properties and interactions. Furthermore, the structural information in conventional Treemaps is too implicit for viewers to perceive. This paper presents Cabinet Tree, an approach that: i) draws branches explicitly to show relational structures, ii) adapts a spaceoptimized layout for leaves and maximizes the space utilization, iii) uses coloring and labeling strategies to clearly reveal patterns and contrast different attributes intuitively. We also apply the continuous node selection and detail window techniques to support user interaction with different levels of the hierarchies. Our quantitative evaluations demonstrate that Cabinet Tree achieves good scalability for increased resolutions and big datasets. 
CACE Principle (CACE) 
Machine learning systems mix signals together, entangling them and making isolation of improvements impossible. For instance, consider a system that uses features x1, …xn in a model. If we change the input distribution of values in x1, the importance, weights, or use of the remaining n – 1 features may all change. This is true whether the model is retrained fully in a batch style or allowed to adapt in an online fashion. Adding a new feature xn+1 can cause similar changes, as can removing any feature xj . No inputs are ever really independent. We refer to this here as the CACE principle: Changing Anything Changes Everything. CACE applies not only to input signals, but also to hyperparameters, learning settings, sampling methods, convergence thresholds, data selection, and essentially every other possible tweak. 
Cache Telepathy  Deep Neural Networks (DNNs) are fast becoming ubiquitous for their ability to attain good accuracy in various machine learning tasks. A DNN’s architecture (i.e., its hyperparameters) broadly determines the DNN’s accuracy and performance, and is often confidential. Attacking a DNN in the cloud to obtain its architecture can potentially provide major commercial value. Further, attaining a DNN’s architecture facilitates other, existing DNN attacks. This paper presents Cache Telepathy: a fast and accurate mechanism to steal a DNN’s architecture using the cache side channel. Our attack is based on the insight that DNN inference relies heavily on tiled GEMM (Generalized Matrix Multiply), and that DNN architecture parameters determine the number of GEMM calls and the dimensions of the matrices used in the GEMM functions. Such information can be leaked through the cache side channel. This paper uses Prime+Probe and Flush+Reload to attack VGG and ResNet DNNs running OpenBLAS and Intel MKL libraries. Our attack is effective in helping obtain the architectures by very substantially reducing the search space of target DNN architectures. For example, for VGG using OpenBLAS, it reduces the search space from more than $10^{35}$ architectures to just 16. 
CacheDiff  We present a sampling method called, CacheDiff, that has both time and space complexity of O(k) to randomly select k items from a pool of N items, in which N is known. 
CactusNet  Deep neural networks trained over large datasets learn features that are both generic to the whole dataset, and specific to individual classes in the dataset. Learned features tend towards generic in the lower layers and specific in the higher layers of a network. Methods like finetuning are made possible because of the ability for one filter to apply to multiple target classes. Much like the human brain this behavior, can also be used to cluster and separate classes. However, to the best of our knowledge there is no metric for how applicable learned features are to specific classes. In this paper we propose a definition and metric for measuring the applicability of learned features to individual classes, and use this applicability metric to estimate input applicability and produce a new method of unsupervised learning we call the CactusNet. 
CADDeLaG  Random walk based distance measures for graphs such as commutetime distance are useful in a variety of graph algorithms, such as clustering, anomaly detection, and creating low dimensional embeddings. Since such measures hinge on the spectral decomposition of the graph, the computation becomes a bottleneck for large graphs and do not scale easily to graphs that cannot be loaded in memory. Most existing graph mining libraries for large graphs either resort to sampling or exploit the sparsity structure of such graphs for spectral analysis. However, such methods do not work for dense graphs constructed for studying pairwise relationships among entities in a data set. Examples of such studies include analyzing pairwise locations in gridded climate data for discovering long distance climate phenomena. These graphs representations are fully connected by construction and cannot be sparsified without loss of meaningful information. In this paper we describe CADDeLaG, a framework for scalable computation of commutetime distance based anomaly detection in large dense graphs without the need to load the entire graph in memory. The framework relies on Apache Spark’s memorycentric clustercomputing infrastructure and consists of two building blocks: a decomposable algorithm for commute time distance computation and a distributed linear system solver. We illustrate the scalability of CADDeLaG and its dependency on various factors using both synthetic and real world data sets. We demonstrate the usefulness of CADDeLaG in identifying anomalies in a climate graph sequence, that have been historically missed due to ad hoc graph sparsification and on an election donation data set. 
Caffe  Caffe is a deep learning framework made with expression, speed, and modularity in mind. It is developed by the Berkeley Vision and Learning Center (BVLC) and by community contributors. Yangqing Jia created the project during his PhD at UC Berkeley. Caffe is released under the BSD 2Clause license. http://…/neuralnetworkswithcaffeonthegpu Github 
Cakewalk Sampling  Combinatorial optimization is a common theme in computer science which underlies a considerable variety of problems. In contrast to the continuous setting, combinatorial problems require special solution strategies, and it’s hard to come by generic schemes like gradient methods for continuous domains. We follow a standard construction of a parametric sampling distribution that transforms the problem to the continuous domain, allowing us to optimize the expectation of a given objective using estimates of the gradient. In spite of the apparent generality, such constructions are known to suffer from highly variable gradient estimates, and thus require careful tuning that is done in a problem specific manner. We show that a simple trick of converting the objective values to their cumulative probabilities fixes the distribution of the objective, allowing us to derive an online optimization algorithm that can be applied in a generic fashion. As an experimental benchmark we use the task of finding cliques in undirected graphs, and we show that our method, even when blindly applied, consistently outperforms related methods. Notably, on the DIMACS clique benchmark, our method approaches the performance of the best clique finding algorithms without access to the graph structure, and only through objective function evaluations, thus providing significant evidence to the generality and effectivity of our method. 
Calamari  Optical Character Recognition (OCR) on contemporary and historical data is still in the focus of many researchers. Especially historical prints require book specific trained OCR models to achieve applicable results (Springmann and L\’udeling, 2016, Reul et al., 2017a). To reduce the human effort for manually annotating ground truth (GT) various techniques such as voting and pretraining have shown to be very efficient (Reul et al., 2018a, Reul et al., 2018b). Calamari is a new open source OCR line recognition software that both uses stateofthe art Deep Neural Networks (DNNs) implemented in Tensorflow and giving native support for techniques such as pretraining and voting. The customizable network architectures constructed of Convolutional Neural Networks (CNNS) and LongShortTermMemory (LSTM) layers are trained by the socalled Connectionist Temporal Classification (CTC) algorithm of Graves et al. (2006). Optional usage of a GPU drastically reduces the computation times for both training and prediction. We use two different datasets to compare the performance of Calamari to OCRopy, OCRopus3, and Tesseract 4. Calamari reaches a Character Error Rate (CER) of 0.11% on the UW3 dataset written in modern English and 0.18% on the DTA19 dataset written in German Fraktur, which considerably outperforms the results of the existing softwares. 
Calibrated BoostingForest  Excellent ranking power along with well calibrated probability estimates are needed in many classification tasks. In this paper, we introduce a technique, Calibrated BoostingForest that captures both. This novel technique is an ensemble of gradient boosting machines that can support both continuous and binary labels. While offering superior ranking power over any individual regression or classification model, Calibrated BoostingForest is able to preserve well calibrated posterior probabilities. Along with these benefits, we provide an alternative to the tedious step of tuning gradient boosting machines. We demonstrate that tuning Calibrated BoostingForests can be reduced to a simple hyperparameter selection. We further establish that increasing this hyperparameter improves the ranking performance under a diminishing return. We examine the effectiveness of Calibrated BoostingForest on ligandbased virtual screening where both continuous and binary labels are available and compare the performance of Calibrated BoostingForest with logistic regression, gradient boosting machine and deep learning. Calibrated BoostingForest achieved an approximately 4% improvement compared to a stateofart deep learning model and has the potential to achieve an 8% improvement after tuning the single hyperparameter. Moreover, it achieved around 98% improvement on probability quality measurement compared to the best individual gradient boosting machine. Calibrated BoostingForest offers a benchmark demonstration that in the field of ligandbased virtual screening, deep learning is not the universally dominant machine learning model and good calibrated probabilities can better facilitate virtual screening process. 
CaliCo  In this article, we present a recently released R package for Bayesian calibration. Many industrial fields are facing unfeasible or costly field experiments. These experiments are replaced with numerical/computer experiments which are realized by running a numerical code. Bayesian calibration intends to estimate, through a posterior distribution, input parameters of the code in order to make the code outputs close to the available experimental data. The code can be time consuming while the Bayesian calibration implies a lot of code calls which makes studies too burdensome. A discrepancy might also appear between the numerical code and the physical system when facing incompatibility between experimental data and numerical code outputs. The package CaliCo deals with these issues through four statistical models which deal with a time consuming code or not and with discrepancy or not. A guideline for users is provided in order to illustrate the main functions and their arguments. Eventually, a toy example is detailed using CaliCo. This example (based on a real physical system) is in five dimensions and uses simulated data. 
Canberra Distance  The Canberra distance is a numerical measure of the distance between pairs of points in a vector space, introduced in 1966 and refined in 1967 by G. N. Lance and W. T. Williams. It is a weighted version of L1 (Manhattan) distance. The Canberra distance has been used as a metric for comparing ranked lists and for intrusion detection in computer security. 
CannistraiAlanisRavai Index (CAR) 
Predicting missing links in incomplete complex networks efficiently and accurately is still a challenging problem. The recently proposed CAR (CannistraiAlanisRavai) index shows the power of local link/triangle information in improving linkprediction accuracy. 
Canonical Correlated AutoEncoder (C2AE) 
Multilabel classification is a practical yet challenging task in machine learning related fields, since it requires the prediction of more than one label category for each input instance. We propose a novel deep neural networks (DNN) based model, Canonical Correlated AutoEncoder (C2AE), for solving this task. Aiming at better relating feature and label domain data for improved classification, we uniquely perform joint feature and label embedding by deriving a deep latent space, followed by the introduction of labelcorrelation sensitive loss function for recovering the predicted label outputs. Our C2AE is achieved by integrating the DNN architectures of canonical correlation analysis and autoencoder, which allows endtoend learning and prediction with the ability to exploit label dependency. Moreover, our C2AE can be easily extended to address the learning problem with missing labels. Our experiments on multiple datasets with different scales confirm the effectiveness and robustness of our proposed method, which is shown to perform favorably against stateoftheart methods for multilabel classification. 
Canonical Correlation Analysis (CCA,CANCOR) 
Canonical correlation analysis is a family of multivariate statistical methods for the analysis of paired sets of variables. Since its proposition, canonical correlation analysis has for instance been extended to extract relations between two sets of variables when the sample size is insufficient in relation to the data dimensionality, when the relations have been considered to be nonlinear, and when the dimensionality is too large for human interpretation. This tutorial explains the theory of canonical correlation analysis including its regularised, kernel, and sparse variants. Additionally, the deep and Bayesian CCA extensions are briefly reviewed. Together with the numerical examples, this overview provides a coherent compendium on the applicability of the variants of canonical correlation analysis. By bringing together techniques for solving the optimisation problems, evaluating the statistical significance and generalisability of the canonical correlation model, and interpreting the relations, we hope that this article can serve as a handson tool for applying canonical correlation methods in data analysis. QRFCCA 
Canonical Correspondence Analysis (CCA) 
In applied statistics, canonical correspondence analysis (CCA) is a multivariate constrained ordination technique that extracts major gradients among combinations of explanatory variables in a dataset. The requirements of a CCA are that the samples are random and independent and that the independent variables are consistent within the sample site and errorfree. 
Canonical Divergence Analysis (CDA) 
We aim to analyze the relation between two random vectors that may potentially have both different number of attributes as well as realizations, and which may even not have a joint distribution. This problem arises in many practical domains, including biology and architecture. Existing techniques assume the vectors to have the same domain or to be jointly distributed, and hence are not applicable. To address this, we propose Canonical Divergence Analysis (CDA). 
Canonical Tensor Decomposition (CP) 

Canonical Variate Regression (CVR) 
CVR 
Canopy Clustering Algorithm  The canopy clustering algorithm is an unsupervised preclustering algorithm introduced by Andrew McCallum, Kamal Nigam and Lyle Ungar in 2000. It is often used as preprocessing step for the Kmeans algorithm or the Hierarchical clustering algorithm. It is intended to speed up clustering operations on large data sets, where using another algorithm directly may be impractical due to the size of the data set. The algorithm proceeds as follows, using two thresholds T_1 (the loose distance) and T_2 (the tight distance), where T_1 > T_2 . 1. Begin with the set of data points to be clustered. 2. Remove a point from the set, beginning a new ‘canopy’. 3. For each point left in the set, assign it to the new canopy if the distance less than the loose distance T_1. 4. If the distance of the point is additionally less than the tight distance T_2, remove it from the original set. 5. Repeat from step 2 until there are no more data points in the set to cluster. 6. These relatively cheaply clustered canopies can be subclustered using a more expensive but accurate algorithm. An important note is that individual data points may be part of several canopies. As an additional speedup, an approximate and fast distance metric can be used for 3, where a more accurate and slow distance metric can be used for step 4. Since the algorithm uses distance functions and requires the specification of distance thresholds, its applicability for highdimensional data is limited by the curse of dimensionality. Only when a cheap and approximative – lowdimensional – distance function is available, the produced canopies will preserve the clusters produced by Kmeans. 
CaosDB  Here we present CaosDB, a Research Data Management System (RDMS) designed to ensure seamless integration of inhomogeneous data sources and repositories of legacy data. Its primary purpose is the management of data from biomedical sciences, both from simulations and experiments during the complete research data lifecycle. An RDMS for this domain faces particular challenges: Research data arise in huge amounts, from a wide variety of sources, and traverse a highly branched path of further processing. To be accepted by its users, an RDMS must be built around workflows of the scientists and practices and thus support changes in workflow and data structure. Nevertheless it should encourage and support the development and observation of standards and furthermore facilitate the automation of data acquisition and processing with specialized software. The storage data model of an RDMS must reflect these complexities with appropriate semantics and ontologies while offering simple methods for finding, retrieving, and understanding relevant data. We show how CaosDB responds to these challenges and give an overview of the CaosDB Server, its data model and its easytolearn CaosDB Query Language. We briefly discuss the status of the implementation, how we currently use CaosDB, and how we plan to use and extend it. 
CapsE  In this paper, we introduce an embedding model, named CapsE, exploring a capsule network to model relationship triples \textit{(subject, relation, object)}. Our CapsE represents each triple as a 3column matrix where each column vector represents the embedding of an element in the triple. This 3column matrix is then fed to a convolution layer where multiple filters are operated to generate different feature maps. These feature maps are used to construct capsules in the first capsule layer. Capsule layers are connected via dynamic routing mechanism. The last capsule layer consists of only one capsule to produce a vector output. The length of this vector output is used to measure the plausibility of the triple. Our proposed CapsE obtains stateoftheart link prediction results for knowledge graph completion on two benchmark datasets: WN18RR and FB15k237, and outperforms strong search personalization baselines on SEARCH17 dataset. 
Capsule Network (CapsNet) 
A capsule is a group of neurons whose activity vector represents the instantiation parameters of a specific type of entity such as an object or an object part. We use the length of the activity vector to represent the probability that the entity exists and its orientation to represent the instantiation parameters. Active capsules at one level make predictions, via transformation matrices, for the instantiation parameters of higherlevel capsules. When multiple predictions agree, a higher level capsule becomes active. We show that a discrimininatively trained, multilayer capsule system achieves stateoftheart performance on MNIST and is considerably better than a convolutional net at recognizing highly overlapping digits. To achieve these results we use an iterative routingbyagreement mechanism: A lowerlevel capsule prefers to send its output to higher level capsules whose activity vectors have a big scalar product with the prediction coming from the lowerlevel capsule. Text classification using capsules What is a CapsNet or Capsule Network? 
Capsule Projection Network (CapProNet) 
In this paper, we formalize the idea behind capsule nets of using a capsule vector rather than a neuron activation to predict the label of samples. To this end, we propose to learn a group of capsule subspaces onto which an input feature vector is projected. Then the lengths of resultant capsules are used to score the probability of belonging to different classes. We train such a Capsule Projection Network (CapProNet) by learning an orthogonal projection matrix for each capsule subspace, and show that each capsule subspace is updated until it contains input feature vectors corresponding to the associated class. Only a small negligible computing overhead is incurred to train the network in lowdimensional capsule subspaces or through an alternative hyperpower iteration to estimate the normalization matrix. Experiment results on image datasets show the presented model can greatly improve the performance of stateoftheart ResNet backbones by $1020\%$ at the same level of computing and memory costs. 
CAPTheorem (Brewer’s theorem) 
In theoretical computer science, the CAP theorem, also known as Brewer’s theorem, states that it is impossible for a distributed computer system to simultaneously provide all three of the following guarantees: · Consistency (all nodes see the same data at the same time) · Availability (a guarantee that every request receives a response about whether it was successful or failed) · Partition tolerance (the system continues to operate despite arbitrary message loss or failure of part of the system) 
CaptureMarkRecapture Analysis  Mark and recapture is a method commonly used in ecology to estimate an animal population’s size. A portion of the population is captured, marked, and released. Later, another portion is captured and the number of marked individuals within the sample is counted. Since the number of marked individuals within the second sample should be proportional to the number of marked individuals in the whole population, an estimate of the total population size can be obtained by dividing the number of marked individuals by the proportion of marked individuals in the second sample. The method is most useful when it is not practical to count all the individuals in the population. Other names for this method, or closely related methods, include capturerecapture, capturemarkrecapture, markrecapture, sightresight, markreleaserecapture, multiple systems estimation, band recovery, the Petersen method and the Lincoln method. Another major application for these methods is in epidemiology, where they are used to estimate the completeness of ascertainment of disease registers. Typical applications include estimating the number of people needing particular services (i.e. services for children with learning disabilities, services for medically frail elderly living in the community), or with particular conditions(i.e. illegal drug addicts, people infected with HIV, etc.). 
Cartogram  A cartogram is a map in which some thematic mapping variable – such as travel time, population, or Gross National Product – is substituted for land area or distance. The geometry or space of the map is distorted in order to convey the information of this alternate variable. There are two main types of cartograms: area and distance cartograms. Cartograms have a fairly long history, with examples from the mid1800s. 
Cascade Attribute Learning Network (CALNet) 
We propose the cascade attribute learning network (CALNet), which can learn attributes in a control task separately and assemble them together. Our contribution is twofold: first we propose attribute learning in reinforcement learning (RL). Attributes used to be modeled using constraint functions or terms in the objective function, making it hard to transfer. Attribute learning, on the other hand, models these task properties as modules in the policy network. We also propose using novel cascading compensative networks in the CALNet to learn and assemble attributes. Using the CALNet, one can zero shoot an unseen task by separately learning all its attributes, and assembling the attribute modules. We have validated the capacity of our model on a wide variety of control problems with attributes in time, position, velocity and acceleration phases. 
Cascade Clustering and Reference Point Incremental Learning Based Interactive Algorithm (CLIA) 
Researches have shown difficulties in obtaining proximity while maintaining diversity for solving manyobjective optimization problems (MaOPs). The complexities of the true Pareto Front (PF) also pose serious challenges for the pervasive algorithms for their insufficient ability to adapt to the characteristics of the true PF with no priori. This paper proposes a cascade Clustering and reference point incremental Learning based Interactive Algorithm (CLIA) for manyobjective optimization. In the cascade clustering process, using reference lines provided by the learning process, individuals are clustered and intraclassly sorted in a bilevel cascade style for better proximity and diversity. In the reference point incremental learning process, using the feedbacks from the clustering process, the proper generation of reference points is gradually obtained by incremental learning and the reference lines are accordingly repositioned. The advantages of the proposed interactive algorithm CLIA lie not only in the proximity obtainment and diversity maintenance but also in the versatility for the diverse PFs which uses only the interactions between the two processes without incurring extra evaluations. The experimental studies on the CEC’2018 MaOP benchmark functions have shown that the proposed algorithm CLIA has satisfactory covering of the true PFs, and is competitive, stable and efficient compared with the stateoftheart algorithms. 
Cascade RCNN  In object detection, an intersection over union (IoU) threshold is required to define positives and negatives. An object detector, trained with low IoU threshold, e.g. 0.5, usually produces noisy detections. However, detection performance tends to degrade with increasing the IoU thresholds. Two main factors are responsible for this: 1) overfitting during training, due to exponentially vanishing positive samples, and 2) inferencetime mismatch between the IoUs for which the detector is optimal and those of the input hypotheses. A multistage object detection architecture, the Cascade RCNN, is proposed to address these problems. It consists of a sequence of detectors trained with increasing IoU thresholds, to be sequentially more selective against close false positives. The detectors are trained stage by stage, leveraging the observation that the output of a detector is a good distribution for training the next higher quality detector. The resampling of progressively improved hypotheses guarantees that all detectors have a positive set of examples of equivalent size, reducing the overfitting problem. The same cascade procedure is applied at inference, enabling a closer match between the hypotheses and the detector quality of each stage. A simple implementation of the Cascade RCNN is shown to surpass all singlemodel object detectors on the challenging COCO dataset. Experiments also show that the Cascade RCNN is widely applicable across detector architectures, achieving consistent gains independently of the baseline detector strength. The code will be made available at https://…/cascadercnn. 
Cascade Residual Learning  Leveraging on the recent developments in convolutional neural networks (CNNs), matching dense correspondence from a stereo pair has been cast as a learning problem, with performance exceeding traditional approaches. However, it remains challenging to generate highquality disparities for the inherently illposed regions. To tackle this problem, we propose a novel cascade CNN architecture composing of two stages. The first stage advances the recently proposed DispNet by equipping it with extra upconvolution modules, leading to disparity images with more details. The second stage explicitly rectifies the disparity initialized by the first stage; it couples with the firststage and generates residual signals across multiple scales. The summation of the outputs from the two stages gives the final disparity. As opposed to directly learning the disparity at the second stage, we show that residual learning provides more effective refinement. Moreover, it also benefits the training of the overall cascade network. Experimentation shows that our cascade residual learning scheme provides stateoftheart performance for matching stereo correspondence. By the time of the submission of this paper, our method ranks first in the KITTI 2015 stereo benchmark, surpassing the prior works by a noteworthy margin. 
CascadeCNN  This work presents CascadeCNN, an automated toolflow that pushes the quantisation limits of any given CNN model, aiming to perform highthroughput inference. A twostage architecture tailored for any given CNNFPGA pair is generated, consisting of a low and highprecision unit in a cascade. A confidence evaluation unit is employed to identify misclassified cases from the excessively lowprecision unit and forward them to the highprecision unit for reprocessing. Experiments demonstrate that the proposed toolflow can achieve a performance boost up to 55% for VGG16 and 48% for AlexNet over the baseline design for the same resource budget and accuracy, without the need of retraining the model or accessing the training data. 
Cascaded MultiScale Cross Network  The deep convolutional neural networks have achieved significant improvements in accuracy and speed for single image superresolution. However, as the depth of network grows, the information flow is weakened and the training becomes harder and harder. On the other hand, most of the models adopt a singlestream structure with which integrating complementary contextual information under different receptive fields is difficult. To improve information flow and to capture sufficient knowledge for reconstructing the highfrequency details, we propose a cascaded multiscale cross network (CMSC) in which a sequence of subnetworks is cascaded to infer high resolution features in a coarsetofine manner. In each cascaded subnetwork, we stack multiple multiscale cross (MSC) modules to fuse complementary multiscale information in an efficient way as well as to improve information flow across the layers. Meanwhile, by introducing residualfeatures learning in each stage, the relative information between highresolution and lowresolution features is fully utilized to further boost reconstruction performance. We train the proposed network with cascadedsupervision and then assemble the intermediate predictions of the cascade to achieve high quality image reconstruction. Extensive quantitative and qualitative evaluations on benchmark datasets illustrate the superiority of our proposed method over stateoftheart superresolution methods. 
CaseBased Reasoning (CBR) 
Casebased reasoning (CBR), broadly construed, is the process of solving new problems based on the solutions of similar past problems. An auto mechanic who fixes an engine by recalling another car that exhibited similar symptoms is using casebased reasoning. A lawyer who advocates a particular outcome in a trial based on legal precedents or a judge who creates case law is using casebased reasoning. So, too, an engineer copying working elements of nature (practicing biomimicry), is treating nature as a database of solutions to problems. Casebased reasoning is a prominent type of analogy solution making. It has been argued that casebased reasoning is not only a powerful method for computer reasoning, but also a pervasive behavior in everyday human problem solving; or, more radically, that all reasoning is based on past cases personally experienced. This view is related to prototype theory, which is most deeply explored in cognitive science. 
CaseControl Study  A casecontrol study is a type of study design used widely, originally developed in epidemiology, although its use has also been advocated for the social sciences. It is a type of observational study in which two existing groups differing in outcome are identified and compared on the basis of some supposed causal attribute. Casecontrol studies are often used to identify factors that may contribute to a medical condition by comparing subjects who have that condition/disease (the “cases”) with patients who do not have the condition/disease but are otherwise similar (the “controls”). They require fewer resources but provide less evidence for causal inference than a randomized controlled trial. 
CASED  We introduce CASED, a novel curriculum sampling algorithm that facilitates the optimization of deep learning segmentation or detection models on data sets with extreme class imbalance. We evaluate the CASED learning framework on the task of lung nodule detection in chest CT. In contrast to twostage solutions, wherein nodule candidates are first proposed by a segmentation model and refined by a second detection stage, CASED improves the training of deep nodule segmentation models (e.g. UNet) to the point where state of the art results are achieved using only a trivial detection stage. CASED improves the optimization of deep segmentation models by allowing them to first learn how to distinguish nodules from their immediate surroundings, while continuously adding a greater proportion of difficulttoclassify global context, until uniformly sampling from the empirical data distribution. Using CASED during training yields a minimalist proposal to the lung nodule detection problem that tops the LUNA16 nodule detection benchmark with an average sensitivity score of 88.35%. Furthermore, we find that models trained using CASED are robust to nodule annotation quality by showing that comparable results can be achieved when only a point and radius for each ground truth nodule are provided during training. Finally, the CASED learning framework makes no assumptions with regard to imaging modality or segmentation target and should generalize to other medical imaging problems where class imbalance is a persistent problem. 
Catalan Number  In combinatorial mathematics, the Catalan numbers form a sequence of natural numbers that occur in various counting problems, often involving recursivelydefined objects. They are named after the Belgian mathematician Eugène Charles Catalan (18141894). Modular Catalan Numbers 
CatandMouse Markov Chain  The socalled CatandMouse Markov chain, studied earlier by Litvak and Robert (2012), is a 2dimensional Markov chain on the lattice $\mathbb{Z}^2$, where the first component is a simple random walk and the second component changes when the components meet. 
Catastrophe Modeling  Catastrophe modeling (also known as cat modeling) is the process of using computerassisted calculations to estimate the losses that could be sustained due to a catastrophic event such as a hurricane or earthquake. Cat modeling is especially applicable to analyzing risks in the insurance industry and is at the confluence of actuarial science, engineering, meteorology, and seismology. 
CatBoost  CatBoost delivers bestinclass accuracy unmatched by other gradient boosting algorithms today. It is an outofthebox solution that significantly improves data scientists’ ability to create predictive models using a variety of data sources, such as sensory, historical and transactional data. While most competing gradient boosting algorithms need to convert data descriptors to numerical form, CatBoost’s ability to support categorical data directly saves businesses time while increasing accuracy and efficiency. 
Categorical Cross Entropy  
Categorical Distributional Reinforcement Learning (CDRL) 
Categorical Distributional Reinforcement Learning (CDRL) [Bellemare et al., 2017]. 
Categorical Response Model  
Causal Additive Model (CAM) 
We develop estimation for potentially highdimensional additive structural equation models. A key component of our approach is to decouple order search among the variables from feature or edge selection in a directed acyclic graph encoding the causal structure. We show that the former can be done with nonregularized (restricted) maximum likelihood estimation while the latter can be efficiently addressed using sparse regression techniques. Thus, we substantially simplify the problem of structure search and estimation for an important class of causal models. We establish consistency of the (restricted) maximum likelihood estimator for low and highdimensional scenarios, and we also allow for misspecification of the error distribution. Furthermore, we develop an efficient computational algorithm which can deal with many variables, and the new method’s accuracy and performance is illustrated on simulated and real data. 
Causal Falling Rule List (CFRL) 
A causal falling rule list (CFRL) is a sequence of ifthen rules that specifies heterogeneous treatment effects, where (i) the order of rules determines the treatment effect subgroup a subject belongs to, and (ii) the treatment effect decreases monotonically down the list. A given CFRL parameterizes a hierarchical bayesian regression model in which the treatment effects are incorporated as parameters, and assumed constant within modelspecific subgroups. 
Causal Generative Neural Network (CGNN) 
We introduce CGNN, a framework to learn functional causal models as generative neural networks. These networks are trained using backpropagation to minimize the maximum mean discrepancy to the observed data. Unlike previous approaches, CGNN leverages both conditional independences and distributional asymmetries to seamlessly discover bivariate and multivariate causal structures, with or without hidden variables. CGNN does not only estimate the causal structure, but a full and differentiable generative model of the data. Throughout an extensive variety of experiments, we illustrate the competitive results of CGNN w.r.t stateoftheart alternatives in observational causal discovery on both simulated and real data, in the tasks of causeeffect inference, vstructure identification, and multivariate causal discovery. 
Causal Inference  Causal inference is the process of drawing a conclusion about a causal connection based on the conditions of the occurrence of an effect. The main difference between causal inference and inference of association is that the former analyzes the response of the effect variable when the cause is changed. The science of why things occur is called etiology. http://…mp;uid=2&uid=4&sid=21104618644387 
Causal Inference Benchmarking Framework  CausalityBenchmark is a library developed by IBM Research Haifa for benchmarking algorithms that estimate the causal effect of a treatment on some outcome. The framework includes unlabeled data, labeled data, code for scoring algorithm predictions based on both novel and established metrics. It can benchmark predictions of both population effect size and individual effect size. 
Causal Information Theory  Causal Information Theory – Formal Introduction of Key Concepts 
Causal Loglinear Model  ➘ “LogLinear Model” 
Causal Markov Condition  The causal Markov condition states that all dependences among the variables that are observed are due to the causal structure. This rules out time series that become dependent because of a common deterministic trend. 
Causal Model  A causal model is an abstract model that describes the causal mechanisms of a system. The model must express more than correlation because correlation does not imply causation. Judea Pearl defines a causal model as an ordered triple <U,V,E> , where U is a set of exogenous variables whose values are determined by factors outside the model; V is a set of endogenous variables whose values are determined by factors within the model; and E is a set of structural equations that express the value of each endogenous variable as a function of the values of the other variables in U and V. 
Causal Modeling Framework of Modular Structural Causal Models (mSCM) 
We address the problem of causal discovery from data, making use of the recently proposed causal modeling framework of modular structural causal models (mSCM) to handle cycles, latent confounders and nonlinearities. We introduce {\sigma}connection graphs ({\sigma}CG), a new class of mixed graphs (containing undirected, bidirected and directed edges) with additional structure, and extend the concept of {\sigma}separation, the appropriate generalization of the wellknown notion of dseparation in this setting, to apply to {\sigma}CGs. We prove the closedness of {\sigma}separation under marginalisation and conditioning and exploit this to implement a test of {\sigma}separation on a {\sigma}CG. This then leads us to the first causal discovery algorithm that can handle nonlinear functional relations, latent confounders, cyclic causal relationships, and data from different (stochastic) perfect interventions. As a proof of concept, we show on synthetic data how well the algorithm recovers features of the causal graph of modular structural causal models. 
Causal Network  A causal network is a Bayesian network with an explicit requirement that the relationships be causal. The additional semantics of the causal networks specify that if a node X is actively caused to be in a given state x (an action written as do(X=x)), then the probability density function changes to the one of the network obtained by cutting the links from the parents of X to X, and setting X to the caused value x. Using these semantics, one can predict the impact of external interventions from data obtained prior to intervention. ➚ “Bayesian Network” 
Causal Prediction  
Causal Rule Sets (CRS) 
We introduce a novel generative model for interpretable subgroup analysis for causal inference applications, Causal Rule Sets (CRS). A CRS model uses a small set of short rules to capture a subgroup where the average treatment effect is elevated compared to the entire population. We present a Bayesian framework for learning a causal rule set. The Bayesian framework consists of a prior that favors simpler models and a Bayesian logistic regression that characterizes the relation between outcomes, attributes and subgroup membership. We find maximum a posteriori models using discrete Monte Carlo steps in the joint solution space of rules sets and parameters. We provide theoretically grounded heuristics and bounding strategies to improve search efficiency. Experiments show that the search algorithm can efficiently recover a true underlying subgroup and CRS shows consistently competitive performance compared to other stateoftheart baseline methods. 
Causal Transfer Learning  An important goal in both transfer learning and causal inference is to make accurate predictions when the distribution of the test set and the training set(s) differ. Such a distribution shift may happen as a result of an external intervention on the data generating process, causing certain aspects of the distribution to change, and others to remain invariant. We consider a class of causal transfer learning problems, where multiple training sets are given that correspond to different external interventions, and the task is to predict the distribution of a target variable given measurements of other variables for a new (yet unseen) intervention on the system. We propose a method for solving these problems that exploits causal reasoning but does neither rely on prior knowledge of the causal graph, nor on the the type of interventions and their targets. We evaluate the method on simulated and real world data and find that it outperforms a standard prediction method that ignores the distribution shift. 
CausalGAN  We propose an adversarial training procedure for learning a causal implicit generative model for a given causal graph. We show that adversarial training can be used to learn a generative model with true observational and interventional distributions if the generator architecture is consistent with the given causal graph. We consider the application of generating faces based on given binary labels where the dependency structure between the labels is preserved with a causal graph. This problem can be seen as learning a causal implicit generative model for the image and labels. We devise a twostage procedure for this problem. First we train a causal implicit generative model over binary labels using a neural network consistent with a causal graph as the generator. We empirically show that WassersteinGAN can be used to output discrete labels. Later, we propose two new conditional GAN architectures, which we call CausalGAN and CausalBEGAN. We show that the optimal generator of the CausalGAN, given the labels, samples from the image distributions conditioned on these labels. The conditional GAN combined with a trained causal implicit generative model for the labels is then a causal implicit generative model over the labels and the generated image. We show that the proposed architectures can be used to sample from observational and interventional image distributions, even for interventions which do not naturally occur in the dataset. 
Causality  Causality (also referred to as causation, or cause and effect) is what connects one process (the cause) with another process or state (the effect), where the first is partly responsible for the second, and the second is partly dependent on the first. In general, a process has many causes, which are said to be causal factors for it, and all lie in its past. An effect can in turn be a cause of, or causal factor for, many other effects, which all lie in its future. Causality is metaphysically prior to notions of time and space. Causality is an abstraction that indicates how the world progresses, so basic a concept that it is more apt as an explanation of other concepts of progression than as something to be explained by others more basic. The concept is like those of agency and efficacy. For this reason, a leap of intuition may be needed to grasp it. Accordingly, causality is implicit in the logic and structure of ordinary language. Aristotelian philosophy uses the word ’cause’ to mean ‘explanation’ or ‘answer to a why question’, including Aristotle’s material, formal, efficient, and final ’causes’; then the ’cause’ is the explanans for the explanandum. In this case, failure to recognize that different kinds of ’cause’ are being considered can lead to futile debate. Of Aristotle’s four explanatory modes, the one nearest to the concerns of the present article is the ‘efficient’ one. The topic of causality remains a staple in contemporary philosophy. 
CausalSpartan  Causal consistency is an intermediate consistency model that can be achieved together with high availability and highperformance requirements even in presence of network partitions. In the context of partitioned data stores, it has been shown that implicit dependency tracking using clocks is more efficient than explicit dependency tracking by sending dependency check messages. Existing clockbased solutions depend on monotonic psychical clocks that are closely synchronized. These requirements make current protocols vulnerable to clock anomalies. In this paper, we propose a new clockbased algorithm, CausalSpartan, that instead of physical clocks, utilizes Hybrid Logical Clocks (HLCs). We show that using HLCs, without any overhead, we make the system robust on physical clock anomalies. This improvement is more significant in the context of query amplification, where a single query results in multiple GET/PUT operations.We also show that CausalSpartan decreases the visibility latency for a given data item comparing to existing clockbased approaches. In turn, this reduces the completion time of collaborative applications where two clients accessing two different replicas edit same items of the data store. Like previous protocols, CausalSpartan assumes that a given client does not access more than one replica. We show that in presence of network partitions, this assumption (made in several other works) is essential if one were to provide causal consistency as well as immediate availability to local updates. 
Cautious Deep Learning  Most classifiers operate by selecting the maximum of an estimate of the conditional distribution $p(yx)$ where $x$ stands for the features of the instance to be classified and $y$ denotes its label. This often results in a hubristic bias: overconfidence in the assignment of a definite label. Usually, the observations are concentrated on a small volume but the classifier provides definite predictions for the entire space. We propose constructing conformal prediction sets [vovk2005algorithmic] which contain a set of labels rather than a single label. These conformal prediction sets contain the true label with probability $1\alpha$. Our construction is based on $p(xy)$ rather than $p(yx)$ which results in a classifier that is very cautious: it outputs the null set – meaning `I don’t know’ — when the object does not resemble the training examples. An important property of our approach is that classes can be added or removed without having to retrain the classifier. We demonstrate the performance on the ImageNet ILSVRC dataset using high dimensional features obtained from state of the art convolutional neural networks. 
Cavs  Recent deep learning (DL) models have moved beyond static network architectures to dynamic ones, handling data where the network structure changes every example, such as sequences of variable lengths, trees, and graphs. Existing dataflowbased programming models for DL—both static and dynamic declaration—either cannot readily express these dynamic models, or are inefficient due to repeated dataflow graph construction and processing, and difficulties in batched execution. We present Cavs, a vertexcentric programming interface and optimized system implementation for dynamic DL models. Cavs represents dynamic network structure as a static vertex function $\mathcal{F}$ and a dynamic instancespecific graph $\mathcal{G}$, and performs backpropagation by scheduling the execution of $\mathcal{F}$ following the dependencies in $\mathcal{G}$. Cavs bypasses expensive graph construction and preprocessing overhead, allows for the use of static graph optimization techniques on predefined operations in $\mathcal{F}$, and naturally exposes batched execution opportunities over different graphs. Experiments comparing Cavs to two stateoftheart frameworks for dynamic NNs (TensorFlow Fold and DyNet) demonstrate the efficacy of this approach: Cavs achieves a near one order of magnitude speedup on training of various dynamic NN architectures, and ablations demonstrate the contribution of our proposed batching and memory management strategies. 
CDF2PDF  CDF2PDF is a method of PDF estimation by approximating CDF. The original idea of it was previously proposed in [1] called SIC. However, SIC requires additional hyperparameter tunning, and no algorithms for computing higher order derivative from a trained NN are provided in [1]. CDF2PDF improves SIC by avoiding the timeconsuming hyperparameter tuning part and enabling higher order derivative computation to be done in polynomial time. Experiments of this method for onedimensional data shows promising results. 
Cell Suppression Problem (CSP) 
Cell suppression is one of the most frequently used techniques to prevent the disclosure of sensitive data in statistical tables. Finding the minimum cost set of nonsensitive entries to suppress, along with the sensitive ones, in order to make a table safe for publication, is a NPhard problem, denoted the cell suppression problem (CSP). 
Censored Time Series Analysis  Imputation method in the presence of censored data. The main message of the imputation method is that we should account for the variability of the censored part of the data by mimicking the complete data. That is, we impute the incomplete part with a conditional random sample rather than the conditional expectation or certain constants. Simulation results suggest that the imputation method reduces the possible biases and has similar standard errors than those from complete data. 
Censoring  In statistics, engineering, economics, and medical research, censoring is a condition in which the value of a measurement or observation is only partially known. For example, suppose a study is conducted to measure the impact of a drug on mortality rate. In such a study, it may be known that an individual’s age at death is at least 75 years (but may be more). Such a situation could occur if the individual withdrew from the study at age 75, or if the individual is currently alive at the age of 75. Censoring also occurs when a value occurs outside the range of a measuring instrument. For example, a bathroom scale might only measure up to 300 pounds (140 kg). If a 350 lb (160 kg) individual is weighed using the scale, the observer would only know that the individual’s weight is at least 300 pounds (140 kg). The problem of censored data, in which the observed value of some variable is partially known, is related to the problem of missing data, where the observed value of some variable is unknown. Censoring should not be confused with the related idea truncation. With censoring, observations result either in knowing the exact value that applies, or in knowing that the value lies within an interval. With truncation, observations never result in values outside a given range: values in the population outside the range are never seen or never recorded if they are seen. Note that in statistics, truncation is not the same as rounding. 
Centered Autologistic Model  The traditional autologistic model was proposed by Besag (1972). The model is a Markov random field (MRF) model (Kindermann and Snell, 1980) 
Centered Initial Attack (CIA) 
During the last years, a remarkable breakthrough has been made in AI domain thanks to artificial deep neural networks that achieved a great success in many machine learning tasks in computer vision, natural language processing, speech recognition, malware detection and so on. However, they are highly vulnerable to easily crafted adversarial examples. Many investigations have pointed out this fact and different approaches have been proposed to generate attacks while adding a limited perturbation to the original data. The most robust known method so far is the so called C&W attack [1]. Nonetheless, a countermeasure known as feature squeezing coupled with ensemble defense showed that most of these attacks can be destroyed [6]. In this paper, we present a new method we call Centered Initial Attack (CIA) whose advantage is twofold : first, it insures by construction the maximum perturbation to be smaller than a threshold fixed beforehand, without the clipping process that degrades the quality of attacks. Second, it is robust against recently introduced defenses such as feature squeezing, JPEG encoding and even against a voting ensemble of defenses. While its application is not limited to images, we illustrate this using five of the current best classifiers on ImageNet dataset among which two are adversarialy retrained on purpose to be robust against attacks. With a fixed maximum perturbation of only 1.5% on any pixel, around 80% of attacks (targeted) fool the voting ensemble defense and nearly 100% when the perturbation is only 6%. While this shows how it is difficult to defend against CIA attacks, the last section of the paper gives some guidelines to limit their impact. 
Central Network (CentralNet) 
This paper proposes a novel multimodal fusion approach, aiming to produce best possible decisions by integrating information coming from multiple media. While most of the past multimodal approaches either work by projecting the features of different modalities into the same space, or by coordinating the representations of each modality through the use of constraints, our approach borrows from both visions. More specifically, assuming each modality can be processed by a separated deep convolutional network, allowing to take decisions independently from each modality, we introduce a central network linking the modality specific networks. This central network not only provides a common feature embedding but also regularizes the modality specific networks through the use of multitask learning. The proposed approach is validated on 4 different computer vision tasks on which it consistently improves the accuracy of existing multimodal fusion approaches. 
Centrality  In graph theory and network analysis, indicators of centrality identify the most important vertices within a graph. Applications include identifying the most influential person(s) in a social network, key infrastructure nodes in the Internet or urban networks, and superspreaders of disease. Centrality concepts were first developed in social network analysis, and many of the terms used to measure centrality reflect their sociological origin.[1] They should not be confused with node influence metrics, which seek to quantify the influence of every node in the network. 
Centralized Coordinate Learning (CCL) 
Owe to the rapid development of deep neural network (DNN) techniques and the emergence of large scale face databases, face recognition has achieved a great success in recent years. During the training process of DNN, the face features and classification vectors to be learned will interact with each other, while the distribution of face features will largely affect the convergence status of network and the face similarity computing in test stage. In this work, we formulate jointly the learning of face features and classification vectors, and propose a simple yet effective centralized coordinate learning (CCL) method, which enforces the features to be dispersedly spanned in the coordinate space while ensuring the classification vectors to lie on a hypersphere. An adaptive angular margin is further proposed to enhance the discrimination capability of face features. Extensive experiments are conducted on six face benchmarks, including those have large age gap and hard negative samples. Trained only on the smallscale CASIA Webface dataset with 460K face images from about 10K subjects, our CCL model demonstrates high effectiveness and generality, showing consistently competitive performance across all the six benchmark databases. 
Cerioli Outlier Detection  “Cerioli Outlier Dectection” is an iterated RMCD method of Cerioli (2010) for multivariate outlier detection via robust Mahalanobis distances. 
Certified Program Model  Production distributed systems are challenging to formally verify, in particular when they are based on distributed protocols that are not rigorously described or fully understood. In this paper, we derive models and properties for two core distributed protocols used in eventually consistent production keyvalue stores such as Riak and Cassandra. We propose a novel modeling called certified program models, where complete distributed systems are captured as programs written in traditional systems languages such as concurrent C. Specifically, we model the readrepair and hintedhandoff recovery protocols as concurrent C programs, test them for conformance with real systems, and then verify that they guarantee eventual consistency, modeling precisely the specification as well as the failure assumptions under which the results hold. 
CF4CF  Automatic solutions which enable the selection of the best algorithms for a new problem are commonly found in the literature. One research area which has recently received considerable efforts is Collaborative Filtering. Existing work includes several approaches using Metalearning, which relate the characteristics of datasets with the performance of the algorithms. This work explores an alternative approach to tackle this problem. Since, in essence, both are recommendation problems, this work uses Collaborative Filtering algorithms to select Collaborative Filtering algorithms. Our approach integrates subsampling landmarkers, which are a data characterization approach commonly used in Metalearning, with a standard Collaborative Filtering method. The experimental results show that CF4CF competes with standard Metalearning strategies in the problem of Collaborative Filtering algorithm selection. 
Chain Event Graph (CEG) 
ceg 
ChainerCV  Despite significant progress of deep learning in the field of computer vision, there has not been a software library that covers these methods in a unifying manner. We introduce ChainerCV, a software library that is intended to fill this gap. ChainerCV supports numerous neural network models as well as software components needed to conduct research in computer vision. These implementations emphasize simplicity, flexibility and good software engineering practices. The library is designed to perform on par with the results reported in published papers and its tools can be used as a baseline for future research in computer vision. Our implementation includes sophisticated models like Faster RCNN and SSD, and covers tasks such as object detection and semantic segmentation. 
Chameleon  We present Chameleon, a novel hybrid (mixedprotocol) framework for secure function evaluation (SFE) which enables two parties to jointly compute a function without disclosing their private inputs. Chameleon combines the best aspects of generic SFE protocols with the ones that are based upon additive secret sharing. In particular, the framework performs linear operations in the ring $\mathbb{Z}_{2^l}$ using additively secret shared values and nonlinear operations using Yao’s Garbled Circuits or the GoldreichMicaliWigderson protocol. Chameleon departs from the common assumption of additive or linear secret sharing models where three or more parties need to communicate in the online phase: the framework allows two parties with private inputs to communicate in the online phase under the assumption of a third node generating correlated randomness in an offline phase. Almost all of the heavy cryptographic operations are precomputed in an offline phase which substantially reduces the communication overhead. Chameleon is both scalable and significantly more efficient than the ABY framework (NDSS’15) it is based on. Our framework supports signed fixedpoint numbers. In particular, Chameleon’s vector dot product of signed fixedpoint numbers improves the efficiency of mining and classification of encrypted data for algorithms based upon heavy matrix multiplications. Our evaluation of Chameleon on a 5 layer convolutional deep neural network shows 133x and 4.2x faster executions than Microsoft CryptoNets (ICML’16) and MiniONN (CCS’17), respectively. 
ChanDarwiche Distance  We propose a distance measure between two probability distributions, which allows one to bound the amount of belief change that occurs when moving from one distribution to another. We contrast the proposed measure with some well known measures, including KLdivergence, showing some theoretical properties on its ability to bound belief changes. We then present two practical applications of the proposed distance measure: sensitivity analysis in belief networks and probabilistic belief revision. We show how the distance measure can be easily computed in these applications, and then use it to bound global belief changes that result from either the perturbation of local conditional beliefs or the accommodation of soft evidence. Finally, we show that two well known techniques in sensitivity analysis and belief revision correspond to the minimization of our proposed distance measure and, hence, can be shown to be optimal from that viewpoint. 
Change Point Analysis (CPA) 
Changepoint analysis is a powerful new tool for determining whether a change has taken place. It is capable of detecting subtle changes missed by control charts. Further, it better characterizes the changes detected by providing confidence levels and confidence intervals. When collecting online data, a changepoint analysis is not a replacement for control charting. But, because a changepoint analysis can provide further information, the two methods can be used in a complementary fashion. When analyzing historical data, especially when dealing with large data sets, changepoint analysis is preferable to control charting. A changepoint analysis is more powerful, better characterizes the changes, controls the overall error rate, is robust to outliers, is more flexible and is simpler to use. CPA aims at detecting any change in the mean of a process in historical data. Example questions to be answered by performing CPA: · Did a change occur? · Did more than one change occur? · When did the changes occur? · How confident are we that they are real changes? http://…/changepoint.html 
Change Point Detection  In statistical analysis, change detection or change point detection tries to identify times when the probability distribution of a stochastic process or time series changes. In general the problem concerns both detecting whether or not a change has occurred, or whether several changes might have occurred, and identifying the times of any such changes. Specific applications, like step detection and edge detection, may be concerned with changes in the mean, variance, correlation, or spectral density of the process. More generally change detection also includes the detection of anomalous behavior: anomaly detection. 
ChangePoint Detection Procedure via VIF Regression (VIFCP) 

Channel Gating Neural Network  Employing deep neural networks to obtain stateoftheart performance on computer vision tasks can consume billions of floating point operations and several Joules of energy per evaluation. Network pruning, which statically removes unnecessary features and weights, has emerged as a promising way to reduce this computation cost. In this paper, we propose channel gating, a dynamic, finegrained, trainingbased computationcostreduction scheme. Channel gating works by identifying the regions in the features which contribute less to the classification result and turning off a subset of the channels for computing the pixels within these uninteresting regions. Unlike static network pruning, the channel gating optimizes computations exploiting characteristics specific to each input at runtime. We show experimentally that applying channel gating in stateoftheart networks can achieve 66% and 60% reduction in FLOPs with 0.22% and 0.29% accuracy loss on the CIFAR10 and CIFAR100 datasets, respectively. 
Channel Matching (CM) 
A group of transition probability functions form a Shannon’s channel whereas a group of truth functions form a semantic channel. Label learning is to let semantic channels match Shannon’s channels and label selection is to let Shannon’s channels match semantic channels. The Channel Matching (CM) algorithm is provided for multilabel classification. This algorithm adheres to maximum semantic information criterion which is compatible with maximum likelihood criterion and regularized least squares criterion. If samples are very large, we can directly convert Shannon’s channels into semantic channels by the third kind of Bayes’ theorem; otherwise, we can train truth functions with parameters by sampling distributions. A label may be a Boolean function of some atomic labels. For simplifying learning, we may only obtain the truth functions of some atomic label. For a given label, instances are divided into three kinds (positive, negative, and unclear) instead of two kinds as in popular studies so that the problem with binary relevance is avoided. For each instance, the classifier selects a compound label with most semantic information or richest connotation. As a predictive model, the semantic channel does not change with the prior probability distribution (source) of instances. It still works when the source is changed. The classifier changes with the source, and hence can overcome classimbalance problem. It is shown that the old population’s increasing will change the classifier for label ‘Old’ and has been impelling the semantic evolution of ‘Old’. The CM iteration algorithm for unseen instance classification is introduced. 
ChannelRecurrent Variational Autoencoders (CRVAE) 
Variational Autoencoder (VAE) is an efficient framework in modeling natural images with probabilistic latent spaces. However, when the input spaces become complex, VAE becomes less effective, potentially due to the oversimplification of its latent space construction. In this paper, we propose to integrate recurrent connections across channels to both inference and generation steps of VAE. Sequentially building up the complexity of highlevel features in this way allows us to capture globaltolocal and coarsetofine structures of the input data spaces. We show that our channelrecurrent VAE improves existing approaches in multiple aspects: (1) it attains lower negative loglikelihood than standard VAE on MNIST; when trained adversarially, (2) it generates face and bird images with substantially higher visual quality than the stateoftheart VAEGAN and (3) channelrecurrency allows learning more interpretable representations; finally (4) it achieves competitive classification results on STL10 in a semisupervised setup. 
Chaos Monkey  Chaos Monkey is a service which runs in the Amazon Web Services (AWS) that seeks out Auto Scaling Groups (ASGs) and terminates instances (virtual machines) per group. The software design is flexible enough to work with other cloud providers or instance groupings and can be enhanced to add that support. The service has a configurable schedule that, by default, runs on nonholiday weekdays between 9am and 3pm. In most cases, we have designed our applications to continue working when an instance goes offline, but in those special cases that they don’t, we want to make sure there are people around to resolve and learn from any problems. With this in mind, Chaos Monkey only runs within a limited set of hours with the intent that engineers will be alert and able to respond. 
Charged String Tensor Networks  Tensor network methods provide an intuitive graphical language to describe quantum states, channels, open quantum systems and a class of numerical approximation methods that efficiently simulate certain manybody states in one spatial dimension. There are two fundamental types of tensor networks in wide use today. The most common is similar to quantum circuits. The second is the braided class of tensor networks, used in topological quantum computing. Recently a third class of tensor networks was discovered by Jaffe, Liu and Wozniakowski—the JLWmodel—notably, the wires carry charge excitations. The rules in which network components can be moved, merged and manipulated in a graphical form of reasoning take an elegant form. For instance the relative charge locations on wires carries precise meaning and changing the ordering modifies a connected network specifically by a complex number. The type of isotopy discovered in the topological JLWmodel provides an alternative means to reason about quantum information, computation and protocols. Here we recall the tensornetwork building blocks used in a controlledNOT gate. Some open problems related to the JLWmodel are given. 
Charikar’s Algorithm  To detect nearduplicates this software uses the Charikar’s fingerprinting technique, this means characterizing each document with a unique 64bit vector, like a fingerprint. To determine whether two documents are Nearduplicates, we have to compare their fingerprints. To do this we use two algorithms, the algorithm developed by Moses Charikar and the Hamming distance algorithm, which allows us to measure the similarity between two vectors of n bits. What is Charikar’s algorithm? · Characterization of the document · Apply hash functions to the characteristics · Obtain fingerprint · Apply vector comparison function: Are (Doc1, doc2) nearduplicate? Hammingdistance (fingerprint (doc1), fingerprint (doc2)) = k GitXiv 
Chebyshev Distance  In mathematics, Chebyshev distance (or Tchebychev distance), maximum metric, or L8 metric is a metric defined on a vector space where the distance between two vectors is the greatest of their differences along any coordinate dimension. It is named after Pafnuty Chebyshev. It is also known as chessboard distance, since in the game of chess the minimum number of moves needed by a king to go from one square on a chessboard to another equals the Chebyshev distance between the centers of the squares, if the squares have side length one, as represented in 2D spatial coordinates with axes aligned to the edges of the board. For example, the Chebyshev distance between f6 and e2 equals 4. 
CheckpointRestart for Unified Memory (CRUM) 
Unified Virtual Memory (UVM) was recently introduced on recent NVIDIA GPUs. Through software and hardware support, UVM provides a coherent shared memory across the entire heterogeneous node, migrating data as appropriate. The older CUDA programming style is akin to older largememory UNIX applications which used to directly load and unload memory segments. Newer CUDA programs have started taking advantage of UVM for the same reasons of superior programmability that UNIX applications long ago switched to assuming the presence of virtual memory. Therefore, checkpointing of UVM will become increasingly important, especially as NVIDIA CUDA continues to gain wider popularity: 87 of the top 500 supercomputers in the latest listings are GPUaccelerated, with a current trend of ten additional GPUbased supercomputers each year. A new scalable checkpointing mechanism, CRUM (CheckpointRestart for Unified Memory), is demonstrated for hybrid CUDA/MPI computations across multiple computer nodes. CRUM supports a fast, forked checkpointing, which mostly overlaps the CUDA computation with storage of the checkpoint image in stable storage. The runtime overhead of using CRUM is 6% on average, and the time for forked checkpointing is seen to be a factor of up to 40 times less than traditional, synchronous checkpointing. 
Chernoff Faces  Chernoff faces, invented by Herman Chernoff, display multivariate data in the shape of a human face. The individual parts, such as eyes, ears, mouth and nose represent values of the variables by their shape, size, placement and orientation. The idea behind using faces is that humans easily recognize faces and notice small changes without difficulty. Chernoff faces handle each variable differently. Because the features of the faces vary in perceived importance, the way in which variables are mapped to the features should be carefully chosen (e.g. eye size and eyebrowslant have been found to carry significant weight). 
Chernoff Information  Chernoff information upper bounds the probability of error of the optimal Bayesian decision rule for 2 class classification problems. However, it turns out that in practice the Chernoff bound is hard to calculate or even approximate. In statistics, many usual distributions, such as Gaussians, Poissons or frequency histograms called multinomials, can be handled in the unified framework of exponential families. In this note, we prove that the Chernoff information for members of the same exponential family can be either derived analytically in closed form, or efficiently approximated using a simple geodesic bisection optimization technique based on an exact geometric characterization of the ‘Chernoff point’ on the underlying statistical manifold. 
Chinese Restaurant Process  In probability theory, the Chinese restaurant process is a discretetime stochastic process, analogous to seating customers at tables in a Chinese restaurant. Imagine a Chinese restaurant with an infinite number of circular tables, each with infinite capacity. Customer 1 is seated at an unoccupied table with probability 1. At time n + 1, a new customer chooses uniformly at random to sit at one of the following n + 1 places: directly to the left of one of the n customers already sitting at an occupied table, or at a new, unoccupied table. David J. Aldous attributes the restaurant analogy to Jim Pitman and Lester Dubins in his 1983 book. At time n, the value of the process is a partition of the set of n customers, where the tables are the blocks of the partition. Mathematicians are interested in the probability distribution of this random partition. 
ChiSquare Test  A chisquared test, also referred to as test, is any statistical hypothesis test in which the sampling distribution of the test statistic is a chisquared distribution when the null hypothesis is true. Also considered a chisquared test is a test in which this is asymptotically true, meaning that the sampling distribution (if the null hypothesis is true) can be made to approximate a chisquared distribution as closely as desired by making the sample size large enough. The chisquare (I) test is used to determine whether there is a significant difference between the expected frequencies and the observed frequencies in one or more categories. Do the number of individuals or objects that fall in each category differ significantly from the number you would expect? Is this difference between the expected and observed due to sampling variation, or is it a real difference? 
CHisquared Automatic Interaction Detection (CHAID) 
CHAID is a type of decision tree technique, based upon adjusted significance testing (Bonferroni testing). The technique was developed in South Africa and was published in 1980 by Gordon V. Kass, who had completed a PhD thesis on this topic. CHAID can be used for prediction (in a similar fashion to regression analysis, this version of CHAID being originally known as XAID) as well as classification, and for detection of interaction between variables. CHAID stands for CHisquared Automatic Interaction Detection, based upon a formal extension of the US AID (Automatic Interaction Detection) and THAID (THeta Automatic Interaction Detection) procedures of the 1960s and 70s, which in turn were extensions of earlier research, including that performed in the UK in the 1950s. In practice, CHAID is often used in the context of direct marketing to select groups of consumers and predict how their responses to some variables affect other variables, although other early applications were in the field of medical and psychiatric research. Like other decision trees, CHAID’s advantages are that its output is highly visual and easy to interpret. Because it uses multiway splits by default, it needs rather large sample sizes to work effectively, since with small sample sizes the respondent groups can quickly become too small for reliable analysis. One important advantage of CHAID over alternatives such as multiple regression is that it is nonparametric. CHAID and R — When you need explanation 
Choice Modeling  Choice modelling attempts to model the decision process of an individual or segment in a particular context. Choice modelling may be used to estimate nonmarket environmental benefits and costs. Many alternative models exist in econometrics, marketing, sociometrics and other fields, including utility maximization, optimization applied to consumer theory, and a plethora of other identification strategies which may be more or less accurate depending on the data, sample, hypothesis and the particular decision being modelled. In addition Choice Modelling is regarded as the most suitable method for estimating consumers’ willingness to pay for quality improvements in multiple dimensions. Neuroscience Suggests Choice Model Misspecification 
ChoiceNet  In this paper, we focus on the supervised learning problem with corrupted training data. We assume that the training dataset is generated from a mixture of a target distribution and other unknown distributions. We estimate the quality of each data by revealing the correlation between the generated distribution and the target distribution. To this end, we present a novel framework referred to here as ChoiceNet that can robustly infer the target distribution in the presence of inconsistent data. We demonstrate that the proposed framework is applicable to both classification and regression tasks. ChoiceNet is extensively evaluated in comprehensive experiments, where we show that it constantly outperforms existing baseline methods in the handling of noisy data. Particularly, ChoiceNet is successfully applied to autonomous driving tasks where it learns a safe driving policy from a dataset with mixed qualities. In the classification task, we apply the proposed method to the CIFAR10 dataset and it shows superior performances in terms of robustness to noisy labels. 
Cholesky Decomposition  In linear algebra, the Cholesky decomposition or Cholesky factorization is a decomposition of a Hermitian, positivedefinite matrix into the product of a lower triangular matrix and its conjugate transpose, useful for efficient numerical solutions and Monte Carlo simulations. It was discovered by AndréLouis Cholesky for real matrices. When it is applicable, the Cholesky decomposition is roughly twice as efficient as the LU decomposition for solving systems of linear equations. 
Chopthin Resampler  Resampling is a standard step in particle filters and more generally sequential Monte Carlo methods. We present an algorithm, called chopthin, for resampling weighted particles. In contrast to standard resampling methods the algorithm does not produce a set of equally weighted particles; instead it merely enforces an upper bound on the ratio between the weights. A simulation study shows that the chopthin algorithm consistently outperforms standard resampling methods. The algorithms chops up particles with large weight and thins out particles with low weight, hence its name. It implicitly guarantees a lower bound on the effective sample size. The algorithm can be implemented very efficiently, making it practically useful. We show that the expected computational effort is linear in the number of particles. Implementations for C++, R (on CRAN) and for Matlab are available. chopthin 
Choquet Integral  A Choquet integral is a subadditive or superadditive integral created by the French mathematician Gustave Choquet in 1953. It was initially used in statistical mechanics and potential theory, but found its way into decision theory in the 1980s, where it is used as a way of measuring the expected utility of an uncertain event. It is applied specifically to membership functions and capacities. In imprecise probability theory, the Choquet integral is also used to calculate the lower expectation induced by a 2monotone lower probability, or the upper expectation induced by a 2alternating upper probability. Using the Choquet integral to denote the expected utility of belief functions measured with capacities is a way to reconcile the Ellsberg paradox and the Allais paradox. http://…/Ayub_Khan_2009.pdf 
Choropleth Map  A choropleth map is a thematic map in which areas are shaded or patterned in proportion to the measurement of the statistical variable being displayed on the map, such as population density or percapita income. The choropleth map provides an easy way to visualize how a measurement varies across a geographic area or it shows the level of variability within a region. A special type of choropleth map is a prism map, a threedimensional map in which a given region’s height on the map is proportional to the statistical variable’s value for that region. 
ChowLiu Tree  In probability theory and statistics ChowLiu tree is an efficient method for constructing a secondorder product approximation of a joint probability distribution, first described in a paper by Chow & Liu (1968). The goals of such a decomposition, as with such Bayesian networks in general, may be either data compression or inference. Structure Learning in Bayesian Networks 
Christoffel Function  
Chronohorogram  
Chumbley Score  A statistical analysis and computational algorithm for comparing pairs of tool marks via profilometry data is described. Empirical validation of the method is established through experiments based on tool marks made at selected fixed angles from 50 sequentially manufactured screwdriver tips. Results obtained from three different comparison scenarios are presented and are in agreement with experiential knowledge possessed by practicing examiners. Further comparisons between scores produced by the algorithm and visual assessments of the same tool mark pairs by professional tool mark examiners in a blind study in general show good agreement between the algorithm and human experts. In specific instances where the algorithm had difficulty in assessing a particular comparison pair, results obtained during the collaborative study with professional examiners suggest ways in which algorithm performance may be improved. It is concluded that the addition of contextual information when inputting data into the algorithm should result in better performance. toolmaRk 
CINEX  Information extraction traditionally focuses on extracting relations between identifiable entities, such as <Monterey, locatedIn, California>. Yet, texts often also contain Counting information, stating that a subject is in a specific relation with a number of objects, without mentioning the objects themselves, for example, ‘California is divided into 58 counties’. Such counting quantifiers can help in a variety of tasks such as query answering or knowledge base curation, but are neglected by prior work. This paper develops the first fullfledged system for extracting counting information from text, called CINEX. We employ distant supervision using fact counts from a knowledge base as training seeds, and develop novel techniques for dealing with several challenges: (i) nonmaximal training seeds due to the incompleteness of knowledge bases, (ii) sparse and skewed observations in text sources, and (iii) high diversity of linguistic patterns. Experiments with five humanevaluated relations show that CINEX can achieve 60% average precision for extracting counting information. In a largescale experiment, we demonstrate the potential for knowledge base enrichment by applying CINEX to 2,474 frequent relations in Wikidata. CINEX can assert the existence of 2.5M facts for 110 distinct relations, which is 28% more than the existing Wikidata facts for these relations. 
CIoTA  Due to their rapid growth and deployment, Internet of things (IoT) devices have become a central aspect of our daily lives. However, they tend to have many vulnerabilities which can be exploited by an attacker. Unsupervised techniques, such as anomaly detection, can help us secure the IoT devices. However, an anomaly detection model must be trained for a long time in order to capture all benign behaviors. This approach is vulnerable to adversarial attacks since all observations are assumed to be benign while training the anomaly detection model. In this paper, we propose CIoTA, a lightweight framework that utilizes the blockchain concept to perform distributed and collaborative anomaly detection for devices with limited resources. CIoTA uses blockchain to incrementally update a trusted anomaly detection model via selfattestation and consensus among IoT devices. We evaluate CIoTA on our own distributed IoT simulation platform, which consists of 48 Raspberry Pis, to demonstrate CIoTA’s ability to enhance the security of each device and the security of the network as a whole. 
Circular Plot / Circos  Circos is a software package for visualizing data and information. It visualizes data in a circular layout – this makes Circos ideal for exploring relationships between objects or positions. There are other reasons why a circular layout is advantageous, not the least being the fact that it is attractive. 
Circular Statistics  ➘ “Directional Statistics” 
Citizen Data Scientist (CDS) 
A citizen data scientist is a role that analyzes data and creates data and business models for their companies with the help of big data tools and technologies. Citizen data scientists do not necessarily need to be data science or business intelligence experts. This role is given to employees in an organization who can use the big data tools and technology to create data models. Back in 2016, Gartner coined the term, meaning a person ‘who creates or generates models that use advanced diagnostic analytics or predictive and prescriptive capabilities, but whose primary job function is outside the field of statistics and analytics.’ Citizen Data Scientists – Are we there yet? 
ClariNet  In this work, we propose an alternative solution for parallel wave generation by WaveNet. In contrast to parallel WaveNet (Oord et al., 2018), we distill a Gaussian inverse autoregressive flow from the autoregressive WaveNet by minimizing a novel regularized KL divergence between their highlypeaked output distributions. Our method computes the KL divergence in closedform, which simplifies the training algorithm and provides very efficient distillation. In addition, we propose the first texttowave neural architecture for speech synthesis, which is fully convolutional and enables fast endtoend training from scratch. It significantly outperforms the previous pipeline that connects a texttospectrogram model to a separately trained WaveNet (Ping et al., 2017). We also successfully distill a parallel waveform synthesizer conditioned on the hidden representation in this endtoend model. 
Class Label Autoencoder  Existing zeroshot learning (ZSL) methods usually learn a projection function between a feature space and a semantic embedding space(text or attribute space) in the training seen classes or testing unseen classes. However, the projection function cannot be used between the feature space and multisemantic embedding spaces, which have the diversity characteristic for describing the different semantic information of the same class. To deal with this issue, we present a novel method to ZSL based on learning class label autoencoder (CLA). CLA can not only build a uniform framework for adapting to multisemantic embedding spaces, but also construct the encoderdecoder mechanism for constraining the bidirectional projection between the feature space and the class label space. Moreover, CLA can jointly consider the relationship of feature classes and the relevance of the semantic classes for improving zeroshot classification. The CLA solution can provide both unseen class labels and the relation of the different classes representation(feature or semantic information) that can encode the intrinsic structure of classes. Extensive experiments demonstrate the CLA outperforms stateofart methods on four benchmark datasets, which are AwA, CUB, Dogs and ImNet2. 
Classical Test Theory (CTT) 
Classical test theory is a body of related psychometric theory that predicts outcomes of psychological testing such as the difficulty of items or the ability of testtakers. Generally speaking, the aim of classical test theory is to understand and improve the reliability of psychological tests. Classical test theory may be regarded as roughly synonymous with true score theory. The term ‘classical’ refers not only to the chronology of these models but also contrasts with the more recent psychometric theories, generally referred to collectively as item response theory, which sometimes bear the appellation ‘modern’ as in ‘modern latent trait theory’. Classical test theory as we know it today was codified by Novick (1966) and described in classic texts such as Lord & Novick (1968) and Allen & Yen (1979/2002). The description of classical test theory below follows these seminal publications. 
Classification Accuracy (CA) 
In the fields of science, engineering, industry, and statistics, the accuracy of a measurement system is the degree of closeness of measurements of a quantity to that quantity’s actual (true) value. The precision of a measurement system, related to reproducibility and repeatability, is the degree to which repeated measurements under unchanged conditions show the same results. Although the two words precision and accuracy can be synonymous in colloquial use, they are deliberately contrasted in the context of the scientific method. A measurement system can be accurate but not precise, precise but not accurate, neither, or both. For example, if an experiment contains a systematic error, then increasing the sample size generally increases precision but does not improve accuracy. The result would be a consistent yet inaccurate string of results from the flawed experiment. Eliminating the systematic error improves accuracy but does not change precision. A measurement system is considered valid if it is both accurate and precise. Related terms include bias (nonrandom or directed effects caused by a factor or factors unrelated to the independent variable) and error (random variability). The terminology is also applied to indirect measurements – that is, values obtained by a computational procedure from observed data. In addition to accuracy and precision, measurements may also have a measurement resolution, which is the smallest change in the underlying physical quantity that produces a response in the measurement. In numerical analysis, accuracy is also the nearness of a calculation to the true value; while precision is the resolution of the representation, typically defined by the number of decimal or binary digits. http://…/accuracy.htm 
Classification Based on Associations (CBA) 
Classification rule mining aims to discover a small set of rules in the database that forms an accurate classifier. Association rule mining finds all the rules existing in the database that satisfy some minimum support and minimum confidence constraints. For association rule mining, the target of discovery is not predetermined, while for classification rule mining there is one and only one predetermined target. In this paper, we propose to integrate these two mining techniques. The integration is done by focusing on mining a special subset of association rules, called class association rules (CARs). An efficient algorithm is also given for building a classifier based on the set of discovered CARs. Experimental results show that the classifier built this way is, in general, more accurate than that produced by the stateoftheart classification system C4.5. In addition, this integration helps to solve a number of problems that exist in the current classification systems. rCBA 
Classification Based Preselection (CPS) 
In evolutionary algorithms, a preselection operator aims to select the promising offspring solutions from a candidate offspring set. It is usually based on the estimated or real objective values of the candidate offspring solutions. In a sense, the preselection can be treated as a classification procedure, which classifies the candidate offspring solutions into promising ones and unpromising ones. Following this idea, we propose a classification based preselection (CPS) strategy for evolutionary multiobjective optimization. When applying classification based preselection, an evolutionary algorithm maintains two external populations (training data set) that consist of some selected good and bad solutions found so far; then it trains a classifier based on the training data set in each generation. Finally it uses the classifier to filter the unpromising candidate offspring solutions and choose a promising one from the generated candidate offspring set for each parent solution. In such cases, it is not necessary to estimate or evaluate the objective values of the candidate offspring solutions. The classification based preselection is applied to three stateoftheart multiobjective evolutionary algorithms (MOEAs) and is empirically studied on two sets of test instances. The experimental results suggest that classification based preselection can successfully improve the performance of these MOEAs. 
Classification Rule  Given a population whose members can be potentially separated into a number of different sets or classes, a classification rule is a procedure in which the elements of the population set are each assigned to one of the classes. A perfect test is such that every element in the population is assigned to the class it really belongs. An imperfect test is such that some errors appear, and then statistical analysis must be applied to analyse the classification. 
Classification Without Labels  Modern machine learning techniques can be used to construct powerful models for difficult collider physics problems. In many applications, however, these models are trained on imperfect simulations due to a lack of truthlevel information in the data, which risks the model learning artifacts of the simulation. In this paper, we introduce the paradigm of classification without labels (CWoLa) in which a classifier is trained to distinguish statistical mixtures of classes, which are common in collider physics. Crucially, neither individual labels nor class proportions are required, yet we prove that the optimal classifier in the CWoLa paradigm is also the optimal classifier in the traditional fullysupervised case where all label information is available. After demonstrating the power of this method in an analytical toy example, we consider a realistic benchmark for collider physics: distinguishing quark versus gluoninitiated jets using mixed quark/gluon training samples. More generally, CWoLa can be applied to any classification problem where labels or class proportions are unknown or simulations are unreliable, but statistical mixtures of the classes are available. 
Cleverhans  cleverhans is a software library that provides standardized reference implementations of adversarial example construction techniques and adversarial training. The library may be used to develop more robust machine learning models and to provide standardized benchmarks of models’ performance in the adversarial setting. Benchmarks constructed without a standardized implementation of adversarial example construction are not comparable to each other, because a good result may indicate a robust model or it may merely indicate a weak implementation of the adversarial example construction procedure. 
Clickstream Analytics  A clickstream is the recording of the parts of the screen a computer user clicks on while web browsing or using another software application. As the user clicks anywhere in the webpage or application, the action is logged on a client or inside the web server, as well as possibly the web browser, router, proxy server or ad server. Clickstream analysis is useful for web activity analysis, software testing, market research, and for analyzing employee productivity. 
ClickThrough Rate (CTR) 
Clickthrough rate (CTR) is a way of measuring the success of an online advertising campaign for a particular website as well as the effectiveness of an email campaign by the number of users that clicked on a specific link. 
client2vec  The workflow of data scientists normally involves potentially inefficient processes such as data mining, feature engineering and model selection. Recent research has focused on automating this workflow, partly or in its entirety, to improve productivity. We choose the former approach and in this paper share our experience in designing the client2vec: an internal library to rapidly build baselines for banking applications. Client2vec uses marginalized stacked denoising autoencoders on current account transactions data to create vector embeddings which represent the behaviors of our clients. These representations can then be used in, and optimized against, a variety of tasks such as client segmentation, profiling and targeting. Here we detail how we selected the algorithmic machinery of client2vec and the data it works on and present experimental results on several business cases. 
CLINIcal Question Answering system (CLINIQA) 
The recent developments in the field of biomedicine have made large volumes of biomedical literature available to the medical practitioners. Due to the large size and lack of efficient searching strategies, medical practitioners struggle to obtain necessary information available in the biomedical literature. Moreover, the most sophisticated search engines of age are not intelligent enough to interpret the clinicians’ questions. These facts reflect the urgent need of an information retrieval system that accepts the queries from medical practitioners’ in natural language and returns the answers quickly and efficiently. In this paper, we present an implementation of a machine intelligence based CLINIcal Question Answering system (CLINIQA) to answer medical practitioner’s questions. The system was rigorously evaluated on different text mining algorithms and the best components for the system were selected. The system makes use of Unified Medical Language System for semantic analysis of both questions and medical documents. In addition, the system employs supervised machine learning algorithms for classification of the documents, identifying the focus of the question and answer selection. Effective domainspecific heuristics are designed for answer ranking. The performance evaluation on hundred clinical questions shows the effectiveness of our approach. 
Clipper  Machine learning is being deployed in a growing number of applications which demand realtime, accurate, and robust predictions under heavy query load. However, most machine learning frameworks and systems only address model training and not deployment. In this paper, we introduce Clipper, the first generalpurpose lowlatency prediction serving system. Interposing between enduser applications and a wide range of machine learning frameworks, Clipper introduces a modular architecture to simplify model deployment across frameworks. Furthermore, by introducing caching, batching, and adaptive model selection techniques, Clipper reduces prediction latency and improves prediction throughput, accuracy, and robustness without modifying the underlying machine learning frameworks. We evaluate Clipper on four common machine learning benchmark datasets and demonstrate its ability to meet the latency, accuracy, and throughput demands of online serving applications. Finally, we compare Clipper to the TensorFlow Serving system and demonstrate comparable prediction throughput and latency on a range of models while enabling new functionality, improved accuracy, and robustness. 
Closest Pair Problem  The closest pair of points problem or closest pair problem is a problem of computational geometry: given n points in metric space, find a pair of points with the smallest distance between them. The closest pair problem for points in the Euclidean plane was among the first geometric problems that were treated at the origins of the systematic study of the computational complexity of geometric algorithms. A naive algorithm of finding distances between all pairs of points in a space of dimension d and selecting the minimum requires O(n2) time. It turns out that the problem may be solved in O(n log n) time in a Euclidean space or Lp space of fixed dimension d. In the algebraic decision tree model of computation, the O(n log n) algorithm is optimal, by a reduction from the element uniqueness problem. In the computational model that assumes that the floor function is computable in constant time the problem can be solved in O(n log log n) time. If we allow randomization to be used together with the floor function, the problem can be solved in O(n) time. A New Algorithm for Finding Closest Pair of Vectors 
Cloud AutoML (AutoML) 
Train highquality custom machine learning models with minimum effort and machine learning expertise. • Train custom machine learning models: Cloud AutoML is a suite of machine learning products that enables developers with limited machine learning expertise to train highquality models specific to their business needs, by leveraging Google´s stateoftheart transfer learning, and Neural Architecture Search technology. • Stateoftheart performance: Use Cloud AutoML to leverage Google´s proprietary technology, which offers fast performance and accurate predictions. AutoML puts more than 10 years of Google Research technology in the hands of our users. • Get up and running fast: Cloud AutoML provides a simple graphical user interface (GUI) for you to train, evaluate, improve, and deploy models based on your own data. You´re only a few minutes away from your own custom machine learning model. • Generate highquality training data: You can use Google´s human labeling service to have reallife people annotate or clean your labels to make sure your models are being trained on highquality data. 
Cloud Data  The Difference Between Big Data and Cloud Data: New technologies are required for the emergence and standardization of cloud data to take hold. Big data was meant as a holding cell for large amounts of data that could be sorted effectively only by specialized data scientists (this is becoming easier with OLAP on Hadoop type tools). The protocols for big data rely upon simple, standard protocols and can’t be adjusted easily to meet the demands of complex operations. Big data takes time to sort through and analyze, whereas cloud data is immediate and happens in the background using the tremendous resources of cloud servers. Cloud data requires a significantly higher number of resources since it must connect to databases in several geographically distributed services. Since cloud data must flexibly interact with several unique interfaces and security models, the mechanisms used for big data won’t work for cloud data. 
Cloud SELENE (cSELENE) 
While working in collaborative team elsewhere sometimes the federated (huge) data are from heterogeneous cloud vendors. It is not only about the data privacy concern but also about how can those federated data can be querying from cloud directly in fast and securely way. Previous solution offered hybrid cloud between public and trusted private cloud. Another previous solution used encryption on MapReduce framework. But the challenge is we are working on heterogeneous clouds. In this paper, we present a novel technique for querying with privacy concern. Since we take execution time into account, our basic idea is to use the data mining model by partitioning the federated databases in order to reduce the search and query time. By using model of the database it means we use only the summary or the very characteristic patterns of the database. Modeling is the Preserving Privacy Stage I, since by modeling the data is being symbolized. We implement encryption on the database as preserving privacy Stage II. Our system, called ‘cSELENE’ (stands for ‘cloud SELENE’), is designed to handle federated data on heterogeneous clouds: AWS, Microsoft Azure, and Google Cloud Platform with MapReduce technique. In this paper we discuss preservingprivacy system and threat model, the format of federated data, the parallel programming (GPU programming and shared/memory systems), the parallel and secure algorithm for data mining model in distributed cloud, the cloud infrastructure/architecture, and the UIX design of the cSELENE system. Other issues such as incremental method and the secure design of cloud architecture system (Virtual Machines across platform design) are still open to discuss. Our experiments should demonstrate the validity and practicality of the proposed high performance computing scheme. 
CLSTM  Neural network models have been demonstrated to be capable of achieving remarkable performance in sentence and document modeling. Convolutional neural network (CNN) and recurrent neural network (RNN) are two mainstream architectures for such modeling tasks, which adopt totally different ways of understanding natural languages. In this work, we combine the strengths of both architectures and propose a novel and unified model called CLSTM for sentence representation and text classification. CLSTM utilizes CNN to extract a sequence of higherlevel phrase representations, and are fed into a long shortterm memory recurrent neural network (LSTM) to obtain the sentence representation. CLSTM is able to capture both local features of phrases as well as global and temporal sentence semantics. We evaluate the proposed architecture on sentiment classification and question classification tasks. The experimental results show that the CLSTM outperforms both CNN and LSTM and can achieve excellent performance on these tasks. 
Clued Recurrent Attention Model (CRAM) 
To overcome the poor scalability of convolutional neural network, recurrent attention model(RAM) selectively choose what and where to look on the image. By directing recurrent attention model how to look the image, RAM can be even more successful in that the given clue narrow down the scope of the possible focus zone. In this perspective, this work proposes clued recurrent attention model (CRAM) which add clue or constraint on the RAM better problem solving. CRAM follows encoderdecoder framework, encoder utilizes recurrent attention model with spatial transformer network and decoder which varies depending on the task. To ensure the performance, CRAM tackles two computer vision task. One is the image classification task, with clue given as the binary image saliency which indicates the approximate location of object. The other is the inpainting task, with clue given as binary mask which indicates the occluded part. In both tasks, CRAM shows better performance than existing methods showing the successful extension of RAM. 
Cluster Validation  There are many cluster analysis methods that can produce quite different clusterings on the same dataset. Cluster validation is about the evaluation of the quality of a clustering; ‘relative cluster validation’ is about using such criteria to compare clusterings. This can be used to select one of a set of clusterings from different methods, or from the same method ran with different parameters such as different numbers of clusters. There are many cluster validation indexes in the literature. Most of them attempt to measure the overall quality of a clustering by a single number, but this can be inappropriate. There are various different characteristics of a clustering that can be relevant in practice, depending on the aim of clustering, such as low withincluster distances and high betweencluster separation. 
Clustered Latent Dirichlet Allocation (CLDA) 
The allrelevant problem of feature selection is the identification of all strongly and weakly relevant attributes. This problem is especially hard to solve for time series classification and regression in industrial applications such as predictive maintenance or production line optimization, for which each label or regression target is associated with several time series and metainformation simultaneously. Here, we are proposing an efficient, scalable feature extraction algorithm, which filters the available features in an early stage of the machine learning pipeline with respect to their significance for the classification or regression task, while controlling the expected percentage of selected but irrelevant features. The proposed algorithm combines established feature extraction methods with a feature importance filter. It has a low computational complexity, allows to start on a problem with only limited domain knowledge available, can be trivially parallelized, is highly scalable and based on well studied nonparametric hypothesis tests. We benchmark our proposed algorithm on all binary classification problems of the UCR time series classification archive as well as time series from a production line optimization project and simulated stochastic processes with underlying qualitative change of dynamics. 
Clustered Sparrow Algorithm  The clustered Sparrow algorithm 
Clustering / Cluster Analysis  Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense or another) to each other than to those in other groups (clusters). It is a main task of exploratory data mining, and a common technique for statistical data analysis, used in many fields, including machine learning, pattern recognition, image analysis, information retrieval, and bioinformatics. Cluster analysis itself is not one specific algorithm, but the general task to be solved. It can be achieved by various algorithms that differ significantly in their notion of what constitutes a cluster and how to efficiently find them. Popular notions of clusters include groups with small distances among the cluster members, dense areas of the data space, intervals or particular statistical distributions. Clustering can therefore be formulated as a multiobjective optimization problem. The appropriate clustering algorithm and parameter settings (including values such as the distance function to use, a density threshold or the number of expected clusters) depend on the individual data set and intended use of the results. Cluster analysis as such is not an automatic task, but an iterative process of knowledge discovery or interactive multiobjective optimization that involves trial and failure. It will often be necessary to modify data preprocessing and model parameters until the result achieves the desired properties. 
Clustering Using REpresentatives (CURE) 
CURE (Clustering Using REpresentatives) is an efficient data clustering algorithm for large databases that is more robust to outliers and identifies clusters having nonspherical shapes and wide variances in size. 
Clustering Validation Indices  The purpose of clustering is to determine the intrinsic grouping in a set of unlabeled data, where the objects in each group are indistinguishable under some criterion of similarity. Clustering is an unsupervised classification process fundamental to data mining (one of the most important tasks in data analysis). It has applications in several fields like bioinformatics, web data analysis, text mining and scientific data exploration. Clustering refers to unsupervised learning and, for that reason it has no a priori data set information. However, to get good results, the clustering algorithm depends on input parameters. For instance, kmeans and CURE algorithms require a number of clusters (k) to be created. In this sense, the question is: What is the optimal number of clusters? Currently, cluster validity indexes research has drawn attention as a means to give a solution. Many different cluster validity methods have been proposed without any a priori class information. Clustering validation is a technique to find a set of clusters that best fits natural partitions (number of clusters) without any class information. Generally speaking, there are two types of clustering techniques, which are based on external criteria and internal criteria. · External validation: Based on previous knowledge about data. · Internal validation: Based on the information intrinsic to the data alone. If we consider these two types of cluster validation to determine the correct number of groups from a dataset, one option is to use external validation indexes for which a priori knowledge of dataset information is required, but it is hard to say if they can be used in real problems (usually, real problems do not have prior information of the dataset in question). Another option is to use internal validity indexes which do not require a priori information from dataset. 
ClusterNet  Clustering using neural networks has recently demonstrated promising performance in machine learning and computer vision applications. However, the performance of current approaches is limited either by unsupervised learning or their dependence on large set of labeled data samples. In this paper, we propose ClusterNet that uses pairwise semantic constraints from very few labeled data samples (< 5% of total data) and exploits the abundant unlabeled data to drive the clustering approach. We define a new loss function that uses pairwise semantic similarity between objects combined with constrained kmeans clus tering to efficiently utilize both labeled and unlabeled data in the same framework. The proposed network uses convolution autoencoder to learn a latent representation that groups data into k specified clusters, while also learning the cluster centers simultaneously. We evaluate and com pare the performance of ClusterNet on several datasets and state of the art deep clustering approaches. 
ClusterWeighted Latent Class Modeling  Usually in Latent Class Analysis (LCA), external predictors are taken to be cluster conditional probability predictors (LC models with covariates), and/or score conditional probability predictors (LC regression models). In such cases, their distribution is not of interest. Class specific distribution is of interest in the distal outcome model, when the distribution of the external variable(s) is assumed to dependent on LC membership. In this paper, we consider a more general formulation, typical in clusterweighted models, which embeds both the latent class regression and the distal outcome models. This allows us to test simultaneously both whether the distribution of the covariate(s) differs across classes, and whether there are significant direct effects of the covariate(s) on the indicators, by including most of the information about the covariate(s) – latent variable relationship. We show the advantages of the proposed modeling approach through a set of population studies and an empirical application on assets ownership of Italian households. 
Clusterwise Linear Regression (CLR) 
➚ “ClusterWise Linear Regression” 
ClusterWise Linear Regression (CLR) 
Clusterwise linear regression (CLR), a clustering problem intertwined with regression, is to find clusters of entities such that the overall sum of squared errors from regressions performed over these clusters is minimized, where each cluster may have different variances. 
Clustrophile 2  Data clustering is a common unsupervised learning method frequently used in exploratory data analysis. However, identifying relevant structures in unlabeled, highdimensional data is nontrivial, requiring iterative experimentation with clustering parameters as well as data features and instances. The space of possible clusterings for a typical dataset is vast, and navigating in this vast space is also challenging. The absence of groundtruth labels makes it impossible to define an optimal solution, thus requiring user judgment to establish what can be considered a satisfiable clustering result. Data scientists need adequate interactive tools to effectively explore and navigate the large space of clusterings so as to improve the effectiveness of exploratory clustering analysis. We introduce \textit{Clustrophile 2}, a new interactive tool for guided clustering analysis. \textit{Clustrophile 2} guides users in clusteringbased exploratory analysis, adapts user feedback to improve user guidance, facilitates the interpretation of clusters, and helps quickly reason about differences between clusterings. To this end, \textit{Clustrophile 2} contributes a novel feature, the clustering tour, to help users choose clustering parameters and assess the quality of different clustering results in relation to current analysis goals and user expectations. We evaluate \textit{Clustrophile 2} through a user study with 12 data scientists, who used our tool to explore and interpret subcohorts in a dataset of Parkinson’s disease patients. Results suggest that \textit{Clustrophile 2} improves the speed and effectiveness of exploratory clustering analysis for both experts and nonexperts. 
CN2 Induction Algorithm  The CN2 induction algorithm is a learning algorithm for rule induction. It is designed to work even when the training data is imperfect. It is based on ideas from the AQ algorithm and the ID3 algorithm. As a consequence it creates a rule set like that created by AQ but is able to handle noisy data like ID3. 
CoarsetoFine Context Memory (CFCM) 
Recent neuralnetworkbased architectures for image segmentation make extensive usage of feature forwarding mechanisms to integrate information from multiple scales. Although yielding good results, even deeper architectures and alternative methods for feature fusion at different resolutions have been scarcely investigated for medical applications. In this work we propose to implement segmentation via an encoderdecoder architecture which differs from any other previously published method since (i) it employs a very deep architecture based on residual learning and (ii) combines features via a convolutional Long Short Term Memory (LSTM), instead of concatenation or summation. The intuition is that the memory mechanism implemented by LSTMs can better integrate features from different scales through a coarsetofine strategy; hence the name CoarsetoFine Context Memory (CFCM). We demonstrate the remarkable advantages of this approach on two datasets: the Montgomery county lung segmentation dataset, and the EndoVis 2015 challenge dataset for surgical instrument segmentation. 
COBRA  Clustering is inherently illposed: there often exist multiple valid clusterings of a single dataset, and without any additional information a clustering system has no way of knowing which clustering it should produce. This motivates the use of constraints in clustering, as they allow users to communicate their interests to the clustering system. Active constraintbased clustering algorithms select the most useful constraints to query, aiming to produce a good clustering using as few constraints as possible. We propose COBRA, an active method that first overclusters the data by running Kmeans with a $K$ that is intended to be too large, and subsequently merges the resulting small clusters into larger ones based on pairwise constraints. In its merging step, COBRA is able to keep the number of pairwise queries low by maximally exploiting constraint transitivity and entailment. We experimentally show that COBRA outperforms the state of the art in terms of clustering quality and runtime, without requiring the number of clusters in advance. 
COBRAS  Constraintbased clustering algorithms exploit background knowledge to construct clusterings that are aligned with the interests of a particular user. This background knowledge is often obtained by allowing the clustering system to pose pairwise queries to the user: should these two elements be in the same cluster or not? Active clustering methods aim to minimize the number of queries needed to obtain a good clustering by querying the most informative pairs first. Ideally, a user should be able to answer a couple of these queries, inspect the resulting clustering, and repeat these two steps until a satisfactory result is obtained. We present COBRAS, an approach to active clustering with pairwise constraints that is suited for such an interactive clustering process. A core concept in COBRAS is that of a superinstance: a local region in the data in which all instances are assumed to belong to the same cluster. COBRAS constructs such superinstances in a topdown manner to produce highquality results early on in the clustering process, and keeps refining these superinstances as more pairwise queries are given to get more detailed clusterings later on. We experimentally demonstrate that COBRAS produces good clusterings at fast run times, making it an excellent candidate for the iterative clustering scenario outlined above. 
COBRASTS  Clustering is ubiquitous in data analysis, including analysis of time series. It is inherently subjective: different users may prefer different clusterings for a particular dataset. Semisupervised clustering addresses this by allowing the user to provide examples of instances that should (not) be in the same cluster. This paper studies semisupervised clustering in the context of time series. We show that COBRAS, a stateoftheart semisupervised clustering method, can be adapted to this setting. We refer to this approach as COBRASTS. An extensive experimental evaluation supports the following claims: (1) COBRASTS far outperforms the current state of the art in semisupervised clustering for time series, and thus presents a new baseline for the field; (2) COBRASTS can identify clusters with separated components; (3) COBRASTS can identify clusters that are characterized by small local patterns; (4) a small amount of semisupervision can greatly improve clustering quality for time series; (5) the choice of the clustering algorithm matters (contrary to earlier claims in the literature). 
CochranArmitage Trend Test  The CochranArmitage test for trend, named for William Cochran and Peter Armitage, is used in categorical data analysis when the aim is to assess for the presence of an association between a variable with two categories and an ordinal variable with k categories. It modifies the Pearson chisquared test to incorporate a suspected ordering in the effects of the k categories of the second variable. For example, doses of a treatment can be ordered as ‘low’, ‘medium’, and ‘high’, and we may suspect that the treatment benefit cannot become smaller as the dose increases. The trend test is often used as a genotypebased test for casecontrol genetic association studies. CATTexact 
CochranMantelHaenszel Statistics  In statistics, the CochranMantelHaenszel statistics are a collection of test statistics used in the analysis of stratified categorical data. They are named after William G. Cochran, Nathan Mantel and William Haenszel. One of these test statistics is the CochranMantelHaenszel (CMH) test, which allows the comparison of two groups on a dichotomous/categorical response. It is used when the effect of the explanatory variable on the response variable is influenced by covariates that can be controlled. It is often used in observational studies where random assignment of subjects to different treatments cannot be controlled, but influencing covariates can. In the CMH test, the data are arranged in a series of associated 2 × 2 contingency tables, the null hypothesis is that the observed response is independent of the treatment used in any 2 × 2 contingency table. The CMH test’s use of associated 2 × 2 contingency tables increases the ability of the test to detect associations (the power of the test is increased). sensitivity2x2xk 
CochranMantelHaenszel Test  ➘ “CochranMantelHaenszel Statistics” samplesizeCMH 
Cocktail Algorithm  fastcox 
CoCoA  The scale of modern datasets necessitates the development of efficient distributed optimization methods for machine learning. We present a generalpurpose framework for distributed computing environments, CoCoA, that has an efficient communication scheme and is applicable to a wide variety of problems in machine learning and signal processing. We extend the framework to cover general nonstronglyconvex regularizers, including L1regularized problems like lasso, sparse logistic regression, and elastic net regularization, and show how earlier work can be derived as a special case. We provide convergence guarantees for the class of convex regularized loss minimization objectives, leveraging a novel approach in handling nonstronglyconvex regularizers and nonsmooth loss functions. The resulting framework has markedly improved performance over stateoftheart methods, as we illustrate with an extensive set of experiments on real distributed datasets. 
code2seq  The ability to generate natural language sequences from source code snippets can be used for code summarization, documentation, and retrieval. Sequencetosequence (seq2seq) models, adopted from neural machine translation (NMT), have achieved stateoftheart performance on these tasks by treating source code as a sequence of tokens. We present ${\rm {\scriptsize CODE2SEQ}}$: an alternative approach that leverages the syntactic structure of programming languages to better encode source code. Our model represents a code snippet as the set of paths in its abstract syntax tree (AST) and uses attention to select the relevant paths during decoding, much like contemporary NMT models. We demonstrate the effectiveness of our approach for two tasks, two programming languages, and four datasets of up to 16M examples. Our model significantly outperforms previous models that were specifically designed for programming languages, as well as general stateoftheart NMT models. 
Coded Distributed Computing  A New Combinatorial Design of Coded Distributed Computing 
Coded Fast Fourier Transform (Coded FFT) 
We consider the problem of computing the Fourier transform of highdimensional vectors, distributedly over a cluster of machines consisting of a master node and multiple worker nodes, where the worker nodes can only store and process a fraction of the inputs. We show that by exploiting the algebraic structure of the Fourier transform operation and leveraging concepts from coding theory, one can efficiently deal with the straggler effects. In particular, we propose a computation strategy, named as coded FFT, which achieves the optimal recovery threshold, defined as the minimum number of workers that the master node needs to wait for in order to compute the output. This is the first code that achieves the optimum robustness in terms of tolerating stragglers or failures for computing Fourier transforms. Furthermore, the reconstruction process for coded FFT can be mapped to MDS decoding, which can be solved efficiently. Moreover, we extend coded FFT to settings including computing general $n$dimensional Fourier transforms, and provide the optimal computing strategy for those settings. 
Coded TeraSort  We focus on sorting, which is the building block of many machine learning algorithms, and propose a novel distributed sorting algorithm, named Coded TeraSort, which substantially improves the execution time of the TeraSort benchmark in Hadoop MapReduce. The key idea of Coded TeraSort is to impose structured redundancy in data, in order to enable innetwork coding opportunities that overcome the data shuffling bottleneck of TeraSort. We empirically evaluate the performance of CodedTeraSort algorithm on Amazon EC2 clusters, and demonstrate that it achieves 1.97x – 3.39x speedup, compared with TeraSort, for typical settings of interest. 
CoDeepNEAT  The success of deep learning depends on finding an architecture to fit the task. As deep learning has scaled up to more challenging tasks, the architectures have become difficult to design by hand. This paper proposes an automated method, CoDeepNEAT, for optimizing deep learning architectures through evolution. By extending existing neuroevolution methods to topology, components, and hyperparameters, this method achieves results comparable to best human designs in standard benchmarks in object recognition and language modeling. It also supports building a realworld application of automated image captioning on a magazine website. Given the anticipated increases in available computing power, evolution of deep networks is promising approach to constructing deep learning applications in the future. 
Coefficient of Variation  In probability theory and statistics, the coefficient of variation (CV) is a normalized measure of dispersion of a probability distribution or frequency distribution. It is defined as the ratio of the standard deviation to the mean. It is also known as unitized risk or the variation coefficient. The absolute value of the CV is sometimes known as relative standard deviation (RSD), which is expressed as a percentage. 
Coevolutionary Neural Population Model  We present a method for using neural networks to model evolutionary population dynamics, and draw parallels to recent deep learning advancements in which adversariallytrained neural networks engage in coevolutionary interactions. We conduct experiments which demonstrate that models from evolutionary game theory are capable of describing the behavior of these neural population systems. 
CoffeeScript  CoffeeScript is a little language that compiles into JavaScript. Underneath that awkward Javaesque patina, JavaScript has always had a gorgeous heart. CoffeeScript is an attempt to expose the good parts of JavaScript in a simple way. The golden rule of CoffeeScript is: “It’s just JavaScript”. The code compiles onetoone into the equivalent JS, and there is no interpretation at runtime. You can use any existing JavaScript library seamlessly from CoffeeScript (and viceversa). The compiled output is readable and prettyprinted, will work in every JavaScript runtime, and tends to run as fast or faster than the equivalent handwritten JavaScript. 
Cogniculture  Research in Artificial Intelligence is breaking technology barriers every day. New algorithms and high performance computing are making things possible which we could only have imagined earlier. Though the enhancements in AI are making life easier for human beings day by day, there is constant fear that AI based systems will pose a threat to humanity. People in AI community have diverse set of opinions regarding the pros and cons of AI mimicking human behavior. Instead of worrying about AI advancements, we propose a novel idea of cognitive agents, including both human and machines, living together in a complex adaptive ecosystem, collaborating on human computation for producing essential social goods while promoting sustenance, survival and evolution of the agents’ life cycle. We highlight several research challenges and technology barriers in achieving this goal. We propose a governance mechanism around this ecosystem to ensure ethical behaviors of all cognitive agents. Along with a novel set of usecases of Cogniculture, we discuss the road map ahead for this journey. 
Cognitive Analytics  Cognitive Analytics: A hybrid of several disparate disciplines, methods, and practical technologies. 
Cognitive Architecture  A cognitive architecture can refer to a theory about the structure of the human mind. One of the main goals of a cognitive architecture is to summarize the various results of cognitive psychology in a comprehensive computer model. However, the results need to be in a formalized form so far that they can be the basis of a computer program. The formalized models can be used to further refine a comprehensive theory of cognition, and more immediately, as a commercially usable model. Successful cognitive architectures include ACTR (Adaptive Control of Thought, ACT), SOAR and OpenCog. 
Cognitive Bias  Cognitive biases are tendencies to think in certain ways. Cognitive biases can lead to systematic deviations from a standard of rationality or good judgment, and are often studied in psychology and behavioral economics. 
Cognitive Computing  Cognitive computing refers to the development of computer systems modeled after the human brain. Originally referred to as artificial intelligence, researchers began to use the modern term instead in the 1990s, to indicate that the science was designed to teach computers to think like a human mind, rather than developing an artificial system. This type of computing integrates technology and biology in an attempt to reengineer the brain, one of the most efficient and effective computers on Earth. Cognitive computing is a way of processing data that is neither linear nor deterministic. It uses the ideas behind neuroscience and psychology to augment human reasoning with better pattern matching while determining the optimal information a person needs to make decisions. Cognitive computing is different than other forms of software. Instead of shepherding data through predetermined pathways, it finds the previously unknown paths and patterns through the data. This is ultimately a more scalable model than relying on experts to synthesize data since there are too few experts of any sort available at any one time. Cognitive computing doesn’t try to fit data into an existing model; it looks at the data and figures out what the model is first. Cognitive Computing Cognitive Computing: Solving the Big Data Problem? Cognitive Computing Defined 
Cognitive Database  We propose Cognitive Databases, an approach for transparently enabling Artificial Intelligence (AI) capabilities in relational databases. A novel aspect of our design is to first view the structured data source as meaningful unstructured text, and then use the text to build an unsupervised neural network model using a Natural Language Processing (NLP) technique called word embedding. This model captures the hidden inter/intracolumn relationships between database tokens of different types. For each database token, the model includes a vector that encodes contextual semantic relationships. We seamlessly integrate the word embedding model into existing SQL query infrastructure and use it to enable a new class of SQLbased analytics queries called cognitive intelligence (CI) queries. CI queries use the model vectors to enable complex queries such as semantic matching, inductive reasoning queries such as analogies, predictive queries using entities not present in a database, and, more generally, using knowledge from external sources. We demonstrate unique capabilities of Cognitive Databases using an Apache Spark based prototype to execute inductive reasoning CI queries over a multimodal database containing text and images. We believe our firstofakind system exemplifies using AI functionality to endow relational databases with capabilities that were previously very hard to realize in practice. 
Cognitive Representation Learner (CogRL) 
A cognitive model of human learning provides information about skills a learner must acquire to perform accurately in a task domain. Cognitive models of learning are not only of scientific interest, but are also valuable in adaptive online tutoring systems. A more accurate model yields more effective tutoring through better instructional decisions. Prior methods of automated cognitive model discovery have typically focused on wellstructured domains, relied on student performance data or involved substantial human knowledge engineering. In this paper, we propose Cognitive Representation Learner (CogRL), a novel framework to learn accurate cognitive models in illstructured domains with no data and little to no human knowledge engineering. Our contribution is twofold: firstly, we show that representations learnt using CogRL can be used for accurate automatic cognitive model discovery without using any student performance data in several illstructured domains: Rumble Blocks, Chinese Character, and Article Selection. This is especially effective and useful in domains where an accurate humanauthored cognitive model is unavailable or authoring a cognitive model is difficult. Secondly, for domains where a cognitive model is available, we show that representations learned through CogRL can be used to get accurate estimates of skill difficulty and learning rate parameters without using any student performance data. These estimates are shown to highly correlate with estimates using student performance data on an Article Selection dataset. 
CogSciK  Computational models of decisionmaking must contend with the variance of context and any number of possible decisions that a defined strategic actor can make at a given time. Relying on cognitive science theory, the authors have created an algorithm that captures the orientation of the actor towards an object and arrays the possible decisions available to that actor based on their given intersubjective orientation. This algorithm, like a traditional Kmeans clustering algorithm, relies on a coreperiphery structure that gives the likelihood of moves as those closest to the cluster’s centroid. The result is an algorithm that enables unsupervised classification of an array of decision points belonging to an actor’s present state and deeply rooted in cognitive science theory. 
Cohort Analysis  Cohort analysis is a subset of behavioral analytics that takes the data from a given eCommerce platform, web application, or online game and rather than looking at all users as one unit, it breaks them into related groups for analysis. These related groups, or cohorts, usually share common characteristics or experiences within a defined timespan. Cohort analysis allows a company to ‘see patterns clearly across the lifecycle of a customer (or user), rather than slicing across all customers blindly without accounting for the natural cycle that a customer undergoes.’ By seeing these patterns of time, a company can adapt and tailor its service to those specific cohorts. While cohort analysis is sometimes associated with a cohort study, they are different and should not be viewed as one in the same. Cohort analysis has come to describe specifically the analysis of cohorts in regards to big data and business analytics, while a cohort study is a more general umbrella term that describes a type of study in which data is broken down into similar groups. 
Coincidence Analysis (CNA) 
CNA, a Boolean method of causal analysis presented in Baumgartner (2009a). CNA is a configurationl comparative method for the identification of complex causal dependenciesin particular, causal chains and common cause structuresin configurational data. CNA is related to QCA (Ragin 2008), but contrary to the latter does not minimize sufficient and necessary conditions by means of Quine McCluskey optimization, but based on its own custombuilt optimization algorithm. The latter greatly facilitates the analysis of data featuring chainlike causal dependencies among the conditions of an ultimate outcome. http://…/infer_c.pdf http://…/baumgartnerthiem.pdf cna 
Cointegration  The term cointegration was defined by Granger (1983) as a formulation of the phenomenon that nonstationary processes can have linear combinations that are stationary. It was his investigations of the relation between cointegration and error correction that brought modelling of vector autoregressions with unit roots and cointegration to the center of attention in applied and theoretical econometrics; see Engle and Granger (1987). Cointegration is a statistical property of time series variables. Cointegration has become an important property in contemporary time series analysis. Time series often have trends – either deterministic or stochastic. In a seminal paper, Charles Nelson and Charles Plosser (1982) showed that most time series have stochastic trends – these are also called unit root processes, or processes integrated of order 1I(1). http://…/Cointegration 
COLA  Decentralized machine learning is a promising emerging paradigm in view of global challenges of data ownership and privacy. We consider learning of linear classification and regression models, in the setting where the training data is decentralized over many user devices, and the learning algorithm must run ondevice, on an arbitrary communication network, without a central coordinator. We propose COLA, a new decentralized training algorithm with strong theoretical guarantees and superior practical performance. Our framework overcomes many limitations of existing methods, and achieves communication efficiency, scalability, elasticity as well as resilience to changes in data and participating devices. 
coLaboratory Project  coLaboratory Project, a new tool for data science and analysis, designed to make collaborating on data easier. coLaboratory merges successful open source products with Google technologies, enabling multiple people to collaborate directly through simultaneous access and analysis of data. This provides a big improvement over adhoc workflows involving emailing documents back and forth. 
ColdStart Aware Attention (CSAA) 
➘ “Hybrid Contextualized Sentiment Classifier” 
Collaborative Blackbox and RUle Set Hybrid (CoBRUSH) 
Interpretable machine learning models have received increasing interest in recent years, especially in domains where humans are involved in the decisionmaking process. However, the possible loss of the task performance for gaining interpretability is often inevitable. This performance downgrade puts practitioners in a dilemma of choosing between a topperforming blackbox model with no explanations and an interpretable model with unsatisfying task performance. In this work, we propose a novel framework for building a Hybrid Decision Model that integrates an interpretable model with any blackbox model to introduce explanations in the decision making process while preserving or possibly improving the predictive accuracy. We propose a novel metric, explainability, to measure the percentage of data that are sent to the interpretable model for decision. We also design a principled objective function that considers predictive accuracy, model interpretability, and data explainability. Under this framework, we develop Collaborative Blackbox and RUle Set Hybrid (CoBRUSH) model that combines logic rules and any blackbox model into a joint decision model. An input instance is first sent to the rules for decision. If a rule is satisfied, a decision will be directly generated. Otherwise, the blackbox model is activated to decide on the instance. To train a hybrid model, we design an efficient search algorithm that exploits theoretically grounded strategies to reduce computation. Experiments show that CoBRUSH models are able to achieve same or better accuracy than their blackbox collaborator working alone while gaining explainability. They also have smaller model complexity than interpretable baselines. 
Collaborative Compressive Sensing  We propose a collaborative compressive sensing (CCS) framework consisting of a bank of $K$ compressive sensing (CS) systems that share the same sensing matrix but have different sparsifying dictionaries. This CCS system is guaranteed to yield better performance than each individual CS system in a statistical sense, while with the parallel computing strategy, it requires the same time as that needed for each individual CS system to conduct compression and signal recovery. We then provide an approach to designing optimal CCS systems by utilizing a measure that involves both the sensing matrix and dictionaries and hence allows us to simultaneously optimize the sensing matrix and all the $K$ dictionaries under the same scheme. An alternating minimizationbased algorithm is derived for solving the corresponding optimal design problem. We provide a rigorous convergence analysis to show that the proposed algorithm is convergent. Experiments with real images are carried out and show that the proposed CCS system significantly improves on existing CS systems in terms of the signal recovery accuracy. 
Collaborative Cross Network (CoNet) 
The crossdomain recommendation technique is an effective way of alleviating the data sparsity in recommender systems by leveraging the knowledge from relevant domains. Transfer learning is a class of algorithms underlying these techniques. In this paper, we propose a novel transfer learning approach for crossdomain recommendation by using neural networks as the base model. We assume that hidden layers in two base networks are connected by cross mappings, leading to the collaborative cross networks (CoNet). CoNet enables dual knowledge transfer across domains by introducing cross connections from one base network to another and vice versa. CoNet is achieved in multilayer feedforward networks by adding dual connections and joint loss functions, which can be trained efficiently by backpropagation. The proposed model is evaluated on two realworld datasets and it outperforms baseline models by relative improvements of 3.56\% in MRR and 8.94\% in NDCG, respectively. 
Collaborative Deep Learning (CDL) 
Collaborative filtering (CF) is a successful approach commonly used by many recommender systems. Conventional CFbased methods use the ratings given to items by users as the sole source of information for learning to make recommendation. However, the ratings are often very sparse in many applications, causing CFbased methods to degrade significantly in their recommendation performance. To address this sparsity problem, auxiliary information such as item content information may be utilized. Collaborative topic regression (CTR) is an appealing recent method taking this approach which tightly couples the two components that learn from two different sources of information. Nevertheless, the latent representation learned by CTR may not be very effective when the auxiliary information is very sparse. To address this problem, we generalize recent advances in deep learning from i.i.d. input to noni.i.d. (CFbased) input and propose in this paper a hierarchical Bayesian model called collaborative deep learning (CDL), which jointly performs deep representation learning for the content information and collaborative filtering for the ratings (feedback) matrix. Extensive experiments on three realworld datasets from different domains show that CDL can significantly advance the state of the art. GitXiv 
Collaborative Deep Reinforcement Learning (CDRL) 
Besides independent learning, human learning process is highly improved by summarizing what has been learned, communicating it with peers, and subsequently fusing knowledge from different sources to assist the current learning goal. This collaborative learning procedure ensures that the knowledge is shared, continuously refined, and concluded from different perspectives to construct a more profound understanding. The idea of knowledge transfer has led to many advances in machine learning and data mining, but significant challenges remain, especially when it comes to reinforcement learning, heterogeneous model structures, and different learning tasks. Motivated by human collaborative learning, in this paper we propose a collaborative deep reinforcement learning (CDRL) framework that performs adaptive knowledge transfer among heterogeneous learning agents. Specifically, the proposed CDRL conducts a novel deep knowledge distillation method to address the heterogeneity among different learning tasks with a deep alignment network. Furthermore, we present an efficient collaborative Asynchronous Advantage ActorCritic (cA3C) algorithm to incorporate deep knowledge distillation into the online training of agents, and demonstrate the effectiveness of the CDRL framework using extensive empirical evaluation on OpenAI gym. 
Collaborative Filtering (CF) 
Collaborative filtering (CF) is a technique used by some recommender systems. Collaborative filtering has two senses, a narrow one and a more general one. In general, collaborative filtering is the process of filtering for information or patterns using techniques involving collaboration among multiple agents, viewpoints, data sources, etc. In the newer, narrower sense, collaborative filtering is a method of making automatic predictions (filtering) about the interests of a user by collecting preferences or taste information from many users (collaborating). (also called “peopletopeople correlation”) 
Collaborative Filtering – Neural Autoregressive Distribution Estimator (CFNADE) 
This paper proposes CFNADE, a neural autoregressive architecture for collaborative filtering (CF) tasks, which is inspired by the Restricted Boltzmann Machine (RBM) based CF model and the Neural Autoregressive Distribution Estimator (NADE). We first describe the basic CFNADE model for CF tasks. Then we propose to improve the model by sharing parameters between different ratings. A factored version of CFNADE is also proposed for better scalability. Furthermore, we take the ordinal nature of the preferences into consideration and propose an ordinal cost to optimize CFNADE, which shows superior performance. Finally, CFNADE can be extended to a deep model, with only moderately increased computational complexity. Experimental results show that CFNADE with a single hidden layer beats all previous stateoftheart methods on MovieLens 1M, MovieLens 10M, and Netflix datasets, and adding more hidden layers can further improve the performance. 
Collaborative Filtering with UserItem CoAutoregressive Models (CFUIcA) 
Besides the success on object recognition, machine translation and system control in games, (deep) neural networks have achieved stateoftheart results in collaborative filtering (CF) recently. Previous neural approaches for CF are either userbased or itembased, which cannot leverage all relevant information explicitly. We propose CFUIcA, a neural coautoregressive model for CF tasks, which exploit the structural autoregressiveness in the domains of both users and items. Furthermore, we separate the inherent dependence in this structure under a natural assumption and develop an efficient stochastic learning algorithm to handle large scale datasets. We evaluate CFUIcA on two popular benchmarks: MovieLens 1M and Netflix, and achieve stateoftheart predictive performance, which demonstrates the effectiveness of CFUIcA. 
Collaborative HumanAI (CHAI) 
Automated dermoscopic image analysis has witnessed rapid growth in diagnostic performance. Yet adoption faces resistance, in part, because no evidence is provided to support decisions. In this work, an approach for evidencebased classification is presented. A feature embedding is learned with CNNs, tripletloss, and global average pooling, and used to classify via kNN search. Evidence is provided as both the discovered neighbors, as well as localized image regions most relevant to measuring distance between query and neighbors. To ensure that results are relevant in terms of both label accuracy and human visual similarity for any skill level, a novel hierarchical triplet logic is implemented to jointly learn an embedding according to disease labels and nonexpert similarity. Results are improved over baselines trained on disease labels alone, as well as standard multiclass loss. Quantitative relevance of results, according to nonexpert similarity, as well as localized image regions, are also significantly improved. 
Collaborative Learning  We introduce collaborative learning in which multiple classifier heads of the same network are simultaneously trained on the same training data to improve generalization and robustness to label noise with no extra inference cost. It acquires the strengths from auxiliary training, multitask learning and knowledge distillation. There are two important mechanisms involved in collaborative learning. First, the consensus of multiple views from different classifier heads on the same example provides supplementary information as well as regularization to each classifier, thereby improving generalization. Second, intermediatelevel representation (ILR) sharing with backpropagation rescaling aggregates the gradient flows from all heads, which not only reduces training computational complexity, but also facilitates supervision to the shared layers. The empirical results on CIFAR and ImageNet datasets demonstrate that deep neural networks learned as a group in a collaborative way significantly reduce the generalization error and increase the robustness to label noise. 
Collaborative Memory Network (CMN) 
Recommendation systems play a vital role to keep users engaged with personalized content in modern online platforms. Deep learning has revolutionized many research fields and there is a recent surge of interest in applying it to collaborative filtering (CF). However, existing methods compose deep learning architectures with the latent factor model ignoring a major class of CF models, neighborhood or memorybased approaches. We propose Collaborative Memory Networks (CMN), a deep architecture to unify the two classes of CF models capitalizing on the strengths of the global structure of latent factor model and local neighborhoodbased structure in a nonlinear fashion. Motivated by the success of Memory Networks, we fuse a memory component and neural attention mechanism as the neighborhood component. The associative addressing scheme with the user and item memories in the memory module encodes complex useritem relations coupled with the neural attention mechanism to learn a useritem specific neighborhood. Finally, the output module jointly exploits the neighborhood with the user and item memories to produce the ranking score. Stacking multiple memory modules together yield deeper architectures capturing increasingly complex useritem relations. Furthermore, we show strong connections between CMN components, memory networks and the three classes of CF models. Comprehensive experimental results demonstrate the effectiveness of CMN on three public datasets outperforming competitive baselines. Qualitative visualization of the attention weights provide insight into the model’s recommendation process and suggest the presence of higher order interactions. 
Collective Adaptive Resourcesharing Markovian Agents (CARMA) 
In this paper we present CARMA, a language recently defined to support specification and analysis of collective adaptive systems. CARMA is a stochastic process algebra equipped with linguistic constructs specifically developed for modelling and programming systems that can operate in openended and unpredictable environments. This class of systems is typically composed of a huge number of interacting agents that dynamically adjust and combine their behaviour to achieve specific goals. A CARMA model, termed a collective, consists of a set of components, each of which exhibits a set of attributes. To model dynamic aggregations, which are sometimes referred to as ensembles, CARMA provides communication primitives that are based on predicates over the exhibited attributes. These predicates are used to select the participants in a communication. Two communication mechanisms are provided in the CARMA language: multicastbased and unicastbased. 
Collective And Point Anomalies (CAPA) 
The challenge of efficiently identifying anomalies in data sequences is an important statistical problem that now arises in many applications. Whilst there has been substantial work aimed at making statistical analyses robust to outliers, or point anomalies, there has been much less work on detecting anomalous segments, or collective anomalies. By bringing together ideas from changepoint detection and robust statistics, we introduce Collective And Point Anomalies (CAPA), a computationally efficient approach that is suitable when collective anomalies are characterised by either a change in mean, variance, or both, and distinguishes them from point anomalies. Theoretical results establish the consistency of CAPA at detecting collective anomalies and empirical results show that CAPA has close to linear computational cost as well as being more accurate at detecting and locating collective anomalies than other approaches. We demonstrate the utility of CAPA through its ability to detect exoplanets from light curve data from the Kepler telescope. 
Collective Intelligence (COIN) 
Collective Intelligence is shared or group intelligence that emerges from the collaboration, collective efforts, and competition of many individuals and appears in consensus decision making. The term appears in sociobiology, political science and in context of mass peer review and crowdsourcing applications. It may involve consensus, social capital and formalisms such as voting systems, social media and other means of quantifying mass activity. Collective IQ is a measure of collective intelligence, although it is often used interchangeably with the term collective intelligence. (‘Building new conclusions from independent contributors is really what collective intelligence is all about.’) 
Collocation  In corpus linguistics, a collocation is a sequence of words or terms that cooccur more often than would be expected by chance. In phraseology, collocation is a subtype of phraseme. An example of a phraseological collocation, as propounded by Michael Halliday, is the expression strong tea. While the same meaning could be conveyed by the roughly equivalent *powerful tea, this expression is considered incorrect by English speakers. Conversely, the corresponding expression for computer, powerful computers is preferred over *strong computers. Phraseological collocations should not be confused with idioms, where meaning is derived, whereas collocations are mostly compositional. There are about six main types of collocations: adjective+noun, noun+noun (such as collective nouns), verb+noun, adverb+adjective, verbs+prepositional phrase (phrasal verbs), and verb+adverb. Collocation extraction is a task that extracts collocations automatically from a corpus, using computational linguistics. 
Colors of Noise  In audio engineering, electronics, physics, and many other fields, the color of noise refers to the power spectrum of a noise signal (a signal produced by a stochastic process). Different colors of noise have significantly different properties: for example, as audio signals they will sound different to human ears, and as images they will have a visibly different texture. Therefore, each application typically requires noise of a specific color. This sense of ‘color’ for noise signals is similar to the concept of timbre in music (which is also called ‘tone color’); however the latter is almost always used for sound, and may consider very detailed features of the spectrum. The practice of naming kinds of noise after colors started with white noise, a signal whose spectrum has equal power within any equal interval of frequencies. That name was given by analogy with white light, which was (incorrectly) assumed to have such a flat power spectrum over the visible range. Other color names, like pink, red, and blue were then given to noise with other spectral profiles, often (but not always) in reference to the color of light with similar spectra. Some of those names have standard definitions in certain disciplines, while others are very informal and poorly defined. Many of these definitions assume a signal with components at all frequencies, with a power spectral density per unit of bandwidth proportional to 1/f ß and hence they are examples of powerlaw noise. For instance, the spectral density of white noise is flat (ß = 0), while flicker or pink noise has ß = 1, and Brownian noise has ß = 2. 
Columnoriented DBMS  A columnoriented DBMS is a database management system (DBMS) that stores data tables as sections of columns of data rather than as rows of data. In comparison, most relational DBMSs store data in rows. This columnoriented DBMS has advantages for data warehouses, customer relationship management (CRM) systems, and library card catalogs, and other ad hoc inquiry systems where aggregates are computed over large numbers of similar data items. It is possible to achieve some of the benefits of columnoriented and roworiented organization with any DBMSs. Denoting one as columnoriented refers to both the ease of expression of a columnoriented structure and the focus on optimizations for columnoriented workloads. This approach is in contrast to roworiented or row store databases and with correlation databases, which use a valuebased storage structure. 
Combinations of Mutually Exclusive Alterations (CoMEt) 
Cancer is a heterogeneous disease with different combinations of genetic and epigenetic alterations driving the development of cancer in different individuals. While these alterations are believed to converge on genes in key cellular signaling and regulatory pathways, our knowledge of these pathways remains incomplete, making it difficult to identify driver alterations by their recurrence across genes or known pathways. We introduce Combinations of Mutually Exclusive Alterations (CoMEt), an algorithm to identify combinations of alterations de novo, without any prior biological knowledge (e.g. pathways or protein interactions). CoMEt searches for combinations of mutations that exhibit mutual exclusivity, a pattern expected for mutations in pathways. CoMEt has several important feature that distinguish it from existing approaches to analyze mutual exclusivity among alterations. These include: an exact statistical test for mutual exclusivity that is more sensitive in detecting combinations containing rare alterations; simultaneous identification of collections of one or more combinations of mutually exclusive alterations; simultaneous analysis of subtypespecific mutations; and summarization over an ensemble of collections of mutually exclusive alterations. These features enable CoMEt to robustly identify alterations affecting multiple pathways, or hallmarks of cancer. 
Combinatorial Optimization  In applied mathematics and theoretical computer science, combinatorial optimization is a topic that consists of finding an optimal object from a finite set of objects. In many such problems, exhaustive search is not feasible. It operates on the domain of those optimization problems, in which the set of feasible solutions is discrete or can be reduced to discrete, and in which the goal is to find the best solution. Some common problems involving combinatorial optimization are the traveling salesman problem (“TSP”) and the minimum spanning tree problem (“MST”). 
Combined Series Approach (CSA) 
STMotif 
Comet.ml  Comet allows you to track, compare and collaborate on Machine Learning experiments. Use Comet.ml if you need a tool that: · Allows for hyper parameters, metrics, code, stdout tracking · Supports Keras, Tensorflow, PyTorch, scikitlearn out of the box and other libraries with the manual API. · Runs seamlessly on every machine including your laptop, AWS, Azure or company owned machines 
Common Cause Principle (CCP) 
It seems that a correlation between events A and B indicates either that A causes B, or that B causes A, or that A and B have a common cause. It also seems that causes always occur before their effects and, thus, that common causes always occur before the correlated events. Reichenbach was the first to formalize this idea rather precisely. 
Communication In Focus (CIF) 
A NLU technology that is based on a novel approach that does not require the predefinition of these terms. Rather, the design uses Context Discriminants to digest these new subjects based on prior understanding of the based language. Context Discriminant reduces complex documents into snippets of words of semantic neighbors consisting of context and points of views on subjects. Higher order derivatives are achieved by applying CD to the result produced by the prior CD. This approach enables us to refine the contexts on related subjects over distant semantic neighbors and to discover higher order dependent subjects that depict entity relationships between subjects. 
Community Detection  Communities are often defined in terms of the partition of the set of vertices, that is each node is put into one and only one community. This is a useful simplification and most community detection methods find this type of community structure. However in some cases a better representation could be one where vertices are in more than one community. This might happen in a social network where each vertex represents a person, and the communities represent the different groups of friends: one community for family, another community for coworkers, one for friends in the same sports club, and so on. The use of cliques for community detection discussed below is just one example of how such overlapping community structure can be found. ➘ “Complex Network” Community detection algorithms: a comparative analysis A Comparison of Community Detection Algorithms on Artificial Networks 
Community Trees  We introduce the concept of community trees that summarizes topological structures within a network. A community tree is a tree structure representing clique communities from the clique percolation method (CPM). The community tree also generates a persistent diagram. Community trees and persistent diagrams reveal topological structures of the underlying networks and can be used as visualization tools. We study the stability of community trees and derive a quantity called the total star number (TSN) that presents an upper bound on the change of community trees. Our findings provide a topological interpretation for the stability of communities generated by the CPM. 
Compact Trip Representation (CTR) 
We present a new Compact Trip Representation (CTR) that allows us to manage users’ trips (moving objects) over networks. These could be public transportation networks (buses, subway, trains, and so on) where nodes are stations or stops, or road networks where nodes are intersections. CTR represents the sequences of nodes and time instants in users’ trips. The spatial component is handled with a data structure based on the wellknown Compressed Suffix Array (CSA), which provides both a compact representation and interesting indexing capabilities. We also represent the temporal component of the trips, that is, the time instants when users visit nodes in their trips. We create a sequence with these time instants, which are then selfindexed with a balanced Wavelet Matrix (WM). This gives us the ability to solve rangeinterval queries efficiently. We show how CTR can solve relevant spatial and spatiotemporal queries over large sets of trajectories. Finally, we also provide experimental results to show the space requirements and query efficiency of CTR. 
Comparative Opinion Mining  Opinion mining refers to the use of natural language processing, text analysis and computational linguistics to identify and extract subjective information in textual material. Opinion mining, also known as sentiment analysis, has received a lot of attention in recent times, as it provides a number of tools to analyse the public opinion on a number of different topics. Comparative opinion mining is a subfield of opinion mining that deals with identifying and extracting information that is expressed in a comparative form (e.g.~’paper X is better than the Y’). Comparative opinion mining plays a very important role when ones tries to evaluate something, as it provides a reference point for the comparison. This paper provides a review of the area of comparative opinion mining. It is the first review that cover specifically this topic as all previous reviews dealt mostly with general opinion mining. This survey covers comparative opinion mining from two different angles. One from perspective of techniques and the other from perspective of comparative opinion elements. It also incorporates preprocessing tools as well as dataset that were used by the past researchers that can be useful to the future researchers in the field of comparative opinion mining. 
ComparisonBased Random Forest  Assume we are given a set of items from a general metric space, but we neither have access to the representation of the data nor to the distances between data points. Instead, suppose that we can actively choose a triplet of items (A,B,C) and ask an oracle whether item A is closer to item B or to item C. In this paper, we propose a novel random forest algorithm for regression and classification that relies only on such triplet comparisons. In the theory part of this paper, we establish sufficient conditions for the consistency of such a forest. In a set of comprehensive experiments, we then demonstrate that the proposed random forest is efficient both for classification and regression. In particular, it is even competitive with other methods that have direct access to the metric representation of the data. 
Competing Prediction Algorithm  Prediction is a wellstudied machine learning task, and prediction algorithms are core ingredients in online products and services. Despite their centrality in the competition between online companies who offer predictionbased products, the strategic use of prediction algorithms remains unexplored. The goal of this paper is to examine strategic use of prediction algorithms. We introduce a novel gametheoretic setting that is based on the PAC learning framework, where each player (aka a prediction algorithm at competition) seeks to maximize the sum of points for which it produces an accurate prediction and the others do not. We show that algorithms aiming at generalization may wittingly misspredict some points to perform better than others on expectation. We analyze the empirical game, i.e. the game induced on a given sample, prove that it always possesses a pure Nash equilibrium, and show that every betterresponse learning process converges. Moreover, our learningtheoretic analysis suggests that players can, with high probability, learn an approximate pure Nash equilibrium for the whole population using a small number of samples. 
Competing Risks  This form of analysis is known by its use of death certificates. In traditional overall survival analysis the cause of death is irrelevant to the analysis. In a competing risks survival analyses each death certificate is reviewed. If the disease of interest is cancer, and the person/patient dies of a car accident, the patient is labelled as censored at death, instead of being labelled as having died. Issues with this method arise as each hospital and or registry may code for causes of death differently. For example, there exists variability in the way a patient who has cancer and commits suicide is coded/labelled. In addition, if a patient has an eye removed due to an ocular cancer and dies getting hit while crossing the road because he didn’t see the car would often be considered to be censored rather than having died due to the cancer or its subsequent effects. ➘ “Survival Analysis” 
Competitive Analysis  Competitive analysis is a method invented for analyzing online algorithms, in which the performance of an online algorithm (which must satisfy an unpredictable sequence of requests, completing each request without being able to see the future) is compared to the performance of an optimal offline algorithm that can view the sequence of requests in advance. An algorithm is competitive if its competitive ratio – the ratio between its performance and the offline algorithm’s performance – is bounded. Unlike traditional worstcase analysis, where the performance of an algorithm is measured only for ‘hard’ inputs, competitive analysis requires that an algorithm perform well both on hard and easy inputs, where ‘hard’ and ‘easy’ are defined by the performance of the optimal offline algorithm. For many algorithms, performance is dependent not only on the size of the inputs, but also on their values. One such example is the quicksort algorithm, which sorts an array of elements. Such datadependent algorithms are analysed for averagecase and worstcase data. Competitive analysis is a way of doing worst case analysis for online and randomized algorithms, which are typically data dependent. In competitive analysis, one imagines an ‘adversary’ that deliberately chooses difficult data, to maximize the ratio of the cost of the algorithm being studied and some optimal algorithm. Adversaries range in power from the oblivious adversary, which has no knowledge of the random choices made by the algorithm pitted against it, to the adaptive adversary that has full knowledge of how an algorithm works and its internal state at any point during its execution. Note that this distinction is only meaningful for randomized algorithms. For a deterministic algorithm, either adversary can simply compute what state that algorithm must have at any time in the future, and choose difficult data accordingly. For example, the quicksort algorithm chooses one element, called the ‘pivot’, that is, on average, not too far from the center value of the data being sorted. Quicksort then separates the data into two piles, one of which contains all elements with value less than the value of the pivot, and the other containing the rest of the elements. If quicksort chooses the pivot in some deterministic fashion (for instance, always choosing the first element in the list), then it is easy for an adversary to arrange the data beforehand so that quicksort will perform in worstcase time. If, however, quicksort chooses some random element to be the pivot, then an adversary without knowledge of what random numbers are coming up cannot arrange the data to guarantee worstcase execution time for quicksort. The classic online problem first analysed with competitive analysis (Sleator & Tarjan 1985) is the list update problem: Given a list of items and a sequence of requests for the various items, minimize the cost of accessing the list where the elements closer to the front of the list cost less to access. (Typically, the cost of accessing an item is equal to its position in the list.) After an access, the list may be rearranged. Most rearrangements have a cost. The MoveToFront algorithm simply moves the requested item to the front after the access, at no cost. The Transpose algorithm swaps the accessed item with the item immediately before it, also at no cost. Classical methods of analysis showed that Transpose is optimal in certain contexts. In practice, MoveToFront performed much better. Competitive analysis was used to show that an adversary can make Transpose perform arbitrarily badly compared to an optimal algorithm, whereas MoveToFront can never be made to incur more than twice the cost of an optimal algorithm. In the case of online requests from a server, competitive algorithms are used to overcome uncertainties about the future. That is, the algorithm does not ‘know’ the future, while the imaginary adversary (the ‘competitor’) ‘knows’. Similarly, competitive algorithms were developed for distributed systems, where the algorithm has to react to a request arriving at one location, without ‘knowing’ what has just happened in a remote location. This setting was presented in (Awerbuch, Kutten & Peleg 1992). 
Competitive Dense Fully Convolutional Network (CDFNet) 
Increased information sharing through short and longrange skip connections between layers in fully convolutional networks have demonstrated significant improvement in performance for semantic segmentation. In this paper, we propose Competitive Dense Fully Convolutional Networks (CDFNet) by introducing competitive maxout activations in place of naive feature concatenation for inducing competition amongst layers. Within CDFNet, we propose two architectural contributions, namely competitive dense block (CDB) and competitive unpooling block (CUB) to induce competition at local and global scales for short and longrange skip connections respectively. This extension is demonstrated to boost learning of specialized subnetworks targeted at segmenting specific anatomies, which in turn eases the training of complex tasks. We present the proofofconcept on the challenging task of whole body segmentation in the publicly available VISCERAL benchmark and demonstrate improved performance over multiple learning and registration based stateoftheart methods. 
Competitive Intelligence (CI) 
Competitive intelligence is the action of defining, gathering, analyzing, and distributing intelligence about products, customers, competitors, and any aspect of the environment needed to support executives and managers making strategic decisions for an organization. Competitive intelligence essentially means understanding and learning what’s happening in the world outside your business so one can be as competitive as possible. It means learning as much as possibleas soon as possibleabout one’s industry in general, one’s competitors, or even one’s county’s particular zoning rules. In short, it empowers you to anticipate and face challenges head on. A more focused definition of CI regards it as the organizational function responsible for the early identification of risks and opportunities in the market before they become obvious. Experts also call this process the early signal analysis. This definition focuses attention on the difference between dissemination of widely available factual information (such as market statistics, financial reports, newspaper clippings) performed by functions such as libraries and information centers, and competitive intelligence which is a perspective on developments and events aimed at yielding a competitive edge. Competitive Intelligence and 6 Tips for Its Effective Use 
Competitive Learning  Competitive learning is a form of unsupervised learning in artificial neural networks, in which nodes compete for the right to respond to a subset of the input data. A variant of Hebbian learning, competitive learning works by increasing the specialization of each node in the network. It is well suited to finding clusters within data. Models and algorithms based on the principle of competitive learning include vector quantization and selforganising maps (Kohonen maps). https://…/handbookch7.html 
Competitive Pathway Network (CoPaNet) 
In the design of deep neural architectures, recent studies have demonstrated the benefits of grouping subnetworks into a larger network. For examples, the Inception architecture integrates multiscale subnetworks and the residual network can be regarded that a residual unit combines a residual subnetwork with an identity shortcut. In this work, we embrace this observation and propose the Competitive Pathway Network (CoPaNet). The CoPaNet comprises a stack of competitive pathway units and each unit contains multiple parallel residualtype subnetworks followed by a max operation for feature competition. This mechanism enhances the model capability by learning a variety of features in subnetworks. The proposed strategy explicitly shows that the features propagate through pathways in various routing patterns, which is referred to as pathway encoding of category information. Moreover, the crossblock shortcut can be added to the CoPaNet to encourage feature reuse. We evaluated the proposed CoPaNet on four object recognition benchmarks: CIFAR10, CIFAR100, SVHN, and ImageNet. CoPaNet obtained the stateoftheart or comparable results using similar amounts of parameters. The code of CoPaNet is available at: https://…/CoPaNet. 
Complementary Recommendations Using Adversarial Feature Transformer (CRAFT) 
Traditional approaches for complementary product recommendations rely on behavioral and nonvisual data such as customer coviews or cobuys. However, certain domains such as fashion are primarily visual. We propose a framework that harnesses visual cues in an unsupervised manner to learn the distribution of cooccurring complementary items in real world images. Our model learns a nonlinear transformation between the two manifolds of source and target complementary item categories (e.g., tops and bottoms in outfits). Given a large dataset of images containing instances of cooccurring object categories, we train a generative transformer network directly on the feature representation space by casting it as an adversarial optimization problem. Such a conditional generative model can produce multiple novel samples of complementary items (in the feature space) for a given query item. The final recommendations are selected from the closest real world examples to the synthesized complementary features. We apply our framework to the task of recommending complementary tops for a given bottom clothing item. The recommendations made by our system are diverse, and are favored by human experts over the baseline approaches. 
Complementary Temporal Action Proposal (CTAP) 
Temporal action proposal generation is an important task, akin to object proposals, temporal action proposals are intended to capture ‘clips’ or temporal intervals in videos that are likely to contain an action. Previous methods can be divided to two groups: sliding window ranking and actionness score grouping. Sliding windows uniformly cover all segments in videos, but the temporal boundaries are imprecise; grouping based method may have more precise boundaries but it may omit some proposals when the quality of actionness score is low. Based on the complementary characteristics of these two methods, we propose a novel Complementary Temporal Action Proposal (CTAP) generator. Specifically, we apply a Proposallevel Actionness Trustworthiness Estimator (PATE) on the sliding windows proposals to generate the probabilities indicating whether the actions can be correctly detected by actionness scores, the windows with high scores are collected. The collected sliding windows and actionness proposals are then processed by a temporal convolutional neural network for proposal ranking and boundary adjustment. CTAP outperforms stateoftheart methods on average recall (AR) by a large margin on THUMOS14 and ActivityNet 1.3 datasets. We further apply CTAP as a proposal generation method in an existing action detector, and show consistent significant improvements. 
Complete Representations GAN (CRGAN) 
Generating multiview images from a singleview input is an essential yet challenging problem. It has broad applications in vision, graphics, and robotics. Our study indicates that the widelyused generative adversarial network (GAN) may learn ‘incomplete’ representations due to the singlepathway framework: an encoderdecoder network followed by a discriminator network. We propose CRGAN to address this problem. In addition to the single reconstruction path, we introduce a generation sideway to maintain the completeness of the learned embedding space. The two learning pathways collaborate and compete in a parametersharing manner, yielding considerably improved generalization ability to ‘unseen’ dataset. More importantly, the twopathway framework makes it possible to combine both labeled and unlabeled data for selfsupervised learning, which further enriches the embedding space for realistic generations. The experimental results prove that CRGAN significantly outperforms stateoftheart methods, especially when generating from ‘unseen’ inputs in wild conditions. 
Complete Spatial Randomness (CSR) 
Complete spatial randomness (CSR) describes a point process whereby point events occur within a given study area in a completely random fashion. It is synonymous with a homogeneous spatial Poisson process. Such a process is modeled using only one parameter \rho, i.e. the density of points within the defined area. The term complete spatial randomness is commonly used in Applied Statistics in the context of examining certain point patterns, whereas in most other statistical contexts it is referred to the concept of a spatial Poisson process. 
Completed Partially Directed Acyclic Graph (CPDAG) 
➘ “Directed Acyclic Graph” 
CompleteLinkage Clustering  Completelinkage clustering is one of several methods of agglomerative hierarchical clustering. At the beginning of the process, each element is in a cluster of its own. The clusters are then sequentially combined into larger clusters until all elements end up being in the same cluster. At each step, the two clusters separated by the shortest distance are combined. The definition of ‘shortest distance’ is what differentiates between the different agglomerative clustering methods. In completelinkage clustering, the link between two clusters contains all element pairs, and the distance between clusters equals the distance between those two elements (one in each cluster) that are farthest away from each other. The shortest of these links that remains at any step causes the fusion of the two clusters whose elements are involved. The method is also known as farthest neighbour clustering. The result of the clustering can be visualized as a dendrogram, which shows the sequence of cluster fusion and the distance at which each fusion took place. 
Complex Adaptive System (CAS) 
Complexity theory is a relatively new field that began in the mid1980s at the Santa Fe Institute in New Mexico. Work at the Santa Fe Institute is usually presented as the study of Complex Adaptive Systems (CAS). The CAS movement is predominantly American, as opposed to the European “natural science” tradition in the area of cybernetics and systems. Like in cybernetics and systems theory, CAS shares the subject of general properties of complex systems across traditional disciplinary boundaries. However, CAS is distinguished by the extensive use of computer simulations as a research tool, and an emphasis on systems, such as markets or ecologies, which are less integrated or “organized” than the ones studied by the older tradition (e.g., organisms, machines and companies). 
Complex Correntropy  Recent studies have demonstrated that correntropy is an efficient tool for analyzing higherorder statistical moments in nonGaussian noise environments. Although correntropy has been used with complex data, no theoretical study was pursued to elucidate its properties, nor how to best use it for optimization. This paper presents a probabilistic interpretation for correntropy using complexvalued data called complex correntropy. A recursive solution for the maximum complex correntropy criterion (MCCC) is introduced based on a fixed point solution. This technique is applied to a simple system identification case study, and the results demonstrate prominent advantages when compared to the complex recursive least squares (RLS) algorithm. By using such probabilistic interpretation, correntropy can be applied to solve several problems involving complex data in a more straightforward way. Keywords: complexvalued data correntropy, maximum complex correntropy criterion, fixedpoint algorithm. 
Complex Event Processing (CEP) 
Event processing is a method of tracking and analyzing (processing) streams of information (data) about things that happen (events), and deriving a conclusion from them. Complex event processing, or CEP, is event processing that combines data from multiple sources to infer events or patterns that suggest more complicated circumstances. The goal of complex event processing is to identify meaningful events (such as opportunities or threats) and respond to them as quickly as possible. 
Complex Network  In the context of network theory, a complex network is a graph (network) with nontrivial topological features – features that do not occur in simple networks such as lattices or random graphs but often occur in graphs modelling real systems. The study of complex networks is a young and active area of scientific research inspired largely by the empirical study of realworld networks such as computer networks and social networks. 
Complex Network Classifier (CNC) 
Classifying large scale networks into several categories and distinguishing them according to their fine structures is of great importance with several applications in real life. However, most studies of complex networks focus on properties of a single network but seldom on classification, clustering, and comparison between different networks, in which the network is treated as a whole. Due to the nonEuclidean properties of the data, conventional methods can hardly be applied on networks directly. In this paper, we propose a novel framework of complex network classifier (CNC) by integrating network embedding and convolutional neural network to tackle the problem of network classification. By training the classifiers on synthetic complex network data and real international trade network data, we show CNC can not only classify networks in a high accuracy and robustness, it can also extract the features of the networks automatically. 
Complex Systems  Complex systems present problems both in mathematical modelling and philosophical foundations. The study of complex systems represents a new approach to science that investigates how relationships between parts give rise to the collective behaviors of a system and how the system interacts and forms relationships with its environment. The equations from which models of complex systems are developed generally derive from statistical physics, information theory and nonlinear dynamics and represent organized but unpredictable behaviors of natural systems that are considered fundamentally complex. The physical manifestations of such systems are difficult to define, so a common choice is to identify ‘the system’ with the mathematical information model rather than referring to the undefined physical subject the model represents. Such systems are used to model processes in computer science, biology, economics, physics, chemistry, and many other fields. It is also called complex systems theory, complexity science, study of complex systems, sciences of complexity, nonequilibrium physics, and historical physics. A variety of abstract theoretical complex systems is studied as a field of mathematics. The key problems of complex systems are difficulties with their formal modelling and simulation. From such a perspective, in different research contexts complex systems are defined on the basis of their different attributes. Since all complex systems have many interconnected components, the science of networks and network theory are important aspects of the study of complex systems. A consensus regarding a single universal definition of complex system does not yet exist. For systems that are less usefully represented with equations various other kinds of narratives and methods for identifying, exploring, designing and interacting with complex systems are used. 
ComplexValued Neural Network (CVNN) 
The complexvalued Neural Network is an extension of a (usual) realvalued neural network, whose input and output signals and parameters such as weights and thresholds are all complex numbers (the activation function is inevitably a complexvalued function). Neural Networks have been applied to various fields such as communication systems, image processing and speech recognition, in which complex numbers are often used through the Fourier Transformation. This indicates that complexvalued neural networks are useful. In addition, in the human brain, an action potential may have different pulse patterns, and the distance between pulses may be different. This suggests that introducing complex numbers representing phase and amplitude into neural networks is appropriate. In these years the complexvalued neural networks expand the application fields in image processing, computer vision, optoelectronic imaging, and communication and so on. The potentially wide applicability yields new aspects of theories required for novel or more effective functions and mechanisms. 
Complier Average Causal Effects (CACE) 
Typically, studies analyze data based on treatment assignment rather than treatment received. This focus on assignment is called an intentiontotreat (ITT) analysis. In a policy environment, the ITT may make a lot of sense; we are answering this specific question: ‘What is the overall effect in the real world where the intervention is made available yet some people take advantage of it while others do not?’ Alternatively, researchers may be interested in different question: ‘What is the causal effect of actually receiving the treatment?’ Now, to answer the second question, there are numerous subtle issues that you need to wrestle with (again, go take the course). But, long story short, we need to (1) identify the folks in the intervention group who actually do what they have been encouraged to do (receive the intervention) but only because they were encouraged, and not because they would have received the intervention anyways had they not been randomized, and compare their outcomes with (2) the folks in the control group who did not seek out the intervention on their own initiative but would have received the intervention had they been encouraged. These two groups are considered to be compliers – they would always do what they are told in the context of the study. And the effect of the intervention that is based on outcomes from this type of patient is called the complier average causal effect (CACE). 
Component Lasso Method  We propose a new sparse regression method called the component lasso, based on a simple idea. The method uses the connectedcomponents structure of the sample covariance matrix to split the problem into smaller ones. It then applies the lasso to each subproblem separately, obtaining a coefficient vector for each one. Finally, it uses nonnegative least squares to recombine the different vectors into a single solution. This step is useful in selecting and reweighting components that are correlated with the response. Simulated and real data examples show that the component lasso can outperform standard regression methods such as the lasso and elastic net, achieving a lower mean squared error as well as better support recovery. The modular structure also lends itself naturally to parallel computation. 
Composable Preprocessing Operators (CPO) 
Toolset that enriches ‘mlr’ with a diverse set of preprocessing operators. Composable Preprocessing Operators (‘CPO’s) are firstclass R objects that can be applied to data.frames and ‘mlr’ ‘Task’s to modify data, can be attached to ‘mlr’ ‘Learner’s to add preprocessing to machine learning algorithms, and can be composed to form preprocessing pipelines. mlrCPO 
Composite Gaussian Process Models (CGP) 
A new type of nonstationary Gaussian process model is devel oped for approximating computationally expensive functions. The new model is a composite of two Gaussian processes, where the first one captures the smooth global trend and the second one models lo cal details. The new predictor also incorporates a flexible variance model, which makes it more capable of approximating surfaces with varying volatility. Compared to the commonly used stationary Gaus sian process model, the new predictor is numerically more stable and can more accurately approximate complex surfaces when the experi mental design is sparse. In addition, the new model can also improve the prediction intervals by quantifying the change of local variability associated with the response. 
Composite Indicator (COIN) 
A composite indicator is formed when individual indicators are compiled into a single index, on the basis of an underlying model of the multidimensional concept that is being measured. A composite indicator measures multidimensional concepts (e.g. competitiveness, etrade or environmental quality) which cannot be captured by a single indicator. Ideally, a composite indicator should be based on a theoretical framework / definition, which allows individual indicators / variables to be selected, combined and weighted in a manner which reflects the dimensions or structure of the phenomena being measured. 
Composite Quantile Regression (CQR) 

Compositional Data  In statistics, compositional data are quantitative descriptions of the parts of some whole, conveying exclusively relative information. This definition, given by John Aitchison (1986) has several consequences: · A compositional data point, or composition for short, can be represented by a positive real vector with as many parts as considered. Sometimes, if the total amount is fixed and known, one component of the vector can be omitted. · As compositions only carry relative information, the only information is given by the ratios between components. Consequently, a composition multiplied by any positive constant contains the same information as the former. Therefore, proportional positive vectors are equivalent when considered as compositions. · As usual in mathematics, equivalent classes are represented by some element of the class, called a representative. Thus, equivalent compositions can be represented by positive vectors whose components add to a given constant kappa. The vector operation assigning the constant sum representative is called closure, where D is the number of parts (components) and denotes a row vector. · Compositional data can be represented by constant sum real vectors with positive components, and this vectors span a simplex. 
Compositional Data Analysis (CoDa) 
Compositional data analysis deals with situations where the relevant information is contained only in the ratios between the measured variables, and not in the reported values. Compositional data analysis usually deals with relative information between parts where the total (abundances, mass, amount, etc.) is unknown or uninformative. A Concise Guide to Compositional Data Analysis Compositional,compositions 
Compositional GAN  Generative Adversarial Networks (GANs) can produce images of surprising complexity and realism, but are generally modeled to sample from a single latent source ignoring the explicit spatial interaction between multiple entities that could be present in a scene. Capturing such complex interactions between different objects in the world, including their relative scaling, spatial layout, occlusion, or viewpoint transformation is a challenging problem. In this work, we propose to model object composition in a GAN framework as a selfconsistent compositiondecomposition network. Our model is conditioned on the object images from their marginal distributions to generate a realistic image from their joint distribution by explicitly learning the possible interactions. We evaluate our model through qualitative experiments and user evaluations in both the scenarios when either paired or unpaired examples for the individual object images and the joint scenes are given during training. Our results reveal that the learned model captures potential interactions between the two object domains given as input to output new instances of composed scene at test time in a reasonable fashion. 
Compositional Pattern Producing Network (DPPN) 
Compositional patternproducing networks (CPPNs) are a variation of artificial neural networks (ANNs) that differ in their set of activation functions and how they are applied. While ANNs often contain only sigmoid functions and sometimes Gaussian functions, CPPNs can include both types of functions and many others. The choice of functions for the canonical set can be biased toward specific types of patterns and regularities. For example, periodic functions such as sine produce segmented patterns with repetitions, while symmetric functions such as Gaussian produce symmetric patterns. Linear functions can be employed to produce linear or fractallike patterns. Thus, the architect of a CPPNbased genetic art system can bias the types of patterns it generates by deciding the set of canonical functions to include. 
Comprehensive EVent Ontology (CEVO) 
While the general analysis of named entities has received substantial research attention, the analysis of relations over named entities has not. In fact, a review of the literature on unstructured as well as structured data revealed a deficiency in research on the abstract conceptualization required to organize relations. We believe that such an abstract conceptualization can benefit various communities and applications such as natural language processing, information extraction, machine learning and ontology engineering. In this paper, we present CEVO (i.e., a Comprehensive EVent Ontology) built on Levin’s conceptual hierarchy of English verbs that categorizes verbs with the shared meaning and syntactic behavior. We present the fundamental concepts and requirements for this ontology. Furthermore, we present three use cases for demonstrating the benefits of this ontology on annotation tasks: 1) annotating relations in plain text, 2) annotating ontological properties and 3) linking textual relations to ontological properties. 
Compressed Learning (CL) 
In this paper, we provide theoretical results to show that compressed learning, learning directly in the compressed domain, is possible. In Particular, we provide tight bounds demonstrating that the linear kernel SVM’s classifier in the measurement domain, with high probability, has true accuracy close to the accuracy of the best linear threshold classifier in the data domain. We show that this is beneficial both from the compressed sensing and the machine learning points of view. Furthermore, we indicate that for a family of wellknown compressed sensing matrices, compressed learning is universal, in the sense that learning and classification in the measurement domain works provided that the data are sparse in some, even unknown, basis. Moreover, we show that our results are also applicable to a family of smooth manifoldlearning tasks. Finally, we support our claims with experimental results. Compressed Learning: A Deep Neural Network Approach 
Compressed, Complementary, ComputationallyEfficient Adaptive Gradient Online Learning (CompAdaGrad) 
The adaptive gradient online learning method known as AdaGrad has seen widespread use in the machine learning community in stochastic and adversarial online learning problems and more recently in deep learning methods. The method’s fullmatrix incarnation offers much better theoretical guarantees and potentially better empirical performance than its diagonal version; however, this version is computationally prohibitive and so the simpler diagonal version often is used in practice. We introduce a new method, CompAdaGrad, that navigates the space between these two schemes and show that this method can yield results much better than diagonal AdaGrad while avoiding the (effectively intractable) $O(n^3)$ computational complexity of fullmatrix AdaGrad for dimension $n$. CompAdaGrad essentially performs fullmatrix regularization in a lowdimensional subspace while performing diagonal regularization in the complementary subspace. We derive CompAdaGrad’s updates for composite mirror descent in case of the squared $\ell_2$ norm and the $\ell_1$ norm, demonstrate that its complexity per iteration is linear in the dimension, and establish guarantees for the method independent of the choice of composite regularizer. Finally, we show preliminary results on several datasets. 
Compressive Kmeans (CKM) 
The LloydMax algorithm is a classical approach to perform Kmeans clustering. Unfortunately, its cost becomes prohibitive as the training dataset grows large. We propose a compressive version of Kmeans (CKM), that estimates cluster centers from a sketch, i.e. from a drastically compressed representation of the training dataset. We demonstrate empirically that CKM performs similarly to LloydMax, for a sketch size proportional to the number of centroids times the ambient dimension, and independent of the size of the original dataset. Given the sketch, the computational complexity of CKM is also independent of the size of the dataset. Unlike LloydMax which requires several replicates, we further demonstrate that CKM is almost insensitive to initialization. For a large dataset of 10^7 data points, we show that CKM can run two orders of magnitude faster than five replicates of LloydMax, with similar clustering performance on artificial data. Finally, CKM achieves lower classification errors on handwritten digits classification. ➘ “LloydMax” 
Compressive Sampling (CS) 
Compressed sensing (also known as compressive sensing, compressive sampling, or sparse sampling) is a signal processing technique for efficiently acquiring and reconstructing a signal, by finding solutions to underdetermined linear systems. This is based on the principle that, through optimization, the sparsity of a signal can be exploited to recover it from far fewer samples than required by the ShannonNyquist sampling theorem. There are two conditions under which recovery is possible. The first one is sparsity which requires the signal to be sparse in some domain. The second one is incoherence which is applied through the isometric property which is sufficient for sparse signals. MRI is a prominent application. A Mathematical Introduction to Compressive Sensing An Introduction To Compressive Sampling Compressive Sensing 
Computation Control Protocol (CCP) 
Cooperative computation is a promising approach for localized data processing for Internet of Things (IoT), where computationally intensive tasks in a device could be divided into subtasks, and offloaded to other devices or servers in close proximity. However, exploiting the potential of cooperative computation is challenging mainly due to the heterogeneous nature of IoT devices. Indeed, IoT devices may have different and timevarying computing power and energy resources, and could be mobile. Coded computation, which advocates mixing data in subtasks by employing erasure codes and offloading these subtasks to other devices for computation, is recently gaining interest, thanks to its higher reliability, smaller delay, and lower communication costs. In this paper, we develop a coded cooperative computation framework, which we name Computation Control Protocol (CCP), by taking into account heterogeneous computing power and energy resources of IoT devices. CCP dynamically allocates subtasks to helpers and is adaptive to timevarying resources. We show that (i) CCP improves task completion delay significantly as compared to baselines, (ii) task completion delay of CCP is very close to its theoretical characterization, and (iii) the efficiency of CCP in terms of resource utilization is higher than 99%, which is significant. 
Computational Intelligence (CI) 
Computational intelligence (CI) is a set of natureinspired computational methodologies and approaches to address complex realworld problems to which traditional approaches, i.e., first principles modeling or explicit statistical modeling, are ineffective or infeasible. Many such reallife problems are not considered to be wellposed problems mathematically, but nature provides many counterexamples of biological systems exhibiting the required function, practically. For instance, the human body has about 200 joints (degrees of freedom), but humans have little problem in executing a target movement of the hand, specified in just three Cartesian dimensions. Even if the torso were mechanically fixed, there is an excess of 7:3 parameters to be controlled for natural arm movement. Traditional models also often fail to handle uncertainty, noise and the presence of an everchanging context. Computational Intelligence provides solutions for such and other complicated problems and inverse problems. It primarily includes artificial neural networks, evolutionary computation and fuzzy logic. In addition, CI also embraces biologically inspired algorithms such as swarm intelligence and artificial immune systems, which can be seen as a part of evolutionary computation, and includes broader fields such as image processing, data mining, and natural language processing. Furthermore other formalisms: DempsterShafer theory, chaos theory and manyvalued logic are used in the construction of computational models. The characteristic of “intelligence” is usually attributed to humans. More recently, many products and items also claim to be “intelligent”. Intelligence is directly linked to the reasoning and decision making. Fuzzy logic was introduced in 1965 as a tool to formalise and represent the reasoning process and fuzzy logic systems which are based on fuzzy logic possess many characteristics attributed to intelligence. Fuzzy logic deals effectively with uncertainty that is common for human reasoning, perception and inference and, contrary to some misconceptions, has a very formal and strict mathematical backbone (‘is quite deterministic in itself yet allowing uncertainties to be effectively represented and manipulated by it’, so to speak). Neural networks, introduced in 1940s (further developed in 1980s) mimic the human brain and represent a computational mechanism based on a simplified mathematical model of the perceptrons (neurons) and signals that they process. Evolutionary computation, introduced in the 1970s and more popular since the 1990s mimics the populationbased sexual evolution through reproduction of generations. It also mimics genetics in so called genetic algorithms. 
Computational Linguistics  Computational linguistics is an interdisciplinary field concerned with the statistical or rulebased modeling of natural language from a computational perspective. Traditionally, computational linguistics was usually performed by computer scientists who had specialized in the application of computers to the processing of a natural language. Computational linguists often work as members of interdisciplinary teams, including linguists (specifically trained in linguistics), language experts (persons with some level of ability in the languages relevant to a given project), and computer scientists. In general, computational linguistics draws upon the involvement of linguists, computer scientists, experts in artificial intelligence, mathematicians, logicians, philosophers, cognitive scientists, cognitive psychologists, psycholinguists, anthropologists and neuroscientists, among others. Computational linguistics has theoretical and applied components, where theoretical computational linguistics takes up issues in theoretical linguistics and cognitive science, and applied computational linguistics focuses on the practical outcome of modeling human language use. 
Computational Network Toolkit (CNTK) 
CNTK (http://www.cntk.ai ), the Computational Network Toolkit by Microsoft Research, is a unified deeplearning toolkit that describes neural networks as a series of computational steps via a directed graph. In this directed graph, leaf nodes represent input values or network parameters, while other nodes represent matrix operations upon their inputs. CNTK allows to easily realize and combine popular model types such as feedforward DNNs, convolutional nets (CNNs), and recurrent networks (RNNs/LSTMs). It implements stochastic gradient descent (SGD, error backpropagation) learning with automatic differentiation and parallelization across multiple GPUs and servers. CNTK has been available under an opensource license since April 2015. It is our hope that the community will take advantage of CNTK to share ideas more quickly through the exchange of open source working code. 
Computational Theory of Mind  In philosophy, a computational theory of mind names a view that the human mind or the human brain (or both) is an information processing system and that thinking is a form of computing. The theory was proposed in its modern form by Hilary Putnam in 1961, and developed by the MIT philosopher and cognitive scientist (and Putnam’s PhD student) Jerry Fodor in the 1960s, 1970s and 1980s. Despite being vigorously disputed in analytic philosophy in the 1990s (due to work by Putnam himself, John Searle, and others), the view is common in modern cognitive psychology and is presumed by many theorists of evolutionary psychology; in the 2000s and 2010s the view has resurfaced in analytic philosophy (Scheutz 2003, Edelman 2008). The computational theory of mind holds that the mind is a computation that arises from the brain acting as a computing machine. The theory can be elaborated in many ways, the most popular of which is that the brain is a computer and the mind is the result of the program that the brain runs. A program is the finite description of an algorithm or effective procedure, which prescribes a deterministic sequence of discrete actions that produces outputs based only on inputs and the internal states (memory) of the computing machine. For any admissible input, algorithms terminate in a finite number of steps. So the computational theory of mind is the claim that the mind is a computation of a machine (the brain) that derives output representations of the world from input representations and internal memory in a deterministic (nonrandom) way that is consistent with the theory of computation. Computational theories of mind are often said to require mental representation because ‘input’ into a computation comes in the form of symbols or representations of other objects. A computer cannot compute an actual object, but must interpret and represent the object in some form and then compute the representation. The computational theory of mind is related to the representational theory of mind in that they both require that mental states are representations. However the two theories differ in that the representational theory claims that all mental states are representations while the computational theory leaves open that certain mental states, such as pain or depression, may not be representational and therefore may not be suitable for a computational treatment. These nonrepresentational mental states are known as qualia. In Fodor’s original views, the computational theory of mind is also related to the language of thought. The language of thought theory allows the mind to process more complex representations with the help of semantics. 
Computer Aided Diagnosis  In radiology, computeraided detection (CADe), also called computeraided diagnosis (CADx), are procedures in medicine that assist doctors in the interpretation of medical images. Imaging techniques in Xray, MRI, and Ultrasound diagnostics yield a great deal of information, which the radiologist has to analyze and evaluate comprehensively in a short time. CAD systems help scan digital images, e.g. from computed tomography, for typical appearances and to highlight conspicuous sections, such as possible diseases. 
Computer Assisted/Aided Qualitative Data Analysis Software (CAQDAS) 
Computer Assisted/Aided Qualitative Data Analysis Software (CAQDAS) offers tools that assist with qualitative research such as transcription analysis, coding and text interpretation, recursive abstraction, content analysis, discourse analysis, grounded theory methodology, etc. 
Computer Science  Computer science is the scientific and practical approach to computation and its applications. It is the systematic study of the feasibility, structure, expression, and mechanization of the methodical procedures (or algorithms) that underlie the acquisition, representation, processing, storage, communication of, and access to information, whether such information is encoded as bits in a computer memory or transcribed in genes and protein structures in a biological cell. An alternate, more succinct definition of computer science is the study of automating algorithmic processes that scale. A computer scientist specializes in the theory of computation and the design of computational systems. Its subfields can be divided into a variety of theoretical and practical disciplines. Some fields, such as computational complexity theory (which explores the fundamental properties of computational and intractable problems), are highly abstract, while fields such as computer graphics emphasize realworld visual applications. Still other fields focus on the challenges in implementing computation. For example, programming language theory considers various approaches to the description of computation, while the study of computer programming itself investigates various aspects of the use of programming language and complex systems. Humancomputer interaction considers the challenges in making computers and computations useful, usable, and universally accessible to humans. 
Computer Vision (CV) 
Computer vision is a field that includes methods for acquiring, processing, analyzing, and understanding images and, in general, highdimensional data from the real world in order to produce numerical or symbolic information, e.g., in the forms of decisions. A theme in the development of this field has been to duplicate the abilities of human vision by electronically perceiving and understanding an image. This image understanding can be seen as the disentangling of symbolic information from image data using models constructed with the aid of geometry, physics, statistics, and learning theory. Computer vision has also been described as the enterprise of automating and integrating a wide range of processes and representations for vision perception. Computer vision is the automatic analysis of images and videos by computers in order to gain some understanding of the world. Computer vision is inspired by the capabilities of the human vision system and, when initially addressed in the 1960s and 1970s, it was thought to be a relatively straightforward problem to solve. However, the reason we think/thought that vision is easy is that we have our own visual system which makes the task seem intuitive to our conscious minds. In fact, the human visual system is very complex and even the estimates of how much of the brain is involved with visual processing vary from 25% up to more than 50%. 
Concept Drift  In predictive analytics and machine learning, the concept drift means that the statistical properties of the target variable, which the model is trying to predict, change over time in unforeseen ways. This causes problems because the predictions become less accurate as time passes. The term concept refers to the quantity to be predicted. More generally, it can also refer to other phenomena of interest besides the target concept, such as an input, but, in the context of concept drift, the term commonly refers to the target variable. 
Concept Interaction Graph  Identifying the relationship between two text objects is a core research problem underlying many natural language processing tasks. A wide range of deep learning schemes have been proposed for text matching, mainly focusing on sentence matching, question answering or query document matching. We point out that existing approaches do not perform well at matching long documents, which is critical, for example, to AIbased news article understanding and event or story formation. The reason is that these methods either omit or fail to fully utilize complicated semantic structures in long documents. In this paper, we propose a graph approach to text matching, especially targeting long document matching, such as identifying whether two news articles report the same event in the real world, possibly with different narratives. We propose the Concept Interaction Graph to yield a graph representation for a document, with vertices representing different concepts, each being one or a group of coherent keywords in the document, and with edges representing the interactions between different concepts, connected by sentences in the document. Based on the graph representation of document pairs, we further propose a Siamese Encoded Graph Convolutional Network that learns vertex representations through a Siamese neural network and aggregates the vertex features though Graph Convolutional Networks to generate the matching result. Extensive evaluation of the proposed approach based on two labeled news article datasets created at Tencent for its intelligent news products show that the proposed graph approach to long document matching significantly outperforms a wide range of stateoftheart methods. 
Concept Learning  Concept learning, also known as category learning, concept attainment, and concept formation, is defined by Bruner, Goodnow, & Austin (1967) as ‘the search for and listing of attributes that can be used to distinguish exemplars from non exemplars of various categories’. More simply put, concepts are the mental categories that help us classify objects, events, or ideas, building on the understanding that each object, event, or idea has a set of common relevant features. Thus, concept learning is a strategy which requires a learner to compare and contrast groups or categories that contain conceptrelevant features with groups or categories that do not contain conceptrelevant features. Concept learning also refers to a learning task in which a human or machine learner is trained to classify objects by being shown a set of example objects along with their class labels. The learner simplifies what has been observed by condensing it in the form of an example. This simplified version of what has been learned is then applied to future examples. Concept learning may be simple or complex because learning takes place over many areas. When a concept is difficult, it is less likely that the learner will be able to simplify, and therefore will be less likely to learn. Colloquially, the task is known as learning from examples. Most theories of concept learning are based on the storage of exemplars and avoid summarization or overt abstraction of any kind. 
Concept Mask  Existing works on semantic segmentation typically consider a small number of labels, ranging from tens to a few hundreds. With a large number of labels, training and evaluation of such task become extremely challenging due to correlation between labels and lack of datasets with complete annotations. We formulate semantic segmentation as a problem of image segmentation given a semantic concept, and propose a novel system which can potentially handle an unlimited number of concepts, including objects, parts, stuff, and attributes. We achieve this using a weakly and semisupervised framework leveraging multiple datasets with different levels of supervision. We first train a deep neural network on a 6M stock image dataset with only imagelevel labels to learn visualsemantic embedding on 18K concepts. Then, we refine and extend the embedding network to predict an attention map, using a curated dataset with bounding box annotations on 750 concepts. Finally, we train an attentiondriven class agnostic segmentation network using an 80category fully annotated dataset. We perform extensive experiments to validate that the proposed system performs competitively to the state of the art on fully supervised concepts, and is capable of producing accurate segmentations for weakly learned and unseen concepts. 
Concept Mining  Concept mining is an activity that results in the extraction of concepts from artifacts. Solutions to the task typically involve aspects of artificial intelligence and statistics, such as data mining and text mining. Because artifacts are typically a loosely structured sequence of words and other symbols (rather than concepts), the problem is nontrivial, but it can provide powerful insights into the meaning, provenance and similarity of documents. 
Concept2vec  Although there is an emerging trend towards generating embeddings for primarily unstructured data, and recently for structured data, there is not yet any systematic suite for measuring the quality of embeddings. This deficiency is further sensed with respect to embeddings generated for structured data because there are no concrete evaluation metrics measuring the quality of encoded structure as well as semantic patterns in the embedding space. In this paper, we introduce a framework containing three distinct tasks concerned with the individual aspects of ontological concepts: (i) the categorization aspect, (ii) the hierarchical aspect, and (iii) the relational aspect. Then, in the scope of each task, a number of intrinsic metrics are proposed for evaluating the quality of the embeddings. Furthermore, w.r.t. this framework multiple experimental studies were run to compare the quality of the available embedding models. Employing this framework in future research can reduce misjudgment and provide greater insight about quality comparisons of embeddings for ontological concepts. 
ConceptCognitive Learning (CCL) 
Conceptcognitive learning (CCL) is a hot topic in recent years, and it has attracted much attention from the communities of formal concept analysis, granular computing and cognitive computing. However, the relationship among cognitive computing (CC), conceptcognitive computing (CCC), and CCL is not clearly described. To this end, we explain the relationship of CC, CCC, and CCL. Then, we propose a generalized CCL from the point of view of machine learning. Finally, experiments on seven data sets are conducted to evaluate concept formation and conceptcognitive processes of the proposed generalized CCL. 
ConceptOriented Deep Learning (CODL) 
Concepts are the foundation of human deep learning, understanding, and knowledge integration and transfer. We propose conceptoriented deep learning (CODL) which extends (machine) deep learning with concept representations and conceptual understanding capability. CODL addresses some of the major limitations of deep learning: interpretability, transferability, contextual adaptation, and requirement for lots of labeled training data. We discuss the major aspects of CODL including concept graph, concept representations, concept exemplars, and concept representation learning systems supporting incremental and continual learning. 
Conceptual Clustering  Conceptual clustering is a machine learning paradigm for unsupervised classification developed mainly during the 1980s. It is distinguished from ordinary data clustering by generating a concept description for each generated class. Most conceptual clustering methods are capable of generating hierarchical category structures; see Categorization for more information on hierarchy. Conceptual clustering is closely related to formal concept analysis, decision tree learning, and mixture model learning. http://…/eswc2008PAM.pdf 
Conceptual Expansion  Problems with few examples of a new class of objects prove challenging to most classifiers. One solution to is to reuse existing data through transfer methods such as oneshot learning or domain adaption. However these approaches require an explicit handauthored or learned definition of how reuse can occur. We present an approach called conceptual expansion that learns how to reuse existing machinelearned knowledge when classifying new cases. We evaluate our approach by adding new classes of objects to the CIFAR10 dataset and varying the number of available examples of these new classes. 
CONCODE  Source code is rarely written in isolation. It depends significantly on the programmatic context, such as the class that the code would reside in. To study this phenomenon, we introduce the task of generating class member functions given English documentation and the programmatic context provided by the rest of the class. This task is challenging because the desired code can vary greatly depending on the functionality the class provides (e.g., a sort function may or may not be available when we are asked to ‘return the smallest element’ in a particular member variable list). We introduce CONCODE, a new large dataset with over 100,000 examples consisting of Java classes from online code repositories, and develop a new encoderdecoder architecture that models the interaction between the method documentation and the class environment. We also present a detailed error analysis suggesting that there is significant room for future work on this task. 
Concolic Testing  Concolic testing alternates between CONCrete program execution and symbOLIC analysis to explore the execution paths of a software program and to increase code coverage. In this paper, we develop the first concolic testing approach for Deep Neural Networks (DNNs). More specifically, we utilise quantified linear arithmetic over rationals to express test requirements that have been studied in the literature, and then develop a coherent method to perform concolic testing with the aim of better coverage. Our experimental results show the effectiveness of the concolic testing approach in both achieving high coverage and finding adversarial examples. 
Concordance Correlation Coefficient  In statistics, the concordance correlation coefficient measures the agreement between two variables, e.g., to evaluate reproducibility or for interrater reliability. 
CondenseNet  Deep neural networks are increasingly used on mobile devices, where computational resources are limited. In this paper we develop CondenseNet, a novel network architecture with unprecedented efficiency. It combines dense connectivity between layers with a mechanism to remove unused connections. The dense connectivity facilitates feature reuse in the network, whereas learned group convolutions remove connections between layers for which this feature reuse is superfluous. At test time, our model can be implemented using standard grouped convolutions – allowing for efficient computation in practice. Our experiments demonstrate that CondenseNets are much more efficient than stateoftheart compact convolutional networks such as MobileNets and ShuffleNets. 
Condition Monitoring (CM) 
Condition monitoring (or, colloquially, CM) is the process of monitoring a parameter of condition in machinery (vibration, temperature etc.), in order to identify a significant change which is indicative of a developing fault. It is a major component of . The use of condition monitoring allows maintenance to be scheduled, or other actions to be taken to prevent failure and avoid its consequences. Condition monitoring has a unique benefit in that conditions that would shorten normal lifespan can be addressed before they develop into a major failure. Condition monitoring techniques are normally used on rotating equipment and other machinery (pumps, electric motors, internal combustion engines, presses), while periodic inspection using nondestructive testing techniques and fit for service (FFS) evaluation are used for stationary plant equipment such as steam boilers, piping and heat exchangers. http://…/9781466584051 
Conditional Autoregressive Model (CAR) 
The essential idea here is that the probability of values estimated at any given location are conditional on the level of neighboring values. mclcar 
Conditional Extreme Value Models  Extreme value theory (EVT) is often used to model environmental, financial and internet traffic data. Multivariate EVT assumes a multivariate domain of attraction condition for the distribution of a random vector necessitating that each component satisfy a marginal domain of attraction condition. Heffernan and Tawn [2004] and Heffernan and Resnick [2007] developed an approximation to the joint distribution of the random vector by conditioning on one of the components being in an extreme value domain. The usual method of analysis using multivariate extreme value theory often is not helpful either because of asymptotic independence or due to one component of the observation vector not being in a domain of attraction. These defects can be addressed by using the conditional extreme value model. 
Conditional Fiducial Model  The fiducial is not unique in general, but we prove that in a restricted class of models it is uniquely determined by the sampling distribution of the data. It depends in particular not on the choice of a data generating model. The arguments lead to a generalization of the classical formula found by Fisher (1930). The restricted class includes cases with discrete distributions, the case of the shape parameter in the Gamma distribution, and also the case of the correlation coefficient in a bivariate Gaussian model. One of the examples can also be used in a pedagogical context to demonstrate possible difficulties with likelihood, Bayesian, and bootstrapinference. Examples that demonstrate nonuniqueness are also presented. It is explained that they can be seen as cases with restrictions on the parameter space. Motivated by this the concept of a conditional fiducial model is introduced. This class of models includes the common case of iid samples from a oneparameter model investigated by Hannig (2013), the structural group models investigated by Fraser (1968), and also certain models discussed by Fisher (1973) in his final writing on the subject. 
Conditional Information Gain Network  Deep neural network models owe their representational power to the high number of learnable parameters. It is often infeasible to run these largely parametrized deep models in limited resource environments, like mobile phones. Network models employing conditional computing are able to reduce computational requirements while achieving high representational power, with their ability to model hierarchies. We propose Conditional Information Gain Networks, which allow the feed forward deep neural networks to execute conditionally, skipping parts of the model based on the sample and the decision mechanisms inserted in the architecture. These decision mechanisms are trained using cost functions based on differentiable Information Gain, inspired by the training procedures of decision trees. These information gain based decision mechanisms are differentiable and can be trained endtoend using a unified framework with a general cost function, covering both classification and decision losses. We test the effectiveness of the proposed method on MNIST and recently introduced Fashion MNIST datasets and show that our information gain based conditional execution approach can achieve better or comparable classification results using significantly fewer parameters, compared to standard convolutional neural network baselines. 
Conditional Linear Regression  Work in machine learning and statistics commonly focuses on building models that capture the vast majority of data, possibly ignoring a segment of the population as outliers. However, there does not often exist a good model on the whole dataset, so we seek to find a small subset where there exists a useful model. We are interested in finding a linear rule capable of achieving more accurate predictions for just a segment of the population. We give an efficient algorithm with theoretical analysis for the conditional linear regression task, which is the joint task of identifying a significant segment of the population, described by a kDNF, along with its linear regression fit. 
Conditional Neural Process (CNP) 
Deep neural networks excel at function approximation, yet they are typically trained from scratch for each new function. On the other hand, Bayesian methods, such as Gaussian Processes (GPs), exploit prior knowledge to quickly infer the shape of a new function at test time. Yet GPs are computationally expensive, and it can be hard to design appropriate priors. In this paper we propose a family of neural models, Conditional Neural Processes (CNPs), that combine the benefits of both. CNPs are inspired by the flexibility of stochastic processes such as GPs, but are structured as neural networks and trained via gradient descent. CNPs make accurate predictions after observing only a handful of training data points, yet scale to complex functions and large datasets. We demonstrate the performance and versatility of the approach on a range of canonical machine learning tasks, including regression, classification and image completion. 
Conditional Power (CP) 
Conditional power (CP) is the probability that the final study result will be statistically significant, given the data observed thus far and a specific assumption about the pattern of the data to be observed in the remainder of the study, such as assuming the original design effect, or the effect estimated from the current data, or under the null hypothesis. In many clinical trials, a CP computation at a prespecified point in the study, such as midway, is used as the basis for early termination for futility when there is little evidence of a beneficial effect. 
Conditional Preference Network (CPnet) 
Interactive Learning of Acyclic Conditional Preference Networks 
Conditional Random Field (CRF) 
Conditional random fields (CRFs) are a class of statistical modelling method often applied in pattern recognition and machine learning, where they are used for structured prediction. Whereas an ordinary classifier predicts a label for a single sample without regard to ‘neighboring’ samples, a CRF can take context into account; e.g., the linear chain CRF popular in natural language processing predicts sequences of labels for sequences of input samples. CRFs are a type of discriminative undirected probabilistic graphical model. It is used to encode known relationships between observations and construct consistent interpretations. It is often used for labeling or parsing of sequential data, such as natural language text or biological sequences and in computer vision. Specifically, CRFs find applications in shallow parsing, named entity recognition and gene finding, among other tasks, being an alternative to the related hidden Markov models (HMMs). In computer vision, CRFs are often used for object recognition and image segmentation. Complete tutorial on Text Classification using Conditional Random Fields Model 
Conditional Random Fields as Recurrent Neural Networks (CRFRNN) 
Pixellevel labelling tasks, such as semantic segmentation, play a central role in image understanding. Recent approaches have attempted to harness the capabilities of deep learning techniques for image recognition to tackle pixellevel labelling tasks. One central issue in this methodology is the limited capacity of deep learning techniques to delineate visual objects. To solve this problem, we introduce a new form of convolutional neural network that combines the strengths of Convolutional Neural Networks (CNNs) and Conditional Random Fields (CRFs)based probabilistic graphical modelling. To this end, we formulate Conditional Random Fields as Recurrent Neural Networks. This network, called CRFRNN, is then plugged in as a part of a CNN to obtain a deep network that has desirable properties of both CNNs and CRFs. Importantly, our system fully integrates CRF modelling with CNNs, making it possible to train the whole deep network endtoend with the usual backpropagation algorithm, avoiding offline postprocessing methods for object delineation. GitXiv 
ConditionBased Maintenance (CBM) 
Conditionbased maintenance (CBM), shortly described, is maintenance when need arises. This maintenance is performed after one or more indicators show that equipment is going to fail or that equipment performance is deteriorating. This concept is applicable to mission critical systems that incorporate active redundancy and fault reporting. It is also applicable to nonmission critical systems that lack redundancy and fault reporting. Conditionbased maintenance was introduced to try to maintain the correct equipment at the right time. CBM is based on using realtime data to prioritize and optimize maintenance resources. Observing the state of the system is known as condition monitoring. Such a system will determine the equipment’s health, and act only when maintenance is actually necessary. Developments in recent years have allowed extensive instrumentation of equipment, and together with better tools for analyzing condition data, the maintenance personnel of today are more than ever able to decide what is the right time to perform maintenance on some piece of equipment. Ideally conditionbased maintenance will allow the maintenance personnel to do only the right things, minimizing spare parts cost, system downtime and time spent on maintenance. http://…/3313ijmnct03.pdf 
CONESTA (CONESTA) 
Highdimensional prediction models are increasingly used to analyze biological data such as neuroimaging of genetic data sets. However, classical penalized algorithms yield to dense solutions that are difficult to interpret without arbitrary thresholding. Alternatives based on sparsityinducing penalties suffer from coefficient instability. Complex structured sparsityinducing penalties are a promising approach to force the solution to adhere to some domainspecific constraints and thus offering new perspectives in biomarker identification. We propose a generic optimization framework that can combine any smooth convex loss function with: (i) penalties whose proximal operator is known and (ii) with a large range of complex, nonsmooth convex structured penalties such as total variation, or overlapping group lasso. Although many papers have addressed a similar goal, few have tackled it in such a generic way and in the context of highdimensional data. The proposed continuation algorithm, called \textit{CONESTA}, dynamically smooths the complex penalties to avoid the computation of proximal operators, that are either not known or expensive to compute. The decreasing sequence of smoothing parameters is dynamically adapted, using the duality gap, in order to maintain the optimal convergence speed towards any globally desired precision with duality gap guarantee. First, we demonstrate, on both simulated data and on experimental MRI data, that CONESTA outperforms the excessive gap method, ADMM, proximal gradient smoothing (without continuation) and inexact FISTA in terms of convergence speed and/or precision of the solution. Second, on the experimental MRI data set, we establish the superiority of structured sparsityinducing penalties ($\ell_1$ and total variation) over nonstructured methods in terms of the recovery of meaningful and stable groups of predictive variables. 
Confidence  Confidence is defined as the probability of seeing the rule’s consequent under the condition that the transactions also contain the antecedent. Confidence is directed and gives different values for the rules X→Y and Y→X. Association rules have to satisfy a minimum confidence constraint, conf(X→Y)≥γ. Confidence is not downward closed and was developed together with support by Agrawal et al. (the socalled supportconfidence framework). Support is first used to find frequent (significant) itemsets exploiting its downward closure property to prune the search space. Then confidence is used in a second step to produce rules from the frequent itemsets that exceed a min. confidence threshold. A problem with confidence is that it is sensitive to the frequency of the consequent Y in the database. Caused by the way confidence is calculated, consequents with higher support will automatically produce higher confidence values even if there exists no association between the items. 
Confidence Interval  In statistics, a confidence interval (CI) is a type of interval estimate of a population parameter and is used to indicate the reliability of an estimate. It is an observed interval (i.e. it is calculated from the observations), in principle different from sample to sample, that frequently includes the parameter of interest if the experiment is repeated. How frequently the observed interval contains the parameter is determined by the confidence level or confidence coefficient. 
Confidence Weighting (CW) 
Confidence weighting (CW) is concerned with measuring two variables: (1) what a respondent believes is a correct answer to a question and (2) what degree of certainty the respondent has toward the correctness of this belief. Confidence weighting when applied to a specific answer selection for a particular test or exam question is referred to in the literature from cognitive psychology as itemspecific confidence, a term typically used by researchers who investigate metamemory or metacognition, comprehension monitoring, or feelingofknowing. Itemspecific confidence is defined as calibrating the relationship between an objective performance of accuracy (e.g., a test answer selection) with the subjective measure of confidence, (e.g., a numeric value assigned to the selection). Studies on selfconfidence and metacognition during test taking have used itemspecific confidence as a way to assess the accuracy and confidence underlying knowledge judgments. Researchers outside of the field of cognitive psychology have used confidence weighting as applied to itemspecific judgments in assessing alternative conceptions of difficult concepts in high school biology and physics, developing and evaluating computerized adaptive testing, testing computerized assessments of learning and understanding, and developing and testing formative and summative classroom assessments. Confidence weighting is one of three components of the Risk Inclination Model. 
ConfidenceBased Recommender (CoBaR) 
Neighborhoodbased collaborative filtering algorithms usually adopt a fixed neighborhood size for every user or item, although groups of users or items may have different lengths depending on users’ preferences. In this paper, we propose an extension to a nonpersonalized recommender based on confidence intervals and hierarchical clustering to generate groups of users with optimal sizes. The evaluation shows that the proposed technique outperformed the traditional recommender algorithms in four publicly available datasets. 
ConfidenceWeighted Linear Classification  We introduce confidenceweighted linear classifiers, which add parameter confidence information to linear classifiers. Online learners in this setting update both classifier parameters and the estimate of their confidence. The particular online algorithms we study here maintain a Gaussian distribution over parameter vectors and update the mean and covariance of the distribution with each instance. Empirical evaluation on a range of NLP tasks show that our algorithm improves over other state of the art online and batch methods, learns faster in the online setting, and lends itself to better classifier combination after parallel training. 
Confident Multiple Choice Learning (CMCL) 
Ensemble methods are arguably the most trustworthy techniques for boosting the performance of machine learning models. Popular independent ensembles (IE) relying on naive averaging/voting scheme have been of typical choice for most applications involving deep neural networks, but they do not consider advanced collaboration among ensemble models. In this paper, we propose new ensemble methods specialized for deep neural networks, called confident multiple choice learning (CMCL): it is a variant of multiple choice learning (MCL) via addressing its overconfidence issue.In particular, the proposed major components of CMCL beyond the original MCL scheme are (i) new loss, i.e., confident oracle loss, (ii) new architecture, i.e., feature sharing and (iii) new training method, i.e., stochastic labeling. We demonstrate the effect of CMCL via experiments on the image classification on CIFAR and SVHN, and the foregroundbackground segmentation on the iCoseg. In particular, CMCL using 5 residual networks provides 14.05% and 6.60% relative reductions in the top1 error rates from the corresponding IE scheme for the classification task on CIFAR and SVHN, respectively. 
Configural Frequency Analysis (CFA) 
Configural frequency analysis (CFA) is a method of exploratory data analysis, introduced by Gustav A. Lienert in 1969. The goal of a configural frequency analysis is to detect patterns in the data that occur significantly more (such patterns are called Types) or significantly less often (such patterns are called Antitypes) than expected by chance. Thus, the idea of a CFA is to provide by the identified types and antitypes some insight into the structure of the data. Types are interpreted as concepts which are constituted by a pattern of variable values. Antitypes are interpreted as patterns of variable values that do in general not occur together. cfa 
Configurational Comparative Methods (CCM) 
Configurational comparative methods (CCMs) subsume techniques for the identification of complex causal dependencies in configurational data using the theoretical framework of Boolean algebra and its various extensions (Rihoux and Ragin, 2009). For example, Qualitative Comparative Analysis (QCA; Ragin, 1987, 2000, 2008)hitherto the most prominent representative of CCMshas been applied in areas as diverse as business administration (e.g., Chung, 2001), environmental science (van Vliet et al., 2013), evaluation (Cragun et al., 2014), political science (Thiem, 2011), public health (Longest and Thoits, 2012) and sociology (Crowley, 2013). Besides three standalone programs based on graphical user interfaces, three R packages for QCA are currently available, each with a different scope of functionality: QCA (Du¸sa and Thiem, 2014; Thiem and Du¸sa, 2013a,c), QCA3 (Huang, 2014) and SetMethods (Quaranta, 2013) (an addon package to Schneider and Wagemann, 2012). 
Confirmatory Analysis  1) Inferential Statistics – Deductive Approach: · Heavy reliance on probability models · Must accept untestable assumptions · Look for definite answers to specific questions · Emphasis on numerical calculations · Hypotheses determined at outset · Hypothesis tests and formal confidence interval estimation. 2) Advantages: · Provide precise information in the right circumstances · Wellestablished theory and methods. 3) Disadvantages: · Misleading impression of precision in less than ideal circumstances · Analysis driven by preconceived ideas · Difficult to notice unexpected results. 
Confirmatory Factor Analysis (CFA) 
In statistics, confirmatory factor analysis (CFA) is a special form of factor analysis, most commonly used in social research. It is used to test whether measures of a construct are consistent with a researcher’s understanding of the nature of that construct (or factor). As such, the objective of confirmatory factor analysis is to test whether the data fit a hypothesized measurement model. This hypothesized model is based on theory and/or previous analytic research. CFA was first developed by Jöreskog and has built upon and replaced older methods of analyzing construct validity such as the MTMM Matrix as described in Campbell & Fiske (1959). In confirmatory factor analysis, the researcher first develops a hypothesis about what factors s/he believes are underlying the measures s/he has used (e.g., “Depression” being the factor underlying the Beck Depression Inventory and the Hamilton Rating Scale for Depression) and may impose constraints on the model based on these a priori hypotheses. By imposing these constraints, the researcher is forcing the model to be consistent with his/her theory. For example, if it is posited that there are two factors accounting for the covariance in the measures, and that these factors are unrelated to one another, the researcher can create a model where the correlation between factor A and factor B is constrained to zero. Model fit measures could then be obtained to assess how well the proposed model captured the covariance between all the items or measures in the model. If the constraints the researcher has imposed on the model are inconsistent with the sample data, then the results of statistical tests of model fit will indicate a poor fit, and the model will be rejected. If the fit is poor, it may be due to some items measuring multiple factors. It might also be that some items within a factor are more related to each other than others. For some applications, the requirement of “zero loadings” (for indicators not supposed to load on a certain factor) has been regarded as too strict. A newly developed analysis method, “exploratory structural equation modeling”, specifies hypotheses about the relation between observed indicators and their supposed primary latent factors while allowing for estimation of loadings with other latent factors as well. relabeLoadings 
Confirmatory Research  Confirmatory research sets out to test a specific hypothesis to the exclusion of other considerations; whereas discovery research seeks to find out what might be important in understanding a research context, presenting findings as conjectural (e.g., suggestive, indicative) rather than definite’ (Taber, 2013: 45) Exploratory vs Confirmatory Research 
ConflictDriven Clause Learning (CDCL) 
In computer science, ConflictDriven Clause Learning (CDCL) is an algorithm for solving the Boolean satisfiability problem (SAT). Given a Boolean formula, the SAT problem asks for an assignment of variables so that the entire formula evaluates to true. The internal workings of CDCL SAT solvers were inspired by DPLL solvers. 
Conflictfree Asynchronous Machine Learning (CYCLADES) 
We present CYCLADES, a general framework for parallelizing stochastic optimization algorithms in a shared memory setting. CYCLADES is asynchronous during shared model updates, and requires no memory locking mechanisms, similar to HOGWILD!type algorithms. Unlike HOGWILD!, CYCLADES introduces no conflicts during the parallel execution, and offers a blackbox analysis for provable speedups across a large family of algorithms. Due to its inherent conflictfree nature and cache locality, our multicore implementation of CYCLADES consistently outperforms HOGWILD!type algorithms on sufficiently sparse datasets, leading to up to 40% speedup gains compared to the HOGWILD! implementation of SGD, and up to 5x gains over asynchronous implementations of variance reduction algorithms. 
ConflictFree Replicated Data Types (CRDT) 
Internetscale distributed systems often replicate data at multiple geographic locations to provide low latency and high availability, despite node and network failures. Georeplicated systems that adopt a weak consistency model allow replicas to temporarily diverge, requiring a mechanism for merging concurrent updates into a common state. Conflictfree Replicated Data Types (CRDT) provide a principled approach to address this problem. This document presents an overview of Conflictfree Replicated Data Types research and practice, organizing the presentation in the aspects relevant for the application developer, the system developer and the CRDT developer. 
Conformable Fractional Accumulation (CFA) 
The fractional order grey models (FGM) have appealed considerable interest of research in recent years due to its higher effectiveness and flexibility than the conventional grey models and other prediction models. However, the definitions of the fractional order accumulation (FOA) and difference (FOD) is computationally complex, which leads to difficulties for the theoretical analysis and applications. In this paper, the new definition of the FOA are proposed based on the definitions of Conformable Fractional Derivative, which is called the Conformable Fractional Accumulation (CFA), along with its inverse operation, the Conformable Fractional Difference (CFD). Then the new Conformable Fractional Grey Model (CFGM) based on CFA and CFD is introduced with detailed modelling procedures. The feasibility and simplicity and the CFGM are shown in the numerical example. And the at last the comprehensive realworld case studies of natural gas production forecasting in 11 countries are presented, and results show that the CFGM is much more effective than the existing FGM model in the 165 subcases. 
Conformable Fractional Grey Model (CFGM) 
The fractional order grey models (FGM) have appealed considerable interest of research in recent years due to its higher effectiveness and flexibility than the conventional grey models and other prediction models. However, the definitions of the fractional order accumulation (FOA) and difference (FOD) is computationally complex, which leads to difficulties for the theoretical analysis and applications. In this paper, the new definition of the FOA are proposed based on the definitions of Conformable Fractional Derivative, which is called the Conformable Fractional Accumulation (CFA), along with its inverse operation, the Conformable Fractional Difference (CFD). Then the new Conformable Fractional Grey Model (CFGM) based on CFA and CFD is introduced with detailed modelling procedures. The feasibility and simplicity and the CFGM are shown in the numerical example. And the at last the comprehensive realworld case studies of natural gas production forecasting in 11 countries are presented, and results show that the CFGM is much more effective than the existing FGM model in the 165 subcases. 
Conformal Prediction  Conformal prediction uses past experience to determine precise levels of confidence in new predictions. Given an error probability e, together with a method that makes a prediction ˆ y of a label y, it produces a set of labels, typically containing ˆ y, that also contains y with probability 1e. Conformal prediction can be applied to any method for producing ˆ y: a nearestneighbor method, a supportvector machine, ridge regression, etc. Conformal prediction is designed for an online setting in which labels are predicted successively, each one being revealed before the next is predicted. The most novel and valuable feature of conformal prediction is that if the successive examples are sampled independently from the same distribution, then the successive predictions will be right 1e of the time, even though they are based on an accumulating data set rather than on independent data sets. In addition to the model under which successive examples are sampled independently, other online compression models can also use conformal prediction. The widely used Gaussian linear model is one of these. 
Confounding  http://…/confounding.html 
Confounding Variable  In statistics, a confounding variable (also confounding factor, a confound, or confounder) is an extraneous variable in a statistical model that correlates (directly or inversely) with both the dependent variable and the independent variable. A perceived relationship between an independent variable and a dependent variable that has been misestimated due to the failure to account for a confounding factor is termed a spurious relationship, and the presence of misestimation for this reason is termed omittedvariable bias. While specific definitions may vary, in essence a confounding variable fits the following four criteria, here given in a hypothetical situation with variable of interest ‘V’, confounding variable ‘C’ and outcome of interest ‘O’: 1. C is associated (inversely or directly) with O 2. C is associated with O, independent of V 3. C is associated (inversely or directly) with V 4. C is not in the causal pathway of V to O (C is not a direct consequence of V, not a way by which V produces O) The above correlationbased definition, however, is metaphorical at best – a growing number of analysts agree that confounding is a causal concept, and as such, cannot be described in terms of correlations nor associations. 
Confusion Matrix  In the field of machine learning, a confusion matrix, also known as a contingency table or an error matrix , is a specific table layout that allows visualization of the performance of an algorithm, typically a supervised learning one (in unsupervised learning it is usually called a matching matrix). Each column of the matrix represents the instances in a predicted class, while each row represents the instances in an actual class. The name stems from the fact that it makes it easy to see if the system is confusing two classes (i.e. commonly mislabeling one as another). 
Congested Scene Recognition Network (CSRNet) 
We propose a network for Congested Scene Recognition called CSRNet to provide a datadriven and deep learning method that can understand highly congested scenes and perform accurate count estimation as well as present highquality density maps. The proposed CSRNet is composed of two major components: a convolutional neural network (CNN) as the frontend for 2D feature extraction and a dilated CNN for the backend, which uses dilated kernels to deliver larger reception fields and to replace pooling operations. CSRNet is an easytrained model because of its pure convolutional structure. To our best acknowledge, CSRNet is the first implementation using dilated CNNs for crowd counting tasks. We demonstrate CSRNet on four datasets (ShanghaiTech dataset, the UCF_CC_50 dataset, the WorldEXPO’10 dataset, and the UCSD dataset) and we deliver the stateoftheart performance on all the datasets. In the ShanghaiTech Part_B dataset, we significantly achieve the MAE which is 47.3% lower than the previous stateoftheart method. We extend the applications for counting other objects, such as the vehicle in TRANCOS dataset. Results show that CSRNet significantly improves the output quality with 15.4% lower MAE than the previous stateoftheart approach. 
Congruence Class Model (CCM) 
CCMnet 
Congruence Distance  A time series is a sequence of data items; typical examples are videos, stock ticker data, or streams of temperature measurements. Quite some research has been devoted to comparing and indexing simple time series, i.e., time series where the data items are real numbers or integers. However, for many application scenarios, the data items of a time series are not simple, but highdimensional data points. Motivated by an application scenario dealing with motion gesture recognition, we develop a distance measure (which we call congruence distance) that serves as a model for the approximate congruency of two multidimensional time series. This distance measure generalizes the classical notion of congruence from point sets to multidimensional time series. We show that, given two input time series $S$ and $T$, computing the congruence distance of $S$ and $T$ is NPhard. Afterwards, we present two algorithms that compute an approximation of the congruence distance. We provide theoretical bounds that relate these approximations with the exact congruence distance. 
Conjoint Analysis  Conjoint analysis’ is a survey based statistical technique used in market research that helps determine how people value different attributes (feature, function, benefits) that make up an individual product or service. The objective of conjoint analysis is to determine what combination of a limited number of attributes is most influential on respondent choice or decision making. A controlled set of potential products or services is shown to survey respondents and by analyzing how they make preferences between these products, the implicit valuation of the individual elements making up the product or service can be determined. These implicit valuations (utilities or partworths) can be used to create market models that estimate market share, revenue and even profitability of new designs. Conjoint originated in mathematical psychology and was developed by marketing professor Paul E. Green at the Wharton School of the University of Pennsylvania and Data Chan. Other prominent conjoint analysis pioneers include professor V. ‘Seenu’ Srinivasan of Stanford University who developed a linear programming (LINMAP) procedure for rank ordered data as well as a selfexplicated approach, Richard Johnson who developed the Adaptive Conjoint Analysis technique in the 1980s and Jordan Louviere (University of Iowa) who invented and developed choicebased approaches to conjoint analysis and related techniques such as bestworst scaling. Today it is used in many of the social sciences and applied sciences including marketing, product management, and operations research. It is used frequently in testing customer acceptance of new product designs, in assessing the appeal of advertisements and in service design. It has been used in product positioning, but there are some who raise problems with this application of conjoint analysis. Conjoint analysis techniques may also be referred to as multiattribute compositional modelling, discrete choice modelling, or stated preference research, and is part of a broader set of tradeoff analysis tools used for systematic analysis of decisions. These tools include BrandPrice TradeOff, Simalto, and mathematical approaches such as AHP, evolutionary algorithms or ruledeveloping experimentation. What Is Conjoint Analysis? 
Conjugate Gradient Method (CG) 
In mathematics, the conjugate gradient method is an algorithm for the numerical solution of particular systems of linear equations, namely those whose matrix is symmetric and positivedefinite. The conjugate gradient method is often implemented as an iterative algorithm, applicable to sparse systems that are too large to be handled by a direct implementation or other direct methods such as the Cholesky decomposition. Large sparse systems often arise when numerically solving partial differential equations or optimization problems. The conjugate gradient method can also be used to solve unconstrained optimization problems such as energy minimization. It was developed by Magnus Hestenes and Eduard Stiefel. 
Conjugate Prior  In Bayesian probability theory, if the posterior distributions p(thetax) are in the same family as the prior probability distribution p(theta), the prior and posterior are then called conjugate distributions, and the prior is called a conjugate prior for the likelihood function. For example, the Gaussian family is conjugate to itself (or selfconjugate) with respect to a Gaussian likelihood function: if the likelihood function is Gaussian, choosing a Gaussian prior over the mean will ensure that the posterior distribution is also Gaussian. This means that the Gaussian distribution is a conjugate prior for the likelihood which is also Gaussian. 
Connected Scatterplot  The connected scatterplot visualizes two related time series in a scatterplot and connects the points with a line in temporal sequence. News media are increasingly using this technique to present data under the intuition that it is understandable and engaging. To explore these intuitions, we (1) describe how paired time series relationships appear in a connected scatterplot, (2) qualitatively evaluate how well people understand trends depicted in this format, (3) quantitatively measure the types and frequency of misinterpretations, and (4) empirically evaluate whether viewers will preferentially view graphs in this format over the more traditional format. The results suggest that lowcomplexity connected scatterplots can be understood with little explanation, and that viewers are biased towards inspecting connected scatterplots over the more traditional format. We also describe misinterpretations of connected scatterplots and propose further research into mitigating these mistakes for viewers unfamiliar with the technique. 
Connection Analytics  Connection Analytics – an emerging discipline that provides answers to persistent business questions such as identification and influence of thought leaders, impact of external events or players on financial risk, or analysis of network performance based on causal relationships between nodes. It provides a new way of looking at people, products, physical phenomena, or events. Enterprises are using Big Data analytics to complement traditional SQL queries in answering very familiar questions, such as customer retention, marketing attribution, risk mitigation, and operational efficiency which, until now, required enormous compute power, timeconsuming data management and the need for learning highly specialized programming and query languages. 
Connection Scan Algorithm (CSA) 
We introduce the Connection Scan Algorithm (CSA) to efficiently answer queries to timetable information systems. The input consists, in the simplest setting, of a source position and a desired target position. The output consist is a sequence of vehicles such as trains or buses that a traveler should take to get from the source to the target. We study several problem variations such as the earliest arrival and profile problems. We present algorithm variants that only optimize the arrival time or additionally optimize the number of transfers in the Pareto sense. An advantage of CSA is that is can easily adjust to changes in the timetable, allowing the easy incorporation of known vehicle delays. We additionally introduce the Minimum Expected Arrival Time (MEAT) problem to handle possible, uncertain, future vehicle delays. We present a solution to the MEAT problem that is based upon CSA. Finally, we extend CSA using the multilevel overlay paradigm to answer complex queries on nationwide integrated timetables with trains and buses. 
Connectionist Temporal Classification (CTC) 
Connectionist temporal classification (CTC) is widely used for maximum likelihood learning in endtoend speech recognition models. However, there is usually a disparity between the negative maximum likelihood and the performance metric used in speech recognition, e.g., word error rate (WER). This results in a mismatch between the objective function and metric during training. We show that the above problem can be mitigated by jointly training with maximum likelihood and policy gradient. In particular, with policy learning we are able to directly optimize on the (otherwise nondifferentiable) performance metric. We show that joint training improves relative performance by 4% to 13% for our endtoend model as compared to the same model learned through maximum likelihood. The model achieves 5.53% WER on Wall Street Journal dataset, and 5.42% and 14.70% on Librispeech testclean and testother set, respectively. 
ConoverIman Test  
Consilience  We describe an apparently new measure of multivariate goodnessoffit between sets of quantitative results from a model (simulation, analytical, or multiple regression), paired with those observed under corresponding conditions from the system being modeled. Our approach returns a single, integrative measure, even though it can accommodate complex systems that produce responses of M types. For each responsetype, the goodnessoffit measure, which we label ‘Consilience’ (C), is maximally 1, for perfect fit; near 0 for the largesample case (number of pairs, N, more than about 25) in which the modeled series is a random sample from a quasinormal distribution with the same mean and variance as that of the observed series (null model); and, less than 0, toward minusinfinity, for progressively worse fit. In addition, lackoffit for each responsetype can be apportioned between systematic and nonsystematic (unexplained) components of error. Finally, for statistical assessment of models relative to the equivalent null model, we offer provisional estimates of critical C vs. N, and of critical jointC vs. N and M, at various levels of Pr(typeI error). Application of our proposed methodology requires only MS Excel (2003 or later); we provide Excel XLS and XLSX templates that afford semiautomatic computation for systems involving up to M = 5 response types, each represented by up to N = 1000 observedandmodeled result pairs. N need not be equal, nor response pairs in complete overlap, over M. 
Consistent Generative Query Network  Stochastic video prediction is usually framed as an extrapolation problem where the goal is to sample a sequence of consecutive future image frames conditioned on a sequence of observed past frames. For the most part, algorithms for this task generate future video frames sequentially in an autoregressive fashion, which is slow and requires the input and output to be consecutive. We introduce a model that overcomes these drawbacks — it learns to generate a global latent representation from an arbitrary set of frames within a video. This representation can then be used to simultaneously and efficiently sample any number of temporally consistent frames at arbitrary timepoints in the video. We apply our model to synthetic video prediction tasks and achieve results that are comparable to stateoftheart video prediction models. In addition, we demonstrate the flexibility of our model by applying it to 3D scene reconstruction where we condition on location instead of time. To the best of our knowledge, our model is the first to provide flexible and coherent prediction on stochastic video datasets, as well as consistent 3D scene samples. Please check the project website https://bit.ly/2jX7Vyu to view scene reconstructions and videos produced by our model. 
Consistent Sampling  We describe a very simple method for `consistent sampling’ that allows for sampling with replacement. The method extends previous approaches to consistent sampling, which assign a pseudorandom real number to each element, and sample those with the smallest associated numbers. When sampling with replacement, our extension gives the item sampled a new, larger, associated pseudorandom number, and returns it to the pool of items being sampled. 
Constrained CLR  In this paper we explore different regression models based on Clusterwise Linear Regression (CLR). CLR aims to find the partition of the data into $k$ clusters, such that linear regressions fitted to each of the clusters minimize overall mean squared error on the whole data. The main obstacle preventing to use found regression models for prediction on the unseen test points is the absence of a reasonable way to obtain CLR cluster labels when the values of target variable are unknown. In this paper we propose two novel approaches on how to solve this problem. The first approach, predictive CLR builds a separate classification model to predict test CLR labels. The second approach, constrained CLR utilizes a set of userspecified constraints that enforce certain points to go to the same clusters. Assuming the constraint values are known for the test points, they can be directly used to assign CLR labels. We evaluate these two approaches on three UCI ML datasets as well as on a large corpus of health insurance claims. We show that both of the proposed algorithms significantly improve over the known CLRbased regression methods. Moreover, predictive CLR consistently outperforms linear regression and random forest, and shows comparable performance to support vector regression on UCI ML datasets. The constrained CLR approach achieves the best performance on the health insurance dataset, while enjoying only $\approx 20$ times increased computational time over linear regression. 
Constrained Optimization  In mathematical optimization, constrained optimization (in some contexts called constraint optimization) is the process of optimizing an objective function with respect to some variables in the presence of constraints on those variables. The objective function is either a cost function or energy function, which is to be minimized, or a reward function or utility function, which is to be maximized. Constraints can be either hard constraints, which set conditions for the variables that are required to be satisfied, or soft constraints, which have some variable values that are penalized in the objective function if, and based on the extent that, the conditions on the variables are not satisfied. 
Constrained Optimization By RAdial Basis Function Interpolation (COBRA) 

COnstrained PARAFAC2 (COPA) 
PARAFAC2 has demonstrated success in modeling irregular tensors, where the tensor dimensions vary across one of the modes. An example scenario is jointly modeling treatments across a set of patients with varying number of medical encounters, where the alignment of events in time bears no clinical meaning, and it may also be impossible to align them due to their varying length. Despite recent improvements on scaling up unconstrained PARAFAC2, its model factors are usually dense and sensitive to noise which limits their interpretability. As a result, the following open challenges remain: a) various modeling constraints, such as temporal smoothness, sparsity and nonnegativity, are needed to be imposed for interpretable temporal modeling and b) a scalable approach is required to support those constraints efficiently for large datasets. To tackle these challenges, we propose a COnstrained PARAFAC2 (COPA) method, which carefully incorporates optimization constraints such as temporal smoothness, sparsity, and nonnegativity in the resulting factors. To efficiently support all those constraints, COPA adopts a hybrid optimization framework using alternating optimization and alternating direction method of multiplier (AOADMM). As evaluated on large electronic health record (EHR) datasets with hundreds of thousands of patients, COPA achieves significant speedups (up to 36x faster) over prior PARAFAC2 approaches that only attempt to handle a subset of the constraints that COPA enables. Overall, our method outperforms all the baselines attempting to handle a subset of the constraints in terms of speed, while achieving the same level of accuracy. 
Constrained Policy Optimization (CPO) 
For many applications of reinforcement learning it can be more convenient to specify both a reward function and constraints, rather than trying to design behavior through the reward function. For example, systems that physically interact with or around humans should satisfy safety constraints. Recent advances in policy search algorithms (Mnih et al., 2016, Schulman et al., 2015, Lillicrap et al., 2016, Levine et al., 2016) have enabled new capabilities in highdimensional control, but do not consider the constrained setting. We propose Constrained Policy Optimization (CPO), the first generalpurpose policy search algorithm for constrained reinforcement learning with guarantees for nearconstraint satisfaction at each iteration. Our method allows us to train neural network policies for highdimensional control while making guarantees about policy behavior all throughout training. Our guarantees are based on a new theoretical result, which is of independent interest: we prove a bound relating the expected returns of two policies to an average divergence between them. We demonstrate the effectiveness of our approach on simulated robot locomotion tasks where the agent must satisfy constraints motivated by safety. 
Constrained Quantile Regression Averaging (CQRA) 
Probabilistic load forecasts provide comprehensive information about future load uncertainties. In recent years, many methodologies and techniques have been proposed for probabilistic load forecasting. Forecast combination, a widely recognized best practice in point forecasting literature, has never been formally adopted to combine probabilistic load forecasts. This paper proposes a constrained quantile regression averaging (CQRA) method to create an improved ensemble from several individual probabilistic forecasts. We formulate the CQRA parameter estimation problem as a linear program with the objective of minimizing the pinball loss, with the constraints that the parameters are nonnegative and summing up to one. We demonstrate the effectiveness of the proposed method using two publicly available datasets, the ISO New England data and Irish smart meter data. Comparing with the best individual probabilistic forecast, the ensemble can reduce the pinball score by 4.39% on average. The proposed ensemble also demonstrates superior performance over nine other benchmark ensembles. 
Constrained Recurrent Sparse AutoEncoder (CRsAE) 
Given a convolutional dictionary underlying a set of observed signals, can a carefully designed autoencoder recover the dictionary in the presence of noise We introduce an autoencoder architecture, termed constrained recurrent sparse autoencoder (CRsAE), that answers this question in the affirmative. Given an input signal and an approximate dictionary, the encoder finds a sparse approximation using FISTA. The decoder reconstructs the signal by applying the dictionary to the output of the encoder. The encoder and decoder in CRsAE parallel the sparsecoding and dictionary update steps in optimizationbased alternatingminimization schemes for dictionary learning. As such, the parameters of the encoder and decoder are not independent, a constraint which we enforce for the first time. We derive the backpropagation algorithm for CRsAE. CRsAE is a framework for blind source separation that, only knowing the number of sources (dictionary elements), and assuming sparselymany can overlap, is able to separate them. We demonstrate its utility in the context of spike sorting, a source separation problem in computational neuroscience. We demonstrate the ability of CRsAE to recover the underlying dictionary and characterize its sensitivity as a function of SNR. 
Constrained State Estimation  
Constraint Consistent Learning (CCL) 
A Library for Constraint Consistent Learning 
Content Grouping  Content Grouping lets you group content into a logical structure that reflects how you think about your site or app, and then view and compare aggregated metrics by group name in addition to being able to drill down to the individual URL, page title, or screen name. For example, you can see the aggregated number of pageviews for all pages in a group like Men/Shirts, and then drill in to see each URL or page title. You start by creating a Content Group, a collection of content. For example, on an ecommerce site that sells clothing, you might create groups for Men, Women, and Children. Then, within each group, you might create content like Shirts, Pants, Outerwear. This would let you compare aggregated statistics for each type of clothing within a group (e.g., Men’s Shirts vs Men’s Pants vs. Men’s Outerwear). It would also let you drill in to each group to see how individual Shirts pages compare to one another, for example, Men/Shirts/Tshirts/index.html vs Men/Shirts/DressShirts/index.html. 
ContentAware Representation Learning Model (CARL) 
Heterogeneous networks not only present a challenge of heterogeneity in the types of nodes and relations, but also the attributes and content associated with the nodes. While recent works have looked at representation learning on homogeneous and heterogeneous networks, there is no work that has collectively addressed the following challenges: (a) the heterogeneous structural information of the network consisting of multiple types of nodes and relations; (b) the unstructured semantic content (e.g., text) associated with nodes; and (c) online updates due to incoming new nodes in growing network. We address these challenges by developing a ContentAware Representation Learning model (CARL). CARL performs joint optimization of heterogeneous SkipGram and deep semantic encoding for capturing both heterogeneous structural closeness and unstructured semantic relations among all nodes, as function of node content, that exist in the network. Furthermore, an additional online update module is proposed for efficiently learning representations of incoming nodes. Extensive experiments demonstrate that CARL outperforms stateoftheart baselines in various heterogeneous network mining tasks, such as link prediction, document retrieval, node recommendation and relevance search. We also demonstrate the effectiveness of the CARL’s online update module through a category visualization study. 
Context Aware Bandits (CAB) 
In this paper, we present the CAB (Context Aware Bandits). With CAB we attempt to craft a bandit algorithm that can exploit collaborative effects and that can be deployed in a practical recommendation system setting, where the multiarmed bandits have been shown to perform well in particular with respect to the cold start problem. CAB exploits, a contextaware clustering technique augmenting explorationexploitation strategies in a contextual multiarmed bandit settings. CAB dynamically clusters the users based on the content universe under consideration. We demonstrate the efficacy of our approach on extensive realworld datasets, showing the scalability, and more importantly, the significant increased prediction performance compared to related stateoftheart methods. 
Context Awareness  Context awareness is a property of mobile devices that is defined complementarily to location awareness. Whereas location may determine how certain processes in a device operate, context may be applied more flexibly with mobile users, especially with users of smart phones. Context awareness originated as a term from ubiquitous computing or as socalled pervasive computing which sought to deal with linking changes in the environment with computer systems, which are otherwise static. The term has also been applied to business theory in relation to Contextual application design and business process management issues. 
Context Tree  There has been growing interests in recent years from both practical and research perspectives for sessionbased recommendation tasks as longterm user profiles do not often exist in many reallife recommendation applications. In this case, recommendations for user’s immediate next actions need to be generated based on patterns in anonymous short sessions. An often overlooked aspect is that new items with limited observations arrive continuously in many domains (e.g. news and discussion forums). Therefore, recommendations need to be adaptive to such frequent changes. In this paper, we benchmark a new nonparametric method called context tree (CT) against various stateoftheart methods on extensive datasets for sessionbased recommendation task. Apart from the standard static evaluation protocol adopted by previous literatures, we include an adaptive configuration to mimic the situation when new items with limited observations arrives continuously. Our results show that CT outperforms two bestperforming approaches (recurrent neural network; heuristicbased nearest neighbor) in majority of the tested configurations and datasets. We analyze reasons for this and demonstrate that it is because of the better adaptation to changes in the domain, as well as the remarkable capability to learn static sequential patterns. Moreover, our running time analysis illustrates the efficiency of using CT as other nonparametric methods. 
Contextaware Path Ranking (CPR) 
Knowledge base (KB) completion aims to infer missing facts from existing ones in a KB. Among various approaches, path ranking (PR) algorithms have received increasing attention in recent years. PR algorithms enumerate paths between entity pairs in a KB and use those paths as features to train a model for missing fact prediction. Due to their good performances and high model interpretability, several methods have been proposed. However, most existing methods suffer from scalability (high RAM consumption) and feature explosion (trains on an exponentially large number of features) problems. This paper proposes a Contextaware Path Ranking (CPR) algorithm to solve these problems by introducing a selective path exploration strategy. CPR learns global semantics of entities in the KB using word embedding and leverages the knowledge of entity semantics to enumerate contextually relevant paths using bidirectional random walk. Experimental results on three large KBs show that the path features (fewer in number) discovered by CPR not only improve predictive performance but also are more interpretable than existing baselines. 
ContextAware Personalized POI Sequence Recommender System (CAPS) 
The revolution of World Wide Web (WWW) and smartphone technologies have been the keyfactor behind remarkable success of social networks. With the ease of availability of checkin data, the locationbased social networks (LBSN) (e.g., Facebook1, etc.) have been heavily explored in the past decade for PointofInterest (POI) recommendation. Though many POI recommenders have been defined, most of them have focused on recommending a single location or an arbitrary list that is not contextually coherent. It has been cumbersome to rely on such systems when one needs a contextually coherent list of locations, that can be used for various daytoday activities, for e.g., itinerary planning. This paper proposes a model termed as CAPS (ContextAware Personalized POI Sequence Recommender System) that generates contextually coherent POI sequences relevant to user preferences. To the best of our knowledge, CAPS is the first attempt to formulate the contextual POI sequence modeling by extending Recurrent Neural Network (RNN) and its variants. CAPS extends RNN by incorporating multiple contexts to the hidden layer and by incorporating global context (sequence features) to the hidden layers and the output layer. It extends the variants of RNN (e.g., Longshort term memory (LSTM)) by incorporating multiple contexts and global features in the gate update relations. The major contributions of this paper are: (i) it models the contextual POI sequence problem by incorporating personalized user preferences through multiple constraints (e.g., categorical, social, temporal, etc.), (ii) it extends RNN to incorporate the contexts of individual item and that of the whole sequence. It also extends the gated functionality of variants of RNN to incorporate the multiple contexts, and (iii) it evaluates the proposed models against two realworld data sets. 
ContextAware Policy reuSe (CAPS) 
Transfer learning can greatly speed up reinforcement learning for a new task by leveraging policies of relevant tasks. Existing works of policy reuse either focus on only selecting a single best source policy for transfer without considering contexts, or cannot guarantee to learn an optimal policy for a target task. To improve transfer efficiency and guarantee optimality, we develop a novel policy reuse method, called {\em ContextAware Policy reuSe} (CAPS), that enables multipolicy transfer. Our method learns when and which source policy is best for reuse, as well as when to terminate its reuse. CAPS provides theoretical guarantees in convergence and optimality for both source policy selection and target task learning. Empirical results on a gridbased navigation domain and the Pygame Learning Environment demonstrate that CAPS significantly outperforms other stateoftheart policy reuse methods. 
ContextAware Recommender Systems (CARS) 
Interpreting Contextual Effects By Contextual Modeling In Recommender Systems 
Contextaware Sentiment Word Identification (sentiword2vec) 
Traditional sentiment analysis often uses sentiment dictionary to extract sentiment information in text and classify documents. However, emerging informal words and phrases in user generated content call for analysis aware to the context. Usually, they have special meanings in a particular context. Because of its great performance in representing interword relation, we use sentiment word vectors to identify the special words. Based on the distributed language model word2vec, in this paper we represent a novel method about sentiment representation of word under particular context, to be detailed, to identify the words with abnormal sentiment polarity in long answers. Result shows the improved model shows better performance in representing the words with special meaning, while keep doing well in representing special idiomatic pattern. Finally, we will discuss the meaning of vectors representing in the field of sentiment, which may be different from general objectbased conditions. 
ContextNet  Modern deep learning architectures produce highly accurate results on many challenging semantic segmentation datasets. Stateoftheart methods are, however, not directly transferable to realtime applications or embedded devices, since naive adaptation of such systems to reduce computational cost (speed, memory and energy) causes a significant drop in accuracy. We propose ContextNet, a new deep neural network architecture which builds on factorized convolution, network compression and pyramid representations to produce competitive semantic segmentation in realtime with low memory requirements. ContextNet combines a deep branch at low resolution that captures global context information efficiently with a shallow branch that focuses on highresolution segmentation details. We analyze our network in a thorough ablation study and present results on the Cityscapes dataset, achieving 66.1% accuracy at 18.2 frames per second at full (1024×2048) resolution. 
Contextual / Common Query Language (CQL) 
Contextual Query Language (CQL), previously known as Common Query Language, is a formal language for representing queries to information retrieval systems such as search engines, bibliographic catalogs and museum collection information. Based on the semantics of Z39.50, its design objective is that queries be human readable and writable, and that the language be intuitive while maintaining the expressiveness of more complex query languages. 
Contextual Bandit  The problem of matching ads to interests is a natural machine learning problem in some ways since there is much information in who clicks on what. A fundamental problem with this information is that it is not supervised – in particular a clickornot on one ad doesn’t generally tell you if a different ad would have been clicked on. This implies we have a fundamental exploration problem. A standard mathematical setting for this situation is “kArmed Bandits”, often with various relevant embellishments. The kArmed Bandit setting works on a roundbyround basis. On each round: 1. A policy chooses arm a from 1 of k arms (i.e. 1 of k ads). 2. The world reveals the reward ra of the chosen arm (i.e. whether the ad is clicked on). http://…/Multiarmed_bandit#Contextual_Bandit 
Contextual Explanation Networks (CEN) 
We introduce contextual explanation networks (CENs)—a class of models that learn to predict by generating and leveraging intermediate explanations. CENs combine deep networks with contextspecific probabilistic models and construct explanations in the form of locallycorrect hypotheses. Contrary to the existing posthoc modelexplanation tools, CENs learn to predict and to explain jointly. Our approach offers two major advantages: (i) for each prediction, valid instancespecific explanations are generated with no computational overhead and (ii) prediction via explanation acts as a regularization and boosts performance in lowresource settings. We prove that local approximations to the decision boundary of our networks are consistent with the generated explanations. Our results on image and text classification and survival analysis tasks demonstrate that CENs can easily match or outperform the stateoftheart while offering additional insights behind each prediction, valuable for decision support. 
Contextual Graph Markov Model  We introduce the Contextual Graph Markov Model, an approach combining ideas from generative models and neural networks for the processing of graph data. It founds on a constructive methodology to build a deep architecture comprising layers of probabilistic models that learn to encode the structured information in an incremental fashion. Context is diffused in an efficient and scalable way across the graph vertexes and edges. The resulting graph encoding is used in combination with discriminative models to address structure classification benchmarks. 
Contextual Listen, Attend and Spell (CLAS) 
In automatic speech recognition (ASR) what a user says depends on the particular context she is in. Typically, this context is represented as a set of word ngrams. In this work, we present a novel, allneural, endtoend (E2E) ASR sys tem that utilizes such context. Our approach, which we re fer to as Contextual Listen, Attend and Spell (CLAS) jointly optimizes the ASR components along with embeddings of the context ngrams. During inference, the CLAS system can be presented with context phrases which might contain outof vocabulary (OOV) terms not seen during training. We compare our proposed system to a more traditional contextualization approach, which performs shallowfusion between independently trained LAS and contextual ngram models during beam search. Across a number of tasks, we find that the proposed CLAS system outperforms the baseline method by as much as 68% relative WER, indicating the advantage of joint optimization over individually trained components. Index Terms: speech recognition, sequencetosequence models, listen attend and spell, LAS, attention, embedded speech recognition. 
Contextual Memory Tree  We design and study a Contextual Memory Tree (CMT), a learning memory controller that inserts new memories into an experience store of unbounded size. It is designed to efficiently query for memories from that store, supporting logarithmic time insertion and retrieval operations. Hence CMT can be integrated into existing statistical learning algorithms as an augmented memory unit without substantially increasing training and inference computation. We demonstrate the efficacy of CMT by augmenting existing multiclass and multilabel classification algorithms with CMT and observe statistical improvement. We also test CMT learning on several imagecaptioning tasks to demonstrate that it performs computationally better than a simple nearest neighbors memory system while benefitting from reward learning. 
Contextual MultiArmed Bandits  MultiArmed Bandits with side information. 
Contextual Outlier INterpretation (COIN) 
Outlier detection plays an essential role in many datadriven applications to identify isolated instances that are different from the majority. While many statistical learning and data mining techniques have been used for developing more effective outlier detection algorithms, the interpretation of detected outliers does not receive much attention. Interpretation is becoming increasingly important to help people trust and evaluate the developed models through providing intrinsic reasons why the certain outliers are chosen. It is difficult, if not impossible, to simply apply feature selection for explaining outliers due to the distinct characteristics of various detection models, complicated structures of data in certain applications, and imbalanced distribution of outliers and normal instances. In addition, the role of contrastive contexts where outliers locate, as well as the relation between outliers and contexts, are usually overlooked in interpretation. To tackle the issues above, in this paper, we propose a novel Contextual Outlier INterpretation (COIN) method to explain the abnormality of existing outliers spotted by detectors. The interpretability for an outlier is achieved from three aspects: outlierness score, attributes that contribute to the abnormality, and contextual description of its neighborhoods. Experimental results on various types of datasets demonstrate the flexibility and effectiveness of the proposed framework compared with existing interpretation approaches. 
Contextual Policy Optimisation (CPO) 
Policy gradient methods have been successfully applied to a variety of reinforcement learning tasks. However, while learning in a simulator, these methods do not utilise the opportunity to improve learning by adjusting certain environment variables: unobservable state features that are randomly determined by the environment in a physical setting, but that are controllable in a simulator. This can lead to slow learning, or convergence to highly suboptimal policies. In this paper, we present contextual policy optimisation (CPO). The central idea is to use Bayesian optimisation to actively select the distribution of the environment variable that maximises the improvement generated by each iteration of the policy gradient method. To make this Bayesian optimisation practical, we contribute two easytocompute lowdimensional fingerprints of the current policy. We apply CPO to a number of continuous control tasks of varying difficulty and show that CPO can efficiently learn policies that are robust to significant rare events, which are unlikely to be observable under random sampling but are key to learning good policies. 
Contextual Regression  Machine learning algorithms such as linear regression, SVM and neural network have played an increasingly important role in the process of scientific discovery. However, none of them is both interpretable and accurate on nonlinear datasets. Here we present contextual regression, a method that joins these two desirable properties together using a hybrid architecture of neural network embedding and dot product layer. We demonstrate its high prediction accuracy and sensitivity through the task of predictive feature selection on a simulated dataset and the application of predicting open chromatin sites in the human genome. On the simulated data, our method achieved high fidelity recovery of feature contributions under random noise levels up to 200%. On the open chromatin dataset, the application of our method not only outperformed the state of the art method in terms of accuracy, but also unveiled two previously unfound open chromatin related histone marks. Our method can fill the blank of accurate and interpretable nonlinear modeling in scientific data mining tasks. 
Contextual Stochastic Block Model  We provide the first information theoretic tight analysis for inference of latent community structure given a sparse graph along with high dimensional node covariates, correlated with the same latent communities. Our work bridges recent theoretical breakthroughs in the detection of latent community structure without nodes covariates and a large body of empirical work using diverse heuristics for combining node covariates with graphs for inference. The tightness of our analysis implies in particular, the information theoretical necessity of combining the different sources of information. Our analysis holds for networks of large degrees as well as for a Gaussian version of the model. 
Continued Logarithm (CL) 
Analysis of the Continued Logarithm Algorithm 
Continuous BagofWords (CBOW) 
The ‘continuous bagofwords model’ (CBOW) adds inputs from words within short window to predict the current word. http://…/1301.3781.pdf 
Continuous Computation Language (CCL) 
For Sybase Complex Event Procesing (CEP), developers create CEP applications using the Continuous Computation Language (CCL). Introduced in 2005, CCL was the first commercial, declarative SQLbased CEP language and remains the most extensive SQLbased CEP language on the market. Because the Continuous Computation Language (CCL) is a SQLbased language, it gives programmers a huge head start in creating CEP applications. The Sybase CEP Studio helps manage all aspects of the application development process, further increasing programmer productivity. 
Continuous Semantic Topic Embedding Model (CSTEM) 
This paper proposes the continuous semantic topic embedding model (CSTEM) which finds latent topic variables in documents using continuous semantic distance function between the topics and the words by means of the variational autoencoder(VAE). The semantic distance could be represented by any symmetric bellshaped geometric distance function on the Euclidean space, for which the Mahalanobis distance is used in this paper. In order for the semantic distance to perform more properly, we newly introduce an additional model parameter for each word to take out the global factor from this distance indicating how likely it occurs regardless of its topic. It certainly improves the problem that the Gaussian distribution which is used in previous topic model with continuous word embedding could not explain the semantic relation correctly and helps to obtain the higher topic coherence. Through the experiments with the dataset of 20 Newsgroup, NIPS papers and CNN/Dailymail corpus, the performance of the recent stateoftheart models is accomplished by our model as well as generating topic embedding vectors which makes possible to observe where the topic vectors are embedded with the word vectors in the real Euclidean space and how the topics are related each other semantically. 
Continuous Skipgram (Skipgram) 
The training objective of the Skipgram model is to find word representations that are useful for predicting the surrounding words in a sentence or a document. More formally, given a sequence of training words w1,w2,w3, … ,wT , the objective of the Skipgram model is to maximize the average log probability, where c is the size of the training context (which can be a function of the center word wt). Larger c results in more training examples and thus can lead to a higher accuracy, at the expense of the 2 training time. http://…/1301.3781.pdf 
Continuous Time Autoregressive Moving Average (CARMA) 
We introduce the class of continuoustime autoregressive movingaverage (CARMA) processes in Hilbert spaces. As driving noises of these processes we consider Levy processes in Hilbert space. We provide the basic definitions, show relevant properties of these processes and establish the equivalents of CARMA processes on the real line. Finally, CARMA processes in Hilbert space are linked to the stochastic wave equation and functional autoregressive processes. Multivariate stochastic delay differential equations and CAR representations of CARMA processes 
Continuous Time Stochastic Modelling (CTSM) 
In probability theory and statistics, a continuoustime stochastic process, or a continuousspacetime stochastic process is a stochastic process for which the index variable takes a continuous set of values, as contrasted with a discretetime process for which the index variable takes only distinct values. An alternative terminology uses continuous parameter as being more inclusive. A more restricted class of processes are the continuous stochastic processes: here the term often (but not always) implies both that the index variable is continuous and that sample paths of the process are continuous. Given the possible confusion, caution is needed. Continuoustime stochastic processes that are constructed from discretetime processes via a waiting time distribution are called continuoustime random walks. ctsmr 
ContinuousTime Fractionally Integrated ARMA (CARFIMA) 
carfima 
Contrast  In statistics, particularly analysis of variance and linear regression, an orthogonal contrast is a linear combination of two or more factor level means (averages) whose coefficients add up to zero. Nonorthogonal contrasts do not necessarily sum to 0. Contrasts should be constructed “to answer specific research questions”, and do not necessarily have to be orthogonal. 
Contrast Analysis  ➚ “Contrast” Contrast Analysis What is a contrast matrix (a term, pertaining to an analysis with categorical predictors)? 
Contrastive Divergence (CD) 
Contrastive Divergence (CD), an approximate MaximumLikelihood (ML) learning algorithm proposed by Geoffrey Hinton. Contrastive Divergence is basically a funky term for “approximate gradient descent”. 
Contrastive Predictive Coding  While supervised learning has enabled great progress in many applications, unsupervised learning has not seen such widespread adoption, and remains an important and challenging endeavor for artificial intelligence. In this work, we propose a universal unsupervised learning approach to extract useful representations from highdimensional data, which we call Contrastive Predictive Coding. The key insight of our model is to learn such representations by predicting the future in latent space by using powerful autoregressive models. We use a probabilistic contrastive loss which induces the latent space to capture information that is maximally useful to predict future samples. It also makes the model tractable by using negative sampling. While most prior work has focused on evaluating representations for a particular modality, we demonstrate that our approach is able to learn useful representations achieving strong performance on four distinct domains: speech, images, text and reinforcement learning in 3D environments. 
Contrastive Principal Component Analysis (cPCA) 
We present a new technique called contrastive principal component analysis (cPCA) that is designed to discover lowdimensional structure that is unique to a dataset, or enriched in one dataset relative to other data. The technique is a generalization of standard PCA, for the setting where multiple datasets are available — e.g. a treatment and a control group, or a mixed versus a homogeneous population — and the goal is to explore patterns that are specific to one of the datasets. We conduct a wide variety of experiments in which cPCA identifies important datasetspecific patterns that are missed by PCA, demonstrating that it is useful for many applications: subgroup discovery, visualizing trends, feature selection, denoising, and datadependent standardization. We provide geometrical interpretations of cPCA and show that it satisfies desirable theoretical guarantees. We also extend cPCA to nonlinear settings in the form of kernel cPCA. We have released our code as a python package and documentation is on Github. 
Contrastivecenter Loss  The deep convolutional neural network(CNN) has significantly raised the performance of image classification and face recognition. Softmax is usually used as supervision, but it only penalizes the classification loss. In this paper, we propose a novel auxiliary supervision signal called contrastivecenter loss, which can further enhance the discriminative power of the features, for it learns a class center for each class. The proposed contrastivecenter loss simultaneously considers intraclass compactness and interclass separability, by penalizing the contrastive values between: (1)the distances of training samples to their corresponding class centers, and (2)the sum of the distances of training samples to their noncorresponding class centers. Experiments on different datasets demonstrate the effectiveness of contrastivecenter loss. 
Control Toolbox (CT) 
We introduce the Control Toolbox (CT), an opensource C++ library for efficient modelling, control, estimation, trajectory optimization and model predictive control. The CT is applicable to a broad class of dynamic systems, but features additional modelling tools specially designed for robotics. This paper outlines its general concept, its major building blocks and highlights selected application examples. The CT was designed for intuitive modelling of systems governed by ordinary differential or difference equations. It supports rapid prototyping of cost functions and constraints and provides common interfaces for different optimal control solvers. To date, we support Single Shooting, the iterative LinearQuadratic Regulator, GaussNewton Multiple Shooting and classical Direct Multiple Shooting. We provide interfaces to different NLP and linearquadratic solvers, such as IPOPT, SNOPT, HPIPM, or a custom Riccati solver. The CT was designed with performance for online control in mind and allows to solve largescale optimal control problems highly efficiently. Some of the key features enabling fast runtime performance are full support for Automatic Differentiation, derivative code generation and thorough multithreading. For robotics problems, the we offer an interface to a fully autodifferentiable rigidbody dynamics modelling engine. In combination with derivative code generation, this allows for an unprecedented performance in solving optimal control problems for complex articulated robotic systems. 
conu  There has been a need for a simple, easytouse handler for writing tests and other code around containers that would implement helpful methods and utilities. For this we introduce conu, a lowlevel Python library. This project has been driven from the start by the requirements of container maintainers and testers. In addition to basic image and container management methods, it provides other often used functions, such as container mount, shortcut methods for getting an IP address, exposed ports, logs, name, image extending using sourcetoimage, and many others. conu aims for stable engineagnostic APIs that would be implemented by several container runtime backends. Switching between two different container engines should require only minimum effort. When used for testing, one set of tests could be executed for multiple backends. 
ConvCSNet  Compressive sensing (CS), aiming to reconstruct an image/signal from a small set of random measurements has attracted considerable attentions in recent years. Due to the high dimensionality of images, previous CS methods mainly work on image blocks to avoid the huge requirements of memory and computation, i.e., image blocks are measured with Gaussian random matrices, and the whole images are recovered from the reconstructed image blocks. Though efficient, such methods suffer from serious blocking artifacts. In this paper, we propose a convolutional CS framework that senses the whole image using a set of convolutional filters. Instead of reconstructing individual blocks, the whole image is reconstructed from the linear convolutional measurements. Specifically, the convolutional CS is implemented based on a convolutional neural network (CNN), which performs both the convolutional CS and nonlinear reconstruction. Through endtoend training, the sensing filters and the reconstruction network can be jointly optimized. To facilitate the design of the CS reconstruction network, a novel twobranch CNN inspired from a sparsitybased CS reconstruction model is developed. Experimental results show that the proposed method substantially outperforms previous stateoftheart CS methods in term of both PSNR and visual quality. 
CONVERGEFASTAUXNET  In this paper, we introduce an innovative method to improve the convergence speed and accuracy of object detection neural networks. Our approach, CONVERGEFASTAUXNET, is based on employing multiple, dependent loss metrics and weighting them optimally using an online trained auxiliary network. Experiments are performed in the wellknown RoboCup@Work challenge environment. A fully convolutional segmentation network is trained on detecting objects’ pickup points. We empirically obtain an approximate measure for the rate of success of a robotic pickup operation based on the accuracy of the object detection network. Our experiments show that adding an optimally weighted Euclidean distance loss to a network trained on the commonly used Intersection over Union (IoU) metric reduces the convergence time by 42.48%. The estimated pickup rate is improved by 39.90%. Compared to stateoftheart task weighting methods, the improvement is 24.5% in convergence, and 15.8% on the estimated pickup rate. 
Convergence Clubs  Clustering Regions that form Convergence Clubs, according to the definition of Phillips and Sul (2009) <doi:10.1002/jae.1080>. ConvergenceClubs 
CONvergence of iterated CORrelations (CONCOR) 
Given an adjacency matrix, or a set of adjacency matrices for different relations, a correlation matrix can be formed by the following procedure. Form a profile vector for a vertex i by concatenating the ith row in every adjacency matrix; the i,jth element of the correlation matrix is the Pearson correlation coefficient of the profile vectors of i and j. This (square, symmetric) matrix is called the first correlation matrix. The procedure can be performed iteratively on the correlation matrix until convergence. Each entry is now 1 or 1. This matrix is used to split the data into two blocks such that members of the same block are positively correlated, members of different blocks are negatively correlated. CONCOR uses the above technique to split the initial data into two blocks. Successive splits are then applied to the separate blocks. At each iteration all blocks are submitted for analysis, however blocks containing two vertices are not split. Consequently npartitions of the binary tree can produce up to 2n blocks. Note that any similarity matrix can be used as input. http://…/concorinr 
Convergence of Random Variables  In probability theory, there exist several different notions of convergence of random variables. The convergence of sequences of random variables to some limit random variable is an important concept in probability theory, and its applications to statistics and stochastic processes. The same concepts are known in more general mathematics as stochastic convergence and they formalize the idea that a sequence of essentially random or unpredictable events can sometimes be expected to settle down into a behaviour that is essentially unchanging when items far enough into the sequence are studied. The different possible notions of convergence relate to how such a behaviour can be characterised: two readily understood behaviours are that the sequence eventually takes a constant value, and that values in the sequence continue to change but can be described by an unchanging probability distribution. http://…ty_theory#Convergence_of_random_variables http://…ofconvergenceinprobabilitytheory.jpg 
Convergent Cross Mapping (CCM) 
Convergent cross mapping (CCM) is a statistical test for a causeandeffect relationship between two time series variables that, like the Granger causality test, seeks to resolve the problem that correlation does not imply causation. While Granger causality is best suited for purely stochastic systems where the influence of the causal variables are separable (independent of each other), CCM is based on the theory of Dynamical systems and can be applied to systems where causal variables have synergistic effects. The test was developed in 2012 by the lab of George Sugihara of the Scripps Institution of Oceanography, La Jolla, California, USA. 
Convex Banding of the Covariance Matrix  We introduce a new sparse estimator of the covariance matrix for highdimensional models in which the variables have a known ordering. Our estimator, which is the solution to a convex optimization problem, is equivalently expressed as an estimator which tapers the sample covariance matrix by a Toeplitz, sparselybanded, dataadaptive matrix. As a result of this adaptivity, the convex banding estimator enjoys theoretical optimality properties not attained by previous banding or tapered estimators. In particular, our convex banding estimator is minimax rate adaptive in Frobenius and operator norms, up to log factors, over commonlystudied classes of covariance matrices, and over more general classes. Furthermore, it correctly recovers the bandwidth when the true covariance is exactly banded. Our convex formulation admits a simple and efficient algorithm. Empirical studies demonstrate its practical effectiveness and illustrate that our exactlybanded estimator works well even when the true covariance matrix is only close to a banded matrix, confirming our theoretical results. Our method compares favorably with all existing methods, in terms of accuracy and speed. We illustrate the practical merits of the convex banding estimator by showing that it can be used to improve the performance of discriminant analysis for classifying sound recordings. 
Convex Feasibility Problem (CFP) 
The convex feasibility problem (CFP) is to find a feasible point in the intersection of finitely many convex and closed sets. If the intersection is empty then the CFP is inconsistent and a feasible point does not exist. 
Convex Function  In mathematics, a realvalued function f(x) defined on an interval is called convex (or convex downward or concave upward) if the line segment between any two points on the graph of the function lies above the graph, in a Euclidean space (or more generally a vector space) of at least two dimensions. Equivalently, a function is convex if its epigraph (the set of points on or above the graph of the function) is a convex set. Wellknown examples of convex functions are the quadratic function f(x)=x^2 and the exponential function f(x)=e^x for any real number x. Convex functions play an important role in many areas of mathematics. They are especially important in the study of optimization problems where they are distinguished by a number of convenient properties. For instance, a (strictly) convex function on an open set has no more than one minimum. Even in infinitedimensional spaces, under suitable additional hypotheses, convex functions continue to satisfy such properties and, as a result, they are the most wellunderstood functionals in the calculus of variations. In probability theory, a convex function applied to the expected value of a random variable is always less than or equal to the expected value of the convex function of the random variable. This result, known as Jensen’s inequality, underlies many important inequalities (including, for instance, the arithmeticgeometric mean inequality and Hölder’s inequality). Exponential growth is a special case of convexity. Exponential growth narrowly means “increasing at a rate proportional to the current value”, while convex growth generally means “increasing at an increasing rate (but not necessarily proportionally to current value)”. 
Convex Hierarchical Testing (CHT) 
We consider the testing of all pairwise interactions in a twoclass problem with many features. We devise a hierarchical testing framework that considers an interaction only when one or more of its constituent features has a nonzero main effect. The test is based on a convex optimization framework that seamlessly considers main effects and interactions together. 
Convex Optimization  Convex minimization, a subfield of optimization, studies the problem of minimizing convex functions over convex sets. The convexity property can make optimization in some sense “easier” than the general case – for example, any local minimum must be a global minimum. 
Convexified Convolutional Neural Networks (CCNN) 
We describe the class of convexified convolutional neural networks (CCNNs), which capture the parameter sharing of convolutional neural networks in a convex manner. By representing the nonlinear convolutional filters as vectors in a reproducing kernel Hilbert space, the CNN parameters can be represented as a lowrank matrix, which can be relaxed to obtain a convex optimization problem. For learning twolayer convolutional neural networks, we prove that the generalization error obtained by a convexified CNN converges to that of the best possible CNN. For learning deeper networks, we train CCNNs in a layerwise manner. Empirically, CCNNs achieve performance competitive with CNNs trained by backpropagation, SVMs, fullyconnected neural networks, stacked denoising autoencoders, and other baseline methods. 
ConvFlow  Bayesian posterior inference is prevalent in various machine learning problems. Variational inference provides one way to approximate the posterior distribution, however its expressive power is limited and so is the accuracy of resulting approximation. Recently, there has a trend of using neural networks to approximate the variational posterior distribution due to the flexibility of neural network architecture. One way to construct flexible variational distribution is to warp a simple density into a complex by normalizing flows, where the resulting density can be analytically evaluated. However, there is a tradeoff between the flexibility of normalizing flow and computation cost for efficient transformation. In this paper, we propose a simple yet effective architecture of normalizing flows, ConvFlow, based on convolution over the dimensions of random input vector. Experiments on synthetic and real world posterior inference problems demonstrate the effectiveness and efficiency of the proposed method. 
ConvNetJS  ConvNetJS is a Javascript library for training Deep Learning models (mainly Neural Networks) entirely in your browser. Open a tab and you’re training. No software requirements, no compilers, no installations, no GPUs, no sweat. 
Convolution  In mathematics and, in particular, functional analysis, convolution is a mathematical operation on two functions f and g, producing a third function that is typically viewed as a modified version of one of the original functions, giving the area overlap between the two functions as a function of the amount that one of the original functions is translated. Convolution is similar to crosscorrelation. It has applications that include probability, statistics, computer vision, image and signal processing, electrical engineering, and differential equations. 
Convolutional Analysis Operator Learning (CAOL) 
Convolutional operator learning is increasingly gaining attention in many signal processing and computer vision applications. Learning kernels has mostly relied on socalled local approaches that extract and store many overlapping patches across training signals. Due to memory demands, local approaches have limitations when learning kernels from large datasets — particularly with multilayered structures, e.g., convolutional neural network (CNN) — and/or applying the learned kernels to highdimensional signal recovery problems. The socalled global approach has been studied within the ‘synthesis’ signal model, e.g., convolutional dictionary learning, overcoming the memory problems by careful algorithmic designs. This paper proposes a new convolutional analysis operator learning (CAOL) framework in the global approach, and develops a new convergent Block Proximal Gradient method using a Majorizer (BPGM) to solve the corresponding block multinonconvex problems. To learn diverse filters within the CAOL framework, this paper introduces an orthogonality constraint that enforces a tightframe (TF) filter condition, and a regularizer that promotes diversity between filters. Numerical experiments show that, for tight majorizers, BPGM significantly accelerates the CAOL convergence rate compared to the stateoftheart method, BPG. Numerical experiments for sparseview computational tomography show that CAOL using TF filters significantly improves reconstruction quality compared to a conventional edgepreserving regularizer. Finally, this paper shows that CAOL can be useful to mathematically model a CNN, and the corresponding updates obtained via BPGM coincide with core modules of the CNN. 
Convolutional Block Attention Module (CBAM) 
We propose Convolutional Block Attention Module (CBAM), a simple yet effective attention module for feedforward convolutional neural networks. Given an intermediate feature map, our module sequentially infers attention maps along two separate dimensions, channel and spatial, then the attention maps are multiplied to the input feature map for adaptive feature refinement. Because CBAM is a lightweight and general module, it can be integrated into any CNN architectures seamlessly with negligible overheads and is endtoend trainable along with base CNNs. We validate our CBAM through extensive experiments on ImageNet1K, MS~COCO detection, and VOC~2007 detection datasets. Our experiments show consistent improvements in classification and detection performances with various models, demonstrating the wide applicability of CBAM. The code and models will be publicly available. 
Convolutional Deep Averaging Network (CDAN) 
Unordered feature sets are a nonstandard data structure that traditional neural networks are incapable of addressing in a principled manner. Providing a concatenation of features in an arbitrary order may lead to the learning of spurious patterns or biases that do not actually exist. Another complication is introduced if the number of features varies between each set. We propose convolutional deep averaging networks (CDANs) for classifying and learning representations of datasets whose instances comprise variablesize, unordered feature sets. CDANs are efficient, permutationinvariant, and capable of accepting sets of arbitrary size. We emphasize the importance of nonlinear feature embeddings for obtaining effective CDAN classifiers and illustrate their advantages in experiments versus linear embeddings and alternative permutationinvariant and equivariant architectures. 
Convolutional Dictionary Learning  Convolutional sparse representations are a form of sparse representation with a dictionary that has a structure that is equivalent to convolution with a set of linear filters. While effective algorithms have recently been developed for the convolutional sparse coding problem, the corresponding dictionary learning problem is substantially more challenging. Furthermore, although a number of different approaches have been proposed, the absence of thorough comparisons between them makes it difficult to determine which of them represents the current state of the art. The present work both addresses this deficiency and proposes some new approaches that outperform existing ones in certain contexts. A thorough set of performance comparisons indicates a very wide range of performance differences among the existing and proposed methods, and clearly identifies those that are the most effective. 
Convolutional Gaussian Processes  We present a practical way of introducing convolutional structure into Gaussian processes, making them more suited to highdimensional inputs like images. The main contribution of our work is the construction of an interdomain inducing point approximation that is welltailored to the convolutional kernel. This allows us to gain the generalisation benefit of a convolutional kernel, together with fast but accurate posterior inference. We investigate several variations of the convolutional kernel, and apply it to MNIST and CIFAR10, which have both been known to be challenging for Gaussian processes. We also show how the marginal likelihood can be used to find an optimal weighting between convolutional and RBF kernels to further improve performance. We hope that this illustration of the usefulness of a marginal likelihood will help automate discovering architectures in larger models. 
Convolutional Geometric Matrix Completion (CGMC) 
Geometric matrix completion~(GMC) has been proposed for recommendation by integrating the relationship~(link) graphs among users/items into matrix completion~(MC) . Traditional \mbox{GMC} methods typically adopt graph regularization to impose smoothness priors for \mbox{MC}. Recently, geometric deep learning on graphs~(\mbox{GDLG}) is proposed to solve the GMC problem, showing better performance than existing GMC methods including traditional graph regularization based methods. To the best of our knowledge, there exists only one GDLG method for GMC, which is called \mbox{RMGCNN}. RMGCNN combines graph convolutional network~(GCN) and recurrent neural network~(RNN) together for GMC. In the original work of RMGCNN, RMGCNN demonstrates better performance than pure GCNbased method. In this paper, we propose a new \mbox{GMC} method, called \underline{c}onvolutional \underline{g}eometric \underline{m}atrix \underline{c}ompletion~(CGMC), for recommendation with graphs among users/items. CGMC is a pure GCNbased method with a newly designed graph convolutional network. Experimental results on real datasets show that CGMC can outperform other stateoftheart methods including RMGCNN. 
Convolutional Highways  Convolutional highways are deep networks based on multiple stacked convolutional layers for feature preprocessing. 
Convolutional Neural Knowledge Graph Learning  Previous models for learning entity and relationship embeddings of knowledge graphs such as TransE, TransH, and TransR aim to explore new links based on learned representations. However, these models interpret relationships as simple translations on entity embeddings. In this paper, we try to learn more complex connections between entities and relationships. In particular, we use a Convolutional Neural Network (CNN) to learn entity and relationship representations in knowledge graphs. In our model, we treat entities and relationships as onedimensional numerical sequences with the same length. After that, we combine each triplet of head, relationship, and tail together as a matrix with height 3. CNN is applied to the triplets to get confidence scores. Positive and manually corrupted negative triplets are used to train the embeddings and the CNN model simultaneously. Experimental results on public benchmark datasets show that the proposed model outperforms stateoftheart models on exploring unseen relationships, which proves that CNN is effective to learn complex interactive patterns between entities and relationships. 
Convolutional Neural Network  In computer science, a convolutional neural network is a type of feedforward artificial neural network where the individual neurons are tiled in such a way that they respond to overlapping regions in the visual field. Convolutional networks were inspired by biological processes and are variations of multilayer perceptrons which are designed to use minimal amounts of preprocessing. They are widely used models for image recognition. http://…oduction_to_Convolutional_Neural_Networks 
Convolutional Neural Network – Support Vector Machine (CNNSVM) 
Convolutional neural networks (CNNs) are similar to ‘ordinary’ neural networks in the sense that they are made up of hidden layers consisting of neurons with ‘learnable’ parameters. These neurons receive inputs, performs a dot product, and then follows it with a nonlinearity. The whole network expresses the mapping between raw image pixels and their class scores. Conventionally, the Softmax function is the classifier used at the last layer of this network. However, there have been studies (Alalshekmubarak and Smith, 2013; Agarap, 2017; Tang, 2013) conducted to challenge this norm. The cited studies introduce the usage of linear support vector machine (SVM) in an artificial neural network architecture. This project is yet another take on the subject, and is inspired by (Tang, 2013). Empirical data has shown that the CNNSVM model was able to achieve a test accuracy of ~99.04% using the MNIST dataset (LeCun, Cortes, and Burges, 2010). On the other hand, the CNNSoftmax was able to achieve a test accuracy of ~99.23% using the same dataset. Both models were also tested on the recentlypublished FashionMNIST dataset (Xiao, Rasul, and Vollgraf, 2017), which is suppose to be a more difficult image classification dataset than MNIST (Zalandoresearch, 2017). This proved to be the case as CNNSVM reached a test accuracy of ~90.72%, while the CNNSoftmax reached a test accuracy of ~91.86%. The said results may be improved if data preprocessing techniques were employed on the datasets, and if the base CNN model was a relatively more sophisticated than the one used in this study. 
Convolutional Neural Network with Alternately Updated Clique (CliqueNet) 
Improving information flow in deep networks helps to ease the training difficulties and utilize parameters more efficiently. Here we propose a new convolutional neural network architecture with alternately updated clique (CliqueNet). In contrast to prior networks, there are both forward and backward connections between any two layers in the same block. The layers are constructed as a loop and are updated alternately. The CliqueNet has some unique properties. For each layer, it is both the input and output of any other layer in the same block, so that the information flow among layers is maximized. During propagation, the newly updated layers are concatenated to reupdate previously updated layer, and parameters are reused for multiple times. This recurrent feedback structure is able to bring higher level visual information back to refine lowlevel filters and achieve spatial attention. We analyze the features generated at different stages and observe that using refined features leads to a better result. We adopt a multiscale feature strategy that effectively avoids the progressive growth of parameters. Experiments on image recognition datasets including CIFAR10, CIFAR100, SVHN and ImageNet show that our proposed models achieve the stateoftheart performance with fewer parameters. 
Convolutional Recurrent Neural Network (CRNN) 
This paper proposes a novel framework for detecting redundancy in supervised sentence categorisation. Unlike traditional singleton neural network, our model incorporates characteraware convolutional neural network (CharCNN) with characteraware recurrent neural network (CharRNN) to form a convolutional recurrent neural network (CRNN). Our model benefits from CharCNN in that only salient features are selected and fed into the integrated CharRNN. CharRNN effectively learns long sequence semantics via sophisticated update mechanism. We compare our framework against the stateoftheart text classification algorithms on four popular benchmarking corpus. For instance, our model achieves competing precision rate, recall ratio, and F1 score on the Googlenews dataset. For twentynewsgroups data stream, our algorithm obtains the optimum on precision rate, recall ratio, and F1 score. For Brown Corpus, our framework obtains the best F1 score and almost equivalent precision rate and recall ratio over the top competitor. For the question classification collection, CRNN produces the optimal recall rate and F1 score and comparable precision rate. We also analyse three different RNN hidden recurrent cells’ impact on performance and their runtime efficiency. We observe that MGU achieves the optimal runtime and comparable performance against GRU and LSTM. For TFIDF based algorithms, we experiment with word2vec, GloVe, and sent2vec embeddings and report their performance differences. 
Convolutional Spiking Neural Network  Spiking neural networks are motivated from principles of neural systems and may possess unexplored advantages in the context of machine learning. A class of \textit{convolutional spiking neural networks} is introduced, trained to detect image features with an unsupervised, competitive learning mechanism. Image features can be shared within subpopulations of neurons, or each may evolve independently to capture different features in different regions of input space. We analyze the time and memory requirements of learning with and operating such networks. The MNIST dataset is used as an experimental testbed, and comparisons are made between the performance and convergence speed of a baseline spiking neural network. 
ConwayMaxwell Poisson (CMP) 
Count data are a popular outcome in many empirical studies, especially as big data has become available on human and social behavior. The ConwayMaxwell Poisson (CMP) distribution is popularly used for modeling count data due to its ability to handle both overdispersed and underdispersed data. Yet, current methods for estimating CMP regression models are not efficient, especially with highdimensional data. Extant methods use either nonlinear optimization or MCMC methods. We propose a flexible estimation framework for CMP regression based on iterative reweighed least squares (IRLS). Because CMP belongs to the exponential family, convergence is guaranteed and is more efficient. We also extend this framework to allow estimation for additive models with smoothing splines. We illustrate the usefulness of this approach through simulation study and application to real data on speed dating. 
Cook’s Distance  In statistics, Cook’s distance or Cook’s D is a commonly used estimate of the influence of a data point when performing least squares regression analysis. In a practical ordinary least squares analysis, Cook’s distance can be used in several ways: to indicate data points that are particularly worth checking for validity; to indicate regions of the design space where it would be good to be able to obtain more data points. It is named after the American statistician R. Dennis Cook, who introduced the concept in 1977. 
Cooperative Game Theory  In game theory, a cooperative game is a game where groups of players (‘coalitions’) may enforce cooperative behaviour, hence the game is a competition between coalitions of players, rather than between individual players. An example is a coordination game, when players choose the strategies by a consensus decisionmaking process. Recreational games are rarely cooperative, because they usually lack mechanisms by which coalitions may enforce coordinated behaviour on the members of the coalition. Such mechanisms, however, are abundant in real life situations (e.g. contract law). Cooperative theory starts with a formalization of games that abstracts away altogether from procedures and … concentrates, instead, on the possibilities for agreement. … There are several reasons that explain why cooperative games came to be treated separately. One is that when one does build negotiation and enforcement procedures explicitly into the model, then the results of a noncooperative analysis depend very strongly on the precise form of the procedures, on the order of making offers and counteroffers and so on. This may be appropriate in voting situations in which precise rules of parliamentary order prevail, where a good strategist can indeed carry the day. But problems of negotiation are usually more amorphous; it is difficult to pin down just what the procedures are. More fundamentally, there is a feeling that procedures are not really all that relevant; that it is the possibilities for coalition forming, promising and threatening that are decisive, rather than whose turn it is to speak. … Detail distracts attention from essentials. Some things are seen better from a distance; the Roman camps around Metzada are indiscernible when one is in them, but easily visible from the top of the mountain. 
Cooperative Inverse Reinforcement Learning (CIRL) 
For an autonomous system to be helpful to humans and to pose no unwarranted risks, it needs to align its values with those of the humans in its environment in such a way that its actions contribute to the maximization of value for the humans. We propose a formal definition of the value alignment problem as {\em cooperative inverse reinforcement learning} (CIRL). A CIRL problem is a cooperative, partialinformation game with two agents, human and robot; both are rewarded according to the human’s reward function, but the robot does not initially know what this is. In contrast to classical IRL, where the human is assumed to act optimally in isolation, optimal CIRL solutions produce behaviors such as active teaching, active learning, and communicative actions that are more effective in achieving value alignment. We show that computing optimal joint policies in CIRL games can be reduced to solving a POMDP, prove that optimality in isolation is suboptimal in CIRL, and derive an approximate CIRL algorithm. 
Cooperative Learning  Learning paradigms involving varying levels of supervision have received a lot of interest within the computer vision and machine learning communities. The supervisory information is typically considered to come from a human supervisor — a ‘teacher’ figure. In this paper, we consider an alternate source of supervision — a ‘peer’ — i.e. a different machine. We introduce cooperative learning, where two agents trying to learn the same visual concepts, but in potentially different environments using different sources of data (sensors), communicate their current knowledge of these concepts to each other. Given the distinct sources of data in both agents, the mode of communication between the two agents is not obvious. We propose the use of visual attributes — semantic midlevel visual properties such as furry, wooden, etc.– as the mode of communication between the agents. Our experiments in three domains — objects, scenes, and animals — demonstrate that our proposed cooperative learning approach improves the performance of both agents as compared to their performance if they were to learn in isolation. Our approach is particularly applicable in scenarios where privacy, security and/or bandwidth constraints restrict the amount and type of information the two agents can exchange. 
Cooperative SGD  Stateoftheart distributed machine learning suffers from significant delays due to frequent communication and synchronizing between worker nodes. Emerging communicationefficient SGD algorithms that limit synchronization between locally trained models have been shown to be effective in speedingup distributed SGD. However, a rigorous convergence analysis and comparative study of different communicationreduction strategies remains a largely open problem. This paper presents a new framework called Coooperative SGD that subsumes existing communicationefficient SGD algorithms such as federatedaveraging, elasticaveraging and decentralized SGD. By analyzing Cooperative SGD, we provide novel convergence guarantees for existing algorithms. Moreover this framework enables us to design new communicationefficient SGD algorithms that strike the best balance between reducing communication overhead and achieving fast error convergence. 
Cooperative Training (CoT) 
We propose Cooperative Training (CoT) for training generative models that measure a tractable density function for target data. CoT coordinately trains a generator $G$ and an auxiliary predictive mediator $M$. The training target of $M$ is to estimate a mixture density of the learned distribution $G$ and the target distribution $P$, and that of $G$ is to minimize the JensenShannon divergence estimated through $M$. CoT achieves independent success without the necessity of pretraining via Maximum Likelihood Estimation or involving highvariance algorithms like REINFORCE. This lowvariance algorithm is theoretically proved to be unbiased for both generative and predictive tasks. We also theoretically and empirically show the superiority of CoT over most previous algorithms, in terms of generative quality and diversity, predictive generalization ability and computational cost. 
Coopetititve Soft Gating Ensemble (CSGE) 
In this article, we proposed the Coopetititve Soft Gating Ensemble or CSGE for general machine learning tasks. The goal of machine learning is to create models which poses a high generalisation capability. But often problems are too complex to be solved by a single model. Therefore, ensemble methods combine predictions of multiple models. The CSGE comprises a comprehensible combination based on three different aspects relating to the overall global historical performance, the local/situationdependent and timedependent performance of its ensemble members. The CSGE can be optimised according to arbitrary loss functions making it accessible for a wider range of problems. We introduce a novel training procedure including a hyperparameter initialisation at its heart. We show that the CSGE approach reaches stateoftheart performance for both classification and regression tasks. Still, the CSGE allows to quantify the influence of all base estimators by means of the three weighting aspects in a comprehensive way. In terms of Organic computing (OC), our CSGE approach combines multiple base models towards a selforganising complex system. Moreover, we provide a scikitlearn compatible implementation. 
CoordConv  Uber uses convolutional neural networks in many domains that could potentially involve coordinate transforms, from designing selfdriving vehicles to automating street sign detection to build maps and maximizing the efficiency of spatial movements in the Uber Marketplace. In deep learning, few ideas have experienced as much impact as convolution. Almost all stateoftheart results in machine vision make use of stacks of convolutional layers as basic building blocks. Since such architectures are widespread, we should expect that they excel at simple tasks like painting a single pixel in a tiny image, right Surprisingly, it turns out that convolution often has difficulty completing seemingly trivial tasks. In our paper, An Intriguing Failing of Convolutional Neural Networks and the CoordConv Solution, we expose and analyze a generic inability of convolutional neural networks (CNNs) to transform spatial representations between two different types: coordinates in (i, j) Cartesian space and coordinates in onehot pixel space. It´s surprising because the task appears so simple, and it may be important because such coordinate transforms seem to be required to solve many common tasks, like detecting objects in images, training generative models of images, and training reinforcement learning (RL) agents from pixels. It turns out that these tasks may have subtly suffered from this failing of convolution all along, as suggested by performance improvements we demonstrate across several domains when using the solution we propose, a layer called CoordConv. 
Coordinate Descent (CD) 
Coordinate descent is a nonderivative optimization algorithm. To find a local minimum of a function, one does line search along one coordinate direction at the current point in each iteration. One uses different coordinate directions cyclically throughout the procedure. On nonseparable functions the algorithm may fail to find the optimum in a reasonable number of function evaluations. To improve the convergence an appropriate coordinate system can be gradually learned, such that new search coordinates obtained using PCA are as decorrelated as possible with respect to the objective function 
Coordinate Descent Algorithms (CDA) 
This monograph presents a class of algorithms called coordinate descent algorithms for mathematicians, statisticians, and engineers outside the field of optimization. This particular class of algorithms has recently gained popularity due to their effectiveness in solving largescale optimization problems in machine learning, compressed sensing, image processing, and computational statistics. Coordinate descent algorithms solve optimization problems by successively minimizing along each coordinate or coordinate hyperplane, which is ideal for parallelized and distributed computing. Avoiding detailed technicalities and proofs, this monograph gives relevant theory and examples for practitioners to effectively apply coordinate descent to modern problems in data science and engineering. To keep the primer uptodate, we intend to publish this monograph only after no additional topics need to be added and we foresee no further major advances in the area. 
copCAR Regression Model (copCAR) 
NonGaussian spatial data are common in many fields. When fitting regressions for such data, one needs to account for spatial dependence to ensure reliable inference for the regression coefficients. The two most commonly used regression models for spatially aggregated data are the automodel and the areal generalized linear mixed model (GLMM). These models induce spatial dependence in different ways but share the smoothing approach, which is intuitive but problematic. This article develops a new regression model for areal data. The new model is called copCAR because it is copulabased and employs the areal GLMM#s conditional autoregression (CAR). copCAR overcomes many of the drawbacks of the automodel and the areal GLMM. Specifically, copCAR (1) is flexible and intuitive, (2) permits positive spatial dependence for all types of data, (3) permits efficient computation, and (4) provides reliable spatial regression inference and information about dependence strength. An implementation is provided by R package copCAR, which is available from the Comprehensive R Archive Network, and supplementary materials are available online. copCAR 
Copula  In probability theory and statistics, a copula is a multivariate probability distribution for which the marginal probability distribution of each variable is uniform. Copulas are used to describe the dependence between random variables. They are named for their resemblance to grammatical copulas in linguistics. 
Copula Statistic (CoS) 
A new index based on empirical copulas, termed the Copula Statistic (CoS), is introduced for assessing the strength of multivariate dependence and for testing statistical independence. New properties of the copulas are proved. They allow us to define the CoS in terms of a relative distance function between the empirical copula, the Fr\’echetHoeffding bounds and the independence copula. Monte Carlo simulations reveal that for large sample sizes, the CoS is approximately normal. This property is utilised to develop a CoSbased statistical test of independence against various noisy functional dependencies. It is shown that this test exhibits higher statistical power than the Total Information Coefficient (TICe), the Distance Correlation (dCor), the Randomized Dependence Coefficient (RDC), and the Copula Correlation (Ccor) for monotonic and circular functional dependencies. Furthermore, the R2equitability of the CoS is investigated for estimating the strength of a collection of functional dependencies with additive Gaussian noise. Finally, the CoS is applied to a real stock market data set from which we infer that a bivariate analysis is insufficient to unveil multivariate dependencies and to two gene expression data sets of the Yeast and of the E. Coli, which allow us to demonstrate the good performance of the CoS. 
CoQA  Humans gather information by engaging in conversations involving a series of interconnected questions and answers. For machines to assist in information gathering, it is therefore essential to enable them to answer conversational questions. We introduce CoQA, a novel dataset for building Conversational Question Answering systems. Our dataset contains 127k questions with answers, obtained from 8k conversations about text passages from seven diverse domains. The questions are conversational, and the answers are freeform text with their corresponding evidence highlighted in the passage. We analyze CoQA in depth and show that conversational questions have challenging phenomena not present in existing reading comprehension datasets, e.g., coreference and pragmatic reasoning. We evaluate strong conversational and reading comprehension models on CoQA. The best system obtains an F1 score of 65.1%, which is 23.7 points behind human performance (88.8%), indicating there is ample room for improvement. We launch CoQA as a challenge to the community at http://…/coqa 
Coral Reefs Optimization (CRO) 
This paper presents a novel bioinspired algorithm to tackle complex optimization problems: the coral reefs optimization (CRO) algorithm. The CRO algorithm artificially simulates a coral reef, where different corals (namely, solutions to the optimization problem considered) grow and reproduce in coral colonies, fighting by choking out other corals for space in the reef. This fight for space, along with the specific characteristics of the corals’ reproduction, produces a robust metaheuristic algorithm shown to be powerful for solving hard optimization problems. In this research the CRO algorithm is tested in several continuous and discrete benchmark problems, as well as in practical application scenarios (i.e., optimum mobile network deployment and offshore wind farm design). The obtained results confirm the excellent performance of the proposed algorithm and open line of research for further application of the algorithm to realworld problems. 
Core Conflictual Relationship Theme (CCRT) 
Core Conflictual Relationship: Text Mining to Discover What and When 
Core2Vec  Recent advances in the field of network representation learning are mostly attributed to the application of the skipgram model in the context of graphs. Stateoftheart analogues of skipgram model in graphs define a notion of neighbourhood and aim to find the vector representation for a node, which maximizes the likelihood of preserving this neighborhood. In this paper, we take a drastic departure from the existing notion of neighbourhood of a node by utilizing the idea of coreness. More specifically, we utilize the wellestablished idea that nodes with similar core numbers play equivalent roles in the network and hence induce a novel and an organic notion of neighbourhood. Based on this idea, we propose core2vec, a new algorithmic framework for learning low dimensional continuous feature mapping for a node. Consequently, the nodes having similar core numbers are relatively closer in the vector space that we learn. We further demonstrate the effectiveness of core2vec by comparing word similarity scores obtained by our method where the node representations are drawn from standard word association graphs against scores computed by other stateoftheart network representation techniques like node2vec, DeepWalk and LINE. Our results always outperform these existing methods 
CoresetBased Neural Network Compression  We propose a novel Convolutional Neural Network (CNN) compression algorithm based on coreset representations of filters. We exploit the redundancies extant in the space of CNN weights and neuronal activations (across samples) in order to obtain compression. Our method requires no retraining, is easy to implement, and obtains stateoftheart compression performance across a wide variety of CNN architectures. Coupled with quantization and Huffman coding, we create networks that provide AlexNetlike accuracy, with a memory footprint that is $832\times$ smaller than the original AlexNet, while also introducing significant reductions in inference time as well. Additionally these compressed networks when finetuned, successfully generalize to other domains as well. 
CornerNet  We propose CornerNet, a new approach to object detection where we detect an object bounding box as a pair of keypoints, the topleft corner and the bottomright corner, using a single convolution neural network. By detecting objects as paired keypoints, we eliminate the need for designing a set of anchor boxes commonly used in prior singlestage detectors. In addition to our novel formulation, we introduce corner pooling, a new type of pooling layer that helps the network better localize corners. Experiments show that CornerNet achieves a 42.1% AP on MS COCO, outperforming all existing onestage detectors. 
Corpora Agnostic Word Vectorization Method (WordNet2Vec) 
A complex nature of big data resources demands new methods for structuring especially for textual content. WordNet is a good knowledge source for comprehensive abstraction of natural language as its good implementations exist for many languages. Since WordNet embeds natural language in the form of a complex network, a transformation mechanism WordNet2Vec is proposed in the paper. It creates vectors for each word from WordNet. These vectors encapsulate general position – role of a given word towards all other words in the natural language. Any list or set of such vectors contains knowledge about the context of its component within the whole language. Such word representation can be easily applied to many analytic tasks like classification or clustering. 
Corpus Linguistics  Corpus linguistics is the study of language as expressed in samples (corpora) of “real world” text. This method represents a digestive approach to deriving a set of abstract rules by which a natural language is governed or else relates to another language. Originally done by hand, corpora are now largely derived by an automated process. Corpus linguistics adherents believe that reliable language analysis best occurs on fieldcollected samples, in natural contexts and with minimal experimental interference. Within corpus linguistics there are divergent views as to the value of corpus annotation, from John Sinclair advocating minimal annotation and allowing texts to ‘speak for themselves’, to others, such as the Survey of English Usage team (based in University College, London) advocating annotation as a path to greater linguistic understanding and rigour. 
Correct Classification Percentage (CCP) 
Correct Classification Percentage (CCP) described in the paper: Jialiang Li (2013) <doi:10.1093/biostatistics/kxs047>. mcca 
Correct Eventual Consistency Tool (CEC) 
This tool helps programmers verify whether their distributed application design is safe. 
Correlated Components Analysis  How does one find data dimensions that are reliably expressed across repetitions? For example, in neuroscience one may want to identify combinations of brain signals that are reliably activated across multiple trials or subjects. For a clinical assessment with multiple ratings, one may want to identify an aggregate score that is reliably reproduced across raters. The approach proposed here — ‘correlated components analysis’ — is to identify components that maximally correlate between repetitions (e.g. trials, subjects, raters). This can be expressed as the maximization of the ratio of betweenrepetition to withinrepetition covariance, resulting in a generalized eigenvalue problem. We show that covariances can be computed efficiently without explicitly considering all pairs of repetitions, that the result is equivalent to multiclass linear discriminant analysis for unbiased signals, and that the approach also maximize reliability, defined as the mean divided by the deviation across repetitions. We also extend the method to nonlinear components using kernels, discuss regularization to improve numerical stability, present parametric and nonparametric tests to establish statistical significance, and provide code. 
Correlated Topic Model (CTM) 
Topic models, such as latent Dirichlet allocation (LDA), can be useful tools for the statistical analysis of document collections and other discrete data. The LDA model assumes that the words of each document arise from a mixture of topics, each of which is a distribution over the vocabulary. A limitation of LDA is the inability to model topic correlation even though, for example, a document about genetics is more likely to also be about disease than xray astronomy. This limitation stems from the use of the Dirichlet distribution to model the variability among the topic proportions. In this paper we develop the correlated topic model (CTM), where the topic proportions exhibit correlation via the logistic normal distribution. We derive a meanfield variational inference algorithm for approximate posterior inference in this model, which is complicated by the fact that the logistic normal is not conjugate to the multinomial. The CTM gives a better fit than LDA on a collection of OCRed articles from the journal Science. Furthermore, the CTM provides a natural way of visualizing and exploring this and other unstructured data sets. 
CORrelation ALignment (CORAL) 
In this chapter, we present CORrelation ALignment (CORAL), a simple yet effective method for unsupervised domain adaptation. CORAL minimizes domain shift by aligning the secondorder statistics of source and target distributions, without requiring any target labels. In contrast to subspace manifold methods, it aligns the original feature distributions of the source and target domains, rather than the bases of lowerdimensional subspaces. It is also much simpler than other distribution matching methods. CORAL performs remarkably well in extensive evaluations on standard benchmark datasets. We first describe a solution that applies a linear transformation to source features to align them with target features before classifier training. For linear classifiers, we propose to equivalently apply CORAL to the classifier weights, leading to added efficiency when the number of classifiers is small but the number and dimensionality of target examples are very high. The resulting CORAL Linear Discriminant Analysis (CORALLDA) outperforms LDA by a large margin on standard domain adaptation benchmarks. Finally, we extend CORAL to learn a nonlinear transformation that aligns correlations of layer activations in deep neural networks (DNNs). The resulting Deep CORAL approach works seamlessly with DNNs and achieves stateoftheart performance on standard benchmark datasets. Our code is available at:~\url{https://…/CORAL} 
CORrelation Differences (CORD) 
Given a zero mean random vector X=:(X1,…,Xp) ∈ R^p, we consider the problem of defining and estimating a partition G of {1,…,p} such that the components of X with indices in the same group of the partition have a similar, communitylike behavior. We introduce a new model, the Gexchangeable model, to define group similarity. This model is a natural extension of the more commonly used Glatent model, for which the partition G is generally not identifiable, without additional restrictions on X. In contrast, we show that for any random vector X there exists an identifiable partition G according to which X is Gexchangeable, thereby providing a clear target for community estimation. Moreover, we provide another model, the Gblock covariance model, which generalizes the Gexchangeable model, and can be of interest in its own right for defining group similarity. We discuss connections between the three types of Gmodels. We exploit the connection with Gblock covariance models to develop a new metric, CORD, and a homonymous method for community estimation. We specialize and analyze our method for Gaussian copula data. We show that this method recovers the partition according to which X is Gexchangeable with a Gblock copula correlation matrix. In the particular case of Gaussian distributions, this estimator, under mild assumptions, identifies the unique minimal partition according to the Glatent model. The CORD estimator is consistent as long as the communities are separated at a rate that we prove to be minimax optimal, via lower bound calculations. Our procedure is fast and extensive numerical studies show that it recovers communities defined by our models, while existing variable clustering algorithms typically fail to do so. This is further supported by two realdata examples. 
CorrelationAdjusted Regression Survival Scores (CARS) 
Contains functions to estimate the CorrelationAdjusted Regression Survival (CARS) Scores. The method is described in Welchowski, T. and Zuber, V. and Schmid, M., (2018), CorrelationAdjusted Regression Survival Scores for HighDimensional Variable Selection, <arXiv:1802.08178>. carSurv 
Correntropy  Correntropy is a nonlinear similarity measure between two random variables. Learning with the Maximum Correntropy Criterion Induced Losses for Regression 
Correspondence Analysis (CA) 
Correspondence analysis (CA) is a multivariate statistical technique proposed by Hirschfeld and later developed by JeanPaul Benzécri. It is conceptually similar to principal component analysis, but applies to categorical rather than continuous data. In a similar manner to principal component analysis, it provides a means of displaying or summarising a set of data in twodimensional graphical form. ➘ “Principal Component Analysis” 
CorrRNN  In this paper, we study automatic keyphrase generation. Although conventional approaches to this task show promising results, they neglect correlation among keyphrases, resulting in duplication and coverage issues. To solve these problems, we propose a new sequencetosequence architecture for keyphrase generation named CorrRNN, which captures correlation among multiple keyphrases in two ways. First, we employ a coverage vector to indicate whether the word in the source document has been summarized by previous phrases to improve the coverage for keyphrases. Second, preceding phrases are taken into account to eliminate duplicate phrases and improve result coherence. Experiment results show that our model significantly outperforms the stateoftheart method on benchmark datasets in terms of both accuracy and diversity. 
Cortana Analytics  Cortana Analytics is a fully managed big data and advanced analytics suite that enables you to transform your data into intelligent action. 
Cortex Neural Network (CrtxNN) 
Neural Network has been successfully applied to many realworld problems, such as image recognition and machine translation. However, for the current architecture of neural networks, it is hard to perform complex cognitive tasks, for example, to process the image and audio inputs together. Cortex, as an important architecture in the brain, is important for animals to perform the complex cognitive task. We view the architecture of Cortex in the brain as a missing part in the design of the current artificial neural network. In this paper, we purpose Cortex Neural Network (CrtxNN). The Cortex Neural Network is an upper architecture of neural networks which motivated from cerebral cortex in the brain to handle different tasks in the same learning system. It is able to identify different tasks and solve them with different methods. In our implementation, the Cortex Neural Network is able to process different cognitive tasks and perform reflection to get a higher accuracy. We provide a series of experiments to examine the capability of the cortex architecture on traditional neural networks. Our experiments proved its ability on the Cortex Neural Network can reach accuracy by 98.32% on MNIST and 62% on CIFAR10 at the same time, which can promisingly reduce the loss by 40%. 
CortexNet  In the past five years we have observed the rise of incredibly well performing feedforward neural networks trained supervisedly for vision related tasks. These models have achieved superhuman performance on object recognition, localisation, and detection in still images. However, there is a need to identify the best strategy to employ these networks with temporal visual inputs and obtain a robust and stable representation of video data. Inspired by the human visual system, we propose a deep neural network family, CortexNet, which features not only bottomup feedforward connections, but also it models the abundant topdown feedback and lateral connections, which are present in our visual cortex. We introduce two training schemes – the unsupervised MatchNet and weakly supervised TempoNet modes – where a network learns how to correctly anticipate a subsequent frame in a video clip or the identity of its predominant subject, by learning egomotion clues and how to automatically track several objects in the current scene. Find the project website at https://…/. 
Cosine Distance  ➘ “Cosine Similarity” 
Cosine Similarity  Cosine similarity is a measure of similarity between two vectors of an inner product space that measures the cosine of the angle between them. The cosine of 0° is 1, and it is less than 1 for any other angle. It is thus a judgment of orientation and not magnitude: two vectors with the same orientation have a cosine similarity of 1, two vectors at 90° have a similarity of 0, and two vectors diametrically opposed have a similarity of 1, independent of their magnitude. Cosine similarity is particularly used in positive space, where the outcome is neatly bounded in. Note that these bounds apply for any number of dimensions, and cosine similarity is most commonly used in highdimensional positive spaces. For example, in information retrieval and text mining, each term is notionally assigned a different dimension and a document is characterised by a vector where the value of each dimension corresponds to the number of times that term appears in the document. Cosine similarity then gives a useful measure of how similar two documents are likely to be in terms of their subject matter. The technique is also used to measure cohesion within clusters in the field of data mining. 
Cosinor Analysis  Cosinor analysis uses the least squares method to fit a sine wave to a time series. Cosinor analysis is often used in the analysis of biologic time series that demonstrate predictible rhythms. This method can be used with an unequally spaced time series. 
Costaware Cascading Upper Confidence Bound (CCUCB) 
In this paper, we propose a costaware cascading bandits model, a new variant of multiarmed ban dits with cascading feedback, by considering the random cost of pulling arms. In each step, the learning agent chooses an ordered list of items and examines them sequentially, until certain stopping condition is satisfied. Our objective is then to max imize the expected net reward in each step, i.e., the reward obtained in each step minus the total cost in curred in examining the items, by deciding the or dered list of items, as well as when to stop examina tion. We study both the offline and online settings, depending on whether the state and cost statistics of the items are known beforehand. For the of fline setting, we show that the Unit Cost Ranking with Threshold 1 (UCRT1) policy is optimal. For the online setting, we propose a Costaware Cascading Upper Confidence Bound (CCUCB) algorithm, and show that the cumulative regret scales in O(log T ). We also provide a lower bound for all {\alpha}consistent policies, which scales in {\Omega}(log T ) and matches our upper bound. The performance of the CCUCB algorithm is evaluated with both synthetic and realworld data. 
CostSensitive Dynamic Principal Projection (CSDPP) 
We study multilabel classification (MLC) with three important realworld issues: online updating, label space dimensional reduction (LSDR), and costsensitivity. Current MLC algorithms have not been designed to address these three issues simultaneously. In this paper, we propose a novel algorithm, costsensitive dynamic principal projection (CSDPP) that resolves all three issues. The foundation of CSDPP is an online LSDR framework derived from a leading LSDR algorithm. In particular, CSDPP is equipped with an efficient online dimension reducer motivated by matrix stochastic gradient, and establishes its theoretical backbone when coupled with a carefullydesigned online regression learner. In addition, CSDPP embeds the cost information into label weights to achieve costsensitivity along with theoretical guarantees. Experimental results verify that CSDPP achieves better practical performance than current MLC algorithms across different evaluation criteria, and demonstrate the importance of resolving the three issues simultaneously. 
Counterfactual Fairness  Machine learning has matured to the point to where it is now being considered to automate decisions in loan lending, employee hiring, and predictive policing. In many of these scenarios however, previous decisions have been made that are unfairly biased against certain subpopulations (e.g., those of a particular race, gender, or sexual orientation). Because this past data is often biased, machine learning predictors must account for this to avoid perpetuating discriminatory practices (or incidentally making new ones). In this paper, we develop a framework for modeling fairness in any dataset using tools from counterfactual inference. We propose a definition called counterfactual fairness that captures the intuition that a decision is fair towards an individual if it gives the same predictions in (a) the observed world and (b) a world where the individual had always belonged to a different demographic group, other background causes of the outcome being equal. We demonstrate our framework on two realworld problems: fair prediction of law school success, and fair modeling of an individual’s criminality in policing data. 
Counterfactual Inference  
CountMin Sketch  In computing, the countmin sketch (CM sketch) is a probabilistic data structure that serves as a frequency table of events in a stream of data. It uses hash functions to map events to frequencies, but unlike a hash table uses only sublinear space, at the expense of overcounting some events due to collisions. The countmin sketch was invented in 2003 by Graham Countmin sketches are somewhat similar to Bloom filters; the main distinction is that Bloom filters represent sets, while CM sketches represent multisets. Spectral Bloom filters with multiset policy are conceptually isomorphic to the countmin sketch. 
Coupled Sparse Asymmetric Least Squares (COSALES) 
SALES 
Coupled UNet (CUNet) 
We design a new connectivity pattern for the UNet architecture. Given several stacked UNets, we couple each UNet pair through the connections of their semantic blocks, resulting in the coupled UNets (CUNet). The coupling connections could make the information flow more efficiently across UNets. The feature reuse across UNets makes each UNet very parameter efficient. We evaluate the coupled UNets on two benchmark datasets of human pose estimation. Both the accuracy and model parameter number are compared. The CUNet obtains comparable accuracy as stateoftheart methods. However, it only has at least 60% fewer parameters than other approaches. 
Covariance Matrix Adaptation Evolution Strategy (CMAES) 
CMAES stands for Covariance Matrix Adaptation Evolution Strategy. Evolution strategies (ES) are stochastic, derivativefree methods for numerical optimization of nonlinear or nonconvex continuous optimization problems. They belong to the class of evolutionary algorithms and evolutionary computation. An evolutionary algorithm is broadly based on the principle of biological evolution, namely the repeated interplay of variation (via recombination and mutation) and selection: in each generation (iteration) new individuals (candidate solutions, denoted as x) are generated by variation, usually in a stochastic way, of the current parental individuals. Then, some individuals are selected to become the parents in the next generation based on their fitness or objective function value f(x). Like this, over the generation sequence, individuals with better and better fvalues are generated. In an evolution strategy, new candidate solutions are sampled according to a multivariate normal distribution in the R^n. Recombination amounts to selecting a new mean value for the distribution. Mutation amounts to adding a random vector, a perturbation with zero mean. Pairwise dependencies between the variables in the distribution are represented by a covariance matrix. The covariance matrix adaptation (CMA) is a method to update the covariance matrix of this distribution. This is particularly useful, if the function f is illconditioned. Adaptation of the covariance matrix amounts to learning a second order model of the underlying objective function similar to the approximation of the inverse Hessian matrix in the QuasiNewton method in classical optimization. In contrast to most classical methods, fewer assumptions on the nature of the underlying objective function are made. Only the ranking between candidate solutions is exploited for learning the sample distribution and neither derivatives nor even the function values themselves are required by the method. 
Covariant Compositional Network (CCN) 
Most existing neural networks for learning graphs address permutation invariance by conceiving of the network as a message passing scheme, where each node sums the feature vectors coming from its neighbors. We argue that this imposes a limitation on their representation power, and instead propose a new general architecture for representing objects consisting of a hierarchy of parts, which we call Covariant Compositional Networks (CCNs). Here, covariance means that the activation of each neuron must transform in a specific way under permutations, similarly to steerability in CNNs. We achieve covariance by making each activation transform according to a tensor representation of the permutation group, and derive the corresponding tensor aggregation rules that each neuron must implement. Experiments show that CCNs can outperform competing methods on standard graph learning benchmarks. 
Covariate Adaptive Clustering  predkmeans 
Covariate Balancing Propensity Score (CBPS) 
Implements the covariate balancing propensity score (CBPS) proposed by Imai and Ratkovic (2014) <DOI:10.1111/rssb.12027>. The propensity score is estimated such that it maximizes the resulting covariate balance as well as the prediction of treatment assignment. The method, therefore, avoids an iteration between model fitting and balance checking. The package also implements several extensions of the CBPS beyond the crosssectional, binary treatment setting. The current version implements the CBPS for longitudinal settings so that it can be used in conjunction with marginal structural models from Imai and Ratkovic (2015) <DOI:10.1080/01621459.2014.956872>, treatments with three and four valued treatment variables, continuousvalued treatments from Fong, Hazlett, and Imai (2015) <http://…/CBGPS.pdf>, and the situation with multiple distinct binary treatments administered simultaneously. In the future it will be extended to other settings including the generalization of experimental and instrumental variable estimates. Recently we have added the optimal CBPS which chooses the optimal balancing function and results in doubly robust and efficient estimator for the treatment effect as well as high dimensional CBPS when a large number of covariates exist. CBPS 
Coverage Probability  In statistics, the coverage probability of a confidence interval is the proportion of the time that the interval contains the true value of interest. For example, suppose our interest is in the mean number of months that people with a particular type of cancer remain in remission following successful treatment with chemotherapy. The confidence interval aims to contain the unknown mean remission duration with a given probability. This is the “confidence level” or “confidence coefficient” of the constructed interval which is effectively the “nominal coverage probability” of the procedure for constructing confidence intervals. The “nominal coverage probability” is often set at 0.95. The coverage probability is the actual probability that the interval contains the true mean remission duration in this example. 
Cox ProportionalHazards Regression  Cox proportional hazards regression is a semiparametric method for adjusting survival rate estimates to quantify the effect of predictor variables. The method represents the effects of explanatory variables as a multiplier of a common baseline hazard function, h0(t). The hazard function is the nonparametric part of the Cox proportional hazards regression function, whereas the impact of the predictor variables is a loglinear regression. 
Cox Regression  The term Cox regression model (omitting proportional hazards) is sometimes used to describe the extension of the Cox model to include timedependent factors. However, this usage is potentially ambiguous since the Cox proportional hazards model can itself be described as a regression model. 
Coxcomb Plot / Polar Area Diagram  The polar area diagram is similar to a usual pie chart, except sectors are equal angles and differ rather in how far each sector extends from the center of the circle. The polar area diagram is used to plot cyclic phenomena (e.g., count of deaths by month). For example, if the count of deaths in each month for a year are to be plotted then there will be 12 sectors (one per month) all with the same angle of 30 degrees each. The radius of each sector would be proportional to the square root of the death count for the month, so the area of a sector represents the number of deaths in a month. If the death count in each month is subdivided by cause of death, it is possible to make multiple comparisons on one diagram, as is seen in the polar area diagram famously developed by Florence Nightingale. 
Cozy  Cozy is a tool that automatically writes data structure implementations from highlevel specifications. The source code is available on GitHub. Documentation on using Cozy for your own projects can be found there as well. 
Credibility incorporating Semantic analysis and Temporal factor (CredSaT) 
The widespread use of big social data has pointed the research community in several significant directions. In particular, the notion of social trust has attracted a great deal of attention from information processors  computer scientists and information consumers  formal organizations. This is evident in various applications such as recommendation systems, viral marketing and expertise retrieval. Hence, it is essential to have frameworks that can temporally measure users credibility in all domains categorised under big social data. This paper presents CredSaT (Credibility incorporating Semantic analysis and Temporal factor): a finegrained users credibility analysis framework for big social data. A novel metric that includes both new and current features, as well as the temporal factor, is harnessed to establish the credibility ranking of users. Experiments on realworld dataset demonstrate the effectiveness and applicability of our model to indicate highly domainbased trustworthy users. Further, CredSaT shows the capacity in capturing spammers and other anomalous users. 
Credible Interval  In Bayesian statistics, a credible interval (or Bayesian confidence interval) is an interval in the domain of a posterior probability distribution used for interval estimation. The generalisation to multivariate problems is the credible region. Credible intervals are analogous to confidence intervals in frequentist statistics, although they differ on a philosophical basis; Bayesian intervals treat their bounds as fixed and the estimated parameter as a random variable, whereas frequentist confidence intervals treat their bounds as random variables and the parameter as a fixed value. For example, in an experiment that determines the uncertainty distribution of parameter t, if the probability that t lies between 35 and 45 is 0.95, then 35 <= t <= 45 is a 95% credible interval. 
Credible Interval / Credibility Interval  In Bayesian statistics, a credible interval (or Bayesian confidence interval) is an interval in the domain of a posterior probability distribution used for interval estimation. The generalisation to multivariate problems is the credible region. Credible intervals are analogous to confidence intervals in frequentist statistics. For example, in an experiment that determines the uncertainty distribution of parameter , if the probability that lies between 35 and 45 is 0.95, then is a 95% credible interval. 
CrescendoNet  We introduce a new deep convolutional neural network, CrescendoNet, by stacking simple building blocks without residual connections. Each Crescendo block contains independent convolution paths with increased depths. The numbers of convolution layers and parameters are only increased linearly in Crescendo blocks. In experiments, CrescendoNet with only 15 layers outperforms almost all networks without residual connections on benchmark datasets, CIFAR10, CIFAR100, and SVHN. Given sufficient amount of data as in SVHN dataset, CrescendoNet with 15 layers and 4.1M parameters can match the performance of DenseNetBC with 250 layers and 15.3M parameters. CrescendoNet provides a new way to construct high performance deep convolutional neural networks without residual connections. Moreover, through investigating the behavior and performance of subnetworks in CrescendoNet, we note that the high performance of CrescendoNet may come from its implicit ensemble behavior, which differs from the FractalNet that is also a deep convolutional neural network without residual connections. Furthermore, the independence between paths in CrescendoNet allows us to introduce a new pathwise training procedure, which can reduce the memory needed for training. 
Critical Line Algorithm (CLA) 
The critical line method developed by the Nobel Prize winner H. Markowitz is a classical technique for the construction of a minimumvariance frontier within the paradigm of ‘the expected returnrisk’ (meanvariance) and finding minimum portfolios. Considerable interest has recently been attracted to the development of a fast algorithm for the construction of the minimumvariance frontier. In some works, such algorithms have been used to find statistically stable optimal portfoli.o An OpenSource Implementation of the CriticalLine Algorithm for Portfolio Optimization The Constrained Critical Line Algorithm The Critical Line Method Applying Markowitz’s Critical Line Algorithm 
Cromwell  Cromwell is a Workflow Management System geared towards scientific workflows. cromwellDashboard 
Cross Entropy  In information theory, the cross entropy between two probability distributions over the same underlying set of events measures the average number of bits needed to identify an event drawn from the set, if a coding scheme is used that is optimized for an ‘unnatural’ probability distribution q, rather than the ‘true’ distribution p. 
Cross Industry Standard Process for Data Mining (CRISPDM) 
CRISPDM stands for Cross Industry Standard Process for Data Mining. It is a data mining process model that describes commonly used approaches that expert data miners use to tackle problems. Polls conducted in 2002, 2004, and 2007 show that it is the leading methodology used by data miners. The only other data mining standard named in these polls was SEMMA. However, 34 times as many people reported using CRISPDM. A review and critique of data mining process models in 2009 called the CRISPDM the “de facto standard for developing data mining and knowledge discovery projects.” Other reviews of CRISPDM and data mining process models include Kurgan and Musilek’s 2006 review, and Azevedo and Santos’ 2008 comparison of CRISPDM and SEMMA. 
Cross Validation  Crossvalidation, sometimes called rotation estimation, is a model validation technique for assessing how the results of a statistical analysis will generalize to an independent data set. It is mainly used in settings where the goal is prediction, and one wants to estimate how accurately a predictive model will perform in practice. It is worth highlighting that in a prediction problem, a model is usually given a dataset of known data on which training is run (training dataset), and a dataset of unknown data (or first seen data) against which the model is tested (testing dataset). The goal of cross validation is to define a dataset to “test” the model in the training phase (i.e., the validation dataset), in order to limit problems like overfitting, give an insight on how the model will generalize to an independent data set (i.e., an unknown dataset, for instance from a real problem), etc. 
CrossCat  CrossCat is a domaingeneral, Bayesian method for analyzing highdimensional data tables. CrossCat estimates the full joint distribution over the variables in the table from the data, via approximate inference in a hierarchical, nonparametric Bayesian model, and provides efficient samplers for every conditional distribution. CrossCat combines strengths of nonparametric mixture modeling and Bayesian network structure learning: it can model any joint distribution given enough data by positing latent variables, but also discovers independencies between the observable variables. A range of exploratory analysis and predictive modeling tasks can be addressed via CrossCat, including detecting predictive relationships between variables, finding multiple overlapping clusterings, imputing missing values, and simultaneously selecting features and classifying rows. Research on CrossCat has shown that it is suitable for analysis of realworld tables of up to 10 million cells, including hospital cost and quality measures, voting records, handwritten digits, and statelevel unemployment time series. 
CrossDomain Adversarial AutoEncoder (CDAAE) 
In this paper, we propose the CrossDomain Adversarial AutoEncoder (CDAAE) to address the problem of crossdomain image inference, generation and transformation. We make the assumption that images from different domains share the same latent code space for content, while having separate latent code space for style. The proposed framework can map crossdomain data to a latent code vector consisting of a content part and a style part. The latent code vector is matched with a prior distribution so that we can generate meaningful samples from any part of the prior space. Consequently, given a sample of one domain, our framework can generate various samples of the other domain with the same content of the input. This makes the proposed framework different from the current work of crossdomain transformation. Besides, the proposed framework can be trained with both labeled and unlabeled data, which makes it also suitable for domain adaptation. Experimental results on data sets SVHN, MNIST and CASIA show the proposed framework achieved visually appealing performance for image generation task. Besides, we also demonstrate the proposed method achieved superior results for domain adaptation. Code of our experiments is available in https://…/CDAAE. 
CrossDomain Collaborative Learning (CDCL) 
This paper introduces a novel heterogenous domain adaptation (HDA) method for hyperspectral image classification with a limited amount of labeled samples in both domains. The method is achieved in the way of crossdomain collaborative learning (CDCL), which is addressed via cluster canonical correlation analysis (CCCA) and random walker (RW) algorithms. To be specific, the proposed CDCL method is an iterative process of three main stages, i.e. twice of RWbased pseudolabeling and cross domain learning via CCCA. Firstly, given the initially labeled target samples as training set ($\mathbf{TS}$), the RWbased pseudolabeling is employed to update $\mathbf{TS}$ and extract target clusters ($\mathbf{TCs}$) by fusing the segmentation results obtained by RW and extended RW (ERW) classifiers. Secondly, cross domain learning via CCCA is applied using labeled source samples and $\mathbf{TCs}$. The unlabeled target samples are then classified with the estimated probability maps using the model trained in the projected correlation subspace. Thirdly, both $\mathbf{TS}$ and estimated probability maps are used for updating $\mathbf{TS}$ again via RWbased pseudolabeling. When the iterative process finishes, the result obtained by the ERW classifier using the final $\mathbf{TS}$ and estimated probability maps is regarded as the final classification map. Experimental results on four real HSIs demonstrate that the proposed method can achieve better performance compared with the stateoftheart HDA and ERW methods. 
CrossDomain Latent Feature Mapping (CDLFM) 
Collaborative Filtering (CF) is a widely adopted technique in recommender systems. Traditional CF models mainly focus on predicting a user’s preference to the items in a single domain such as the movie domain or the music domain. A major challenge for such models is the data sparsity problem, and especially, CF cannot make accurate predictions for the coldstart users who have no ratings at all. Although CrossDomain Collaborative Filtering (CDCF) is proposed for effectively transferring users’ rating preference across different domains, it is still difficult for existing CDCF models to tackle the coldstart users in the target domain due to the extreme data sparsity. In this paper, we propose a CrossDomain Latent Feature Mapping (CDLFM) model for coldstart users in the target domain. Firstly, in order to better characterize users in sparse domains, we take the users’ similarity relationship on rating behaviors into consideration and propose the Matrix Factorization by incorporating User Similarities (MFUS) in which three similarity measures are proposed. Next, to perform knowledge transfer across domains, we propose a neighborhood based gradient boosting trees method to learn the crossdomain user latent feature mapping function. For each coldstart user, we learn his/her feature mapping function based on the latent feature pairs of those linked users who have similar rating behaviors with the coldstart user in the auxiliary domain. And the preference of the coldstart user in the target domain can be predicted based on the mapping function and his/her latent features in the auxiliary domain. Experimental results on two real data sets extracted from Amazon transaction data demonstrate the superiority of our proposed model against other stateoftheart methods. 
CrossEntropy Clustering  We build a general and easily applicable clustering theory, which we call crossentropy clustering (shortly CEC), which joins the advantages of classical kmeans (easy implementation and speed) with those of EM (a ne invariance and ability to adapt to clusters of desired shapes). Moreover, contrary to kmeans and EM, CEC nds the optimal number of clusters by automatically removing groups which have negative information cost. Although CEC, like EM, can be build on an arbitrary family of densities, in the most important case of Gaussian CEC the division into clusters is a ne invariant. <a href="'Introduction” target=”top”>http://…/1508.04559v1 
CrossLingual Text Classification (CLTC) 
Crosslingual text classification(CLTC) is the task of classifying documents written in different languages into the same taxonomy of categories. 
CrossModal Generative Adversarial Network (CMGAN) 
It is known that the inconsistent distribution and representation of different modalities, such as image and text, cause the heterogeneity gap that makes it challenging to correlate such heterogeneous data. Generative adversarial networks (GANs) have shown its strong ability of modeling data distribution and learning discriminative representation, existing GANsbased works mainly focus on generative problem to generate new data. We have different goal, aim to correlate heterogeneous data, by utilizing the power of GANs to model crossmodal joint distribution. Thus, we propose Crossmodal GANs to learn discriminative common representation for bridging heterogeneity gap. The main contributions are: (1) Crossmodal GANs architecture is proposed to model joint distribution over data of different modalities. The intermodality and intramodality correlation can be explored simultaneously in generative and discriminative models. Both of them beat each other to promote crossmodal correlation learning. (2) Crossmodal convolutional autoencoders with weightsharing constraint are proposed to form generative model. They can not only exploit crossmodal correlation for learning common representation, but also preserve reconstruction information for capturing semantic consistency within each modality. (3) Crossmodal adversarial mechanism is proposed, which utilizes two kinds of discriminative models to simultaneously conduct intramodality and intermodality discrimination. They can mutually boost to make common representation more discriminative by adversarial training process. To the best of our knowledge, our proposed CMGANs approach is the first to utilize GANs to perform crossmodal common representation learning. Experiments are conducted to verify the performance of our proposed approach on crossmodal retrieval paradigm, compared with 10 methods on 3 crossmodal datasets. 
CrossNets  We propose a novel neural network structure called CrossNets, which considers architectures on directed acyclic graphs. This structure builds on previous generalizations of feed forward models, such as ResNets, by allowing for all forward cross connections between layers (both adjacent and nonadjacent). The addition of cross connections among the network increases information flow across the whole network, leading to better training and testing performances. The superior performance of the network is tested against four benchmark datasets: MNIST, CIFAR10, CIFAR100, and SVHN. We conclude with a proof of convergence for Crossnets to a local minimum for error, where weights for connections are chosen through backpropagation with momentum. 
CrossOver Design  In randomized trials, a crossover design is one in which each subject receives each treatment, in succession. For example, subject 1 first receives treatment A, then treatment B, then treatment C. Subject 2 might receive treatment B, then treatment A, then treatment C. A crossover design has the advantage of eliminating individual subject differences from the overall treatment effect, thus enhancing statistical power. On the other hand, it is important in a crossover study that the underlying condition (say, a disease) not change over time, and that the effects of one treatment disappear before the next is applied. 
Croston Method  It is easier to predict demand when there’s a pattern. But in case of irregular or intermittent demand the simple technique of smoothing does not work. Croston’s forecasting method is a standard approach to deal with intermittent demand. It detects the cyclic pattern of demand and divides the period into 2 time series: 1. Zero demand values 2. Non zero demand values Then demand smoothing is used on both time series separately and demand is forecasted. 
Cubist  Cubist is a powerful tool for generating rulebased models that balance the need for accurate prediction against the requirements of intelligibility. Cubist models generally give better results than those produced by simple techniques such as multivariate linear regression, while also being easier to understand than neural networks. 
Cubist Model  Cubist is a rule{based model that is an extension of Quinlan’s M5 model tree. A tree is grown where the terminal leaves contain linear regression models. These models are based on the predictors used in previous splits. Also, there are intermediate linear models at each step of the tree. A prediction is made using the linear regression model at the terminal node of the tree, but is \smoothed’ by taking into account the prediction from the linear model in the previous node of the tree (which also occurs recursively up the tree). The tree is reduced to a set of rules, which initially are paths from the top of the tree to the bottom. Rules are eliminated via pruning and/or combined for simpli cation. This is explained better in Quinlan (1992). Wang and Witten (1997) attempted to recreate this model using a \rational reconstruction’ of Quinlan (1992) that is the basis for the M5P model in Weka (and the R package RWeka). http://…i=10.1.1.34.885&rep=rep1&type=pdf https://…/ensemblelearningwithcubistmodel 
CuCoTrack  This paper introduces CuCoTrack, a cuckoo hash based data structure designed to efficiently implement connection tracking. The proposed scheme exploits the fact that queries always match one existing connection to compress the 5tuple that identifies the connection. This reduces significantly the amount of memory needed to store the connections and also the memory bandwidth needed for lookups. CuCoTrack uses a dynamic fingerprint to avoid collisions thus ensuring that queries are completed in at most two memory accesses and facilitating a hardware implementation. The proposed scheme has been analyzed theoretically and validated by simulation. The results show that using 16 bits for the fingerprint is enough to avoid collisions in practical configurations. 
CUDA Deep Neural Network library (cuDNN) 
The NVIDIA CUDA Deep Neural Network library (cuDNN) is a GPUaccelerated library of primitives for deep neural networks. It emphasizes performance, easeofuse, and low memory overhead. cuDNN is designed to be integrated into higherlevel machine learning frameworks, such as the popular Caffe, Theano, or Torch software frameworks. The simple, dropin design allows developers to focus on designing and implementing neural net models rather than tuning for performance, while still achieving the high performance delivered by modern parallel computing hardware. If you are a data scientist looking for an interactive deep learning training system, check out the new NVIDIA DIGITS solution, which automatically uses cuDNN. cuDNN is freely available to CUDA Registered Developers. As a registered developer you can download the latest version of cuDNN, access the support forum and file bug reports. 
CUDA RecurREnt Neural Network Toolkit (CURRENNT) 
In this article, we introduce CURRENNT, an opensource parallel implementation of deep recurrent neural networks (RNNs) supporting graphics processing units (GPUs) through NVIDIA’s Computed Uni ed Device Architecture (CUDA). CURRENNT supports uni and bidirectional RNNs with Long ShortTerm Memory (LSTM) memory cells which overcome the vanishing gradient problem. To our knowledge, CURRENNT is the rst publicly available parallel implementation of deep LSTMRNNs. Benchmarks are given on a noisy speech recognition task from the 2013 2nd CHiME Speech Separation and Recognition Challenge, where LSTMRNNs have been shown to deliver best performance. In the result, double digit speedups in bidirectional LSTM training are achieved with respect to a reference singlethreaded CPU implementation. CURRENNT is available under the GNU General Public License. http://…/currennt. 
CuLDA_CGS  Latent Dirichlet Allocation(LDA) is a popular topic model. Given the fact that the input corpus of LDA algorithms consists of millions to billions of tokens, the LDA training process is very timeconsuming, which may prevent the usage of LDA in many scenarios, e.g., online service. GPUs have benefited modern machine learning algorithms and big data analysis as they can provide high memory bandwidth and computation power. Therefore, many frameworks, e.g. TensorFlow, Caffe, CNTK, support to use GPUs for accelerating the popular machine learning dataintensive algorithms. However, we observe that LDA solutions on GPUs are not satisfying. In this paper, we present CuLDA_CGS, a GPUbased efficient and scalable approach to accelerate largescale LDA problems. CuLDA_CGS is designed to efficiently solve LDA problems at high throughput. To it, we first delicately design workload partition and synchronization mechanism to exploit the benefits of mul tiple GPUs. Then, we offload the LDA sampling process to each individual GPU by optimizing from the sampling algorithm, par allelization, and data compression perspectives. Evaluations show that compared with stateoftheart LDA solutions, CuLDA_CGS outperforms them by a large margin (up to 7.3X) on a single GPU. CuLDA_CGS is able to achieve extra 3.0X speedup on 4 GPUs. The source code is publicly available on https://…/cuMF CuLDA_CGS. 
Cumulative Distribution Function (CDF) 
In probability theory and statistics, the cumulative distribution function (CDF), or just distribution function, describes the probability that a realvalued random variable X with a given probability distribution will be found to have a value less than or equal to x. In the case of a continuous distribution, it gives the area under the probability density function from minus infinity to x. Cumulative distribution functions are also used to specify the distribution of multivariate random variables. 
Cumulative Gains Model Quality Metric  In developing risk models, developers employ a number of graphical and numerical tools to evaluate the quality of candidate models. These traditionally involve numerous measures including the KS statistic or one of many Area Under the Curve (AUC) methodologies on ROC and cumulative Gains charts. Typical employment of these methodologies involves one of two scenarios. The first is as a tool to evaluate one or more models and ascertain the effectiveness of that model. Second however is the inclusion of such a metric in the model building process itself such as the way Ferri et al. proposed to use Area Under the ROC curve in the splitting criterion of a decision tree. However, these methods fail to address situations involving competing models where one model is not strictly above the other. Nor do they address differing values of end points as the magnitudes of these typical measures may vary depending on target definition making standardization difficult. Some of these problems are starting to be addressed. Marcade Chief Technology officer of the software vendor KXEN gives an overview of several metric techniques and proposes a new solution to the problem in data mining techniques. Their software uses two statistics called KI and KR. We will examine the shortfalls he addresses more thoroughly and propose a new metric which can be used as an improvement to the KI and KR statistics. Although useful in a machine learning sense of developing a model, these same issues and solutions apply to evaluating a single model’s performance as related by Siddiqi and Mays with respect to risk scorecards. We will not specifically give examples of each application of the new statistics but rather make the claim that it is useful in most situations where an AUC or model separation statistic (such as KS) is used. 
Cumulative Sum Control Chart (CUSUM) 
In statistical quality control, the CUSUM (or cumulative sum control chart) is a sequential analysis technique developed by E. S. Page of the University of Cambridge. It is typically used for monitoring change detection. CUSUM was announced in Biometrika, in 1954, a few years after the publication of Wald’s SPRT algorithm. Page referred to a ‘quality number’ \theta, by which he meant a parameter of the probability distribution; for example, the mean. He devised CUSUM as a method to determine changes in it, and proposed a criterion for deciding when to take corrective action. When the CUSUM method is applied to changes in mean, it can be used for step detection of a time series. A few years later, George Alfred Barnard developed a visualization method, the Vmask chart, to detect both increases and decreases in \theta. 
Curated Taxonomy  Curated Taxonomy: Topdown hierarchy of topics, where each node is assigned specific positive and negative vocabulary rules. For example, the topic ‘RVs’ would have positive vocabularies including ‘recreational vehicles’, ‘motor homes’ and ‘travel trailers’. The topic ‘Shampoo’ would have negative vocabularies including ‘carpet’, ‘dogs’, and ‘Warren Beatty’. 
Curriculum Accelerated SelfSupervised Learning (CASSL) 
Recent selfsupervised learning approaches focus on using a few thousand data points to learn policies for highlevel, lowdimensional action spaces. However, scaling this framework for highdimensional control require either scaling up the data collection efforts or using a clever sampling strategy for training. We present a novel approach – Curriculum Accelerated SelfSupervised Learning (CASSL) – to train policies that map visual information to highlevel, higher dimensional action spaces. CASSL orders the sampling of training data based on control dimensions: the learning and sampling are focused on few control parameters before other parameters. The right curriculum for learning is suggested by variancebased global sensitivity analysis of the control space. We apply our CASSL framework to learning how to grasp using an adaptive, underactuated multifingered gripper, a challenging system to control. Our experimental results indicate that CASSL provides significant improvement and generalization compared to baseline methods such as staged curriculum learning (8% increase) and complete endtoend learning with random exploration (14% improvement) tested on a set of novel objects. 
Curse of Dimensionality  There are multiple phenomena referred to by this name in domains such as numerical analysis, sampling, combinatorics, machine learning, data mining and databases. The common theme of these problems is that when the dimensionality increases, the volume of the space increases so fast that the available data become sparse. This sparsity is problematic for any method that requires statistical significance. In order to obtain a statistically sound and reliable result, the amount of data needed to support the result often grows exponentially with the dimensionality. Also organizing and searching data often relies on detecting areas where objects form groups with similar properties; in high dimensional data however all objects appear to be sparse and dissimilar in many ways which prevents common data organization strategies from being efficient. 
CUSBoost  Class imbalance classification is a challenging research problem in data mining and machine learning, as most of the reallife datasets are often imbalanced in nature. Existing learning algorithms maximise the classification accuracy by correctly classifying the majority class, but misclassify the minority class. However, the minority class instances are representing the concept with greater interest than the majority class instances in reallife applications. Recently, several techniques based on sampling methods (undersampling of the majority class and oversampling the minority class), costsensitive learning methods, and ensemble learning have been used in the literature for classifying imbalanced datasets. In this paper, we introduce a new clusteringbased undersampling approach with boosting (AdaBoost) algorithm, called CUSBoost, for effective imbalanced classification. The proposed algorithm provides an alternative to RUSBoost (random undersampling with AdaBoost) and SMOTEBoost (synthetic minority oversampling with AdaBoost) algorithms. We evaluated the performance of CUSBoost algorithm with the stateoftheart methods based on ensemble learning like AdaBoost, RUSBoost, SMOTEBoost on 13 imbalance binary and multiclass datasets with various imbalance ratios. The experimental results show that the CUSBoost is a promising and effective approach for dealing with highly imbalanced datasets. 
Custom Grouper User Language (CGUL) 
Custom Grouper User Language (CGUL) is a sentencebased language that enables you to perform pattern matching using character or tokenbased regular expressions combined with linguistic attributes to define custom entity types. Working with CGUL can be very challenging. http://…/hana_options_adp (SAP HANA Text Analysis Extraction Customization Guide) 
Customer Acquisition Cost (CAC) 
Customer Acquisition Cost is the cost associated in convincing a customer to buy a product/service. This cost is incurred by the organization to convince a potential customer. This cost is inclusive of the product cost as well as the cost involved in research, marketing, and accessibility costs. This is an important business metric. It plays a major role in calculating the value of the customer to the company and the resulting return on investment (ROI) of acquisition. The calculation of customer valuation helps a company decide how much of its resources can be profitably spent on a particular customer. In general terms, it helps to decide the worth of the customer to the company. Customer Acquisition Cost (abbreviated to CAC) refers to the resources that a business must allocate (financial or otherwise) in order to acquire an additional customer. Numerically, customer acquisition cost is typically expressed as a ratio – dividing the sum total of CAC by the number of additional patrons acquired by the business as a result of the customer acquisition strategy. 
Customer Experience Analytics (CEA) 
➘ “Customer Experience Management” 
Customer Experience Management (CEM) 
Customer experience management (CEM or CXM) is the process that companies use to oversee and track all interactions with a customer during the duration of their relationship. This involves the strategy of building around the needs of individual customers. According to Jeananne Rae, companies are realizing that ‘building great consumer experiences is a complex enterprise, involving strategy, integration of technology, orchestrating business models, brand management and CEO commitment.’ 
Customer LifeTime Value (CLTV,CLV) 
In marketing, customer lifetime value (CLV) (or often CLTV), lifetime customer value (LCV), or user lifetime value (LTV) is a prediction of the net profit attributed to the entire future relationship with a customer. The prediction model can have varying levels of sophistication and accuracy, ranging from a crude heuristic to the use of complex predictive analytics techniques. Customer lifetime value (CLV) can also be defined as the dollar value of a customer relationship, based on the present value of the projected future cash flows from the customer relationship. Customer lifetime value is an important concept in that it encourages firms to shift their focus from quarterly profits to the longterm health of their customer relationships. Customer lifetime value is an important number because it represents an upper limit on spending to acquire new customers. For this reason it is an important element in calculating payback of advertising spent in marketing mix modeling. One of the first accounts of the term Customer Lifetime Value is in the 1988 book Database Marketing, which includes detailed worked examples. Early adopters of Customer Lifetime Value models in the 1990s include Edge Consulting and BrandScience. 
Customer Segmentation  The act of separating a group of clients into sets of similar individuals that are related from a marketing or demographicperspective. For example, a business that practices customer segmentation might group its current or potential customers according to their gender, buying tendencies, age group, and special interests. 
Cuttlefish  Modern data processing applications execute increasingly sophisticated analysis that requires operations beyond traditional relational algebra. As a result, operators in query plans grow in diversity and complexity. Designing query optimizer rules and cost models to choose physical operators for all of these novel logical operators is impractical. To address this challenge, we develop Cuttlefish, a new primitive for adaptively processing online query plans that explores candidate physical operator instances during query execution and exploits the fastest ones using multiarmed bandit reinforcement learning techniques. We prototype Cuttlefish in Apache Spark and adaptively choose operators for image convolution, regular expression matching, and relational joins. Our experiments show Cuttlefishbased adaptive convolution and regular expression operators can reach 7299% of the throughput of an allknowing oracle that always selects the optimal algorithm, even when individual physical operators are up to 105x slower than the optimal. Additionally, Cuttlefish achieves join throughput improvements of up to 7.5x compared with Spark SQL’s query optimizer. 
CYCLOSA  By regularly querying Web search engines, users (unconsciously) disclose large amounts of their personal data as part of their search queries, among which some might reveal sensitive information (e.g. health issues, sexual, political or religious preferences). Several solutions exist to allow users querying search engines while improving privacy protection. However, these solutions suffer from a number of limitations: some are subject to user reidentification attacks, while others lack scalability or are unable to provide accurate results. This paper presents CYCLOSA, a secure, scalable and accurate private Web search solution. CYCLOSA improves security by relying on trusted execution environments (TEEs) as provided by Intel SGX. Further, CYCLOSA proposes a novel adaptive privacy protection solution that reduces the risk of user re identification. CYCLOSA sends fake queries to the search engine and dynamically adapts their count according to the sensitivity of the user query. In addition, CYCLOSA meets scalability as it is fully decentralized, spreading the load for distributing fake queries among other nodes. Finally, CYCLOSA achieves accuracy of Web search as it handles the real query and the fake queries separately, in contrast to other existing solutions that mix fake and real query results. 
Cypher Query Language (CQL) 
Cypher is a declarative graph query language for the (open source) graph database Neo4j that allows for expressive and efficient querying and updating of the graph store. Cypher is a relatively simple but still very powerful language. Very complicated database queries can easily be expressed through Cypher. This allows you to focus on your domain instead of getting lost in database access. 
Cython  The Cython programming language is a superset of Python with a foreign function interface for invoking C/C++ routines and the ability to declare the static type of subroutine parameters and results, local variables, and class attributes. It actually is a Python to C source code translator that integrates with the CPython interpreter on a lowlevel. http://cython.org http://…/9781491901557 runcython 
CytonR  This paper presents an opensource enforcement learning toolkit named CytonRL (https://…/cytonRL ). The toolkit implements four recent advanced deep Qlearning algorithms from scratch using C++ and NVIDIA’s GPUaccelerated libraries. The code is simple and elegant, owing to an opensource generalpurpose neural network library named CytonLib. Benchmark shows that the toolkit achieves competitive performances on the popular Atari game of Breakout. 
Advertisements