C Math Library(CML) ➘ “C Numerical Library” C Numerical Library(CNL) The IMSL C Numerical Library provides advanced mathematical and statistical functionality for programmers to embed in their existing or new applications. Written in standard C, the IMSL C Library can be embedded into any C or C++ application as well as any existing application that can reference a C library. C++ Based Probabilistic Programming Library(CPProb) We consider the problem of Bayesian inference in the family of probabilistic models implicitly defined by stochastic generative models of data. In scientific fields ranging from population biology to cosmology, low-level mechanistic components are composed to create complex generative models. These models lead to intractable likelihoods and are typically non-differentiable, which poses challenges for traditional approaches to inference. We extend previous work in ‘inference compilation’, which combines universal probabilistic programming and deep learning methods, to large-scale scientific simulators, and introduce a C++ based probabilistic programming library called CPProb. We successfully use CPProb to interface with SHERPA, a large code-base used in particle physics. Here we describe the technical innovations realized and planned for this library. C2AE Models trained for classification often assume that all testing classes are known while training. As a result, when presented with an unknown class during testing, such closed-set assumption forces the model to classify it as one of the known classes. However, in a real world scenario, classification models are likely to encounter such examples. Hence, identifying those examples as unknown becomes critical to model performance. A potential solution to overcome this problem lies in a class of learning problems known as open-set recognition. It refers to the problem of identifying the unknown classes during testing, while maintaining performance on the known classes. In this paper, we propose an open-set recognition algorithm using class conditioned auto-encoders with novel training and testing methodology. In contrast to previous methods, training procedure is divided in two sub-tasks, 1. closed-set classification and, 2. open-set identification (i.e. identifying a class as known or unknown). Encoder learns the first task following the closed-set classification training pipeline, whereas decoder learns the second task by reconstructing conditioned on class identity. Furthermore, we model reconstruction errors using the Extreme Value Theory of statistical modeling to find the threshold for identifying known/unknown class samples. Experiments performed on multiple image classification datasets show proposed method performs significantly better than state of the art. C4.5 C4.5 is an algorithm used to generate a decision tree developed by Ross Quinlan. C4.5 is an extension of Quinlan’s earlier ID3 algorithm. The decision trees generated by C4.5 can be used for classification, and for this reason, C4.5 is often referred to as a statistical classifier. Cabinet Tree Treemaps are well-known for visualizing hierarchical data. Most related approaches have been focused on layout algorithms and paid little attention to other display properties and interactions. Furthermore, the structural information in conventional Treemaps is too implicit for viewers to perceive. This paper presents Cabinet Tree, an approach that: i) draws branches explicitly to show relational structures, ii) adapts a space-optimized layout for leaves and maximizes the space utilization, iii) uses coloring and labeling strategies to clearly reveal patterns and contrast different attributes intuitively. We also apply the continuous node selection and detail window techniques to support user interaction with different levels of the hierarchies. Our quantitative evaluations demonstrate that Cabinet Tree achieves good scalability for increased resolutions and big datasets. CACE Principle(CACE) Machine learning systems mix signals together, entangling them and making isolation of improvements impossible. For instance, consider a system that uses features x1, …xn in a model. If we change the input distribution of values in x1, the importance, weights, or use of the remaining n – 1 features may all change. This is true whether the model is retrained fully in a batch style or allowed to adapt in an online fashion. Adding a new feature xn+1 can cause similar changes, as can removing any feature xj . No inputs are ever really independent. We refer to this here as the CACE principle: Changing Anything Changes Everything. CACE applies not only to input signals, but also to hyper-parameters, learning settings, sampling methods, convergence thresholds, data selection, and essentially every other possible tweak. Cache Telepathy Deep Neural Networks (DNNs) are fast becoming ubiquitous for their ability to attain good accuracy in various machine learning tasks. A DNN’s architecture (i.e., its hyper-parameters) broadly determines the DNN’s accuracy and performance, and is often confidential. Attacking a DNN in the cloud to obtain its architecture can potentially provide major commercial value. Further, attaining a DNN’s architecture facilitates other, existing DNN attacks. This paper presents Cache Telepathy: a fast and accurate mechanism to steal a DNN’s architecture using the cache side channel. Our attack is based on the insight that DNN inference relies heavily on tiled GEMM (Generalized Matrix Multiply), and that DNN architecture parameters determine the number of GEMM calls and the dimensions of the matrices used in the GEMM functions. Such information can be leaked through the cache side channel. This paper uses Prime+Probe and Flush+Reload to attack VGG and ResNet DNNs running OpenBLAS and Intel MKL libraries. Our attack is effective in helping obtain the architectures by very substantially reducing the search space of target DNN architectures. For example, for VGG using OpenBLAS, it reduces the search space from more than $10^{35}$ architectures to just 16. CacheDiff We present a sampling method called, CacheDiff, that has both time and space complexity of O(k) to randomly select k items from a pool of N items, in which N is known. CactusNet Deep neural networks trained over large datasets learn features that are both generic to the whole dataset, and specific to individual classes in the dataset. Learned features tend towards generic in the lower layers and specific in the higher layers of a network. Methods like fine-tuning are made possible because of the ability for one filter to apply to multiple target classes. Much like the human brain this behavior, can also be used to cluster and separate classes. However, to the best of our knowledge there is no metric for how applicable learned features are to specific classes. In this paper we propose a definition and metric for measuring the applicability of learned features to individual classes, and use this applicability metric to estimate input applicability and produce a new method of unsupervised learning we call the CactusNet. CADDeLaG Random walk based distance measures for graphs such as commute-time distance are useful in a variety of graph algorithms, such as clustering, anomaly detection, and creating low dimensional embeddings. Since such measures hinge on the spectral decomposition of the graph, the computation becomes a bottleneck for large graphs and do not scale easily to graphs that cannot be loaded in memory. Most existing graph mining libraries for large graphs either resort to sampling or exploit the sparsity structure of such graphs for spectral analysis. However, such methods do not work for dense graphs constructed for studying pairwise relationships among entities in a data set. Examples of such studies include analyzing pairwise locations in gridded climate data for discovering long distance climate phenomena. These graphs representations are fully connected by construction and cannot be sparsified without loss of meaningful information. In this paper we describe CADDeLaG, a framework for scalable computation of commute-time distance based anomaly detection in large dense graphs without the need to load the entire graph in memory. The framework relies on Apache Spark’s memory-centric cluster-computing infrastructure and consists of two building blocks: a decomposable algorithm for commute time distance computation and a distributed linear system solver. We illustrate the scalability of CADDeLaG and its dependency on various factors using both synthetic and real world data sets. We demonstrate the usefulness of CADDeLaG in identifying anomalies in a climate graph sequence, that have been historically missed due to ad hoc graph sparsification and on an election donation data set. cAdv Machine learning, especially deep learning, is widely applied to a range of applications including computer vision, robotics and natural language processing. However, it has been shown that machine learning models are vulnerable to adversarial examples, carefully crafted samples that deceive learning models. In-depth studies on adversarial examples can help better understand potential vulnerabilities and therefore improve model robustness. Recent works have introduced various methods which generate adversarial examples. However, all require the perturbation to be of small magnitude ($\mathcal{L}_p$ norm) for them to be imperceptible to humans, which is hard to deploy in practice. In this paper we propose two novel methods, tAdv and cAdv, which leverage texture transfer and colorization to generate natural perturbation with a large $\mathcal{L}_p$ norm. We conduct extensive experiments to show that the proposed methods are general enough to attack both image classification and image captioning tasks on ImageNet and MSCOCO dataset. In addition, we conduct comprehensive user studies under various conditions to show that our generated adversarial examples are imperceptible to humans even when the perturbations are large. We also evaluate the transferability and robustness of the proposed attacks against several state-of-the-art defenses. Caffe Caffe is a deep learning framework made with expression, speed, and modularity in mind. It is developed by the Berkeley Vision and Learning Center (BVLC) and by community contributors. Yangqing Jia created the project during his PhD at UC Berkeley. Caffe is released under the BSD 2-Clause license. http://…/neural-networks-with-caffe-on-the-gpu Github Cakewalk Sampling Combinatorial optimization is a common theme in computer science which underlies a considerable variety of problems. In contrast to the continuous setting, combinatorial problems require special solution strategies, and it’s hard to come by generic schemes like gradient methods for continuous domains. We follow a standard construction of a parametric sampling distribution that transforms the problem to the continuous domain, allowing us to optimize the expectation of a given objective using estimates of the gradient. In spite of the apparent generality, such constructions are known to suffer from highly variable gradient estimates, and thus require careful tuning that is done in a problem specific manner. We show that a simple trick of converting the objective values to their cumulative probabilities fixes the distribution of the objective, allowing us to derive an online optimization algorithm that can be applied in a generic fashion. As an experimental benchmark we use the task of finding cliques in undirected graphs, and we show that our method, even when blindly applied, consistently outperforms related methods. Notably, on the DIMACS clique benchmark, our method approaches the performance of the best clique finding algorithms without access to the graph structure, and only through objective function evaluations, thus providing significant evidence to the generality and effectivity of our method. Calamari Optical Character Recognition (OCR) on contemporary and historical data is still in the focus of many researchers. Especially historical prints require book specific trained OCR models to achieve applicable results (Springmann and L\’udeling, 2016, Reul et al., 2017a). To reduce the human effort for manually annotating ground truth (GT) various techniques such as voting and pretraining have shown to be very efficient (Reul et al., 2018a, Reul et al., 2018b). Calamari is a new open source OCR line recognition software that both uses state-of-the art Deep Neural Networks (DNNs) implemented in Tensorflow and giving native support for techniques such as pretraining and voting. The customizable network architectures constructed of Convolutional Neural Networks (CNNS) and Long-ShortTerm-Memory (LSTM) layers are trained by the so-called Connectionist Temporal Classification (CTC) algorithm of Graves et al. (2006). Optional usage of a GPU drastically reduces the computation times for both training and prediction. We use two different datasets to compare the performance of Calamari to OCRopy, OCRopus3, and Tesseract 4. Calamari reaches a Character Error Rate (CER) of 0.11% on the UW3 dataset written in modern English and 0.18% on the DTA19 dataset written in German Fraktur, which considerably outperforms the results of the existing softwares. Calibrated Boosting-Forest Excellent ranking power along with well calibrated probability estimates are needed in many classification tasks. In this paper, we introduce a technique, Calibrated Boosting-Forest that captures both. This novel technique is an ensemble of gradient boosting machines that can support both continuous and binary labels. While offering superior ranking power over any individual regression or classification model, Calibrated Boosting-Forest is able to preserve well calibrated posterior probabilities. Along with these benefits, we provide an alternative to the tedious step of tuning gradient boosting machines. We demonstrate that tuning Calibrated Boosting-Forests can be reduced to a simple hyper-parameter selection. We further establish that increasing this hyper-parameter improves the ranking performance under a diminishing return. We examine the effectiveness of Calibrated Boosting-Forest on ligand-based virtual screening where both continuous and binary labels are available and compare the performance of Calibrated Boosting-Forest with logistic regression, gradient boosting machine and deep learning. Calibrated Boosting-Forest achieved an approximately 4% improvement compared to a state-of-art deep learning model and has the potential to achieve an 8% improvement after tuning the single hyper-parameter. Moreover, it achieved around 98% improvement on probability quality measurement compared to the best individual gradient boosting machine. Calibrated Boosting-Forest offers a benchmark demonstration that in the field of ligand-based virtual screening, deep learning is not the universally dominant machine learning model and good calibrated probabilities can better facilitate virtual screening process. CaliCo In this article, we present a recently released R package for Bayesian calibration. Many industrial fields are facing unfeasible or costly field experiments. These experiments are replaced with numerical/computer experiments which are realized by running a numerical code. Bayesian calibration intends to estimate, through a posterior distribution, input parameters of the code in order to make the code outputs close to the available experimental data. The code can be time consuming while the Bayesian calibration implies a lot of code calls which makes studies too burdensome. A discrepancy might also appear between the numerical code and the physical system when facing incompatibility between experimental data and numerical code outputs. The package CaliCo deals with these issues through four statistical models which deal with a time consuming code or not and with discrepancy or not. A guideline for users is provided in order to illustrate the main functions and their arguments. Eventually, a toy example is detailed using CaliCo. This example (based on a real physical system) is in five dimensions and uses simulated data. Cambridge Loudness Model The present paper introduces a deep neural network (DNN) for predicting the instantaneous loudness of a sound from its time waveform. The DNN was trained using the output of a more complex model, called the Cambridge loudness model. While a modern PC can perform a few hundred loudness computations per second using the Cambridge loudness model, it can perform more than 100,000 per second using the DNN, allowing real-time calculation of loudness. The root-mean-square deviation between the predictions of instantaneous loudness level using the two models was less than 0.5 phon for unseen types of sound. We think that the general approach of simulating a complex perceptual model by a much faster DNN can be applied to other perceptual models to make them run in real time. Camelot Camelot is a Python library that makes it easy for anyone to extract tables from PDF files! Canberra Distance The Canberra distance is a numerical measure of the distance between pairs of points in a vector space, introduced in 1966 and refined in 1967 by G. N. Lance and W. T. Williams. It is a weighted version of L1 (Manhattan) distance. The Canberra distance has been used as a metric for comparing ranked lists and for intrusion detection in computer security. Cannistrai-Alanis-Ravai Index(CAR) Predicting missing links in incomplete complex networks efficiently and accurately is still a challenging problem. The recently proposed CAR (Cannistrai-Alanis-Ravai) index shows the power of local link/triangle information in improving link-prediction accuracy. Canonical Correlated AutoEncoder(C2AE) Multi-label classification is a practical yet challenging task in machine learning related fields, since it requires the prediction of more than one label category for each input instance. We propose a novel deep neural networks (DNN) based model, Canonical Correlated AutoEncoder (C2AE), for solving this task. Aiming at better relating feature and label domain data for improved classification, we uniquely perform joint feature and label embedding by deriving a deep latent space, followed by the introduction of label-correlation sensitive loss function for recovering the predicted label outputs. Our C2AE is achieved by integrating the DNN architectures of canonical correlation analysis and autoencoder, which allows end-to-end learning and prediction with the ability to exploit label dependency. Moreover, our C2AE can be easily extended to address the learning problem with missing labels. Our experiments on multiple datasets with different scales confirm the effectiveness and robustness of our proposed method, which is shown to perform favorably against state-of-the-art methods for multi-label classification. Canonical Correlation Analysis(CCA,CANCOR) In statistics, canonical-correlation analysis (CCA, also called ‘Canonical Variates Analysis’) is a way of inferring information from cross-covariance matrices. If we have two vectors X = (X1, …, Xn) and Y = (Y1, …, Ym) of random variables, and there are correlations among the variables, then canonical-correlation analysis will find linear combinations of X and Y which have maximum correlation with each other. T. R. Knapp notes that ‘virtually all of the commonly encountered parametric tests of significance can be treated as special cases of canonical-correlation analysis, which is the general procedure for investigating the relationships between two sets of variables.’ The method was first introduced by Harold Hotelling in 1936, although in the context of angles between flats the mathematical concept was published by Jordan in 1875. Kernel canonical correlation analysis approximates operators for the detection of coherent structures in dynamical data QRFCCA Canonical Correspondence Analysis(CCA) In applied statistics, canonical correspondence analysis (CCA) is a multivariate constrained ordination technique that extracts major gradients among combinations of explanatory variables in a dataset. The requirements of a CCA are that the samples are random and independent and that the independent variables are consistent within the sample site and error-free. Canonical Divergence Analysis(CDA) We aim to analyze the relation between two random vectors that may potentially have both different number of attributes as well as realizations, and which may even not have a joint distribution. This problem arises in many practical domains, including biology and architecture. Existing techniques assume the vectors to have the same domain or to be jointly distributed, and hence are not applicable. To address this, we propose Canonical Divergence Analysis (CDA). Canonical Duality Theory(CDT) A novel canonical duality theory (CDT) is presented for solving general bilevel mixed integer nonlinear optimization governed by linear and quadratic knapsack problems. It shows that the challenging knapsack problems can be solved analytically in term of their canonical dual solutions. The existence and uniqueness of these analytical solutions are proved. NP-Hardness of the knapsack problems is discussed. A powerful CDT algorithm combined with an alternative iteration and a volume reduction method is proposed for solving the NP-hard bilevel knapsack problems. Application is illustrated by a benchmark problem in optimal topology design. The performance and novelty of the proposed method are compared with the popular commercial codes. Canonical Space In this paper, we present some theoretical work to explain why simple gradient descent methods are so successful in solving non-convex optimization problems in learning large-scale neural networks (NN). After introducing a mathematical tool called canonical space, we have proved that the objective functions in learning NNs are convex in the canonical model space. We further elucidate that the gradients between the original NN model space and the canonical space are related by a pointwise linear transformation, which is represented by the so-called disparity matrix. Furthermore, we have proved that gradient descent methods surely converge to a global minimum of zero loss provided that the disparity matrices maintain full rank. If this full-rank condition holds, the learning of NNs behaves in the same way as normal convex optimization. At last, we have shown that the chance to have singular disparity matrices is extremely slim in large NNs. In particular, when over-parameterized NNs are randomly initialized, the gradient decent algorithms converge to a global minimum of zero loss in probability. Canonical Tensor Decomposition(CP) Canonical Variate Regression(CVR) CVR Canonicalization Using Embeddings and Side Information(CESI) Open Information Extraction (OpenIE) methods extract (noun phrase, relation phrase, noun phrase) triples from text, resulting in the construction of large Open Knowledge Bases (Open KBs). The noun phrases (NPs) and relation phrases in such Open KBs are not canonicalized, leading to the storage of redundant and ambiguous facts. Recent research has posed canonicalization of Open KBs as clustering over manuallydefined feature spaces. Manual feature engineering is expensive and often sub-optimal. In order to overcome this challenge, we propose Canonicalization using Embeddings and Side Information (CESI) – a novel approach which performs canonicalization over learned embeddings of Open KBs. CESI extends recent advances in KB embedding by incorporating relevant NP and relation phrase side information in a principled manner. Through extensive experiments on multiple real-world datasets, we demonstrate CESI’s effectiveness. Canopy Clustering Algorithm The canopy clustering algorithm is an unsupervised pre-clustering algorithm introduced by Andrew McCallum, Kamal Nigam and Lyle Ungar in 2000. It is often used as preprocessing step for the K-means algorithm or the Hierarchical clustering algorithm. It is intended to speed up clustering operations on large data sets, where using another algorithm directly may be impractical due to the size of the data set. The algorithm proceeds as follows, using two thresholds T_1 (the loose distance) and T_2 (the tight distance), where T_1 > T_2 . 1. Begin with the set of data points to be clustered. 2. Remove a point from the set, beginning a new ‘canopy’. 3. For each point left in the set, assign it to the new canopy if the distance less than the loose distance T_1. 4. If the distance of the point is additionally less than the tight distance T_2, remove it from the original set. 5. Repeat from step 2 until there are no more data points in the set to cluster. 6. These relatively cheaply clustered canopies can be sub-clustered using a more expensive but accurate algorithm. An important note is that individual data points may be part of several canopies. As an additional speed-up, an approximate and fast distance metric can be used for 3, where a more accurate and slow distance metric can be used for step 4. Since the algorithm uses distance functions and requires the specification of distance thresholds, its applicability for high-dimensional data is limited by the curse of dimensionality. Only when a cheap and approximative – low-dimensional – distance function is available, the produced canopies will preserve the clusters produced by K-means. CaosDB Here we present CaosDB, a Research Data Management System (RDMS) designed to ensure seamless integration of inhomogeneous data sources and repositories of legacy data. Its primary purpose is the management of data from biomedical sciences, both from simulations and experiments during the complete research data lifecycle. An RDMS for this domain faces particular challenges: Research data arise in huge amounts, from a wide variety of sources, and traverse a highly branched path of further processing. To be accepted by its users, an RDMS must be built around workflows of the scientists and practices and thus support changes in workflow and data structure. Nevertheless it should encourage and support the development and observation of standards and furthermore facilitate the automation of data acquisition and processing with specialized software. The storage data model of an RDMS must reflect these complexities with appropriate semantics and ontologies while offering simple methods for finding, retrieving, and understanding relevant data. We show how CaosDB responds to these challenges and give an overview of the CaosDB Server, its data model and its easy-to-learn CaosDB Query Language. We briefly discuss the status of the implementation, how we currently use CaosDB, and how we plan to use and extend it. Capacity Allocation Analysis Designing neural network architectures is a task that lies somewhere between science and art. For a given task, some architectures are eventually preferred over others, based on a mix of intuition, experience, experimentation and luck. For many tasks, the final word is attributed to the loss function, while for some others a further perceptual evaluation is necessary to assess and compare performance across models. In this paper, we introduce the concept of capacity allocation analysis, with the aim of shedding some light on what network architectures focus their modelling capacity on, when used on a given task. We focus more particularly on spatial capacity allocation, which analyzes a posteriori the effective number of parameters that a given model has allocated for modelling dependencies on a given point or region in the input space, in linear settings. We use this framework to perform a quantitative comparison between some classical architectures on various synthetic tasks. Finally, we consider how capacity allocation might translate in non-linear settings. CapsE In this paper, we introduce an embedding model, named CapsE, exploring a capsule network to model relationship triples \textit{(subject, relation, object)}. Our CapsE represents each triple as a 3-column matrix where each column vector represents the embedding of an element in the triple. This 3-column matrix is then fed to a convolution layer where multiple filters are operated to generate different feature maps. These feature maps are used to construct capsules in the first capsule layer. Capsule layers are connected via dynamic routing mechanism. The last capsule layer consists of only one capsule to produce a vector output. The length of this vector output is used to measure the plausibility of the triple. Our proposed CapsE obtains state-of-the-art link prediction results for knowledge graph completion on two benchmark datasets: WN18RR and FB15k-237, and outperforms strong search personalization baselines on SEARCH17 dataset. Capsule Network(CapsNet) A capsule is a group of neurons whose activity vector represents the instantiation parameters of a specific type of entity such as an object or an object part. We use the length of the activity vector to represent the probability that the entity exists and its orientation to represent the instantiation parameters. Active capsules at one level make predictions, via transformation matrices, for the instantiation parameters of higher-level capsules. When multiple predictions agree, a higher level capsule becomes active. We show that a discrimininatively trained, multi-layer capsule system achieves state-of-the-art performance on MNIST and is considerably better than a convolutional net at recognizing highly overlapping digits. To achieve these results we use an iterative routing-by-agreement mechanism: A lower-level capsule prefers to send its output to higher level capsules whose activity vectors have a big scalar product with the prediction coming from the lower-level capsule. Text classification using capsules What is a CapsNet or Capsule Network? Capsule Projection Network(CapProNet) In this paper, we formalize the idea behind capsule nets of using a capsule vector rather than a neuron activation to predict the label of samples. To this end, we propose to learn a group of capsule subspaces onto which an input feature vector is projected. Then the lengths of resultant capsules are used to score the probability of belonging to different classes. We train such a Capsule Projection Network (CapProNet) by learning an orthogonal projection matrix for each capsule subspace, and show that each capsule subspace is updated until it contains input feature vectors corresponding to the associated class. Only a small negligible computing overhead is incurred to train the network in low-dimensional capsule subspaces or through an alternative hyper-power iteration to estimate the normalization matrix. Experiment results on image datasets show the presented model can greatly improve the performance of state-of-the-art ResNet backbones by $10-20\%$ at the same level of computing and memory costs. Capsules of Capsule Networks The capsules of Capsule Networks are collections of neurons that represent an object or part of an object in a parse tree. The output vector of a capsule encodes the so called instantiation parameters of this object (e.g. position, size, or orientation). CAP-Theorem(Brewer’s theorem) In theoretical computer science, the CAP theorem, also known as Brewer’s theorem, states that it is impossible for a distributed computer system to simultaneously provide all three of the following guarantees: · Consistency (all nodes see the same data at the same time) · Availability (a guarantee that every request receives a response about whether it was successful or failed) · Partition tolerance (the system continues to operate despite arbitrary message loss or failure of part of the system) Capture-Mark-Recapture Analysis Mark and recapture is a method commonly used in ecology to estimate an animal population’s size. A portion of the population is captured, marked, and released. Later, another portion is captured and the number of marked individuals within the sample is counted. Since the number of marked individuals within the second sample should be proportional to the number of marked individuals in the whole population, an estimate of the total population size can be obtained by dividing the number of marked individuals by the proportion of marked individuals in the second sample. The method is most useful when it is not practical to count all the individuals in the population. Other names for this method, or closely related methods, include capture-recapture, capture-mark-recapture, mark-recapture, sight-resight, mark-release-recapture, multiple systems estimation, band recovery, the Petersen method and the Lincoln method. Another major application for these methods is in epidemiology, where they are used to estimate the completeness of ascertainment of disease registers. Typical applications include estimating the number of people needing particular services (i.e. services for children with learning disabilities, services for medically frail elderly living in the community), or with particular conditions(i.e. illegal drug addicts, people infected with HIV, etc.). Capuchin Fairness is increasingly recognized as a critical component of machine learning systems. However, it is the underlying data on which these systems are trained that often reflect discrimination, suggesting a database repair problem. Existing treatments of fairness rely on statistical correlations that can be fooled by statistical anomalies, such as Simpson’s paradox. Proposals for causality-based definitions of fairness can correctly model some of these situations, but they require specification of the underlying causal models. In this paper, we formalize the situation as a database repair problem, proving sufficient conditions for fair classifiers in terms of admissible variables as opposed to a complete causal model. We show that these conditions correctly capture subtle fairness violations. We then use these conditions as the basis for database repair algorithms that provide provable fairness guarantees about classifiers trained on their training labels. We evaluate our algorithms on real data, demonstrating improvement over the state of the art on multiple fairness metrics proposed in the literature while retaining high utility. Cartesian Neural Network Constitutive Model(CaNNCM) Elasticity images map biomechanical properties of soft tissues to aid in the detection and diagnosis of pathological states. In particular, quasi-static ultrasonic (US) elastography techniques use force-displacement measurements acquired during an US scan to parameterize the spatio-temporal stress-strain behavior. Current methods use a model-based inverse approach to estimate the parameters associated with a chosen constitutive model. However, model-based methods rely on simplifying assumptions of tissue biomechanical properties, often limiting elastography to imaging one or two linear-elastic parameters. We previously described a data-driven method for building neural network constitutive models (NNCMs) that learn stress-strain relationships from force-displacement data. Using measurements acquired on gelatin phantoms, we demonstrated the ability of NNCMs to characterize linear-elastic mechanical properties without an initial model assumption and thus circumvent the mathematical constraints typically encountered in classic model-based approaches to the inverse problem. While successful, we were required to use a priori knowledge of the internal object shape to define the spatial distribution of regions exhibiting different material properties. Here, we introduce Cartesian neural network constitutive models (CaNNCMs) that are capable of using data to model both linear-elastic mechanical properties and their distribution in space. We demonstrate the ability of CaNNCMs to capture arbitrary material property distributions using stress-strain data from simulated phantoms. Furthermore, we show that a trained CaNNCM can be used to reconstruct a Young’s modulus image. CaNNCMs are an important step toward data-driven modeling and imaging the complex mechanical properties of soft tissues. Cartogram A cartogram is a map in which some thematic mapping variable – such as travel time, population, or Gross National Product – is substituted for land area or distance. The geometry or space of the map is distorted in order to convey the information of this alternate variable. There are two main types of cartograms: area and distance cartograms. Cartograms have a fairly long history, with examples from the mid-1800s. Cascade Attribute Learning Network(CALNet) We propose the cascade attribute learning network (CALNet), which can learn attributes in a control task separately and assemble them together. Our contribution is twofold: first we propose attribute learning in reinforcement learning (RL). Attributes used to be modeled using constraint functions or terms in the objective function, making it hard to transfer. Attribute learning, on the other hand, models these task properties as modules in the policy network. We also propose using novel cascading compensative networks in the CALNet to learn and assemble attributes. Using the CALNet, one can zero shoot an unseen task by separately learning all its attributes, and assembling the attribute modules. We have validated the capacity of our model on a wide variety of control problems with attributes in time, position, velocity and acceleration phases. Cascade Clustering and Reference Point Incremental Learning Based Interactive Algorithm(CLIA) Researches have shown difficulties in obtaining proximity while maintaining diversity for solving many-objective optimization problems (MaOPs). The complexities of the true Pareto Front (PF) also pose serious challenges for the pervasive algorithms for their insufficient ability to adapt to the characteristics of the true PF with no priori. This paper proposes a cascade Clustering and reference point incremental Learning based Interactive Algorithm (CLIA) for many-objective optimization. In the cascade clustering process, using reference lines provided by the learning process, individuals are clustered and intraclassly sorted in a bi-level cascade style for better proximity and diversity. In the reference point incremental learning process, using the feedbacks from the clustering process, the proper generation of reference points is gradually obtained by incremental learning and the reference lines are accordingly repositioned. The advantages of the proposed interactive algorithm CLIA lie not only in the proximity obtainment and diversity maintenance but also in the versatility for the diverse PFs which uses only the interactions between the two processes without incurring extra evaluations. The experimental studies on the CEC’2018 MaOP benchmark functions have shown that the proposed algorithm CLIA has satisfactory covering of the true PFs, and is competitive, stable and efficient compared with the state-of-the-art algorithms. Cascade GAN We deconstruct the performance of GANs into three components: 1. Formulation: we propose a perturbation view of the population target of GANs. Building on this interpretation, we show that GANs can be viewed as a generalization of the robust statistics framework, and propose a novel GAN architecture, termed as Cascade GANs, to provably recover meaningful low-dimensional generator approximations when the real distribution is high-dimensional and corrupted by outliers. 2. Generalization: given a population target of GANs, we design a systematic principle, projection under admissible distance, to design GANs to meet the population requirement using finite samples. We implement our principle in three cases to achieve polynomial and sometimes near-optimal sample complexities: (1) learning an arbitrary generator under an arbitrary pseudonorm; (2) learning a Gaussian location family under total variation distance, where we utilize our principle provide a new proof for the optimality of Tukey median viewed as GANs; (3) learning a low-dimensional Gaussian approximation of a high-dimensional arbitrary distribution under Wasserstein distance. We demonstrate a fundamental trade-off in the approximation error and statistical error in GANs, and show how to apply our principle with empirical samples to predict how many samples are sufficient for GANs in order not to suffer from the discriminator winning problem. 3. Optimization: we demonstrate alternating gradient descent is provably not even locally stable in optimizating the GAN formulation of PCA. We diagnose the problem as the minimax duality gap being non-zero, and propose a new GAN architecture whose duality gap is zero, where the value of the game is equal to the previous minimax value (not the maximin value). We prove the new GAN architecture is globally stable in optimization under alternating gradient descent. Cascade R-CNN In object detection, an intersection over union (IoU) threshold is required to define positives and negatives. An object detector, trained with low IoU threshold, e.g. 0.5, usually produces noisy detections. However, detection performance tends to degrade with increasing the IoU thresholds. Two main factors are responsible for this: 1) overfitting during training, due to exponentially vanishing positive samples, and 2) inference-time mismatch between the IoUs for which the detector is optimal and those of the input hypotheses. A multi-stage object detection architecture, the Cascade R-CNN, is proposed to address these problems. It consists of a sequence of detectors trained with increasing IoU thresholds, to be sequentially more selective against close false positives. The detectors are trained stage by stage, leveraging the observation that the output of a detector is a good distribution for training the next higher quality detector. The resampling of progressively improved hypotheses guarantees that all detectors have a positive set of examples of equivalent size, reducing the overfitting problem. The same cascade procedure is applied at inference, enabling a closer match between the hypotheses and the detector quality of each stage. A simple implementation of the Cascade R-CNN is shown to surpass all single-model object detectors on the challenging COCO dataset. Experiments also show that the Cascade R-CNN is widely applicable across detector architectures, achieving consistent gains independently of the baseline detector strength. The code will be made available at https://…/cascade-rcnn. Cascade Residual Learning Leveraging on the recent developments in convolutional neural networks (CNNs), matching dense correspondence from a stereo pair has been cast as a learning problem, with performance exceeding traditional approaches. However, it remains challenging to generate high-quality disparities for the inherently ill-posed regions. To tackle this problem, we propose a novel cascade CNN architecture composing of two stages. The first stage advances the recently proposed DispNet by equipping it with extra up-convolution modules, leading to disparity images with more details. The second stage explicitly rectifies the disparity initialized by the first stage; it couples with the first-stage and generates residual signals across multiple scales. The summation of the outputs from the two stages gives the final disparity. As opposed to directly learning the disparity at the second stage, we show that residual learning provides more effective refinement. Moreover, it also benefits the training of the overall cascade network. Experimentation shows that our cascade residual learning scheme provides state-of-the-art performance for matching stereo correspondence. By the time of the submission of this paper, our method ranks first in the KITTI 2015 stereo benchmark, surpassing the prior works by a noteworthy margin. CascadeCNN This work presents CascadeCNN, an automated toolflow that pushes the quantisation limits of any given CNN model, aiming to perform high-throughput inference. A two-stage architecture tailored for any given CNN-FPGA pair is generated, consisting of a low- and high-precision unit in a cascade. A confidence evaluation unit is employed to identify misclassified cases from the excessively low-precision unit and forward them to the high-precision unit for re-processing. Experiments demonstrate that the proposed toolflow can achieve a performance boost up to 55% for VGG-16 and 48% for AlexNet over the baseline design for the same resource budget and accuracy, without the need of retraining the model or accessing the training data. Cascaded Multi-Scale Cross Network The deep convolutional neural networks have achieved significant improvements in accuracy and speed for single image super-resolution. However, as the depth of network grows, the information flow is weakened and the training becomes harder and harder. On the other hand, most of the models adopt a single-stream structure with which integrating complementary contextual information under different receptive fields is difficult. To improve information flow and to capture sufficient knowledge for reconstructing the high-frequency details, we propose a cascaded multi-scale cross network (CMSC) in which a sequence of subnetworks is cascaded to infer high resolution features in a coarse-to-fine manner. In each cascaded subnetwork, we stack multiple multi-scale cross (MSC) modules to fuse complementary multi-scale information in an efficient way as well as to improve information flow across the layers. Meanwhile, by introducing residual-features learning in each stage, the relative information between high-resolution and low-resolution features is fully utilized to further boost reconstruction performance. We train the proposed network with cascaded-supervision and then assemble the intermediate predictions of the cascade to achieve high quality image reconstruction. Extensive quantitative and qualitative evaluations on benchmark datasets illustrate the superiority of our proposed method over state-of-the-art super-resolution methods. Cascaded Projection(CaP) We propose a data-driven approach for deep convolutional neural network compression that achieves high accuracy with high throughput and low memory requirements. Current network compression methods either find a low-rank factorization of the features that requires more memory, or select only a subset of features by pruning entire filter channels. We propose the Cascaded Projection (CaP) compression method that projects the output and input filter channels of successive layers to a unified low dimensional space based on a low-rank projection. We optimize the projection to minimize classification loss and the difference between the next layer’s features in the compressed and uncompressed networks. To solve this non-convex optimization problem we propose a new optimization method of a proxy matrix using backpropagation and Stochastic Gradient Descent (SGD) with geometric constraints. Our cascaded projection approach leads to improvements in all critical areas of network compression: high accuracy, low memory consumption, low parameter count and high processing speed. The proposed CaP method demonstrates state-of-the-art results compressing VGG16 and ResNet networks with over 4x reduction in the number of computations and excellent performance in top-5 accuracy on the ImageNet dataset before and after fine-tuning. CascadeML Multi-label classification is an approach which allows a datapoint to be labelled with more than one class at the same time. A common but trivial approach is to train individual binary classifiers per label, but the performance can be improved by considering associations within the labels. Like with any machine learning algorithm, hyperparameter tuning is important to train a good multi-label classifier model. The task of selecting the best hyperparameter settings for an algorithm is an optimisation problem. Very limited work has been done on automatic hyperparameter tuning and AutoML in the multi-label domain. This paper attempts to fill this gap by proposing a neural network algorithm, CascadeML, to train multi-label neural network based on cascade neural networks. This method requires minimal or no hyperparameter tuning and also considers pairwise label associations. The cascade algorithm grows the network architecture incrementally in a two phase process as it learns the weights using adaptive first order gradient algorithm, therefore omitting the requirement of preselecting the number of hidden layers, nodes and the learning rate. The method was tested on 10 multi-label datasets and compared with other multi-label classification algorithms. Results show that CascadeML performs very well without hyperparameter tuning. Cascade-Net In this paper, we consider using deep neural network for OFDM symbol detection and demonstrate its performance advantages in combating large Doppler Shift. In particular, a new architecture named Cascade-Net is proposed for detection, where deep neural network is cascading with a zero-forcing preprocessor to prevent the network stucking in a saddle point or a local minimum point. In addition, we propose a sliding detection approach in order to detect OFDM symbols with large number of subcarriers. We evaluate this new architecture, as well as the sliding algorithm, using the Rayleigh channel with large Doppler spread, which could degrade detection performance in an OFDM system and is especially severe for high frequency band and mmWave communications. The numerical results of OFDM detection in SISO scenario show that cascade-net can achieve better performance than zero-forcing method while providing robustness against ill conditioned channels. We also show the better performance of the sliding cascade network (SCN) compared to sliding zero-forcing detector through numerical simulation. Case Based Reasoning(CBR) Case-based reasoning (CBR), broadly construed, is the process of solving new problems based on the solutions of similar past problems. An auto mechanic who fixes an engine by recalling another car that exhibited similar symptoms is using case-based reasoning. A lawyer who advocates a particular outcome in a trial based on legal precedents or a judge who creates case law is using case-based reasoning. So, too, an engineer copying working elements of nature (practicing biomimicry), is treating nature as a database of solutions to problems. Case-based reasoning is a prominent type of analogy solution making. It has been argued that case-based reasoning is not only a powerful method for computer reasoning, but also a pervasive behavior in everyday human problem solving; or, more radically, that all reasoning is based on past cases personally experienced. This view is related to prototype theory, which is most deeply explored in cognitive science. Case-Control Study A case-control study is a type of study design used widely, originally developed in epidemiology, although its use has also been advocated for the social sciences. It is a type of observational study in which two existing groups differing in outcome are identified and compared on the basis of some supposed causal attribute. Case-control studies are often used to identify factors that may contribute to a medical condition by comparing subjects who have that condition/disease (the “cases”) with patients who do not have the condition/disease but are otherwise similar (the “controls”). They require fewer resources but provide less evidence for causal inference than a randomized controlled trial. CASED We introduce CASED, a novel curriculum sampling algorithm that facilitates the optimization of deep learning segmentation or detection models on data sets with extreme class imbalance. We evaluate the CASED learning framework on the task of lung nodule detection in chest CT. In contrast to two-stage solutions, wherein nodule candidates are first proposed by a segmentation model and refined by a second detection stage, CASED improves the training of deep nodule segmentation models (e.g. UNet) to the point where state of the art results are achieved using only a trivial detection stage. CASED improves the optimization of deep segmentation models by allowing them to first learn how to distinguish nodules from their immediate surroundings, while continuously adding a greater proportion of difficult-to-classify global context, until uniformly sampling from the empirical data distribution. Using CASED during training yields a minimalist proposal to the lung nodule detection problem that tops the LUNA16 nodule detection benchmark with an average sensitivity score of 88.35%. Furthermore, we find that models trained using CASED are robust to nodule annotation quality by showing that comparable results can be achieved when only a point and radius for each ground truth nodule are provided during training. Finally, the CASED learning framework makes no assumptions with regard to imaging modality or segmentation target and should generalize to other medical imaging problems where class imbalance is a persistent problem. Castor We demonstrate Castor, a cloud-based system for contextual IoT time series data and model management at scale. Castor is designed to assist Data Scientists in (a) exploring and retrieving all relevant time series and contextual information that is required for their predictive modelling tasks; (b) seamlessly storing and deploying their predictive models in a cloud production environment; (c) monitoring the performance of all predictive models in productions and (semi-)automatically retraining them in case of performance deterioration. The main features of Castor are: (1) an efficient pipeline for ingesting IoT time series data in real time; (2) a scalable, hybrid data management service for both time series and contextual data; (3) a versatile semantic model for contextual information which can be easily adopted to different application domains; (4) an abstract framework for developing and storing predictive models in R or Python; (5) deployment services which automatically train and/or score predictive models upon user-defined conditions. We demonstrate Castor for a real-world Smart Grid use case and discuss how it can be adopted to other application domains such as Smart Buildings, Telecommunication, Retail or Manufacturing. Cat2Vec This paper presents a method of learning distributed representation for multi-field categorical data, which is a common data format with various applications such as recommender systems, social link prediction, and computational advertising. The success of non-linear models, e.g., factorisation machines, boosted trees, has proved the potential of exploring the interactions among inter-field categories. Inspired by Word2Vec, the distributed representation for natural language, we propose Cat2Vec (categories to vectors) model. In Cat2Vec, a low-dimensional continuous vector is automatically learned for each category in each field. The interactions among inter-field categories are further explored by different neural gates and the most informative ones are selected by pooling layers. In our experiments, with the exploration of the interactions between pairwise categories over layers, the model attains great improvement over state-of-the-art models in a supervised learning task, e.g., click prediction, while capturing the most significant interactions from the data. Deep embedding’s for categorical variables (Cat2Vec) Catalan Number In combinatorial mathematics, the Catalan numbers form a sequence of natural numbers that occur in various counting problems, often involving recursively-defined objects. They are named after the Belgian mathematician Eugène Charles Catalan (1814-1894). Modular Catalan Numbers Catalyst.RL Despite the recent progress in deep reinforcement learning field (RL), and, arguably because of it, a large body of work remains to be done in reproducing and carefully comparing different RL algorithms. We present catalyst.RL, an open source framework for RL research with a focus on reproducibility and flexibility. Main features of our library include large-scale asynchronous distributed training, easy-to-use configuration files with the complete list of hyperparameters for the particular experiments, efficient implementations of various RL algorithms and auxiliary tricks, such as frame stacking, n-step returns, value distributions, etc. To vindicate the usefulness of our framework, we evaluate it on a range of benchmarks in a continuous control, as well as on the task of developing a controller to enable a physiologically-based human model with a prosthetic leg to walk and run. The latter task was introduced at NeurIPS 2018 AI for Prosthetics Challenge, where our team took the 3rd place, capitalizing on the ability of catalyst.RL to train high-quality and sample-efficient RL agents. Cat-and-Mouse Markov Chain The so-called Cat-and-Mouse Markov chain, studied earlier by Litvak and Robert (2012), is a 2-dimensional Markov chain on the lattice $\mathbb{Z}^2$, where the first component is a simple random walk and the second component changes when the components meet. Catastrophe Modeling Catastrophe modeling (also known as cat modeling) is the process of using computer-assisted calculations to estimate the losses that could be sustained due to a catastrophic event such as a hurricane or earthquake. Cat modeling is especially applicable to analyzing risks in the insurance industry and is at the confluence of actuarial science, engineering, meteorology, and seismology. Catastrophic Forgetting ➘ “Catastrophic Interference” Catastrophic Interference Catastrophic interference, also known as catastrophic forgetting, is the tendency of an artificial neural network to completely and abruptly forget previously learned information upon learning new information. Neural networks are an important part of the network approach and connectionist approach to cognitive science. These networks use computer simulations to try to model human behaviours, such as memory and learning. Catastrophic interference is an important issue to consider when creating connectionist models of memory. It was originally brought to the attention of the scientific community by research from McCloskey and Cohen (1989), and Ractcliff (1990). It is a radical manifestation of the ‘sensitivity-stability’ dilemma or the ‘stability-plasticity’ dilemma. Specifically, these problems refer to the issue of being able to make an artificial neural network that is sensitive to, but not disrupted by, new information. Lookup tables and connectionist networks lie on the opposite sides of the stability plasticity spectrum. The former remains completely stable in the presence of new information but lacks the ability to generalize, i.e. infer general principles, from new inputs. On the other hand, connectionist networks like the standard backpropagation network are very sensitive to new information and can generalize on new inputs. Backpropagation models can be considered good models of human memory insofar as they mirror the human ability to generalize but these networks often exhibit less stability than human memory. Notably, these backpropagation networks are susceptible to catastrophic interference. This is considered an issue when attempting to model human memory because, unlike these networks, humans typically do not show catastrophic forgetting. Thus, the issue of catastrophic interference must be eradicated from these backpropagation models in order to enhance the plausibility as models of human memory. CatBoost In this paper we present CatBoost, a new open-sourced gradient boosting library that successfully handles categorical features and outperforms existing publicly available implementations of gradient boosting in terms of quality on a set of popular publicly available datasets. The library has a GPU implementation of learning algorithm and a CPU implementation of scoring algorithm, which are significantly faster than other gradient boosting libraries on ensembles of similar sizes. Categorical Cross Entropy ➘ “Cross Entropy” Categorical Distributional Reinforcement Learning(CDRL) Categorical Distributional Reinforcement Learning (CDRL) [Bellemare et al., 2017]. Categorical Query Language Whenever information from different sources needs to be combined, the data structures supporting that information must first be related. This task, called data integration, is the biggest and most expensive challenge in IT today, accounting for over 40% of enterprise IT budgets. Our technology performs data-integration tasks – such as querying, combining, and evolving databases – using category theory, a branch of mathematics that has already revolutionized several areas of computer science. Category theory gives us the theoretical guidance missing from current-generation data models (Relational, RDF/OWL, Graph, Key-Value, LINQ) and we have used it to build software for integrating data more quickly and more accurately than existing tools. Our product consists of two parts: • CQL, an open-source categorical query language and integrated development environment (IDE). • A proprietary UI and execution engine for increased productivity with CQL – please contact us for licensing information. Categorical Response Model Causal Additive Model(CAM) We develop estimation for potentially high-dimensional additive structural equation models. A key component of our approach is to decouple order search among the variables from feature or edge selection in a directed acyclic graph encoding the causal structure. We show that the former can be done with nonregularized (restricted) maximum likelihood estimation while the latter can be efficiently addressed using sparse regression techniques. Thus, we substantially simplify the problem of structure search and estimation for an important class of causal models. We establish consistency of the (restricted) maximum likelihood estimator for low- and high-dimensional scenarios, and we also allow for misspecification of the error distribution. Furthermore, we develop an efficient computational algorithm which can deal with many variables, and the new method’s accuracy and performance is illustrated on simulated and real data. Causal Complementation Algorithm The theory of linear Diophantine equations in two unknowns over polynomial rings is used to construct causal lifting factorizations for causal two-channel FIR perfect reconstruction multirate filter banks and wavelet transforms. The Diophantine approach generates causal lifting factorizations satisfying certain polynomial degree-reducing inequalities, enabling a new lifting factorization strategy called the \emph{Causal Complementation Algorithm}. This provides an alternative to the noncausal lifting scheme based on the Extended Euclidean Algorithm for Laurent polynomials that was developed by Daubechies and Sweldens. The new approach, which can be regarded as Gaussian elimination in polynomial matrices, utilizes a generalization of polynomial division that ensures existence and uniqueness of quotients whose remainders satisfy user-specified divisibility constraints. The Causal Complementation Algorithm is shown to be more general than the Extended Euclidean Algorithm approach by generating causal lifting factorizations not obtainable using the polynomial Euclidean Algorithm. Causal Convolutional Recurrent Neural Network Streaming Voice Query Recognition using Causal Convolutional Recurrent Neural Networks Causal Falling Rule List(CFRL) A causal falling rule list (CFRL) is a sequence of if-then rules that specifies heterogeneous treatment effects, where (i) the order of rules determines the treatment effect subgroup a subject belongs to, and (ii) the treatment effect decreases monotonically down the list. A given CFRL parameterizes a hierarchical bayesian regression model in which the treatment effects are incorporated as parameters, and assumed constant within model-specific subgroups. Causal Feature Extractor(CFE) Text normalization is a ubiquitous process that appears as the first step of many Natural Language Processing problems. However, previous Deep Learning approaches have suffered from so-called silly errors, which are undetectable on unsupervised frameworks, making those models unsuitable for deployment. In this work, we make use of an attention-based encoder-decoder architecture that overcomes these undetectable errors by using a fine-grained character-level approach rather than a word-level one. Furthermore, our new general-purpose encoder based on causal convolutions, called Causal Feature Extractor (CFE), is introduced and compared to other common encoders. The experimental results show the feasibility of this encoder, which leverages the attention mechanisms the most and obtains better results in terms of accuracy, number of parameters and convergence time. While our method results in a slightly worse initial accuracy (92.74%), errors can be automatically detected and, thus, more readily solved, obtaining a more robust model for deployment. Furthermore, there is still plenty of room for future improvements that will push even further these advantages. Causal Generative Neural Network(CGNN) We introduce CGNN, a framework to learn functional causal models as generative neural networks. These networks are trained using backpropagation to minimize the maximum mean discrepancy to the observed data. Unlike previous approaches, CGNN leverages both conditional independences and distributional asymmetries to seamlessly discover bivariate and multivariate causal structures, with or without hidden variables. CGNN does not only estimate the causal structure, but a full and differentiable generative model of the data. Throughout an extensive variety of experiments, we illustrate the competitive results of CGNN w.r.t state-of-the-art alternatives in observational causal discovery on both simulated and real data, in the tasks of cause-effect inference, v-structure identification, and multivariate causal discovery. Causal Implicit Generative Model(CiGM) We introduce causal implicit generative models (CiGMs): models that allow sampling from not only the true observational but also the true interventional distributions. We show that adversarial training can be used to learn a CiGM, if the generator architecture is structured based on a given causal graph. We consider the application of conditional and interventional sampling of face images with binary feature labels, such as mustache, young. We preserve the dependency structure between the labels with a given causal graph. We devise a two-stage procedure for learning a CiGM over the labels and the image. First we train a CiGM over the binary labels using a Wasserstein GAN where the generator neural network is consistent with the causal graph between the labels. Later, we combine this with a conditional GAN to generate images conditioned on the binary labels. We propose two new conditional GAN architectures: CausalGAN and CausalBEGAN. We show that the optimal generator of the CausalGAN, given the labels, samples from the image distributions conditioned on these labels. The conditional GAN combined with a trained CiGM for the labels is then a CiGM over the labels and the generated image. We show that the proposed architectures can be used to sample from observational and interventional image distributions, even for interventions which do not naturally occur in the dataset. Causal Inference Causal inference is the process of drawing a conclusion about a causal connection based on the conditions of the occurrence of an effect. The main difference between causal inference and inference of association is that the former analyzes the response of the effect variable when the cause is changed. The science of why things occur is called etiology. http://…mp;uid=2&uid=4&sid=21104618644387 Causal Inference Benchmarking Framework Causality-Benchmark is a library developed by IBM Research Haifa for benchmarking algorithms that estimate the causal effect of a treatment on some outcome. The framework includes unlabeled data, labeled data, code for scoring algorithm predictions based on both novel and established metrics. It can benchmark predictions of both population effect size and individual effect size. Causal InfoGAN ➘ “Causal Implicit Generative Model” Causal Information Theory Causal Information Theory – Formal Introduction of Key Concepts Causal Loglinear Model ➘ “Log-Linear Model” Causal Markov Condition The causal Markov condition states that all dependences among the variables that are observed are due to the causal structure. This rules out time series that become dependent because of a common deterministic trend. Causal Model A causal model is an abstract model that describes the causal mechanisms of a system. The model must express more than correlation because correlation does not imply causation. Judea Pearl defines a causal model as an ordered triple , where U is a set of exogenous variables whose values are determined by factors outside the model; V is a set of endogenous variables whose values are determined by factors within the model; and E is a set of structural equations that express the value of each endogenous variable as a function of the values of the other variables in U and V. Causal Modeling Framework of Modular Structural Causal Models(mSCM) We address the problem of causal discovery from data, making use of the recently proposed causal modeling framework of modular structural causal models (mSCM) to handle cycles, latent confounders and non-linearities. We introduce {\sigma}-connection graphs ({\sigma}-CG), a new class of mixed graphs (containing undirected, bidirected and directed edges) with additional structure, and extend the concept of {\sigma}-separation, the appropriate generalization of the well-known notion of d-separation in this setting, to apply to {\sigma}-CGs. We prove the closedness of {\sigma}-separation under marginalisation and conditioning and exploit this to implement a test of {\sigma}-separation on a {\sigma}-CG. This then leads us to the first causal discovery algorithm that can handle non-linear functional relations, latent confounders, cyclic causal relationships, and data from different (stochastic) perfect interventions. As a proof of concept, we show on synthetic data how well the algorithm recovers features of the causal graph of modular structural causal models. Causal Network A causal network is a Bayesian network with an explicit requirement that the relationships be causal. The additional semantics of the causal networks specify that if a node X is actively caused to be in a given state x (an action written as do(X=x)), then the probability density function changes to the one of the network obtained by cutting the links from the parents of X to X, and setting X to the caused value x. Using these semantics, one can predict the impact of external interventions from data obtained prior to intervention. ➚ “Bayesian Network” Causal network reconstruction from time series: From theoretical assumptions to practical estimation Causal Neural Network We introduce causal neural networks, a generalization of the usual feedforward neural networks which allows input features and target outputs to be represented as input or output units. For inferring the values of target outputs which are represented as input units, we developed a forward-backward propagation algorithm which uses gradient descent to minimize the error of the predicted output features. To deal with the large number of possible structures and feature selection, we use a genetic algorithm. Experiments on a regression problem and 5 classification problems show that the causal neural networks can outperform the usual feedforward architectures for particular problems. Wavenilm: A causal neural network for power disaggregation from the complex power signal Causal Prediction Causal Reasoning Causal reasoning is the process of identifying causality: the relationship between a cause and its effect. The study of causality extends from ancient philosophy to contemporary neuropsychology; assumptions about the nature of causality may be shown to be functions of a previous event preceding a later one. The first known protoscientific study of cause and effect occurred in Aristotle’s Physics. Causal inference is an example of causal reasoning. Causal Rule Sets(CRS) We introduce a novel generative model for interpretable subgroup analysis for causal inference applications, Causal Rule Sets (CRS). A CRS model uses a small set of short rules to capture a subgroup where the average treatment effect is elevated compared to the entire population. We present a Bayesian framework for learning a causal rule set. The Bayesian framework consists of a prior that favors simpler models and a Bayesian logistic regression that characterizes the relation between outcomes, attributes and subgroup membership. We find maximum a posteriori models using discrete Monte Carlo steps in the joint solution space of rules sets and parameters. We provide theoretically grounded heuristics and bounding strategies to improve search efficiency. Experiments show that the search algorithm can efficiently recover a true underlying subgroup and CRS shows consistently competitive performance compared to other state-of-the-art baseline methods. Causal System Jump to Search In Control Theory, a Causal System (also Known as a Physical or Nonanticipative System) is a System Where the Output Depends on Past and Current Inputs but not Future Inputs – I.e., the Output y ( t 0 ) {\displaystyle Y(t_{0})} Y(t_{{0}}) Depends on Only the Input x ( t ) {\displaystyle X(t)} X(t) for Values of t = t 0 {\displaystyle T\leq T_{0}} T\leq T_{{0}}. The Idea That the Output of a Function at any Time Depends Only on Past and Present Values of Input is Defined by the Property Commonly Referred to as Causality. A System That has Some Dependence on Input Values From the Future (in Addition to Possible Dependence on Past or Current Input Values) is Termed a non-Causal or Acausal System, and a System That Depends Solely on Future Input Values is an Anticausal System. Note That Some Authors Have Defined an Anticausal System as one That Depends Solely on Future and Present Input Values or, More Simply, as a System That Does not Depend on Past Input Values. Classically, Nature or Physical Reality has Been Considered to be a Causal System. Physics Involving Special Relativity or General Relativity Require More Careful Definitions of Causality, as Described Elaborately in Causality (physics). The Causality of Systems Also Plays an Important Role in Digital Signal Processing, Where Filters are Constructed so That They are Causal, Sometimes by Altering a non-Causal Formulation to Remove the Lack of Causality so That it is Realizable. For More Information, see Causal Filter. For a Causal System, the Impulse Response of the System Must use Only the Present and Past Values of the Input to Determine the Output. This Requirement is a Necessary and Sufficient Condition for a System to be Causal, Regardless of Linearity. Note That Similar Rules Apply to Either Discrete or Continuous Cases. By This Definition of Requiring no Future Input Values, Systems Must be Causal to Process Signals in Real Time.[1] Causal Transfer Learning An important goal in both transfer learning and causal inference is to make accurate predictions when the distribution of the test set and the training set(s) differ. Such a distribution shift may happen as a result of an external intervention on the data generating process, causing certain aspects of the distribution to change, and others to remain invariant. We consider a class of causal transfer learning problems, where multiple training sets are given that correspond to different external interventions, and the task is to predict the distribution of a target variable given measurements of other variables for a new (yet unseen) intervention on the system. We propose a method for solving these problems that exploits causal reasoning but does neither rely on prior knowledge of the causal graph, nor on the the type of interventions and their targets. We evaluate the method on simulated and real world data and find that it outperforms a standard prediction method that ignores the distribution shift. Causal Wasserstein Distance Continuity of the martingale optimal transport problem on the real line CausalBEGAN ➚ “Causal Implicit Generative Model” CausalGAN We propose an adversarial training procedure for learning a causal implicit generative model for a given causal graph. We show that adversarial training can be used to learn a generative model with true observational and interventional distributions if the generator architecture is consistent with the given causal graph. We consider the application of generating faces based on given binary labels where the dependency structure between the labels is preserved with a causal graph. This problem can be seen as learning a causal implicit generative model for the image and labels. We devise a two-stage procedure for this problem. First we train a causal implicit generative model over binary labels using a neural network consistent with a causal graph as the generator. We empirically show that WassersteinGAN can be used to output discrete labels. Later, we propose two new conditional GAN architectures, which we call CausalGAN and CausalBEGAN. We show that the optimal generator of the CausalGAN, given the labels, samples from the image distributions conditioned on these labels. The conditional GAN combined with a trained causal implicit generative model for the labels is then a causal implicit generative model over the labels and the generated image. We show that the proposed architectures can be used to sample from observational and interventional image distributions, even for interventions which do not naturally occur in the dataset. Causality Causality (also referred to as causation, or cause and effect) is what connects one process (the cause) with another process or state (the effect), where the first is partly responsible for the second, and the second is partly dependent on the first. In general, a process has many causes, which are said to be causal factors for it, and all lie in its past. An effect can in turn be a cause of, or causal factor for, many other effects, which all lie in its future. Causality is metaphysically prior to notions of time and space. Causality is an abstraction that indicates how the world progresses, so basic a concept that it is more apt as an explanation of other concepts of progression than as something to be explained by others more basic. The concept is like those of agency and efficacy. For this reason, a leap of intuition may be needed to grasp it. Accordingly, causality is implicit in the logic and structure of ordinary language. Aristotelian philosophy uses the word ’cause’ to mean ‘explanation’ or ‘answer to a why question’, including Aristotle’s material, formal, efficient, and final ’causes’; then the ’cause’ is the explanans for the explanandum. In this case, failure to recognize that different kinds of ’cause’ are being considered can lead to futile debate. Of Aristotle’s four explanatory modes, the one nearest to the concerns of the present article is the ‘efficient’ one. The topic of causality remains a staple in contemporary philosophy. CausalSpartan Causal consistency is an intermediate consistency model that can be achieved together with high availability and high-performance requirements even in presence of network partitions. In the context of partitioned data stores, it has been shown that implicit dependency tracking using clocks is more efficient than explicit dependency tracking by sending dependency check messages. Existing clock-based solutions depend on monotonic psychical clocks that are closely synchronized. These requirements make current protocols vulnerable to clock anomalies. In this paper, we propose a new clock-based algorithm, CausalSpartan, that instead of physical clocks, utilizes Hybrid Logical Clocks (HLCs). We show that using HLCs, without any overhead, we make the system robust on physical clock anomalies. This improvement is more significant in the context of query amplification, where a single query results in multiple GET/PUT operations.We also show that CausalSpartan decreases the visibility latency for a given data item comparing to existing clock-based approaches. In turn, this reduces the completion time of collaborative applications where two clients accessing two different replicas edit same items of the data store. Like previous protocols, CausalSpartan assumes that a given client does not access more than one replica. We show that in presence of network partitions, this assumption (made in several other works) is essential if one were to provide causal consistency as well as immediate availability to local updates. CausalSpartanX Causal consistency is an intermediate consistency model that can be achieved together with high availability and performance requirements even in presence of network partitions. In the context of partitioned data stores, it has been shown that implicit dependency tracking using timestamps is more efficient than explicit dependency tracking. Existing time-based solutions depend on monotonic psychical clocks that are closely synchronized. These requirements make current protocols vulnerable to clock anomalies. In this paper, we propose a new time-based algorithm, CausalSpartanX, that instead of physical clocks, utilizes Hybrid Logical Clocks (HLCs). We show that using HLCs, without any overhead, we make the system robust on physical clock anomalies. This improvement is more significant in the context of query amplification, where a single query results in multiple GET/PUT operations. We also show that CausalSpartanX decreases the visibility latency for a given data item compared with existing time-based approaches. In turn, this reduces the completion time of collaborative applications where two clients accessing two different replicas edit same items of the data store. CausalSpartanX also provides causally consistent distributed read-only transactions. CausalSpartanX read-only transactions are non-blocking and require only one round of communication between the client and the servers. Also, the slowdowns of partitions that are unrelated to a transaction do not affect the performance of the transaction. Like previous protocols, CausalSpartanX assumes that a given client does not access more than one replica. We show that in presence of network partitions, this assumption (made in several other works) is essential if one were to provide causal consistency as well as immediate availability to local updates. Causaltoolbox Estimating heterogeneous treatment effects has become extremely important in many fields and often life changing decisions for individuals are based on these estimates, for example choosing a medical treatment for a patient. In the recent years, a variety of techniques for estimating heterogeneous treatment effects, each making subtly different assumptions, have been suggested. Unfortunately, there are no compelling approaches that allow identification of the procedure that has assumptions that hew closest to the process generating the data set under study and researchers often select just one estimator. This approach risks making inferences based on incorrect assumptions and gives the experimenter too much scope for p-hacking. A single estimator will also tend to overlook patterns other estimators would have picked up. We believe that the conclusion of many published papers might change had a different estimator been chosen and we suggest that practitioners should evaluate many estimators and assess their similarity when investigating heterogeneous treatment effects. We demonstrate this by applying 32 different estimation procedures to an emulated observational data set; this analysis shows that different estimation procedures may give starkly different estimates. We also provide an extensible \texttt{R} package which makes it straightforward for practitioners to apply our analysis to their data. Causative Attack Attacks that target the training process. Cause-Emotion-Action Corpus Many Natural Language Processing works on emotion analysis only focus on simple emotion classification without exploring the potentials of putting emotion into ‘event context’, and ignore the analysis of emotion-related events. One main reason is the lack of this kind of corpus. Here we present Cause-Emotion-Action Corpus, which manually annotates not only emotion, but also cause events and action events. We propose two new tasks based on the data-set: emotion causality and emotion inference. The first task is to extract a triple (cause, emotion, action). The second task is to infer the probable emotion. We are currently releasing the data-set with 10,603 samples and 15,892 events, basic statistic analysis and baseline on both emotion causality and emotion inference tasks. Baseline performance demonstrates that there is much room for both tasks to be improved. Cautious Deep Learning Most classifiers operate by selecting the maximum of an estimate of the conditional distribution $p(y|x)$ where $x$ stands for the features of the instance to be classified and $y$ denotes its label. This often results in a hubristic bias: overconfidence in the assignment of a definite label. Usually, the observations are concentrated on a small volume but the classifier provides definite predictions for the entire space. We propose constructing conformal prediction sets [vovk2005algorithmic] which contain a set of labels rather than a single label. These conformal prediction sets contain the true label with probability $1-\alpha$. Our construction is based on $p(x|y)$ rather than $p(y|x)$ which results in a classifier that is very cautious: it outputs the null set – meaning I don’t know’ — when the object does not resemble the training examples. An important property of our approach is that classes can be added or removed without having to retrain the classifier. We demonstrate the performance on the ImageNet ILSVRC dataset using high dimensional features obtained from state of the art convolutional neural networks. Cavs Recent deep learning (DL) models have moved beyond static network architectures to dynamic ones, handling data where the network structure changes every example, such as sequences of variable lengths, trees, and graphs. Existing dataflow-based programming models for DL—both static and dynamic declaration—either cannot readily express these dynamic models, or are inefficient due to repeated dataflow graph construction and processing, and difficulties in batched execution. We present Cavs, a vertex-centric programming interface and optimized system implementation for dynamic DL models. Cavs represents dynamic network structure as a static vertex function $\mathcal{F}$ and a dynamic instance-specific graph $\mathcal{G}$, and performs backpropagation by scheduling the execution of $\mathcal{F}$ following the dependencies in $\mathcal{G}$. Cavs bypasses expensive graph construction and preprocessing overhead, allows for the use of static graph optimization techniques on pre-defined operations in $\mathcal{F}$, and naturally exposes batched execution opportunities over different graphs. Experiments comparing Cavs to two state-of-the-art frameworks for dynamic NNs (TensorFlow Fold and DyNet) demonstrate the efficacy of this approach: Cavs achieves a near one order of magnitude speedup on training of various dynamic NN architectures, and ablations demonstrate the contribution of our proposed batching and memory management strategies. CAvSAT An inconsistent database is a database that violates one or more integrity constraints, such as functional dependencies. Consistent Query Answering is a rigorous and principled approach to the semantics of queries posed against inconsistent databases. The consistent answers to a query on an inconsistent database is the intersection of the answers to the query on every repair, i.e., on every consistent database that differs from the given inconsistent one in a minimal way. Computing the consistent answers of a fixed conjunctive query on a given inconsistent database can be a coNP-hard problem, even though every fixed conjunctive query is efficiently computable on a given consistent database. We designed, implemented, and evaluated CAvSAT, a SAT-based system for consistent query answering. CAvSAT leverages a set of natural reductions from the complement of consistent query answering to SAT and to Weighted MaxSAT. The system is capable of handling unions of conjunctive queries and arbitrary denial constraints, which include functional dependencies as a special case. We report results from experiments evaluating CAvSAT on both synthetic and real-world databases. These results provide evidence that a SAT-based approach can give rise to a comprehensive and scalable system for consistent query answering. CDF2PDF CDF2PDF is a method of PDF estimation by approximating CDF. The original idea of it was previously proposed in [1] called SIC. However, SIC requires additional hyper-parameter tunning, and no algorithms for computing higher order derivative from a trained NN are provided in [1]. CDF2PDF improves SIC by avoiding the time-consuming hyper-parameter tuning part and enabling higher order derivative computation to be done in polynomial time. Experiments of this method for one-dimensional data shows promising results. Cell Suppression Problem(CSP) Cell suppression is one of the most frequently used techniques to prevent the disclosure of sensitive data in statistical tables. Finding the minimum cost set of nonsensitive entries to suppress, along with the sensitive ones, in order to make a table safe for publication, is a NP-hard problem, denoted the cell suppression problem (CSP). Cell-aware Stacked LSTM(CAS-LSTM) We propose a method of stacking multiple long short-term memory (LSTM) layers for modeling sentences. In contrast to the conventional stacked LSTMs where only hidden states are fed as input to the next layer, our architecture accepts both hidden and memory cell states of the preceding layer and fuses information from the left and the lower context using the soft gating mechanism of LSTMs. Thus the proposed stacked LSTM architecture modulates the amount of information to be delivered not only in horizontal recurrence but also in vertical connections, from which useful features extracted from lower layers are effectively conveyed to upper layers. We dub this architecture Cell-aware Stacked LSTM (CAS-LSTM) and show from experiments that our models achieve state-of-the-art results on benchmark datasets for natural language inference, paraphrase detection, and sentiment classification. CeNN Quantization Cyber-Physical Systems (CPSs) have been pervasive including smart grid, autonomous automobile systems, medical monitoring, process control systems, robotics systems, and automatic pilot avionics. As usually implemented on embedded devices, CPS is typically constrained by computation capacity and energy consumption. In some CPS applications such as telemedicine and advanced driving assistance system (ADAS), data processing on the embedded devices is preferred due to security/safety and real-time requirement. Therefore, high efficiency is highly desirable for such CPS applications. In this paper we present CeNN quantization for high-efficient processing for CPS applications, particularly telemedicine and ADAS applications. We systematically put forward powers-of-two based incremental quantization of CeNNs for efficient hardware implementation. The incremental quantization contains iterative procedures including parameter partition, parameter quantization, and re-training. We propose five different strategies including random strategy, pruning inspired strategy, weighted pruning inspired strategy, nearest neighbor strategy, and weighted nearest neighbor strategy. Experimental results show that our approach can achieve a speedup up to 7.8x with no performance loss compared with the state-of-the-art FPGA solutions for CeNNs. Censored Quantile Regression Forests Random forests are powerful non-parametric regression method but are severely limited in their usage in the presence of randomly censored observations, and naively applied can exhibit poor predictive performance due to the incurred biases. Based on a local adaptive representation of random forests, we develop its regression adjustment for randomly censored regression quantile models. Regression adjustment is based on new estimating equations that adapt to censoring and lead to quantile score whenever the data do not exhibit censoring. The proposed procedure named censored quantile regression forest, allows us to estimate quantiles of time-to-event without any parametric modeling assumption. We establish its consistency under mild model specifications. Numerical studies showcase a clear advantage of the proposed procedure. Censored Time Series Analysis Imputation method in the presence of censored data. The main message of the imputation method is that we should account for the variability of the censored part of the data by mimicking the complete data. That is, we impute the incomplete part with a conditional random sample rather than the conditional expectation or certain constants. Simulation results suggest that the imputation method reduces the possible biases and has similar standard errors than those from complete data. Censoring In statistics, engineering, economics, and medical research, censoring is a condition in which the value of a measurement or observation is only partially known. For example, suppose a study is conducted to measure the impact of a drug on mortality rate. In such a study, it may be known that an individual’s age at death is at least 75 years (but may be more). Such a situation could occur if the individual withdrew from the study at age 75, or if the individual is currently alive at the age of 75. Censoring also occurs when a value occurs outside the range of a measuring instrument. For example, a bathroom scale might only measure up to 300 pounds (140 kg). If a 350 lb (160 kg) individual is weighed using the scale, the observer would only know that the individual’s weight is at least 300 pounds (140 kg). The problem of censored data, in which the observed value of some variable is partially known, is related to the problem of missing data, where the observed value of some variable is unknown. Censoring should not be confused with the related idea truncation. With censoring, observations result either in knowing the exact value that applies, or in knowing that the value lies within an interval. With truncation, observations never result in values outside a given range: values in the population outside the range are never seen or never recorded if they are seen. Note that in statistics, truncation is not the same as rounding. Centered Autologistic Model The traditional autologistic model was proposed by Besag (1972). The model is a Markov random field (MRF) model (Kindermann and Snell, 1980) Centered Initial Attack(CIA) During the last years, a remarkable breakthrough has been made in AI domain thanks to artificial deep neural networks that achieved a great success in many machine learning tasks in computer vision, natural language processing, speech recognition, malware detection and so on. However, they are highly vulnerable to easily crafted adversarial examples. Many investigations have pointed out this fact and different approaches have been proposed to generate attacks while adding a limited perturbation to the original data. The most robust known method so far is the so called C&W attack [1]. Nonetheless, a countermeasure known as feature squeezing coupled with ensemble defense showed that most of these attacks can be destroyed [6]. In this paper, we present a new method we call Centered Initial Attack (CIA) whose advantage is twofold : first, it insures by construction the maximum perturbation to be smaller than a threshold fixed beforehand, without the clipping process that degrades the quality of attacks. Second, it is robust against recently introduced defenses such as feature squeezing, JPEG encoding and even against a voting ensemble of defenses. While its application is not limited to images, we illustrate this using five of the current best classifiers on ImageNet dataset among which two are adversarialy retrained on purpose to be robust against attacks. With a fixed maximum perturbation of only 1.5% on any pixel, around 80% of attacks (targeted) fool the voting ensemble defense and nearly 100% when the perturbation is only 6%. While this shows how it is difficult to defend against CIA attacks, the last section of the paper gives some guidelines to limit their impact. CenterNet Detection identifies objects as axis-aligned boxes in an image. Most successful object detectors enumerate a nearly exhaustive list of potential object locations and classify each. This is wasteful, inefficient, and requires additional post-processing. In this paper, we take a different approach. We model an object as a single point — the center point of its bounding box. Our detector uses keypoint estimation to find center points and regresses to all other object properties, such as size, 3D location, orientation, and even pose. Our center point based approach, CenterNet, is end-to-end differentiable, simpler, faster, and more accurate than corresponding bounding box based detectors. CenterNet achieves the best speed-accuracy trade-off on the MS COCO dataset, with 28.1% AP at 142 FPS, 37.4% AP at 52 FPS, and 45.1% AP with multi-scale testing at 1.4 FPS. We use the same approach to estimate 3D bounding box in the KITTI benchmark and human pose on the COCO keypoint dataset. Our method performs competitively with sophisticated multi-stage methods and runs in real-time. Central Network(CentralNet) This paper proposes a novel multimodal fusion approach, aiming to produce best possible decisions by integrating information coming from multiple media. While most of the past multimodal approaches either work by projecting the features of different modalities into the same space, or by coordinating the representations of each modality through the use of constraints, our approach borrows from both visions. More specifically, assuming each modality can be processed by a separated deep convolutional network, allowing to take decisions independently from each modality, we introduce a central network linking the modality specific networks. This central network not only provides a common feature embedding but also regularizes the modality specific networks through the use of multi-task learning. The proposed approach is validated on 4 different computer vision tasks on which it consistently improves the accuracy of existing multimodal fusion approaches. Centrality In graph theory and network analysis, indicators of centrality identify the most important vertices within a graph. Applications include identifying the most influential person(s) in a social network, key infrastructure nodes in the Internet or urban networks, and super-spreaders of disease. Centrality concepts were first developed in social network analysis, and many of the terms used to measure centrality reflect their sociological origin.[1] They should not be confused with node influence metrics, which seek to quantify the influence of every node in the network. Centralized Coordinate Learning(CCL) Owe to the rapid development of deep neural network (DNN) techniques and the emergence of large scale face databases, face recognition has achieved a great success in recent years. During the training process of DNN, the face features and classification vectors to be learned will interact with each other, while the distribution of face features will largely affect the convergence status of network and the face similarity computing in test stage. In this work, we formulate jointly the learning of face features and classification vectors, and propose a simple yet effective centralized coordinate learning (CCL) method, which enforces the features to be dispersedly spanned in the coordinate space while ensuring the classification vectors to lie on a hypersphere. An adaptive angular margin is further proposed to enhance the discrimination capability of face features. Extensive experiments are conducted on six face benchmarks, including those have large age gap and hard negative samples. Trained only on the small-scale CASIA Webface dataset with 460K face images from about 10K subjects, our CCL model demonstrates high effectiveness and generality, showing consistently competitive performance across all the six benchmark databases. Centralized Kalman-Filtering(CKF) We consider the Kalman-filtering problem with multiple sensors which are connected through a communication network. If all measurements are delivered to one place called fusion center and processed together, we call the process centralized Kalman-filtering (CKF). When there is no fusion center, each sensor can also solve the problem by using local measurements and exchanging information with its neighboring sensors, which is called distributed Kalman-filtering (DKF). Noting that CKF problem is a maximum likelihood estimation problem, which is a quadratic optimization problem, we reformulate DKF problem as a consensus optimization problem, resulting in that DKF problem can be solved by many existing distributed optimization algorithms. A new DKF algorithm employing the distributed dual ascent method is provided and its performance is evaluated through numerical experiments. CentralNet In the context of deep learning, this article presents an original deep network, namely CentralNet, for the fusion of information coming from different sensors. This approach is designed to efficiently and automatically balance the trade-off between early and late fusion (i.e. between the fusion of low-level vs high-level information). More specifically, at each level of abstraction-the different levels of deep networks-uni-modal representations of the data are fed to a central neural network which combines them into a common embedding. In addition, a multi-objective regularization is also introduced, helping to both optimize the central network and the unimodal networks. Experiments on four multimodal datasets not only show state-of-the-art performance, but also demonstrate that CentralNet can actually choose the best possible fusion strategy for a given problem. Centroid Network Traditional clustering algorithms such as K-means rely heavily on the nature of the chosen metric or data representation. To get meaningful clusters, these representations need to be tailored to the downstream task (e.g. cluster photos by object category, cluster faces by identity). Therefore, we frame clustering as a meta-learning task, few-shot clustering, which allows us to specify how to cluster the data at the meta-training level, despite the clustering algorithm itself being unsupervised. We propose Centroid Networks, a simple and efficient few-shot clustering method based on learning representations which are tailored both to the task to solve and to its internal clustering module. We also introduce unsupervised few-shot classification, which is conceptually similar to few-shot clustering, but is strictly harder than supervised* few-shot classification and therefore allows direct comparison with existing supervised few-shot classification methods. On Omniglot and miniImageNet, our method achieves accuracy competitive with popular supervised few-shot classification algorithms, despite using *no labels* from the support set. We also show performance competitive with state-of-the-art learning-to-cluster methods. Cerioli Outlier Detection “Cerioli Outlier Dectection” is an iterated RMCD method of Cerioli (2010) for multivariate outlier detection via robust Mahalanobis distances. Certified Program Model Production distributed systems are challenging to formally verify, in particular when they are based on distributed protocols that are not rigorously described or fully understood. In this paper, we derive models and properties for two core distributed protocols used in eventually consistent production key-value stores such as Riak and Cassandra. We propose a novel modeling called certified program models, where complete distributed systems are captured as programs written in traditional systems languages such as concurrent C. Specifically, we model the read-repair and hinted-handoff recovery protocols as concurrent C programs, test them for conformance with real systems, and then verify that they guarantee eventual consistency, modeling precisely the specification as well as the failure assumptions under which the results hold. cf2vec Algorithm selection using Metalearning aims to find mappings between problem characteristics (i.e. metafeatures) with relative algorithm performance to predict the best algorithm(s) for new datasets. Therefore, it is of the utmost importance that the metafeatures used are informative. In Collaborative Filtering, recent research has created an extensive collection of such metafeatures. However, since these are created based on the practitioner’s understanding of the problem, they may not capture the most relevant aspects necessary to properly characterize the problem. We propose to overcome this problem by taking advantage of Representation Learning, which is able to create an alternative problem characterizations by having the data guide the design of the representation instead of the practitioner’s opinion. Our hypothesis states that such alternative representations can be used to replace standard metafeatures, hence hence leading to a more robust approach to Metalearning. We propose a novel procedure specially designed for Collaborative Filtering algorithm selection. The procedure models Collaborative Filtering as graphs and extracts distributed representations using graph2vec. Experimental results show that the proposed procedure creates representations that are competitive with state-of-the-art metafeatures, while requiring significantly less data and without virtually any human input. CF4CF Automatic solutions which enable the selection of the best algorithms for a new problem are commonly found in the literature. One research area which has recently received considerable efforts is Collaborative Filtering. Existing work includes several approaches using Metalearning, which relate the characteristics of datasets with the performance of the algorithms. This work explores an alternative approach to tackle this problem. Since, in essence, both are recommendation problems, this work uses Collaborative Filtering algorithms to select Collaborative Filtering algorithms. Our approach integrates subsampling landmarkers, which are a data characterization approach commonly used in Metalearning, with a standard Collaborative Filtering method. The experimental results show that CF4CF competes with standard Metalearning strategies in the problem of Collaborative Filtering algorithm selection. Chain Event Graph(CEG) ceg ChainerCV Despite significant progress of deep learning in the field of computer vision, there has not been a software library that covers these methods in a unifying manner. We introduce ChainerCV, a software library that is intended to fill this gap. ChainerCV supports numerous neural network models as well as software components needed to conduct research in computer vision. These implementations emphasize simplicity, flexibility and good software engineering practices. The library is designed to perform on par with the results reported in published papers and its tools can be used as a baseline for future research in computer vision. Our implementation includes sophisticated models like Faster R-CNN and SSD, and covers tasks such as object detection and semantic segmentation. ChainGAN We propose a new architecture and training methodology for generative adversarial networks. Current approaches attempt to learn the transformation from a noise sample to a generated data sample in one shot. Our proposed generator architecture, called $\textit{ChainGAN}$, uses a two-step process. It first attempts to transform a noise vector into a crude sample, similar to a traditional generator. Next, a chain of networks, called $\textit{editors}$, attempt to sequentially enhance this sample. We train each of these units independently, instead of with end-to-end backpropagation on the entire chain. Our model is robust, efficient, and flexible as we can apply it to various network architectures. We provide rationale for our choices and experimentally evaluate our model, achieving competitive results on several datasets. Challenge.AI Patient subtyping based on temporal observations can lead to significantly nuanced subtyping that acknowledges the dynamic characteristics of diseases. Existing methods for subtyping trajectories treat the evolution of clinical observations as a homogeneous process or employ data available at regular intervals. In reality, diseases may have transient underlying states and a state-dependent observation pattern. In our paper, we present an approach to subtype irregular patient data while acknowledging the underlying progression of disease states. Our approach consists of two components: a probabilistic model to determine the likelihood of a patient’s observation trajectory and a mixture model to measure similarity between asynchronous patient trajectories. We demonstrate our model by discovering subtypes of progression to hemodynamic instability (requiring cardiovascular intervention) in a patient cohort from a multi-institution ICU dataset. We find three primary patterns: two of which show classic signs of decompensation (rising heart rate with dropping blood pressure), with one of these showing a faster course of decompensation than the other. The third pattern has transient period of low heart rate and blood pressure. We also show that our model results in a 13% reduction in average cross-entropy error compared to a model with no state progression when forecasting vital signs. Chameleon This paper proposes an efficient neural network (NN) architecture design methodology called Chameleon that honors given resource constraints. Instead of developing new building blocks or using computationally-intensive reinforcement learning algorithms, our approach leverages existing efficient network building blocks and focuses on exploiting hardware traits and adapting computation resources to fit target latency and/or energy constraints. We formulate platform-aware NN architecture search in an optimization framework and propose a novel algorithm to search for optimal architectures aided by efficient accuracy and resource (latency and/or energy) predictors. At the core of our algorithm lies an accuracy predictor built atop Gaussian Process with Bayesian optimization for iterative sampling. With a one-time building cost for the predictors, our algorithm produces state-of-the-art model architectures on different platforms under given constraints in just minutes. Our results show that adapting computation resources to building blocks is critical to model performance. Without the addition of any bells and whistles, our models achieve significant accuracy improvements against state-of-the-art hand-crafted and automatically designed architectures. We achieve 73.8% and 75.3% top-1 accuracy on ImageNet at 20ms latency on a mobile CPU and DSP. At reduced latency, our models achieve up to 8.5% (4.8%) and 6.6% (9.3%) absolute top-1 accuracy improvements compared to MobileNetV2 and MnasNet, respectively, on a mobile CPU (DSP), and 2.7% (4.6%) and 5.6% (2.6%) accuracy gains over ResNet-101 and ResNet-152, respectively, on an Nvidia GPU (Intel CPU). Chan-Darwiche Distance We propose a distance measure between two probability distributions, which allows one to bound the amount of belief change that occurs when moving from one distribution to another. We contrast the proposed measure with some well known measures, including KL-divergence, showing some theoretical properties on its ability to bound belief changes. We then present two practical applications of the proposed distance measure: sensitivity analysis in belief networks and probabilistic belief revision. We show how the distance measure can be easily computed in these applications, and then use it to bound global belief changes that result from either the perturbation of local conditional beliefs or the accommodation of soft evidence. Finally, we show that two well known techniques in sensitivity analysis and belief revision correspond to the minimization of our proposed distance measure and, hence, can be shown to be optimal from that viewpoint. Change Point Analysis(CPA) Change-point analysis is a powerful new tool for determining whether a change has taken place. It is capable of detecting subtle changes missed by control charts. Further, it better characterizes the changes detected by providing confidence levels and confidence intervals. When collecting online data, a change-point analysis is not a replacement for control charting. But, because a change-point analysis can provide further information, the two methods can be used in a complementary fashion. When analyzing historical data, especially when dealing with large data sets, change-point analysis is preferable to control charting. A change-point analysis is more powerful, better characterizes the changes, controls the overall error rate, is robust to outliers, is more flexible and is simpler to use. CPA aims at detecting any change in the mean of a process in historical data. Example questions to be answered by performing CPA: · Did a change occur? · Did more than one change occur? · When did the changes occur? · How confident are we that they are real changes? http://…/changepoint.html Change Point Detection In statistical analysis, change detection or change point detection tries to identify times when the probability distribution of a stochastic process or time series changes. In general the problem concerns both detecting whether or not a change has occurred, or whether several changes might have occurred, and identifying the times of any such changes. Specific applications, like step detection and edge detection, may be concerned with changes in the mean, variance, correlation, or spectral density of the process. More generally change detection also includes the detection of anomalous behavior: anomaly detection. Change Surfaces Identifying changes in model parameters is fundamental in machine learning and statistics. However, standard changepoint models are limited in expressiveness, often addressing unidimensional problems and assuming instantaneous changes. We introduce change surfaces as a multidimensional and highly expressive generalization of changepoints. We provide a model-agnostic formalization of change surfaces, illustrating how they can provide variable, heterogeneous, and non-monotonic rates of change across multiple dimensions. Additionally, we show how change surfaces can be used for counterfactual prediction. As a concrete instantiation of the change surface framework, we develop Gaussian Process Change Surfaces (GPCS). We demonstrate counterfactual prediction with Bayesian posterior mean and credible sets, as well as massive scalability by introducing novel methods for additive non-separable kernels. Using two large spatio-temporal datasets we employ GPCS to discover and characterize complex changes that can provide scientific and policy relevant insights. Specifically, we analyze twentieth century measles incidence across the United States and discover previously unknown heterogeneous changes after the introduction of the measles vaccine. Additionally, we apply the model to requests for lead testing kits in New York City, discovering distinct spatial and demographic patterns. Change-Point Detection Procedure via VIF Regression(VIFCP) ➘ “Variance Inflation Factor” Channel Gating Neural Network Employing deep neural networks to obtain state-of-the-art performance on computer vision tasks can consume billions of floating point operations and several Joules of energy per evaluation. Network pruning, which statically removes unnecessary features and weights, has emerged as a promising way to reduce this computation cost. In this paper, we propose channel gating, a dynamic, fine-grained, training-based computation-cost-reduction scheme. Channel gating works by identifying the regions in the features which contribute less to the classification result and turning off a subset of the channels for computing the pixels within these uninteresting regions. Unlike static network pruning, the channel gating optimizes computations exploiting characteristics specific to each input at run-time. We show experimentally that applying channel gating in state-of-the-art networks can achieve 66% and 60% reduction in FLOPs with 0.22% and 0.29% accuracy loss on the CIFAR-10 and CIFAR-100 datasets, respectively. Channel Matching(CM) A group of transition probability functions form a Shannon’s channel whereas a group of truth functions form a semantic channel. Label learning is to let semantic channels match Shannon’s channels and label selection is to let Shannon’s channels match semantic channels. The Channel Matching (CM) algorithm is provided for multi-label classification. This algorithm adheres to maximum semantic information criterion which is compatible with maximum likelihood criterion and regularized least squares criterion. If samples are very large, we can directly convert Shannon’s channels into semantic channels by the third kind of Bayes’ theorem; otherwise, we can train truth functions with parameters by sampling distributions. A label may be a Boolean function of some atomic labels. For simplifying learning, we may only obtain the truth functions of some atomic label. For a given label, instances are divided into three kinds (positive, negative, and unclear) instead of two kinds as in popular studies so that the problem with binary relevance is avoided. For each instance, the classifier selects a compound label with most semantic information or richest connotation. As a predictive model, the semantic channel does not change with the prior probability distribution (source) of instances. It still works when the source is changed. The classifier changes with the source, and hence can overcome class-imbalance problem. It is shown that the old population’s increasing will change the classifier for label ‘Old’ and has been impelling the semantic evolution of ‘Old’. The CM iteration algorithm for unseen instance classification is introduced. ChannelNet Convolutional neural networks (CNNs) have shown great capability of solving various artificial intelligence tasks. However, the increasing model size has raised challenges in employing them in resource-limited applications. In this work, we propose to compress deep models by using channel-wise convolutions, which re- place dense connections among feature maps with sparse ones in CNNs. Based on this novel operation, we build light-weight CNNs known as ChannelNets. Channel- Nets use three instances of channel-wise convolutions; namely group channel-wise convolutions, depth-wise separable channel-wise convolutions, and the convolu- tional classification layer. Compared to prior CNNs designed for mobile devices, ChannelNets achieve a significant reduction in terms of the number of parameters and computational cost without loss in accuracy. Notably, our work represents the first attempt to compress the fully-connected classification layer, which usually accounts for about 25% of total parameters in compact CNNs. Experimental results on the ImageNet dataset demonstrate that ChannelNets achieve consistently better performance compared to prior methods. Channel-Recurrent Variational Autoencoders(CR-VAE) Variational Autoencoder (VAE) is an efficient framework in modeling natural images with probabilistic latent spaces. However, when the input spaces become complex, VAE becomes less effective, potentially due to the oversimplification of its latent space construction. In this paper, we propose to integrate recurrent connections across channels to both inference and generation steps of VAE. Sequentially building up the complexity of high-level features in this way allows us to capture global-to-local and coarse-to-fine structures of the input data spaces. We show that our channel-recurrent VAE improves existing approaches in multiple aspects: (1) it attains lower negative log-likelihood than standard VAE on MNIST; when trained adversarially, (2) it generates face and bird images with substantially higher visual quality than the state-of-the-art VAE-GAN and (3) channel-recurrency allows learning more interpretable representations; finally (4) it achieves competitive classification results on STL-10 in a semi-supervised setup. Chaos Monkey Chaos Monkey is a service which runs in the Amazon Web Services (AWS) that seeks out Auto Scaling Groups (ASGs) and terminates instances (virtual machines) per group. The software design is flexible enough to work with other cloud providers or instance groupings and can be enhanced to add that support. The service has a configurable schedule that, by default, runs on non-holiday weekdays between 9am and 3pm. In most cases, we have designed our applications to continue working when an instance goes offline, but in those special cases that they don’t, we want to make sure there are people around to resolve and learn from any problems. With this in mind, Chaos Monkey only runs within a limited set of hours with the intent that engineers will be alert and able to respond. Characterized Social Regularization(CSR) Social recommendation, which utilizes social relations to enhance recommender systems, has been gaining increasing attention recently with the rapid development of online social network. Existing social recommendation methods are based on the fact that users preference or decision is influenced by their social friends’ behaviors. However, they assume that the influences of social relation are always the same, which violates the fact that users are likely to share preference on diverse products with different friends. In this paper, we present a novel CSR (short for Characterized Social Regularization) model by designing a universal regularization term for modeling variable social influence. Our proposed model can be applied to both explicit and implicit iteration. Extensive experiments on a real-world dataset demonstrate that CSR significantly outperforms state-of-the-art social recommendation methods. CharBot Domain generation algorithms (DGAs) are commonly leveraged by malware to create lists of domain names which can be used for command and control (C&C) purposes. Approaches based on machine learning have recently been developed to automatically detect generated domain names in real-time. In this work, we present a novel DGA called CharBot which is capable of producing large numbers of unregistered domain names that are not detected by state-of-the-art classifiers for real-time detection of DGAs, including the recently published methods FANCI (a random forest based on human-engineered features) and LSTM.MI (a deep learning approach). CharBot is very simple, effective and requires no knowledge of the targeted DGA classifiers. We show that retraining the classifiers on CharBot samples is not a viable defense strategy. We believe these findings show that DGA classifiers are inherently vulnerable to adversarial attacks if they rely only on the domain name string to make a decision. Designing a robust DGA classifier may, therefore, necessitate the use of additional information besides the domain name alone. To the best of our knowledge, CharBot is the simplest and most efficient black-box adversarial attack against DGA classifiers proposed to date. Charged String Tensor Networks Tensor network methods provide an intuitive graphical language to describe quantum states, channels, open quantum systems and a class of numerical approximation methods that efficiently simulate certain many-body states in one spatial dimension. There are two fundamental types of tensor networks in wide use today. The most common is similar to quantum circuits. The second is the braided class of tensor networks, used in topological quantum computing. Recently a third class of tensor networks was discovered by Jaffe, Liu and Wozniakowski—the JLW-model—notably, the wires carry charge excitations. The rules in which network components can be moved, merged and manipulated in a graphical form of reasoning take an elegant form. For instance the relative charge locations on wires carries precise meaning and changing the ordering modifies a connected network specifically by a complex number. The type of isotopy discovered in the topological JLW-model provides an alternative means to reason about quantum information, computation and protocols. Here we recall the tensor-network building blocks used in a controlled-NOT gate. Some open problems related to the JLW-model are given. Chargrid We introduce a novel type of text representation that preserves the 2D layout of a document. This is achieved by encoding each document page as a two-dimensional grid of characters. Based on this representation, we present a generic document understanding pipeline for structured documents. This pipeline makes use of a fully convolutional encoder-decoder network that predicts a segmentation mask and bounding boxes. We demonstrate its capabilities on an information extraction task from invoices and show that it significantly outperforms approaches based on sequential text or document images. Charikar’s Algorithm To detect near-duplicates this software uses the Charikar’s fingerprinting technique, this means characterizing each document with a unique 64-bit vector, like a fingerprint. To determine whether two documents are Near-duplicates, we have to compare their fingerprints. To do this we use two algorithms, the algorithm developed by Moses Charikar and the Hamming distance algorithm, which allows us to measure the similarity between two vectors of n bits. What is Charikar’s algorithm? · Characterization of the document · Apply hash functions to the characteristics · Obtain fingerprint · Apply vector comparison function: Are (Doc1, doc2) near-duplicate? Hamming-distance (fingerprint (doc1), fingerprint (doc2)) = k GitXiv Charlotte We present Charlotte, a framework for composable, authenticated distributed data structures. Charlotte data is stored in blocks that reference each other by hash. Together, all Charlotte blocks form a directed acyclic graph, the blockweb; all observers and applications use subgraphs of the blockweb for their own data structures. Unlike prior systems, Charlotte data structures are composable: applications and data structures can operate fully independently when possible, and share blocks when desired. To support this composability, we define a language-independent format for Charlotte blocks and a network API for Charlotte servers. An authenticated distributed data structure guarantees that data is immutable and self-authenticating: data referenced will be unchanged when it is retrieved. Charlotte extends these guarantees by allowing applications to plug in their own mechanisms for ensuring availability and integrity of data structures. Unlike most traditional distributed systems, including distributed databases, blockchains, and distributed hash tables, Charlotte supports heterogeneous trust: different observers may have their own beliefs about who might fail, and how. Despite heterogeneity of trust, Charlotte presents each observer with a consistent, available view of data. We demonstrate the flexibility of Charlotte by implementing a variety of integrity mechanisms, including consensus and proof of work. We study the power of disentangling availability and integrity mechanisms by building a variety of applications. The results from these examples suggest that developers can use Charlotte to build flexible, fast, composable applications with strong guarantees. Chart-Text Images greatly help in understanding, interpreting and visualizing data. Adding textual description to images is the first and foremost principle of web accessibility. Visually impaired users using screen readers will use these textual descriptions to get better understanding of images present in digital contents. In this paper, we propose Chart-Text a novel fully automated system that creates textual description of chart images. Given a PNG image of a chart, our Chart-Text system creates a complete textual description of it. First, the system classifies the type of chart and then it detects and classifies the labels and texts in the charts. Finally, it uses specific image processing algorithms to extract relevant information from the chart images. Our proposed system achieves an accuracy of 99.72% in classifying the charts and an accuracy of 78.9% in extracting the data and creating the corresponding textual description. Chat-Crowd In this paper we introduce Chat-crowd, an interactive environment for visual layout composition via conversational interactions. Chat-crowd supports multiple agents with two conversational roles: agents who play the role of a \emph{designer} are in charge of placing objects in an editable canvas according to instructions or commands issued by agents with a \emph{director} role. The system can be integrated with crowdsourcing platforms for both synchronous and asynchronous data collection and is equipped with comprehensive quality controls on the performance of both types of agents. We expect that this system will be useful to build multimodal goal-oriented dialog tasks that require spatial and geometric reasoning. Chebyshev Distance In mathematics, Chebyshev distance (or Tchebychev distance), maximum metric, or L8 metric is a metric defined on a vector space where the distance between two vectors is the greatest of their differences along any coordinate dimension. It is named after Pafnuty Chebyshev. It is also known as chessboard distance, since in the game of chess the minimum number of moves needed by a king to go from one square on a chessboard to another equals the Chebyshev distance between the centers of the squares, if the squares have side length one, as represented in 2-D spatial coordinates with axes aligned to the edges of the board. For example, the Chebyshev distance between f6 and e2 equals 4. Checkpoint-Restart for Unified Memory(CRUM) Unified Virtual Memory (UVM) was recently introduced on recent NVIDIA GPUs. Through software and hardware support, UVM provides a coherent shared memory across the entire heterogeneous node, migrating data as appropriate. The older CUDA programming style is akin to older large-memory UNIX applications which used to directly load and unload memory segments. Newer CUDA programs have started taking advantage of UVM for the same reasons of superior programmability that UNIX applications long ago switched to assuming the presence of virtual memory. Therefore, checkpointing of UVM will become increasingly important, especially as NVIDIA CUDA continues to gain wider popularity: 87 of the top 500 supercomputers in the latest listings are GPU-accelerated, with a current trend of ten additional GPU-based supercomputers each year. A new scalable checkpointing mechanism, CRUM (Checkpoint-Restart for Unified Memory), is demonstrated for hybrid CUDA/MPI computations across multiple computer nodes. CRUM supports a fast, forked checkpointing, which mostly overlaps the CUDA computation with storage of the checkpoint image in stable storage. The runtime overhead of using CRUM is 6% on average, and the time for forked checkpointing is seen to be a factor of up to 40 times less than traditional, synchronous checkpointing. Chernoff Faces Chernoff faces, invented by Herman Chernoff, display multivariate data in the shape of a human face. The individual parts, such as eyes, ears, mouth and nose represent values of the variables by their shape, size, placement and orientation. The idea behind using faces is that humans easily recognize faces and notice small changes without difficulty. Chernoff faces handle each variable differently. Because the features of the faces vary in perceived importance, the way in which variables are mapped to the features should be carefully chosen (e.g. eye size and eyebrow-slant have been found to carry significant weight). Chernoff Information Chernoff information upper bounds the probability of error of the optimal Bayesian decision rule for 2 -class classification problems. However, it turns out that in practice the Chernoff bound is hard to calculate or even approximate. In statistics, many usual distributions, such as Gaussians, Poissons or frequency histograms called multinomials, can be handled in the unified framework of exponential families. In this note, we prove that the Chernoff information for members of the same exponential family can be either derived analytically in closed form, or efficiently approximated using a simple geodesic bisection optimization technique based on an exact geometric characterization of the ‘Chernoff point’ on the underlying statistical manifold. Chi Network Understanding dependence structure among extreme values plays an important role in risk assessment in environmental studies. In this work we propose the $\chi$ network and the annual extremal network for exploring the extremal dependence structure of environmental processes. A $\chi$ network is constructed by connecting pairs whose estimated upper tail dependence coefficient, $\hat \chi$, exceeds a prescribed threshold. We develop an initial $\chi$ network estimator and we use a spatial block bootstrap to assess both the bias and variance of our estimator. We then develop a method to correct the bias of the initial estimator by incorporating the spatial structure in $\chi$. In addition to the $\chi$ network, which assesses spatial extremal dependence over an extended period of time, we further introduce an annual extremal network to explore the year-to-year temporal variation of extremal connections. We illustrate the $\chi$ and the annual extremal networks by analyzing the hurricane season maximum precipitation at the US Gulf Coast and surrounding area. Analysis suggests there exists long distance extremal dependence for precipitation extremes in the study region and the strength of the extremal dependence may depend on some regional scale meteorological conditions, for example, sea surface temperature. Chiller Distributed transactions on high-overhead TCP/IP-based networks were conventionally considered to be prohibitively expensive and thus were avoided at all costs. To that end, the primary goal of almost any existing partitioning scheme is to minimize the number of cross-partition transactions. However, with the next generation of fast RDMA-enabled networks, this assumption is no longer valid. In fact, recent work has shown that distributed databases can scale even when the majority of transactions are cross-partition. In this paper, we first make the case that the new bottleneck which hinders truly scalable transaction processing in modern RDMA-enabled databases is data contention, and that optimizing for data contention leads to different partitioning layouts than optimizing for the number of distributed transactions. We then present Chiller, a new approach to data partitioning and transaction execution, which minimizes data contention for both local and distributed transactions. Finally, we evaluate Chiller using TPC-C and a real-world workload, and show that our partitioning and execution strategy outperforms traditional partitioning techniques which try to avoid distributed transactions, by up to a factor of 2 under the same conditions. ChIMP Information fusion is an essential part of numerous engineering systems and biological functions, e.g., human cognition. Fusion occurs at many levels, ranging from the low-level combination of signals to the high-level aggregation of heterogeneous decision-making processes. While the last decade has witnessed an explosion of research in deep learning, fusion in neural networks has not observed the same revolution. Specifically, most neural fusion approaches are ad hoc, are not understood, are distributed versus localized, and/or explainability is low (if present at all). Herein, we prove that the fuzzy Choquet integral (ChI), a powerful nonlinear aggregation function, can be represented as a multi-layer network, referred to hereafter as ChIMP. We also put forth an improved ChIMP (iChIMP) that leads to a stochastic gradient descent-based optimization in light of the exponential number of ChI inequality constraints. An additional benefit of ChIMP/iChIMP is that it enables eXplainable AI (XAI). Synthetic validation experiments are provided and iChIMP is applied to the fusion of a set of heterogeneous architecture deep models in remote sensing. We show an improvement in model accuracy and our previously established XAI indices shed light on the quality of our data, model, and its decisions. Chinese Restaurant Process In probability theory, the Chinese restaurant process is a discrete-time stochastic process, analogous to seating customers at tables in a Chinese restaurant. Imagine a Chinese restaurant with an infinite number of circular tables, each with infinite capacity. Customer 1 is seated at an unoccupied table with probability 1. At time n + 1, a new customer chooses uniformly at random to sit at one of the following n + 1 places: directly to the left of one of the n customers already sitting at an occupied table, or at a new, unoccupied table. David J. Aldous attributes the restaurant analogy to Jim Pitman and Lester Dubins in his 1983 book. At time n, the value of the process is a partition of the set of n customers, where the tables are the blocks of the partition. Mathematicians are interested in the probability distribution of this random partition. Chi-Square Test A chi-squared test, also referred to as test, is any statistical hypothesis test in which the sampling distribution of the test statistic is a chi-squared distribution when the null hypothesis is true. Also considered a chi-squared test is a test in which this is asymptotically true, meaning that the sampling distribution (if the null hypothesis is true) can be made to approximate a chi-squared distribution as closely as desired by making the sample size large enough. The chi-square (I) test is used to determine whether there is a significant difference between the expected frequencies and the observed frequencies in one or more categories. Do the number of individuals or objects that fall in each category differ significantly from the number you would expect? Is this difference between the expected and observed due to sampling variation, or is it a real difference? Chi-Square Test Neural Network We introduce the chi-square test neural network: a single hidden layer backpropagation neural network using chi-square test theorem to redefine the cost function and the error function. The weights and thresholds are modified using standard backpropagation algorithm. The proposed approach has the advantage of making consistent data distribution over training and testing sets. It can be used for binary classification. The experimental results on real world data sets indicate that the proposed algorithm can significantly improve the classification accuracy comparing to related approaches. CHi-squared Automatic Interaction Detection(CHAID) CHAID is a type of decision tree technique, based upon adjusted significance testing (Bonferroni testing). The technique was developed in South Africa and was published in 1980 by Gordon V. Kass, who had completed a PhD thesis on this topic. CHAID can be used for prediction (in a similar fashion to regression analysis, this version of CHAID being originally known as XAID) as well as classification, and for detection of interaction between variables. CHAID stands for CHi-squared Automatic Interaction Detection, based upon a formal extension of the US AID (Automatic Interaction Detection) and THAID (THeta Automatic Interaction Detection) procedures of the 1960s and 70s, which in turn were extensions of earlier research, including that performed in the UK in the 1950s. In practice, CHAID is often used in the context of direct marketing to select groups of consumers and predict how their responses to some variables affect other variables, although other early applications were in the field of medical and psychiatric research. Like other decision trees, CHAID’s advantages are that its output is highly visual and easy to interpret. Because it uses multiway splits by default, it needs rather large sample sizes to work effectively, since with small sample sizes the respondent groups can quickly become too small for reliable analysis. One important advantage of CHAID over alternatives such as multiple regression is that it is non-parametric. CHAID and R — When you need explanation Choice Modeling Choice modelling attempts to model the decision process of an individual or segment in a particular context. Choice modelling may be used to estimate non-market environmental benefits and costs. Many alternative models exist in econometrics, marketing, sociometrics and other fields, including utility maximization, optimization applied to consumer theory, and a plethora of other identification strategies which may be more or less accurate depending on the data, sample, hypothesis and the particular decision being modelled. In addition Choice Modelling is regarded as the most suitable method for estimating consumers’ willingness to pay for quality improvements in multiple dimensions. Neuroscience Suggests Choice Model Misspecification ChoiceNet We introduce a new architecture called ChoiceNet where each layer of the network is highly connected with skip connections and channelwise concatenations. This enables the network to alleviate the problem of vanishing gradients, reduces the number of parameters without sacrificing performance, and encourages feature reuse. We evaluate our proposed architecture on three benchmark datasetsforobjectrecognitiontasks(CIFAR-10,CIFAR100, SVHN) and on a semantic segmentation dataset (CamVid). Cholesky Decomposition In linear algebra, the Cholesky decomposition or Cholesky factorization is a decomposition of a Hermitian, positive-definite matrix into the product of a lower triangular matrix and its conjugate transpose, useful for efficient numerical solutions and Monte Carlo simulations. It was discovered by André-Louis Cholesky for real matrices. When it is applicable, the Cholesky decomposition is roughly twice as efficient as the LU decomposition for solving systems of linear equations. Chopthin Resampler Resampling is a standard step in particle filters and more generally sequential Monte Carlo methods. We present an algorithm, called chopthin, for resampling weighted particles. In contrast to standard resampling methods the algorithm does not produce a set of equally weighted particles; instead it merely enforces an upper bound on the ratio between the weights. A simulation study shows that the chopthin algorithm consistently outperforms standard resampling methods. The algorithms chops up particles with large weight and thins out particles with low weight, hence its name. It implicitly guarantees a lower bound on the effective sample size. The algorithm can be implemented very efficiently, making it practically useful. We show that the expected computational effort is linear in the number of particles. Implementations for C++, R (on CRAN) and for Matlab are available. chopthin Choquet Fuzzy Integral Vertical Bagging Classifier Mobile app development in recent years has resulted in new products and features to improve human life. Mobile telematics is one such development that encompasses multidisciplinary fields for transportation safety. The application of mobile telematics has been explored in many areas, such as insurance and road safety. However, to the best of our knowledge, its application in gender detection has not been explored. This paper proposes a Choquet fuzzy integral vertical bagging classifier that detects gender through mobile telematics. In this model, different random forest classifiers are trained by randomly generated features with rough set theory, and the top three classifiers are fused using the Choquet fuzzy integral. The model is implemented and evaluated on a real dataset. The empirical results indicate that the Choquet fuzzy integral vertical bagging classifier outperforms other classifiers. Choquet Integral A Choquet integral is a subadditive or superadditive integral created by the French mathematician Gustave Choquet in 1953. It was initially used in statistical mechanics and potential theory, but found its way into decision theory in the 1980s, where it is used as a way of measuring the expected utility of an uncertain event. It is applied specifically to membership functions and capacities. In imprecise probability theory, the Choquet integral is also used to calculate the lower expectation induced by a 2-monotone lower probability, or the upper expectation induced by a 2-alternating upper probability. Using the Choquet integral to denote the expected utility of belief functions measured with capacities is a way to reconcile the Ellsberg paradox and the Allais paradox. http://…/Ayub_Khan_2009.pdf Choropleth Map A choropleth map is a thematic map in which areas are shaded or patterned in proportion to the measurement of the statistical variable being displayed on the map, such as population density or per-capita income. The choropleth map provides an easy way to visualize how a measurement varies across a geographic area or it shows the level of variability within a region. A special type of choropleth map is a prism map, a three-dimensional map in which a given region’s height on the map is proportional to the statistical variable’s value for that region. Chow-Liu Tree In probability theory and statistics Chow-Liu tree is an efficient method for constructing a second-order product approximation of a joint probability distribution, first described in a paper by Chow & Liu (1968). The goals of such a decomposition, as with such Bayesian networks in general, may be either data compression or inference. Structure Learning in Bayesian Networks Christoffel Function Chronohorogram Chumbley Score A statistical analysis and computational algorithm for comparing pairs of tool marks via profilometry data is described. Empirical validation of the method is established through experiments based on tool marks made at selected fixed angles from 50 sequentially manufactured screwdriver tips. Results obtained from three different comparison scenarios are presented and are in agreement with experiential knowledge possessed by practicing examiners. Further comparisons between scores produced by the algorithm and visual assessments of the same tool mark pairs by professional tool mark examiners in a blind study in general show good agreement between the algorithm and human experts. In specific instances where the algorithm had difficulty in assessing a particular comparison pair, results obtained during the collaborative study with professional examiners suggest ways in which algorithm performance may be improved. It is concluded that the addition of contextual information when inputting data into the algorithm should result in better performance. toolmaRk Churn Rate Churn rate (sometimes called attrition rate), in its broadest sense, is a measure of the number of individuals or items moving out of a collective group over a specific period. It is one of two primary factors that determine the steady-state level of customers a business will support. CINEX Information extraction traditionally focuses on extracting relations between identifiable entities, such as . Yet, texts often also contain Counting information, stating that a subject is in a specific relation with a number of objects, without mentioning the objects themselves, for example, ‘California is divided into 58 counties’. Such counting quantifiers can help in a variety of tasks such as query answering or knowledge base curation, but are neglected by prior work. This paper develops the first full-fledged system for extracting counting information from text, called CINEX. We employ distant supervision using fact counts from a knowledge base as training seeds, and develop novel techniques for dealing with several challenges: (i) non-maximal training seeds due to the incompleteness of knowledge bases, (ii) sparse and skewed observations in text sources, and (iii) high diversity of linguistic patterns. Experiments with five human-evaluated relations show that CINEX can achieve 60% average precision for extracting counting information. In a large-scale experiment, we demonstrate the potential for knowledge base enrichment by applying CINEX to 2,474 frequent relations in Wikidata. CINEX can assert the existence of 2.5M facts for 110 distinct relations, which is 28% more than the existing Wikidata facts for these relations. CIoTA Due to their rapid growth and deployment, Internet of things (IoT) devices have become a central aspect of our daily lives. However, they tend to have many vulnerabilities which can be exploited by an attacker. Unsupervised techniques, such as anomaly detection, can help us secure the IoT devices. However, an anomaly detection model must be trained for a long time in order to capture all benign behaviors. This approach is vulnerable to adversarial attacks since all observations are assumed to be benign while training the anomaly detection model. In this paper, we propose CIoTA, a lightweight framework that utilizes the blockchain concept to perform distributed and collaborative anomaly detection for devices with limited resources. CIoTA uses blockchain to incrementally update a trusted anomaly detection model via self-attestation and consensus among IoT devices. We evaluate CIoTA on our own distributed IoT simulation platform, which consists of 48 Raspberry Pis, to demonstrate CIoTA’s ability to enhance the security of each device and the security of the network as a whole. Circulant Convolutional Layer(CircConv) Deep neural networks (DNNs), especially deep convolutional neural networks (CNNs), have emerged as the powerful technique in various machine learning applications. However, the large model sizes of DNNs yield high demands on computation resource and weight storage, thereby limiting the practical deployment of DNNs. To overcome these limitations, this paper proposes to impose the circulant structure to the construction of convolutional layers, and hence leads to circulant convolutional layers (CircConvs) and circulant CNNs. The circulant structure and models can be either trained from scratch or re-trained from a pre-trained non-circulant model, thereby making it very flexible for different training environments. Through extensive experiments, such strong structure-imposing approach is proved to be able to substantially reduce the number of parameters of convolutional layers and enable significant saving of computational cost by using fast multiplication of the circulant tensor. Circular Block Permutation With a Random Starting Point In a sequence of multivariate observations or non-Euclidean data objects, such as networks, local dependence is common and could lead to false change-point discoveries. We propose a new way of permutation — circular block permutation with a random starting point — to address this problem. This permutation scheme is studied on a non-parametric change-point detection framework based on a similarity graph constructed on the observations, leading to a general framework for change-point detection for data with local dependency. Simulation studies show that this new framework retains the same level of power when there is no local dependency, while it controls type I error correctly for sequences with and without local dependency. We also derive an analytic p-value approximation under this new framework. The approximation works well for sequences with length in hundreds and above, making this approach fast-applicable for long data sequences. Circular Plot / Circos Circos is a software package for visualizing data and information. It visualizes data in a circular layout – this makes Circos ideal for exploring relationships between objects or positions. There are other reasons why a circular layout is advantageous, not the least being the fact that it is attractive. Circular Statistics ➘ “Directional Statistics” Circumcentered-Reflection Method The elementary Euclidean concept of circumcenter has recently been employed to improve two aspects of the classical Douglas–Rachford method for projecting onto the intersection of affine subspaces. The so-called circumcentered-reflection method is able to both accelerate the average reflection scheme by the Douglas–Rachford method and cope with the intersection of more than two affine subspaces. We now introduce the technique of circumcentering in blocks, which, more than just an option over the basic algorithm of circumcenters, turns out to be an elegant manner of generalizing the method of alternating projections. Linear convergence for this novel block-wise circumcenter framework is derived and illustrated numerically. Furthermore, we prove that the original circumcentered-reflection method essentially finds the best approximation solution in one single step if the given affine subspaces are hyperplanes. Citizen Data Scientist(CDS) A citizen data scientist is a role that analyzes data and creates data and business models for their companies with the help of big data tools and technologies. Citizen data scientists do not necessarily need to be data science or business intelligence experts. This role is given to employees in an organization who can use the big data tools and technology to create data models. Back in 2016, Gartner coined the term, meaning a person ‘who creates or generates models that use advanced diagnostic analytics or predictive and prescriptive capabilities, but whose primary job function is outside the field of statistics and analytics.’ Citizen Data Scientists – Are we there yet? ClariNet In this work, we propose an alternative solution for parallel wave generation by WaveNet. In contrast to parallel WaveNet (Oord et al., 2018), we distill a Gaussian inverse autoregressive flow from the autoregressive WaveNet by minimizing a novel regularized KL divergence between their highly-peaked output distributions. Our method computes the KL divergence in closed-form, which simplifies the training algorithm and provides very efficient distillation. In addition, we propose the first text-to-wave neural architecture for speech synthesis, which is fully convolutional and enables fast end-to-end training from scratch. It significantly outperforms the previous pipeline that connects a text-to-spectrogram model to a separately trained WaveNet (Ping et al., 2017). We also successfully distill a parallel waveform synthesizer conditioned on the hidden representation in this end-to-end model. Class Label Autoencoder Existing zero-shot learning (ZSL) methods usually learn a projection function between a feature space and a semantic embedding space(text or attribute space) in the training seen classes or testing unseen classes. However, the projection function cannot be used between the feature space and multi-semantic embedding spaces, which have the diversity characteristic for describing the different semantic information of the same class. To deal with this issue, we present a novel method to ZSL based on learning class label autoencoder (CLA). CLA can not only build a uniform framework for adapting to multi-semantic embedding spaces, but also construct the encoder-decoder mechanism for constraining the bidirectional projection between the feature space and the class label space. Moreover, CLA can jointly consider the relationship of feature classes and the relevance of the semantic classes for improving zero-shot classification. The CLA solution can provide both unseen class labels and the relation of the different classes representation(feature or semantic information) that can encode the intrinsic structure of classes. Extensive experiments demonstrate the CLA outperforms state-of-art methods on four benchmark datasets, which are AwA, CUB, Dogs and ImNet-2. Class-Agnostic Segmentation Network(CANet) Recent progress in semantic segmentation is driven by deep Convolutional Neural Networks and large-scale labeled image datasets. However, data labeling for pixel-wise segmentation is tedious and costly. Moreover, a trained model can only make predictions within a set of pre-defined classes. In this paper, we present CANet, a class-agnostic segmentation network that performs few-shot segmentation on new classes with only a few annotated images available. Our network consists of a two-branch dense comparison module which performs multi-level feature comparison between the support image and the query image, and an iterative optimization module which iteratively refines the predicted results. Furthermore, we introduce an attention mechanism to effectively fuse information from multiple support examples under the setting of k-shot learning. Experiments on PASCAL VOC 2012 show that our method achieves a mean Intersection-over-Union score of 55.4% for 1-shot segmentation and 57.1% for 5-shot segmentation, outperforming state-of-the-art methods by a large margin of 14.6% and 13.2%, respectively. Classical Test Theory(CTT) Classical test theory is a body of related psychometric theory that predicts outcomes of psychological testing such as the difficulty of items or the ability of test-takers. Generally speaking, the aim of classical test theory is to understand and improve the reliability of psychological tests. Classical test theory may be regarded as roughly synonymous with true score theory. The term ‘classical’ refers not only to the chronology of these models but also contrasts with the more recent psychometric theories, generally referred to collectively as item response theory, which sometimes bear the appellation ‘modern’ as in ‘modern latent trait theory’. Classical test theory as we know it today was codified by Novick (1966) and described in classic texts such as Lord & Novick (1968) and Allen & Yen (1979/2002). The description of classical test theory below follows these seminal publications. Classification Accuracy(CA) In the fields of science, engineering, industry, and statistics, the accuracy of a measurement system is the degree of closeness of measurements of a quantity to that quantity’s actual (true) value. The precision of a measurement system, related to reproducibility and repeatability, is the degree to which repeated measurements under unchanged conditions show the same results. Although the two words precision and accuracy can be synonymous in colloquial use, they are deliberately contrasted in the context of the scientific method. A measurement system can be accurate but not precise, precise but not accurate, neither, or both. For example, if an experiment contains a systematic error, then increasing the sample size generally increases precision but does not improve accuracy. The result would be a consistent yet inaccurate string of results from the flawed experiment. Eliminating the systematic error improves accuracy but does not change precision. A measurement system is considered valid if it is both accurate and precise. Related terms include bias (non-random or directed effects caused by a factor or factors unrelated to the independent variable) and error (random variability). The terminology is also applied to indirect measurements – that is, values obtained by a computational procedure from observed data. In addition to accuracy and precision, measurements may also have a measurement resolution, which is the smallest change in the underlying physical quantity that produces a response in the measurement. In numerical analysis, accuracy is also the nearness of a calculation to the true value; while precision is the resolution of the representation, typically defined by the number of decimal or binary digits. http://…/accuracy.htm Classification Based on Associations(CBA) Classification rule mining aims to discover a small set of rules in the database that forms an accurate classifier. Association rule mining finds all the rules existing in the database that satisfy some minimum support and minimum confidence constraints. For association rule mining, the target of discovery is not pre-determined, while for classification rule mining there is one and only one predetermined target. In this paper, we propose to integrate these two mining techniques. The integration is done by focusing on mining a special subset of association rules, called class association rules (CARs). An efficient algorithm is also given for building a classifier based on the set of discovered CARs. Experimental results show that the classifier built this way is, in general, more accurate than that produced by the state-of-the-art classification system C4.5. In addition, this integration helps to solve a number of problems that exist in the current classification systems. rCBA Classification Based Preselection(CPS) In evolutionary algorithms, a preselection operator aims to select the promising offspring solutions from a candidate offspring set. It is usually based on the estimated or real objective values of the candidate offspring solutions. In a sense, the preselection can be treated as a classification procedure, which classifies the candidate offspring solutions into promising ones and unpromising ones. Following this idea, we propose a classification based preselection (CPS) strategy for evolutionary multiobjective optimization. When applying classification based preselection, an evolutionary algorithm maintains two external populations (training data set) that consist of some selected good and bad solutions found so far; then it trains a classifier based on the training data set in each generation. Finally it uses the classifier to filter the unpromising candidate offspring solutions and choose a promising one from the generated candidate offspring set for each parent solution. In such cases, it is not necessary to estimate or evaluate the objective values of the candidate offspring solutions. The classification based preselection is applied to three state-of-the-art multiobjective evolutionary algorithms (MOEAs) and is empirically studied on two sets of test instances. The experimental results suggest that classification based preselection can successfully improve the performance of these MOEAs. Classification Rule Given a population whose members can be potentially separated into a number of different sets or classes, a classification rule is a procedure in which the elements of the population set are each assigned to one of the classes. A perfect test is such that every element in the population is assigned to the class it really belongs. An imperfect test is such that some errors appear, and then statistical analysis must be applied to analyse the classification. Classification Using Link Prediction(CULP) Link prediction in a graph is the problem of detecting the missing links that would be formed in the near future. Using a graph representation of the data, we can convert the problem of classification to the problem of link prediction which aims at finding the missing links between the unlabeled data (unlabeled nodes) and their classes. To our knowledge, despite the fact that numerous algorithms use the graph representation of the data for classification, none are using link prediction as the heart of their classifying procedure. In this work, we propose a novel algorithm called CULP (Classification Using Link Prediction) which uses a new structure namely Label Embedded Graph or LEG and a link predictor to find the class of the unlabeled data. Different link predictors along with Compatibility Score – a new link predictor we proposed that is designed specifically for our settings – has been used and showed promising results for classifying different datasets. This paper further improved CULP by designing an extension called CULM which uses a majority vote (hence the M in the acronym) procedure with weights proportional to the predictions’ confidences to use the predictive power of multiple link predictors and also exploits the low level features of the data. Extensive experimental evaluations shows that both CULP and CULM are highly accurate and competitive with the cutting edge graph classifiers and general classifiers. Classification Without Labels Modern machine learning techniques can be used to construct powerful models for difficult collider physics problems. In many applications, however, these models are trained on imperfect simulations due to a lack of truth-level information in the data, which risks the model learning artifacts of the simulation. In this paper, we introduce the paradigm of classification without labels (CWoLa) in which a classifier is trained to distinguish statistical mixtures of classes, which are common in collider physics. Crucially, neither individual labels nor class proportions are required, yet we prove that the optimal classifier in the CWoLa paradigm is also the optimal classifier in the traditional fully-supervised case where all label information is available. After demonstrating the power of this method in an analytical toy example, we consider a realistic benchmark for collider physics: distinguishing quark- versus gluon-initiated jets using mixed quark/gluon training samples. More generally, CWoLa can be applied to any classification problem where labels or class proportions are unknown or simulations are unreliable, but statistical mixtures of the classes are available. Clean and Noisy Label Neural Network In this paper, we address the problem of effectively self-training neural networks in a low-resource setting. Self-training is frequently used to automatically increase the amount of training data. However, in a low-resource scenario, it is less effective due to unreliable annotations created using self-labeling of unlabeled data. We propose to combine self-training with noise handling on the self-labeled data. Directly estimating noise on the combined clean training set and self-labeled data can lead to corruption of the clean data and hence, performs worse. Thus, we propose the Clean and Noisy Label Neural Network which trains on clean and noisy self-labeled data simultaneously by explicitly modelling clean and noisy labels separately. In our experiments on Chunking and NER, this approach performs more robustly than the baselines. Complementary to this explicit approach, noise can also be handled implicitly with the help of an auxiliary learning task. To such a complementary approach, our method is more beneficial than other baseline methods and together provides the best performance overall. CleanML It is widely recognized that the data quality affects machine learning (ML) model performances, and data scientists spend considerable amount of time on data cleaning before model training. However, to date, there does not exist a rigorous study on how exactly does cleaning affect ML — ML community usually focuses on the effects of specific types of noises of certain distributions (e.g., mislabels) on certain ML models, while database (DB) community has been mostly studying the problem of data cleaning alone without considering how data is consumed by downstream analytics. We propose the CleanML benchmark that systematically investigates the impact of data cleaning on downstream ML models. The CleanML benchmark currently includes 13 real-world datasets with real errors, five common error types, and seven different ML models. To ensure that our findings are statistically significant, CleanML carefully controls the randomness in ML experiments using statistical hypothesis testing, and also uses the Benjamini-Yekutieli (BY) procedure to control potential false discoveries due to many hypotheses in the benchmark. We obtain many interesting and non-trivial insights, and identify multiple open research directions. We also release the benchmark and hope to invite future studies on the important problems of joint data cleaning and ML. Cleverhans cleverhans is a software library that provides standardized reference implementations of adversarial example construction techniques and adversarial training. The library may be used to develop more robust machine learning models and to provide standardized benchmarks of models’ performance in the adversarial setting. Benchmarks constructed without a standardized implementation of adversarial example construction are not comparable to each other, because a good result may indicate a robust model or it may merely indicate a weak implementation of the adversarial example construction procedure. CLEVR-Dialog Visual Dialog is a multimodal task of answering a sequence of questions grounded in an image, using the conversation history as context. It entails challenges in vision, language, reasoning, and grounding. However, studying these subtasks in isolation on large, real datasets is infeasible as it requires prohibitively-expensive complete annotation of the ‘state’ of all images and dialogs. We develop CLEVR-Dialog, a large diagnostic dataset for studying multi-round reasoning in visual dialog. Specifically, we construct a dialog grammar that is grounded in the scene graphs of the images from the CLEVR dataset. This combination results in a dataset where all aspects of the visual dialog are fully annotated. In total, CLEVR-Dialog contains 5 instances of 10-round dialogs for about 85k CLEVR images, totaling to 4.25M question-answer pairs. We use CLEVR-Dialog to benchmark performance of standard visual dialog models; in particular, on visual coreference resolution (as a function of the coreference distance). This is the first analysis of its kind for visual dialog models that was not possible without this dataset. We hope the findings from CLEVR-Dialog will help inform the development of future models for visual dialog. Our dataset and code will be made public. Clickstream Analytics A clickstream is the recording of the parts of the screen a computer user clicks on while web browsing or using another software application. As the user clicks anywhere in the webpage or application, the action is logged on a client or inside the web server, as well as possibly the web browser, router, proxy server or ad server. Clickstream analysis is useful for web activity analysis, software testing, market research, and for analyzing employee productivity. Click-Through Rate(CTR) Click-through rate (CTR) is a way of measuring the success of an online advertising campaign for a particular website as well as the effectiveness of an email campaign by the number of users that clicked on a specific link. client2vec The workflow of data scientists normally involves potentially inefficient processes such as data mining, feature engineering and model selection. Recent research has focused on automating this workflow, partly or in its entirety, to improve productivity. We choose the former approach and in this paper share our experience in designing the client2vec: an internal library to rapidly build baselines for banking applications. Client2vec uses marginalized stacked denoising autoencoders on current account transactions data to create vector embeddings which represent the behaviors of our clients. These representations can then be used in, and optimized against, a variety of tasks such as client segmentation, profiling and targeting. Here we detail how we selected the algorithmic machinery of client2vec and the data it works on and present experimental results on several business cases. CLINIcal Question Answering system(CLINIQA) The recent developments in the field of biomedicine have made large volumes of biomedical literature available to the medical practitioners. Due to the large size and lack of efficient searching strategies, medical practitioners struggle to obtain necessary information available in the biomedical literature. Moreover, the most sophisticated search engines of age are not intelligent enough to interpret the clinicians’ questions. These facts reflect the urgent need of an information retrieval system that accepts the queries from medical practitioners’ in natural language and returns the answers quickly and efficiently. In this paper, we present an implementation of a machine intelligence based CLINIcal Question Answering system (CLINIQA) to answer medical practitioner’s questions. The system was rigorously evaluated on different text mining algorithms and the best components for the system were selected. The system makes use of Unified Medical Language System for semantic analysis of both questions and medical documents. In addition, the system employs supervised machine learning algorithms for classification of the documents, identifying the focus of the question and answer selection. Effective domain-specific heuristics are designed for answer ranking. The performance evaluation on hundred clinical questions shows the effectiveness of our approach. Clipper Machine learning is being deployed in a growing number of applications which demand real-time, accurate, and robust predictions under heavy query load. However, most machine learning frameworks and systems only address model training and not deployment. In this paper, we introduce Clipper, the first general-purpose low-latency prediction serving system. Interposing between end-user applications and a wide range of machine learning frameworks, Clipper introduces a modular architecture to simplify model deployment across frameworks. Furthermore, by introducing caching, batching, and adaptive model selection techniques, Clipper reduces prediction latency and improves prediction throughput, accuracy, and robustness without modifying the underlying machine learning frameworks. We evaluate Clipper on four common machine learning benchmark datasets and demonstrate its ability to meet the latency, accuracy, and throughput demands of online serving applications. Finally, we compare Clipper to the TensorFlow Serving system and demonstrate comparable prediction throughput and latency on a range of models while enabling new functionality, improved accuracy, and robustness. Closest Pair Problem The closest pair of points problem or closest pair problem is a problem of computational geometry: given n points in metric space, find a pair of points with the smallest distance between them. The closest pair problem for points in the Euclidean plane was among the first geometric problems that were treated at the origins of the systematic study of the computational complexity of geometric algorithms. A naive algorithm of finding distances between all pairs of points in a space of dimension d and selecting the minimum requires O(n2) time. It turns out that the problem may be solved in O(n log n) time in a Euclidean space or Lp space of fixed dimension d. In the algebraic decision tree model of computation, the O(n log n) algorithm is optimal, by a reduction from the element uniqueness problem. In the computational model that assumes that the floor function is computable in constant time the problem can be solved in O(n log log n) time. If we allow randomization to be used together with the floor function, the problem can be solved in O(n) time. A New Algorithm for Finding Closest Pair of Vectors Closest Vector Problem(CVP) ➘ “Maximum Distance Sub-Lattice Problem” Cloud AutoML(AutoML) Train high-quality custom machine learning models with minimum effort and machine learning expertise. • Train custom machine learning models: Cloud AutoML is a suite of machine learning products that enables developers with limited machine learning expertise to train high-quality models specific to their business needs, by leveraging Google´s state-of-the-art transfer learning, and Neural Architecture Search technology. • State-of-the-art performance: Use Cloud AutoML to leverage Google´s proprietary technology, which offers fast performance and accurate predictions. AutoML puts more than 10 years of Google Research technology in the hands of our users. • Get up and running fast: Cloud AutoML provides a simple graphical user interface (GUI) for you to train, evaluate, improve, and deploy models based on your own data. You´re only a few minutes away from your own custom machine learning model. • Generate high-quality training data: You can use Google´s human labeling service to have real-life people annotate or clean your labels to make sure your models are being trained on high-quality data. Cloud Data The Difference Between Big Data and Cloud Data: New technologies are required for the emergence and standardization of cloud data to take hold. Big data was meant as a holding cell for large amounts of data that could be sorted effectively only by specialized data scientists (this is becoming easier with OLAP on Hadoop type tools). The protocols for big data rely upon simple, standard protocols and can’t be adjusted easily to meet the demands of complex operations. Big data takes time to sort through and analyze, whereas cloud data is immediate and happens in the background using the tremendous resources of cloud servers. Cloud data requires a significantly higher number of resources since it must connect to databases in several geographically distributed services. Since cloud data must flexibly interact with several unique interfaces and security models, the mechanisms used for big data won’t work for cloud data. Cloud SELENE(cSELENE) While working in collaborative team elsewhere sometimes the federated (huge) data are from heterogeneous cloud vendors. It is not only about the data privacy concern but also about how can those federated data can be querying from cloud directly in fast and securely way. Previous solution offered hybrid cloud between public and trusted private cloud. Another previous solution used encryption on MapReduce framework. But the challenge is we are working on heterogeneous clouds. In this paper, we present a novel technique for querying with privacy concern. Since we take execution time into account, our basic idea is to use the data mining model by partitioning the federated databases in order to reduce the search and query time. By using model of the database it means we use only the summary or the very characteristic patterns of the database. Modeling is the Preserving Privacy Stage I, since by modeling the data is being symbolized. We implement encryption on the database as preserving privacy Stage II. Our system, called ‘cSELENE’ (stands for ‘cloud SELENE’), is designed to handle federated data on heterogeneous clouds: AWS, Microsoft Azure, and Google Cloud Platform with MapReduce technique. In this paper we discuss preserving-privacy system and threat model, the format of federated data, the parallel programming (GPU programming and shared/memory systems), the parallel and secure algorithm for data mining model in distributed cloud, the cloud infrastructure/architecture, and the UIX design of the cSELENE system. Other issues such as incremental method and the secure design of cloud architecture system (Virtual Machines across platform design) are still open to discuss. Our experiments should demonstrate the validity and practicality of the proposed high performance computing scheme. Cloud4IoT We introduce Cloud4IoT, a platform offering automatic deployment, orchestration and dynamic configuration of IoT support software components and data-intensive applications for data processing and analytics, thus enabling plug-and-play integration of new sensor objects and dynamic workload scalability. Cloud4IoT enables the concept of Infrastructure as Code in the IoT context: it empowers IoT operations with the flexibility and elasticity of Cloud services. Furthermore it shifts traditionally centralized Cloud architectures towards a more distributed and decentralized computation paradigm, as required by IoT technologies, bridging the gap between Cloud Computing and IoT ecosystems. Thus, Cloud4IoT is playing a role similar to the one covered by solutions like Fog Computing, Cloudlets or Mobile Edge Cloud. The hierarchical architecture of Cloud4IoThosts a central Cloud platform and multiple remote edge Cloud modules supporting dedicated devices, namely the IoT Gateways, through which new sensor objects are made accessible to the platform. Overall, the platform is designed in order to support systems where IoT-based and data intensive applications may pose specific requirements for low latency, restricted available bandwidth, or data locality. Cloud4IoT is built on several Open Source technologies for containerisation and implementations of standards, protocols and services for the IoT. We present the implementation of the platform and demonstrate it in two different use cases. Cloud-Based HyperOpt(CHOPT) Many hyperparameter optimization (HyperOpt) methods assume restricted computing resources and mainly focus on enhancing performance. Here we propose a novel cloud-based HyperOpt (CHOPT) framework which can efficiently utilize shared computing resources while supporting various HyperOpt algorithms. We incorporate convenient web-based user interfaces, visualization, and analysis tools, enabling users to easily control optimization procedures and build up valuable insights with an iterative analysis procedure. Furthermore, our framework can be incorporated with any cloud platform, thus complementarily increasing the efficiency of conventional deep learning frameworks. We demonstrate applications of CHOPT with tasks such as image recognition and question-answering, showing that our framework can find hyperparameter configurations competitive with previous work. We also show CHOPT is capable of providing interesting observations through its analysing tools C-LSTM Neural network models have been demonstrated to be capable of achieving remarkable performance in sentence and document modeling. Convolutional neural network (CNN) and recurrent neural network (RNN) are two mainstream architectures for such modeling tasks, which adopt totally different ways of understanding natural languages. In this work, we combine the strengths of both architectures and propose a novel and unified model called C-LSTM for sentence representation and text classification. C-LSTM utilizes CNN to extract a sequence of higher-level phrase representations, and are fed into a long short-term memory recurrent neural network (LSTM) to obtain the sentence representation. C-LSTM is able to capture both local features of phrases as well as global and temporal sentence semantics. We evaluate the proposed architecture on sentiment classification and question classification tasks. The experimental results show that the C-LSTM outperforms both CNN and LSTM and can achieve excellent performance on these tasks. Clubmark There is a great diversity of clustering and community detection algorithms, which are key components of many data analysis and exploration systems. To the best of our knowledge, however, there does not exist yet any uniform benchmarking framework, which is publicly available and suitable for the parallel benchmarking of diverse clustering algorithms on a wide range of synthetic and real-world datasets. In this paper, we introduce Clubmark, a new extensible framework that aims to fill this gap by providing a parallel isolation benchmarking platform for clustering algorithms and their evaluation on NUMA servers. Clubmark allows for fine-grained control over various execution variables (timeouts, memory consumption, CPU affinity and cache policy) and supports the evaluation of a wide range of clustering algorithms including multi-level, hierarchical and overlapping clustering techniques on both weighted and unweighted input networks with built-in evaluation of several extrinsic and intrinsic measures. Our framework is open-source and provides a consistent and systematic way to execute, evaluate and profile clustering techniques considering a number of aspects that are often missing in state-of-the-art frameworks and benchmarking systems. Clued Recurrent Attention Model(CRAM) To overcome the poor scalability of convolutional neural network, recurrent attention model(RAM) selectively choose what and where to look on the image. By directing recurrent attention model how to look the image, RAM can be even more successful in that the given clue narrow down the scope of the possible focus zone. In this perspective, this work proposes clued recurrent attention model (CRAM) which add clue or constraint on the RAM better problem solving. CRAM follows encoder-decoder framework, encoder utilizes recurrent attention model with spatial transformer network and decoder which varies depending on the task. To ensure the performance, CRAM tackles two computer vision task. One is the image classification task, with clue given as the binary image saliency which indicates the approximate location of object. The other is the inpainting task, with clue given as binary mask which indicates the occluded part. In both tasks, CRAM shows better performance than existing methods showing the successful extension of RAM. ClusART Topic detection becomes more important due to the increase of information electronically available and the necessity to process and filter it. In this context our master’s thesis work was carried out, where we proposed to present a new approach to the detection of topics called ClusART. Thus, we proposed a three-phase approach, namely : a first phase during which lexical preprocessing was conducted. A second phase during which the construction and generation of vectors representing the documents was carried out. A third phase which is itself composed of two steps. In the first step we used the FuzzyART algorithm for the training phase. In the second step we used a classifier using Paragraph Vector for the test phase. The comparative study of our approach on the 20 Newsgroups dataset showed that our approach is able to detect almost relevant topics. Cluster Alignment with a Teacher(CAT) Deep learning methods have shown promise in unsupervised domain adaptation, which aims to leverage a labeled source domain to learn a classifier for the unlabeled target domain with a different distribution. However, such methods typically learn a domain-invariant representation space to match the marginal distributions of the source and target domains, while ignoring their fine-level structures. In this paper, we propose Cluster Alignment with a Teacher (CAT) for unsupervised domain adaptation, which can effectively incorporate the discriminative clustering structures in both domains for better adaptation. Technically, CAT leverages an implicit ensembling teacher model to reliably discover the class-conditional structure in the feature space for the unlabeled target domain. Then CAT forces the features of both the source and the target domains to form discriminative class-conditional clusters and aligns the corresponding clusters across domains. Empirical results demonstrate that CAT achieves state-of-the-art results in several unsupervised domain adaptation scenarios. Cluster Grap In graph theory, a branch of mathematics, a cluster graph is a graph formed from the disjoint union of complete graphs. Equivalently, a graph is a cluster graph if and only if it has no three-vertex induced path; for this reason, the cluster graphs are also called P3-free graphs. They are the complement graphs of the complete multipartite graphs and the 2-leaf powers. A cluster graph is a graph whose every connected component is a complete graph. Cluster Validation There are many cluster analysis methods that can produce quite different clusterings on the same dataset. Cluster validation is about the evaluation of the quality of a clustering; ‘relative cluster validation’ is about using such criteria to compare clusterings. This can be used to select one of a set of clusterings from different methods, or from the same method ran with different parameters such as different numbers of clusters. There are many cluster validation indexes in the literature. Most of them attempt to measure the overall quality of a clustering by a single number, but this can be inappropriate. There are various different characteristics of a clustering that can be relevant in practice, depending on the aim of clustering, such as low within-cluster distances and high between-cluster separation. Cluster-Based Active Learning In this work, we introduce Cluster-Based Active Learning, a novel framework that employs clustering to boost active learning by reducing the number of human interactions required to train deep neural networks. Instead of annotating single samples individually, humans can also label clusters, producing a higher number of annotated samples with the cost of a small label error. Our experiments show that the proposed framework requires 82% and 87% less human interactions for CIFAR-10 and EuroSAT datasets respectively when compared with the fully-supervised training while maintaining similar performance on the test set. Clustered Latent Dirichlet Allocation(CLDA) The all-relevant problem of feature selection is the identification of all strongly and weakly relevant attributes. This problem is especially hard to solve for time series classification and regression in industrial applications such as predictive maintenance or production line optimization, for which each label or regression target is associated with several time series and meta-information simultaneously. Here, we are proposing an efficient, scalable feature extraction algorithm, which filters the available features in an early stage of the machine learning pipeline with respect to their significance for the classification or regression task, while controlling the expected percentage of selected but irrelevant features. The proposed algorithm combines established feature extraction methods with a feature importance filter. It has a low computational complexity, allows to start on a problem with only limited domain knowledge available, can be trivially parallelized, is highly scalable and based on well studied non-parametric hypothesis tests. We benchmark our proposed algorithm on all binary classification problems of the UCR time series classification archive as well as time series from a production line optimization project and simulated stochastic processes with underlying qualitative change of dynamics. Clustered Monotone Transforms for Rating Factorization(CMTRF) Exploiting low-rank structure of the user-item rating matrix has been the crux of many recommendation engines. However, existing recommendation engines force raters with heterogeneous behavior profiles to map their intrinsic rating scales to a common rating scale (e.g. 1-5). This non-linear transformation of the rating scale shatters the low-rank structure of the rating matrix, therefore resulting in a poor fit and consequentially, poor recommendations. In this paper, we propose Clustered Monotone Transforms for Rating Factorization (CMTRF), a novel approach to perform regression up to unknown monotonic transforms over unknown population segments. Essentially, for recommendation systems, the technique searches for monotonic transformations of the rating scales resulting in a better fit. This is combined with an underlying matrix factorization regression model that couples the user-wise ratings to exploit shared low dimensional structure. The rating scale transformations can be generated for each user, for a cluster of users, or for all the users at once, forming the basis of three simple and efficient algorithms proposed in this paper, all of which alternate between transformation of the rating scales and matrix factorization regression. Despite the non-convexity, CMTRF is theoretically shown to recover a unique solution under mild conditions. Experimental results on two synthetic and seven real-world datasets show that CMTRF outperforms other state-of-the-art baselines. Clustered Sparrow Algorithm The clustered Sparrow algorithm ClusterFuzz ClusterFuzz is a scalable fuzzing infrastructure which finds security and stability issues in software. It is used by Google for fuzzing the Chrome Browser, and serves as the fuzzing backend for OSS-Fuzz. Cluster-GCN Graph convolutional network (GCN) has been successfully applied to many graph-based applications; however, training a large-scale GCN remains challenging. Current SGD-based algorithms suffer from either a high computational cost that exponentially grows with number of GCN layers, or a large space requirement for keeping the entire graph and the embedding of each node in memory. In this paper, we propose Cluster-GCN, a novel GCN algorithm that is suitable for SGD-based training by exploiting the graph clustering structure. Cluster-GCN works as the following: at each step, it samples a block of nodes that associate with a dense subgraph identified by a graph clustering algorithm, and restricts the neighborhood search within this subgraph. This simple but effective strategy leads to significantly improved memory and computational efficiency while being able to achieve comparable test accuracy with previous algorithms. To test the scalability of our algorithm, we create a new Amazon2M data with 2 million nodes and 61 million edges which is more than 5 times larger than the previous largest publicly available dataset (Reddit). For training a 3-layer GCN on this data, Cluster-GCN is faster than the previous state-of-the-art VR-GCN (1523 seconds vs 1961 seconds) and using much less memory (2.2GB vs 11.2GB). Furthermore, for training 4 layer GCN on this data, our algorithm can finish in around 36 minutes while all the existing GCN training algorithms fail to train due to the out-of-memory issue. Furthermore, Cluster-GCN allows us to train much deeper GCN without much time and memory overhead, which leads to improved prediction accuracy—using a 5-layer Cluster-GCN, we achieve state-of-the-art test F1 score 99.36 on the PPI dataset, while the previous best result was 98.71 by [16]. Clustering / Cluster Analysis Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense or another) to each other than to those in other groups (clusters). It is a main task of exploratory data mining, and a common technique for statistical data analysis, used in many fields, including machine learning, pattern recognition, image analysis, information retrieval, and bioinformatics. Cluster analysis itself is not one specific algorithm, but the general task to be solved. It can be achieved by various algorithms that differ significantly in their notion of what constitutes a cluster and how to efficiently find them. Popular notions of clusters include groups with small distances among the cluster members, dense areas of the data space, intervals or particular statistical distributions. Clustering can therefore be formulated as a multi-objective optimization problem. The appropriate clustering algorithm and parameter settings (including values such as the distance function to use, a density threshold or the number of expected clusters) depend on the individual data set and intended use of the results. Cluster analysis as such is not an automatic task, but an iterative process of knowledge discovery or interactive multi-objective optimization that involves trial and failure. It will often be necessary to modify data preprocessing and model parameters until the result achieves the desired properties. Clustering Using REpresentatives(CURE) CURE (Clustering Using REpresentatives) is an efficient data clustering algorithm for large databases that is more robust to outliers and identifies clusters having non-spherical shapes and wide variances in size. Clustering Validation Indices The purpose of clustering is to determine the intrinsic grouping in a set of unlabeled data, where the objects in each group are indistinguishable under some criterion of similarity. Clustering is an unsupervised classification process fundamental to data mining (one of the most important tasks in data analysis). It has applications in several fields like bioinformatics, web data analysis, text mining and scientific data exploration. Clustering refers to unsupervised learning and, for that reason it has no a priori data set information. However, to get good results, the clustering algorithm depends on input parameters. For instance, k-means and CURE algorithms require a number of clusters (k) to be created. In this sense, the question is: What is the optimal number of clusters? Currently, cluster validity indexes research has drawn attention as a means to give a solution. Many different cluster validity methods have been proposed without any a priori class information. Clustering validation is a technique to find a set of clusters that best fits natural partitions (number of clusters) without any class information. Generally speaking, there are two types of clustering techniques, which are based on external criteria and internal criteria. · External validation: Based on previous knowledge about data. · Internal validation: Based on the information intrinsic to the data alone. If we consider these two types of cluster validation to determine the correct number of groups from a dataset, one option is to use external validation indexes for which a priori knowledge of dataset information is required, but it is hard to say if they can be used in real problems (usually, real problems do not have prior information of the dataset in question). Another option is to use internal validity indexes which do not require a priori information from dataset. Clustering With Robust Autocuts and Depth(CRAD) We develop a new density-based clustering algorithm named CRAD which is based on a new neighbor searching function with a robust data depth as the dissimilarity measure. Our experiments prove that the new CRAD is highly competitive at detecting clusters with varying densities, compared with the existing algorithms such as DBSCAN, OPTICS and DBCA. Furthermore, a new effective parameter selection procedure is developed to select the optimal underlying parameter in the real-world clustering, when the ground truth is unknown. Lastly, we suggest a new clustering framework that extends CRAD from spatial data clustering to time series clustering without a-priori knowledge of the true number of clusters. The performance of CRAD is evaluated through extensive experimental studies. Clustering-Correcting Code A new family of codes, called clustering-correcting codes, is presented in this paper. This family of codes is motivated by the special structure of data that is stored in DNA-based storage systems. The data stored in these systems has the form of unordered sequences, also called strands, and every strand is synthesized thousands to millions of times, where some of these copies are read back during sequencing. Due to the unordered structure of the strands, an important task in the decoding process is to place them in their correct order. This is usually accomplished by allocating a part of the strand for an index. However, in the presence of errors in the index field, important information on the order of the strands may be lost. Clustering-correcting codes ensure that if the distance between the index fields of two strands is small, then there will be a large distance between their data fields. It is shown how this property enables to place the strands together in their correct clusters even in the presence of errors. We present lower and upper bounds on the size of clustering-correcting codes and an explicit construction of these codes which uses only a single bit of redundancy. Clustering-Difference Graph Clustering is one of the fundamental tasks in data analytics and machine learning. In many situations, different clusterings of the same data set become relevant. For example, different algorithms for the same clustering task may return dramatically different solutions. We are interested in applications in which one clustering has to be transformed into another; e.g., when a gradual transition from an old solution to a new one is required. In this paper, we devise methods for constructing such a transition based on linear programming and network theory. We use a so-called clustering-difference graph to model the desired transformation and provide methods for decomposing the graph into a sequence of elementary moves that accomplishes the transformation. These moves are equivalent to the edge directions, or circuits, of the underlying partition polytopes. Therefore, in addition to a conceptually new metric for measuring the distance between clusterings, we provide new bounds on the circuit diameter of these partition polytopes. ClusterNet Clustering using neural networks has recently demonstrated promising performance in machine learning and computer vision applications. However, the performance of current approaches is limited either by unsupervised learning or their dependence on large set of labeled data samples. In this paper, we propose ClusterNet that uses pairwise semantic constraints from very few labeled data samples (< 5% of total data) and exploits the abundant unlabeled data to drive the clustering approach. We define a new loss function that uses pairwise semantic similarity between objects combined with constrained k-means clus- tering to efficiently utilize both labeled and unlabeled data in the same framework. The proposed network uses convolution autoencoder to learn a latent representation that groups data into k specified clusters, while also learning the cluster centers simultaneously. We evaluate and com- pare the performance of ClusterNet on several datasets and state of the art deep clustering approaches. Cluster-Weighted Latent Class Modeling Usually in Latent Class Analysis (LCA), external predictors are taken to be cluster conditional probability predictors (LC models with covariates), and/or score conditional probability predictors (LC regression models). In such cases, their distribution is not of interest. Class specific distribution is of interest in the distal outcome model, when the distribution of the external variable(s) is assumed to dependent on LC membership. In this paper, we consider a more general formulation, typical in cluster-weighted models, which embeds both the latent class regression and the distal outcome models. This allows us to test simultaneously both whether the distribution of the covariate(s) differs across classes, and whether there are significant direct effects of the covariate(s) on the indicators, by including most of the information about the covariate(s) – latent variable relationship. We show the advantages of the proposed modeling approach through a set of population studies and an empirical application on assets ownership of Italian households. Clusterwise Linear Regression(CLR) ➚ “Cluster-Wise Linear Regression” Cluster-Wise Linear Regression(CLR) Cluster-wise linear regression (CLR), a clustering problem intertwined with regression, is to find clusters of entities such that the overall sum of squared errors from regressions performed over these clusters is minimized, where each cluster may have different variances. clust-LDA Social media corpora pose unique challenges and opportunities, including typically short document lengths and rich meta-data such as author characteristics and relationships. This creates great potential for systematic analysis of the enormous body of the users and thus provides implications for industrial strategies such as targeted marketing. Here we propose a novel and statistically principled method, clust-LDA, which incorporates authorship structure into the topical modeling, thus accomplishing the task of the topical inferences across documents on the basis of authorship and, simultaneously, the identification of groupings between authors. We develop an inference procedure for clust-LDA and demonstrate its performance on simulated data, showing that clust-LDA out-performs the ‘vanilla’ LDA on the topic identification task where authors exhibit distinctive topical preference. We also showcase the empirical performance of clust-LDA based on a real-world social media dataset from Reddit. Clustrophile 2 Data clustering is a common unsupervised learning method frequently used in exploratory data analysis. However, identifying relevant structures in unlabeled, high-dimensional data is nontrivial, requiring iterative experimentation with clustering parameters as well as data features and instances. The space of possible clusterings for a typical dataset is vast, and navigating in this vast space is also challenging. The absence of ground-truth labels makes it impossible to define an optimal solution, thus requiring user judgment to establish what can be considered a satisfiable clustering result. Data scientists need adequate interactive tools to effectively explore and navigate the large space of clusterings so as to improve the effectiveness of exploratory clustering analysis. We introduce \textit{Clustrophile 2}, a new interactive tool for guided clustering analysis. \textit{Clustrophile 2} guides users in clustering-based exploratory analysis, adapts user feedback to improve user guidance, facilitates the interpretation of clusters, and helps quickly reason about differences between clusterings. To this end, \textit{Clustrophile 2} contributes a novel feature, the clustering tour, to help users choose clustering parameters and assess the quality of different clustering results in relation to current analysis goals and user expectations. We evaluate \textit{Clustrophile 2} through a user study with 12 data scientists, who used our tool to explore and interpret sub-cohorts in a dataset of Parkinson’s disease patients. Results suggest that \textit{Clustrophile 2} improves the speed and effectiveness of exploratory clustering analysis for both experts and non-experts. CM3 We propose CM3, a new deep reinforcement learning method for cooperative multi-agent problems where agents must coordinate for joint success in achieving different individual goals. We restructure multi-agent learning into a two-stage curriculum, consisting of a single-agent stage for learning to accomplish individual tasks, followed by a multi-agent stage for learning to cooperate in the presence of other agents. These two stages are bridged by modular augmentation of neural network policy and value functions. We further adapt the actor-critic framework to this curriculum by formulating local and global views of the policy gradient and learning via a double critic, consisting of a decentralized value function and a centralized action-value function. We evaluated CM3 on a new high-dimensional multi-agent environment with sparse rewards: negotiating lane changes among multiple autonomous vehicles in the Simulation of Urban Mobility (SUMO) traffic simulator. Detailed ablation experiments show the positive contribution of each component in CM3, and the overall synthesis converges significantly faster to higher performance policies than existing cooperative multi-agent methods. CMIR-NET We address the problem of cross-modal information retrieval in the domain of remote sensing. In particular, we are interested in two application scenarios: i) cross-modal retrieval between panchromatic (PAN) and multi-spectral imagery, and ii) multi-label image retrieval between very high resolution (VHR) images and speech based label annotations. Notice that these multi-modal retrieval scenarios are more challenging than the traditional uni-modal retrieval approaches given the inherent differences in distributions between the modalities. However, with the growing availability of multi-source remote sensing data and the scarcity of enough semantic annotations, the task of multi-modal retrieval has recently become extremely important. In this regard, we propose a novel deep neural network based architecture which is considered to learn a discriminative shared feature space for all the input modalities, suitable for semantically coherent information retrieval. Extensive experiments are carried out on the benchmark large-scale PAN – multi-spectral DSRSID dataset and the multi-label UC-Merced dataset. Together with the Merced dataset, we generate a corpus of speech signals corresponding to the labels. Superior performance with respect to the current state-of-the-art is observed in all the cases. CN2 Induction Algorithm The CN2 induction algorithm is a learning algorithm for rule induction. It is designed to work even when the training data is imperfect. It is based on ideas from the AQ algorithm and the ID3 algorithm. As a consequence it creates a rule set like that created by AQ but is able to handle noisy data like ID3. CNN In Convolution(CNNIC) Currently, increasingly deeper neural networks have been applied to improve their accuracy. In contrast, We propose a novel wider Convolutional Neural Networks (CNN) architecture, motivated by the Multi-column Deep Neural Networks and the Network In Network(NIN), aiming for higher accuracy without input data transmutation. In our architecture, namely ‘CNN In Convolution'(CNNIC), a small CNN, instead of the original generalized liner model(GLM) based filters, is convoluted as kernel on the original image, serving as feature extracting layer of this networks. And further classifications are then carried out by a global average pooling layer and a softmax layer. Dropout and orthonormal initialization are applied to overcome training difficulties including slow convergence and over-fitting. Persuasive classification performance is demonstrated on MNIST. CNN-biLSTM-CRF Vision algorithms capable of interpreting scenes from a real-time video stream are necessary for computer-assisted surgery systems to achieve context-aware behavior. In laparoscopic procedures one particular algorithm needed for such systems is the identification of surgical phases, for which the current state of the art is a model based on a CNN-LSTM. A number of previous works using models of this kind have trained them in a fully supervised manner, requiring a fully annotated dataset. Instead, our work confronts the problem of learning surgical phase recognition in scenarios presenting scarce amounts of annotated data (under 25% of all available video recordings). We propose a teacher/student type of approach, where a strong predictor called the teacher, trained beforehand on a small dataset of ground truth-annotated videos, generates synthetic annotations for a larger dataset, which another model – the student – learns from. In our case, the teacher features a novel CNN-biLSTM-CRF architecture, designed for offline inference only. The student, on the other hand, is a CNN-LSTM capable of making real-time predictions. Results for various amounts of manually annotated videos demonstrate the superiority of the new CNN-biLSTM-CRF predictor as well as improved performance from the CNN-LSTM trained using synthetic labels generated for unannotated videos. For both offline and online surgical phase recognition with very few annotated recordings available, this new teacher/student strategy provides a valuable performance improvement by efficiently leveraging the unannotated data. CNN-Cert Verifying robustness of neural network classifiers has attracted great interests and attention due to the success of deep neural networks and their unexpected vulnerability to adversarial perturbations. Although finding minimum adversarial distortion of neural networks (with ReLU activations) has been shown to be an NP-complete problem, obtaining a non-trivial lower bound of minimum distortion as a provable robustness guarantee is possible. However, most previous works only focused on simple fully-connected layers (multilayer perceptrons) and were limited to ReLU activations. This motivates us to propose a general and efficient framework, CNN-Cert, that is capable of certifying robustness on general convolutional neural networks. Our framework is general — we can handle various architectures including convolutional layers, max-pooling layers, batch normalization layer, residual blocks, as well as general activation functions; our approach is efficient — by exploiting the special structure of convolutional layers, we achieve up to 17 and 11 times of speed-up compared to the state-of-the-art certification algorithms (e.g. Fast-Lin, CROWN) and 366 times of speed-up compared to the dual-LP approach while our algorithm obtains similar or even better verification bounds. In addition, CNN-Cert generalizes state-of-the-art algorithms e.g. Fast-Lin and CROWN. We demonstrate by extensive experiments that our method outperforms state-of-the-art lower-bound-based certification algorithms in terms of both bound quality and speed. CoachAI Poor lifestyle represents a health risk factor and is the leading cause of morbidity and chronic conditions. The impact of poor lifestyle can be significantly altered by individual behavior change. Although the current shift in healthcare towards a long lasting modifiable behavior, however, with increasing caregiver workload and individuals’ continuous needs of care, there is a need to ease caregiver’s work while ensuring continuous interaction with users. This paper describes the design and validation of CoachAI, a conversational agent assisted health coaching system to support health intervention delivery to individuals and groups. CoachAI instantiates a text based healthcare chatbot system that bridges the remote human coach and the users. This research provides three main contributions to the preventive healthcare and healthy lifestyle promotion: (1) it presents the conversational agent to aid the caregiver; (2) it aims to decrease caregiver’s workload and enhance care given to users, by handling (automating) repetitive caregiver tasks; and (3) it presents a domain independent mobile health conversational agent for health intervention delivery. We will discuss our approach and analyze the results of a one month validation study on physical activity, healthy diet and stress management. Co-Arg This paper presents Co-Arg, a new type of cognitive assistant to an intelligence analyst that enables the synergistic integration of analyst imagination and expertise, computer knowledge and critical reasoning, and crowd wisdom, to draw defensible and persuasive conclusions from masses of evidence of all types, in a world that is changing all the time. Co-Arg’s goal is to improve the quality of the analytic results and enhance their understandability for both experts and novices. The performed analysis is based on a sound and transparent argumentation that links evidence to conclusions in a way that shows very clearly how the conclusions have been reached, what evidence was used and how, what is not known, and what assumptions have been made. The analytic results are presented in a report describes the analytic conclusion and its probability, the main favoring and disfavoring arguments, the justification of the key judgments and assumptions, and the missing information that might increase the accuracy of the solution. Coarse-to-Fine Context Memory(CFCM) Recent neural-network-based architectures for image segmentation make extensive usage of feature forwarding mechanisms to integrate information from multiple scales. Although yielding good results, even deeper architectures and alternative methods for feature fusion at different resolutions have been scarcely investigated for medical applications. In this work we propose to implement segmentation via an encoder-decoder architecture which differs from any other previously published method since (i) it employs a very deep architecture based on residual learning and (ii) combines features via a convolutional Long Short Term Memory (LSTM), instead of concatenation or summation. The intuition is that the memory mechanism implemented by LSTMs can better integrate features from different scales through a coarse-to-fine strategy; hence the name Coarse-to-Fine Context Memory (CFCM). We demonstrate the remarkable advantages of this approach on two datasets: the Montgomery county lung segmentation dataset, and the EndoVis 2015 challenge dataset for surgical instrument segmentation. Coarse-to-Fine Network(C2F Net) Deep neural networks have seen tremendous success for different modalities of data including images, videos, and speech. This success has led to their deployment in mobile and embedded systems for real-time applications. However, making repeated inferences using deep networks on embedded systems poses significant challenges due to constrained resources (e.g., energy and computing power). To address these challenges, we develop a principled co-design approach. Building on prior work, we develop a formalism referred to as Coarse-to-Fine Networks (C2F Nets) that allow us to employ classifiers of varying complexity to make predictions. We propose a principled optimization algorithm to automatically configure C2F Nets for a specified trade-off between accuracy and energy consumption for inference. The key idea is to select a classifier on-the-fly whose complexity is proportional to the hardness of the input example: simple classifiers for easy inputs and complex classifiers for hard inputs. We perform comprehensive experimental evaluation using four different C2F Net architectures on multiple real-world image classification tasks. Our results show that optimized C2F Net can reduce the Energy Delay Product (EDP) by 27 to 60 percent with no loss in accuracy when compared to the baseline solution, where all predictions are made using the most complex classifier in C2F Net. COBRA Clustering is inherently ill-posed: there often exist multiple valid clusterings of a single dataset, and without any additional information a clustering system has no way of knowing which clustering it should produce. This motivates the use of constraints in clustering, as they allow users to communicate their interests to the clustering system. Active constraint-based clustering algorithms select the most useful constraints to query, aiming to produce a good clustering using as few constraints as possible. We propose COBRA, an active method that first over-clusters the data by running K-means with a $K$ that is intended to be too large, and subsequently merges the resulting small clusters into larger ones based on pairwise constraints. In its merging step, COBRA is able to keep the number of pairwise queries low by maximally exploiting constraint transitivity and entailment. We experimentally show that COBRA outperforms the state of the art in terms of clustering quality and runtime, without requiring the number of clusters in advance. COBRAS Constraint-based clustering algorithms exploit background knowledge to construct clusterings that are aligned with the interests of a particular user. This background knowledge is often obtained by allowing the clustering system to pose pairwise queries to the user: should these two elements be in the same cluster or not? Active clustering methods aim to minimize the number of queries needed to obtain a good clustering by querying the most informative pairs first. Ideally, a user should be able to answer a couple of these queries, inspect the resulting clustering, and repeat these two steps until a satisfactory result is obtained. We present COBRAS, an approach to active clustering with pairwise constraints that is suited for such an interactive clustering process. A core concept in COBRAS is that of a super-instance: a local region in the data in which all instances are assumed to belong to the same cluster. COBRAS constructs such super-instances in a top-down manner to produce high-quality results early on in the clustering process, and keeps refining these super-instances as more pairwise queries are given to get more detailed clusterings later on. We experimentally demonstrate that COBRAS produces good clusterings at fast run times, making it an excellent candidate for the iterative clustering scenario outlined above. COBRAS-TS Clustering is ubiquitous in data analysis, including analysis of time series. It is inherently subjective: different users may prefer different clusterings for a particular dataset. Semi-supervised clustering addresses this by allowing the user to provide examples of instances that should (not) be in the same cluster. This paper studies semi-supervised clustering in the context of time series. We show that COBRAS, a state-of-the-art semi-supervised clustering method, can be adapted to this setting. We refer to this approach as COBRAS-TS. An extensive experimental evaluation supports the following claims: (1) COBRAS-TS far outperforms the current state of the art in semi-supervised clustering for time series, and thus presents a new baseline for the field; (2) COBRAS-TS can identify clusters with separated components; (3) COBRAS-TS can identify clusters that are characterized by small local patterns; (4) a small amount of semi-supervision can greatly improve clustering quality for time series; (5) the choice of the clustering algorithm matters (contrary to earlier claims in the literature). Cochran-Armitage Trend Test The Cochran-Armitage test for trend, named for William Cochran and Peter Armitage, is used in categorical data analysis when the aim is to assess for the presence of an association between a variable with two categories and an ordinal variable with k categories. It modifies the Pearson chi-squared test to incorporate a suspected ordering in the effects of the k categories of the second variable. For example, doses of a treatment can be ordered as ‘low’, ‘medium’, and ‘high’, and we may suspect that the treatment benefit cannot become smaller as the dose increases. The trend test is often used as a genotype-based test for case-control genetic association studies. CATTexact Cochran-Mantel-Haenszel Statistics In statistics, the Cochran-Mantel-Haenszel statistics are a collection of test statistics used in the analysis of stratified categorical data. They are named after William G. Cochran, Nathan Mantel and William Haenszel. One of these test statistics is the Cochran-Mantel-Haenszel (CMH) test, which allows the comparison of two groups on a dichotomous/categorical response. It is used when the effect of the explanatory variable on the response variable is influenced by covariates that can be controlled. It is often used in observational studies where random assignment of subjects to different treatments cannot be controlled, but influencing covariates can. In the CMH test, the data are arranged in a series of associated 2 × 2 contingency tables, the null hypothesis is that the observed response is independent of the treatment used in any 2 × 2 contingency table. The CMH test’s use of associated 2 × 2 contingency tables increases the ability of the test to detect associations (the power of the test is increased). sensitivity2x2xk Cochran-Mantel-Haenszel Test ➘ “Cochran-Mantel-Haenszel Statistics” samplesizeCMH Cocktail Algorithm fastcox CoCoA The scale of modern datasets necessitates the development of efficient distributed optimization methods for machine learning. We present a general-purpose framework for distributed computing environments, CoCoA, that has an efficient communication scheme and is applicable to a wide variety of problems in machine learning and signal processing. We extend the framework to cover general non-strongly-convex regularizers, including L1-regularized problems like lasso, sparse logistic regression, and elastic net regularization, and show how earlier work can be derived as a special case. We provide convergence guarantees for the class of convex regularized loss minimization objectives, leveraging a novel approach in handling non-strongly-convex regularizers and non-smooth loss functions. The resulting framework has markedly improved performance over state-of-the-art methods, as we illustrate with an extensive set of experiments on real distributed datasets. Co-Creative System Machine learning has been applied to a number of creative, design-oriented tasks. However, it remains unclear how to best empower human users with these machine learning approaches, particularly those users without technical expertise. In this paper we propose a general framework for turn-based interaction between human users and AI agents designed to support human creativity, called {co-creative systems}. The framework can be used to better understand the space of possible designs of co-creative systems and reveal future research directions. We demonstrate how to apply this framework in conjunction with a pair of recent human subject studies, comparing between the four human-AI systems employed in these studies and generating hypotheses towards future studies. code2seq The ability to generate natural language sequences from source code snippets can be used for code summarization, documentation, and retrieval. Sequence-to-sequence (seq2seq) models, adopted from neural machine translation (NMT), have achieved state-of-the-art performance on these tasks by treating source code as a sequence of tokens. We present ${\rm {\scriptsize CODE2SEQ}}$: an alternative approach that leverages the syntactic structure of programming languages to better encode source code. Our model represents a code snippet as the set of paths in its abstract syntax tree (AST) and uses attention to select the relevant paths during decoding, much like contemporary NMT models. We demonstrate the effectiveness of our approach for two tasks, two programming languages, and four datasets of up to 16M examples. Our model significantly outperforms previous models that were specifically designed for programming languages, as well as general state-of-the-art NMT models. CODED A powerful approach to detecting erroneous data is to check which potentially dirty data records are incompatible with a user’s domain knowledge. Previous approaches allow the user to specify domain knowledge in the form of logical constraints (e.g., functional dependency and denial constraints). We extend the constraint-based approach by introducing a novel class of statistical constraints (SCs). An SC treats each column as a random variable, and enforces an independence or dependence relationship between two (or a few) random variables. Statistical constraints are expressive, allowing the user to specify a wide range of domain knowledge, beyond traditional integrity constraints. Furthermore, they work harmoniously with downstream statistical modeling. We develop CODED, an SC-Oriented Data Error Detection system that supports three key tasks: (1) Checking whether an SC is violated or not on a given dataset, (2) Identify the top-k records that contribute the most to the violation of an SC, and (3) Checking whether a set of input SCs have conflicts or not. We present effective solutions for each task. Experiments on synthetic and real-world data illustrate how SCs apply to error detection, and provide evidence that CODED performs better than state-of-the-art approaches. Coded Distributed Computing A New Combinatorial Design of Coded Distributed Computing Coded Elastic Computing Cloud providers have recently introduced low-priority machines to reduce the cost of computations. Exploiting such opportunity for machine learning tasks is challenging inasmuch as low-priority machines can elastically leave (through preemption) and join the computation at any time. In this paper, we design a new technique called coded elastic computing enabling distributed machine learning computations over elastic resources. The proposed technique allows machines to transparently leave the computation without sacrificing the algorithm-level performance, and, at the same time, flexibly reduce the workload at existing machines when new machines join the computation. Thanks to the redundancy provided by encoding, our approach is able to achieve similar computational cost as the original (uncoded) method when all machines are present; the cost gracefully increases when machines are preempted and reduces when machines join. We test the performance of the proposed technique on two mini-benchmark experiments, namely elastic matrix multiplications and linear regression. Our preliminary experimental results show improvements over several existing techniques. Coded Fast Fourier Transform(Coded FFT) We consider the problem of computing the Fourier transform of high-dimensional vectors, distributedly over a cluster of machines consisting of a master node and multiple worker nodes, where the worker nodes can only store and process a fraction of the inputs. We show that by exploiting the algebraic structure of the Fourier transform operation and leveraging concepts from coding theory, one can efficiently deal with the straggler effects. In particular, we propose a computation strategy, named as coded FFT, which achieves the optimal recovery threshold, defined as the minimum number of workers that the master node needs to wait for in order to compute the output. This is the first code that achieves the optimum robustness in terms of tolerating stragglers or failures for computing Fourier transforms. Furthermore, the reconstruction process for coded FFT can be mapped to MDS decoding, which can be solved efficiently. Moreover, we extend coded FFT to settings including computing general $n$-dimensional Fourier transforms, and provide the optimal computing strategy for those settings. Coded Partial Gradient Computation(CPGC) Coded computation techniques provide robustness against straggling servers in distributed computing, with the following limitations: First, they increase decoding complexity. Second, they ignore computations carried out by straggling servers; and they are typically designed to recover the full gradient, and thus, cannot provide a balance between the accuracy of the gradient and per-iteration completion time. Here we introduce a hybrid approach, called coded partial gradient computation (CPGC), that benefits from the advantages of both coded and uncoded computation schemes, and reduces both the computation time and decoding complexity. Coded TeraSort We focus on sorting, which is the building block of many machine learning algorithms, and propose a novel distributed sorting algorithm, named Coded TeraSort, which substantially improves the execution time of the TeraSort benchmark in Hadoop MapReduce. The key idea of Coded TeraSort is to impose structured redundancy in data, in order to enable in-network coding opportunities that overcome the data shuffling bottleneck of TeraSort. We empirically evaluate the performance of CodedTeraSort algorithm on Amazon EC2 clusters, and demonstrate that it achieves 1.97x – 3.39x speedup, compared with TeraSort, for typical settings of interest. CodedPrivateML How to train a machine learning model while keeping the data private and secure? We present CodedPrivateML, a fast and scalable approach to this critical problem. CodedPrivateML keeps both the data and the model information-theoretically private, while allowing efficient parallelization of training across distributed workers. We characterize CodedPrivateML’s privacy threshold and prove its convergence for logistic (and linear) regression. Furthermore, via experiments over Amazon EC2, we demonstrate that CodedPrivateML can provide an order of magnitude speedup (up to $\sim 34\times$) over the state-of-the-art cryptographic approaches. CodedReduce(CR) We focus on the commonly used synchronous Gradient Descent paradigm for large-scale distributed learning, for which there has been a growing interest to develop efficient and robust gradient aggregation strategies that overcome two key bottlenecks: communication bandwidth and stragglers’ delays. In particular, Ring-AllReduce (RAR) design has been proposed to avoid bandwidth bottleneck at any particular node by allowing each worker to only communicate with its neighbors that are arranged in a logical ring. On the other hand, Gradient Coding (GC) has been recently proposed to mitigate stragglers in a master-worker topology by allowing carefully designed redundant allocation of the data set to the workers. We propose a joint communication topology design and data set allocation strategy, named CodedReduce (CR), that combines the best of both RAR and GC. That is, it parallelizes the communications over a tree topology leading to efficient bandwidth utilization, and carefully designs a redundant data set allocation and coding strategy at the nodes to make the proposed gradient aggregation scheme robust to stragglers. In particular, we quantify the communication parallelization gain and resiliency of the proposed CR scheme, and prove its optimality when the communication topology is a regular tree. Furthermore, we empirically evaluate the performance of our proposed CR design over Amazon EC2 and demonstrate that it achieves speedups of up to 18.9x and 7.9x, respectively over the benchmarks GC and RAR. CodedSketch In this paper, we propose CodedSketch, as a distributed straggler-resistant scheme to compute an approximation of the multiplication of two massive matrices. The objective is to reduce recovery threshold, defined as the total number of worker nodes that we need to wait for to be able to recover the final result. To exploit the fact that only an approximated result is required, in reducing the recovery threshold, some sorts of pre-compression is required. However, compression inherently involves some randomness that would lose the structure of the matrices. On the other hand, considering the structure of the matrices is crucial to reduce the recovery threshold. In CodedSketch, we use count–sketch, as a hash-based compression scheme, on the rows of the first and columns of the second matrix, and a structured polynomial code on the columns of the first and rows of the second matrix. This arrangement allows us to exploits the gain of both in reducing the recovery threshold. To increase the accuracy of computation, multiple independent count–sketches are needed. This independency allows us to theoretically characterize the accuracy of the result and establish the recovery threshold achieved by the proposed scheme. To guarantee the independency of resulting count–sketches in the output, while keeping its cost on the recovery threshold minimum, we use another layer of structured codes. CoDeepNEAT The success of deep learning depends on finding an architecture to fit the task. As deep learning has scaled up to more challenging tasks, the architectures have become difficult to design by hand. This paper proposes an automated method, CoDeepNEAT, for optimizing deep learning architectures through evolution. By extending existing neuroevolution methods to topology, components, and hyperparameters, this method achieves results comparable to best human designs in standard benchmarks in object recognition and language modeling. It also supports building a real-world application of automated image captioning on a magazine website. Given the anticipated increases in available computing power, evolution of deep networks is promising approach to constructing deep learning applications in the future. CodeGRU Recently many NLP-based deep learning models have been applied to model source code for source code suggestion and recommendation tasks. A major limitation of these approaches is that they take source code as simple tokens of text and ignore its contextual, syntaxtual and structural dependencies. In this work, we present CodeGRU, a Gated Recurrent Unit based source code language model that is capable of capturing contextual, syntaxtual and structural dependencies for modeling the source code. The CodeGRU introduces the following several new components. The Code Sampler is first proposed for selecting noise-free code samples and transforms obfuscate code to its proper syntax, which helps to capture syntaxtual and structural dependencies. The Code Regularize is next introduced to encode source code which helps capture the contextual dependencies of the source code. Finally, we propose a novel method which can learn variable size context for modeling source code. We evaluated CodeGRU with real-world dataset and it shows that CodeGRU can effectively capture contextual, syntaxtual and structural dependencies which previous works fails. We also discuss and visualize two use cases of CodeGRU for source code modeling tasks (1) source code suggestion, and (2) source code generation. Coefficient of Variation In probability theory and statistics, the coefficient of variation (CV) is a normalized measure of dispersion of a probability distribution or frequency distribution. It is defined as the ratio of the standard deviation to the mean. It is also known as unitized risk or the variation coefficient. The absolute value of the CV is sometimes known as relative standard deviation (RSD), which is expressed as a percentage. Coevo We present Coevo, an online platform that allows both humans and artificial agents to design shapes that solve different tasks. Our goal is to explore common shared design tools that can be used by humans and artificial agents in a context of creation. This approach can provide a better knowledge transfer and interaction with artificial agents since a common language of design is defined. In this paper, we outline the main components of this platform and discuss the definition of a human-centered language to enhance human-AI collaboration in co-creation scenarios. Coevolutionary Neural Population Model We present a method for using neural networks to model evolutionary population dynamics, and draw parallels to recent deep learning advancements in which adversarially-trained neural networks engage in coevolutionary interactions. We conduct experiments which demonstrate that models from evolutionary game theory are capable of describing the behavior of these neural population systems. CoffeeScript CoffeeScript is a little language that compiles into JavaScript. Underneath that awkward Java-esque patina, JavaScript has always had a gorgeous heart. CoffeeScript is an attempt to expose the good parts of JavaScript in a simple way. The golden rule of CoffeeScript is: “It’s just JavaScript”. The code compiles one-to-one into the equivalent JS, and there is no interpretation at runtime. You can use any existing JavaScript library seamlessly from CoffeeScript (and vice-versa). The compiled output is readable and pretty-printed, will work in every JavaScript runtime, and tends to run as fast or faster than the equivalent handwritten JavaScript. Cogniculture Research in Artificial Intelligence is breaking technology barriers every day. New algorithms and high performance computing are making things possible which we could only have imagined earlier. Though the enhancements in AI are making life easier for human beings day by day, there is constant fear that AI based systems will pose a threat to humanity. People in AI community have diverse set of opinions regarding the pros and cons of AI mimicking human behavior. Instead of worrying about AI advancements, we propose a novel idea of cognitive agents, including both human and machines, living together in a complex adaptive ecosystem, collaborating on human computation for producing essential social goods while promoting sustenance, survival and evolution of the agents’ life cycle. We highlight several research challenges and technology barriers in achieving this goal. We propose a governance mechanism around this ecosystem to ensure ethical behaviors of all cognitive agents. Along with a novel set of use-cases of Cogniculture, we discuss the road map ahead for this journey. Cognitive Analytics Cognitive Analytics: A hybrid of several disparate disciplines, methods, and practical technologies. Cognitive Architecture A cognitive architecture can refer to a theory about the structure of the human mind. One of the main goals of a cognitive architecture is to summarize the various results of cognitive psychology in a comprehensive computer model. However, the results need to be in a formalized form so far that they can be the basis of a computer program. The formalized models can be used to further refine a comprehensive theory of cognition, and more immediately, as a commercially usable model. Successful cognitive architectures include ACT-R (Adaptive Control of Thought, ACT), SOAR and OpenCog. Cognitive Bias Cognitive biases are tendencies to think in certain ways. Cognitive biases can lead to systematic deviations from a standard of rationality or good judgment, and are often studied in psychology and behavioral economics. Cognitive Computing Cognitive computing refers to the development of computer systems modeled after the human brain. Originally referred to as artificial intelligence, researchers began to use the modern term instead in the 1990s, to indicate that the science was designed to teach computers to think like a human mind, rather than developing an artificial system. This type of computing integrates technology and biology in an attempt to re-engineer the brain, one of the most efficient and effective computers on Earth. Cognitive computing is a way of processing data that is neither linear nor deterministic. It uses the ideas behind neuroscience and psychology to augment human reasoning with better pattern matching while determining the optimal information a person needs to make decisions. Cognitive computing is different than other forms of software. Instead of shepherding data through pre-determined pathways, it finds the previously unknown paths and patterns through the data. This is ultimately a more scalable model than relying on experts to synthesize data since there are too few experts of any sort available at any one time. Cognitive computing doesn’t try to fit data into an existing model; it looks at the data and figures out what the model is first. Cognitive Computing Cognitive Computing: Solving the Big Data Problem? Cognitive Computing Defined What is Cognitive Computing? How are Enterprises benefitting from Cognitive Technology? Cognitive Database We propose Cognitive Databases, an approach for transparently enabling Artificial Intelligence (AI) capabilities in relational databases. A novel aspect of our design is to first view the structured data source as meaningful unstructured text, and then use the text to build an unsupervised neural network model using a Natural Language Processing (NLP) technique called word embedding. This model captures the hidden inter-/intra-column relationships between database tokens of different types. For each database token, the model includes a vector that encodes contextual semantic relationships. We seamlessly integrate the word embedding model into existing SQL query infrastructure and use it to enable a new class of SQL-based analytics queries called cognitive intelligence (CI) queries. CI queries use the model vectors to enable complex queries such as semantic matching, inductive reasoning queries such as analogies, predictive queries using entities not present in a database, and, more generally, using knowledge from external sources. We demonstrate unique capabilities of Cognitive Databases using an Apache Spark based prototype to execute inductive reasoning CI queries over a multi-modal database containing text and images. We believe our first-of-a-kind system exemplifies using AI functionality to endow relational databases with capabilities that were previously very hard to realize in practice. Cognitive Representation Learner(CogRL) A cognitive model of human learning provides information about skills a learner must acquire to perform accurately in a task domain. Cognitive models of learning are not only of scientific interest, but are also valuable in adaptive online tutoring systems. A more accurate model yields more effective tutoring through better instructional decisions. Prior methods of automated cognitive model discovery have typically focused on well-structured domains, relied on student performance data or involved substantial human knowledge engineering. In this paper, we propose Cognitive Representation Learner (CogRL), a novel framework to learn accurate cognitive models in ill-structured domains with no data and little to no human knowledge engineering. Our contribution is two-fold: firstly, we show that representations learnt using CogRL can be used for accurate automatic cognitive model discovery without using any student performance data in several ill-structured domains: Rumble Blocks, Chinese Character, and Article Selection. This is especially effective and useful in domains where an accurate human-authored cognitive model is unavailable or authoring a cognitive model is difficult. Secondly, for domains where a cognitive model is available, we show that representations learned through CogRL can be used to get accurate estimates of skill difficulty and learning rate parameters without using any student performance data. These estimates are shown to highly correlate with estimates using student performance data on an Article Selection dataset. CogSciK Computational models of decisionmaking must contend with the variance of context and any number of possible decisions that a defined strategic actor can make at a given time. Relying on cognitive science theory, the authors have created an algorithm that captures the orientation of the actor towards an object and arrays the possible decisions available to that actor based on their given intersubjective orientation. This algorithm, like a traditional K-means clustering algorithm, relies on a core-periphery structure that gives the likelihood of moves as those closest to the cluster’s centroid. The result is an algorithm that enables unsupervised classification of an array of decision points belonging to an actor’s present state and deeply rooted in cognitive science theory. Cohort Analysis Cohort analysis is a subset of behavioral analytics that takes the data from a given eCommerce platform, web application, or online game and rather than looking at all users as one unit, it breaks them into related groups for analysis. These related groups, or cohorts, usually share common characteristics or experiences within a defined timespan. Cohort analysis allows a company to ‘see patterns clearly across the lifecycle of a customer (or user), rather than slicing across all customers blindly without accounting for the natural cycle that a customer undergoes.’ By seeing these patterns of time, a company can adapt and tailor its service to those specific cohorts. While cohort analysis is sometimes associated with a cohort study, they are different and should not be viewed as one in the same. Cohort analysis has come to describe specifically the analysis of cohorts in regards to big data and business analytics, while a cohort study is a more general umbrella term that describes a type of study in which data is broken down into similar groups. Coin.AI One decade ago, Bitcoin was introduced, becoming the first cryptocurrency and establishing the concept of ‘blockchain’ as a distributed ledger. As of today, there are many different implementations of cryptocurrencies working over a blockchain, with different approaches and philosofies. However, many of them share one common feature: they require proof-of-work to support the generation of blocks (mining) and, eventually, the generation of money. This proof-of-work scheme often consists on the resolution of a cryptography problem, most commonly breaking a hash value, which can only be achieved through brute-force. The main drawback of proof-of-work is that it requires ridiculously large amounts of energy which do not have any useful outcome beyond supporting the currency. In this paper, we present a theoretical proposal that introduces a proof-of-useful-work scheme to support a cryptocurrency running over a blockchain, which we named Coin.AI. In this system, the mining scheme requires training deep learning models, and a block is only mined when the performance of such model exceeds a threshold. The distributed system allows for nodes to verify the models delivered by miners in an easy way (certainly much more efficiently than the mining process itself), determining when a block is to be generated. Additionally, this paper presents a proof-of-storage scheme for rewarding users that provide storage for the deep learning models, as well as a theoretical dissertation on how the mechanics of the system could be articulated with the ultimate goal of democratizing the access to artificial intelligence. Coincidence Analysis(CNA) CNA, a Boolean method of causal analysis presented in Baumgartner (2009a). CNA is a configurationl comparative method for the identification of complex causal dependencies-in particular, causal chains and common cause structures-in configurational data. CNA is related to QCA (Ragin 2008), but contrary to the latter does not minimize sufficient and necessary conditions by means of Quine- McCluskey optimization, but based on its own custom-built optimization algorithm. The latter greatly facilitates the analysis of data featuring chainlike causal dependencies among the conditions of an ultimate outcome. http://…/infer_c.pdf http://…/baumgartner-thiem.pdf cna CoinRun In this paper, we investigate the problem of overfitting in deep reinforcement learning. Among the most common benchmarks in RL, it is customary to use the same environments for both training and testing. This practice offers relatively little insight into an agent’s ability to generalize. We address this issue by using procedurally generated environments to construct distinct training and test sets. Most notably, we introduce a new environment called CoinRun, designed as a benchmark for generalization in RL. Using CoinRun, we find that agents overfit to surprisingly large training sets. We then show that deeper convolutional architectures improve generalization, as do methods traditionally found in supervised learning, including L2 regularization, dropout, data augmentation and batch normalization. Cointegration The term cointegration was defined by Granger (1983) as a formulation of the phenomenon that nonstationary processes can have linear combinations that are stationary. It was his investigations of the relation between cointegration and error correction that brought modelling of vector autoregressions with unit roots and cointegration to the center of attention in applied and theoretical econometrics; see Engle and Granger (1987). Cointegration is a statistical property of time series variables. Cointegration has become an important property in contemporary time series analysis. Time series often have trends – either deterministic or stochastic. In a seminal paper, Charles Nelson and Charles Plosser (1982) showed that most time series have stochastic trends – these are also called unit root processes, or processes integrated of order 1-I(1). http://…/Cointegration Co-Kriging Quantile is a popular performance measure for a stochastic system to evaluate its variability and risk. To reduce the risk, selecting the actions that minimize the tail quantiles of some loss distributions is typically of interest for decision makers. When the loss distribution is observed via simulations, evaluating and optimizing its quantile functions can be challenging, especially when the simulations are expensive, as it may cost a large number of simulation runs to obtain accurate quantile estimators. In this work, we propose a multi-level metamodel (co-kriging) based algorithm to optimize quantile functions more efficiently. Utilizing non-decreasing properties of quantile functions, we first search on cheaper and informative lower quantiles which are more accurate and easier to optimize. The quantile level iteratively increases to the objective level while the search has a focus on the possible promising regions identified by the previous levels. This enables us to leverage the accurate information from the lower quantiles to find the optimums faster and improve algorithm efficiency. COLA Decentralized machine learning is a promising emerging paradigm in view of global challenges of data ownership and privacy. We consider learning of linear classification and regression models, in the setting where the training data is decentralized over many user devices, and the learning algorithm must run on-device, on an arbitrary communication network, without a central coordinator. We propose COLA, a new decentralized training algorithm with strong theoretical guarantees and superior practical performance. Our framework overcomes many limitations of existing methods, and achieves communication efficiency, scalability, elasticity as well as resilience to changes in data and participating devices. Colaboratory Colaboratory is a research tool for machine learning education and research. It’s a Jupyter notebook environment that requires no setup to use. coLaboratory Project coLaboratory Project, a new tool for data science and analysis, designed to make collaborating on data easier. coLaboratory merges successful open source products with Google technologies, enabling multiple people to collaborate directly through simultaneous access and analysis of data. This provides a big improvement over ad-hoc workflows involving emailing documents back and forth. Cold-Start Aware Attention(CSAA) ➘ “Hybrid Contextualized Sentiment Classifier” CollaboNet Background: Finding biomedical named entities is one of the most essential tasks in biomedical text mining. Recently, deep learning-based approaches have been applied to biomedical named entity recognition (BioNER) and showed promising results. However, as deep learning approaches need an abundant amount of training data, a lack of data can hinder performance. BioNER datasets are scarce resources and each dataset covers only a small subset of entity types. Furthermore, many bio entities are polysemous, which is one of the major obstacles in named entity recognition. Results: To address the lack of data and the entity type misclassification problem, we propose CollaboNet which utilizes a combination of multiple NER models. In CollaboNet, models trained on a different dataset are connected to each other so that a target model obtains information from other collaborator models to reduce false positives. Every model is an expert on their target entity type and takes turns serving as a target and a collaborator model during training time. The experimental results show that CollaboNet can be used to greatly reduce the number of false positives and misclassified entities including polysemous words. CollaboNet achieved state-of-the-art performance in terms of precision, recall and F1 score. Conclusions: We demonstrated the benefits of combining multiple models for BioNER. Our model has successfully reduced the number of misclassified entities and improved the performance by leveraging multiple datasets annotated for different entity types. Given the state-of-the-art performance of our model, we believe that CollaboNet can improve the accuracy of downstream biomedical text mining applications such as bio-entity relation extraction. Collaborative Black-box and RUle Set Hybrid(CoBRUSH) Interpretable machine learning models have received increasing interest in recent years, especially in domains where humans are involved in the decision-making process. However, the possible loss of the task performance for gaining interpretability is often inevitable. This performance downgrade puts practitioners in a dilemma of choosing between a top-performing black-box model with no explanations and an interpretable model with unsatisfying task performance. In this work, we propose a novel framework for building a Hybrid Decision Model that integrates an interpretable model with any black-box model to introduce explanations in the decision making process while preserving or possibly improving the predictive accuracy. We propose a novel metric, explainability, to measure the percentage of data that are sent to the interpretable model for decision. We also design a principled objective function that considers predictive accuracy, model interpretability, and data explainability. Under this framework, we develop Collaborative Black-box and RUle Set Hybrid (CoBRUSH) model that combines logic rules and any black-box model into a joint decision model. An input instance is first sent to the rules for decision. If a rule is satisfied, a decision will be directly generated. Otherwise, the black-box model is activated to decide on the instance. To train a hybrid model, we design an efficient search algorithm that exploits theoretically grounded strategies to reduce computation. Experiments show that CoBRUSH models are able to achieve same or better accuracy than their black-box collaborator working alone while gaining explainability. They also have smaller model complexity than interpretable baselines. Collaborative Compressive Sensing We propose a collaborative compressive sensing (CCS) framework consisting of a bank of $K$ compressive sensing (CS) systems that share the same sensing matrix but have different sparsifying dictionaries. This CCS system is guaranteed to yield better performance than each individual CS system in a statistical sense, while with the parallel computing strategy, it requires the same time as that needed for each individual CS system to conduct compression and signal recovery. We then provide an approach to designing optimal CCS systems by utilizing a measure that involves both the sensing matrix and dictionaries and hence allows us to simultaneously optimize the sensing matrix and all the $K$ dictionaries under the same scheme. An alternating minimization-based algorithm is derived for solving the corresponding optimal design problem. We provide a rigorous convergence analysis to show that the proposed algorithm is convergent. Experiments with real images are carried out and show that the proposed CCS system significantly improves on existing CS systems in terms of the signal recovery accuracy. Collaborative Convolutional Network(CoCoNet) We present an end-to-end CNN architecture for fine-grained visual recognition called Collaborative Convolutional Network (CoCoNet). The network uses a collaborative filter after the convolutional layers to represent an image as an optimal weighted collaboration of features learned from training samples as a whole rather than one at a time. This gives CoCoNet more power to encode the fine-grained nature of the data with limited samples in an end-to-end fashion. We perform a detailed study of the performance with 1-stage and 2-stage transfer learning and different configurations with benchmark architectures like AlexNet and VggNet. The ablation study shows that the proposed method outperforms its constituent parts considerably and consistently. CoCoNet also outperforms the baseline popular deep learning based fine-grained recognition method, namely Bilinear-CNN (BCNN) with statistical significance. Experiments have been performed on the fine-grained species recognition problem, but the method is general enough to be applied to other similar tasks. Lastly, we also introduce a new public dataset for fine-grained species recognition, that of Indian endemic birds and have reported initial results on it. The training metadata and new dataset are available through the corresponding author. Collaborative Cross Network(CoNet) The cross-domain recommendation technique is an effective way of alleviating the data sparsity in recommender systems by leveraging the knowledge from relevant domains. Transfer learning is a class of algorithms underlying these techniques. In this paper, we propose a novel transfer learning approach for cross-domain recommendation by using neural networks as the base model. We assume that hidden layers in two base networks are connected by cross mappings, leading to the collaborative cross networks (CoNet). CoNet enables dual knowledge transfer across domains by introducing cross connections from one base network to another and vice versa. CoNet is achieved in multi-layer feedforward networks by adding dual connections and joint loss functions, which can be trained efficiently by back-propagation. The proposed model is evaluated on two real-world datasets and it outperforms baseline models by relative improvements of 3.56\% in MRR and 8.94\% in NDCG, respectively. Collaborative Deep Learning(CDL) Collaborative filtering (CF) is a successful approach commonly used by many recommender systems. Conventional CF-based methods use the ratings given to items by users as the sole source of information for learning to make recommendation. However, the ratings are often very sparse in many applications, causing CF-based methods to degrade significantly in their recommendation performance. To address this sparsity problem, auxiliary information such as item content information may be utilized. Collaborative topic regression (CTR) is an appealing recent method taking this approach which tightly couples the two components that learn from two different sources of information. Nevertheless, the latent representation learned by CTR may not be very effective when the auxiliary information is very sparse. To address this problem, we generalize recent advances in deep learning from i.i.d. input to non-i.i.d. (CF-based) input and propose in this paper a hierarchical Bayesian model called collaborative deep learning (CDL), which jointly performs deep representation learning for the content information and collaborative filtering for the ratings (feedback) matrix. Extensive experiments on three real-world datasets from different domains show that CDL can significantly advance the state of the art. GitXiv Collaborative Deep Reinforcement Learning(CDRL) Besides independent learning, human learning process is highly improved by summarizing what has been learned, communicating it with peers, and subsequently fusing knowledge from different sources to assist the current learning goal. This collaborative learning procedure ensures that the knowledge is shared, continuously refined, and concluded from different perspectives to construct a more profound understanding. The idea of knowledge transfer has led to many advances in machine learning and data mining, but significant challenges remain, especially when it comes to reinforcement learning, heterogeneous model structures, and different learning tasks. Motivated by human collaborative learning, in this paper we propose a collaborative deep reinforcement learning (CDRL) framework that performs adaptive knowledge transfer among heterogeneous learning agents. Specifically, the proposed CDRL conducts a novel deep knowledge distillation method to address the heterogeneity among different learning tasks with a deep alignment network. Furthermore, we present an efficient collaborative Asynchronous Advantage Actor-Critic (cA3C) algorithm to incorporate deep knowledge distillation into the online training of agents, and demonstrate the effectiveness of the CDRL framework using extensive empirical evaluation on OpenAI gym. Collaborative Distribution Alignment(CDA) In conventional domain adaptation, a critical assumption is that there exists a fully labeled domain (source) that contains the same label space as another unlabeled or scarcely labeled domain (target). However, in the real world, there often exist application scenarios in which both domains are partially labeled and not all classes are shared between these two domains. Thus, it is meaningful to let partially labeled domains learn from each other to classify all the unlabeled samples in each domain under an open-set setting. We consider this problem as weakly supervised open-set domain adaptation. To address this practical setting, we propose the Collaborative Distribution Alignment (CDA) method, which performs knowledge transfer bilaterally and works collaboratively to classify unlabeled data and identify outlier samples. Extensive experiments on the Office benchmark and an application on person reidentification show that our method achieves state-of-the-art performance. Collaborative Evolutionary Reinforcement Learning(CERL) Deep reinforcement learning algorithms have been successfully applied to a range of challenging control tasks. However, these methods typically struggle with achieving effective exploration and are extremely sensitive to the choice of hyperparameters. One reason is that most approaches use a noisy version of their operating policy to explore – thereby limiting the range of exploration. In this paper, we introduce Collaborative Evolutionary Reinforcement Learning (CERL), a scalable framework that comprises a portfolio of policies that simultaneously explore and exploit diverse regions of the solution space. A collection of learners – typically proven algorithms like TD3 – optimize over varying time-horizons leading to this diverse portfolio. All learners contribute to and use a shared replay buffer to achieve greater sample efficiency. Computational resources are dynamically distributed to favor the best learners as a form of online algorithm selection. Neuroevolution binds this entire process to generate a single emergent learner that exceeds the capabilities of any individual learner. Experiments in a range of continuous control benchmarks demonstrate that the emergent learner significantly outperforms its composite learners while remaining overall more sample-efficient – notably solving the Mujoco Humanoid benchmark where all of its composite learners (TD3) fail entirely in isolation. Collaborative Filtering(CF) Collaborative filtering (CF) is a technique used by some recommender systems. Collaborative filtering has two senses, a narrow one and a more general one. In general, collaborative filtering is the process of filtering for information or patterns using techniques involving collaboration among multiple agents, viewpoints, data sources, etc. In the newer, narrower sense, collaborative filtering is a method of making automatic predictions (filtering) about the interests of a user by collecting preferences or taste information from many users (collaborating). (also called “people-to-people correlation”) Collaborative Filtering – Neural Autoregressive Distribution Estimator(CF-NADE) This paper proposes CF-NADE, a neural autoregressive architecture for collaborative filtering (CF) tasks, which is inspired by the Restricted Boltzmann Machine (RBM) based CF model and the Neural Autoregressive Distribution Estimator (NADE). We first describe the basic CF-NADE model for CF tasks. Then we propose to improve the model by sharing parameters between different ratings. A factored version of CF-NADE is also proposed for better scalability. Furthermore, we take the ordinal nature of the preferences into consideration and propose an ordinal cost to optimize CF-NADE, which shows superior performance. Finally, CF-NADE can be extended to a deep model, with only moderately increased computational complexity. Experimental results show that CF-NADE with a single hidden layer beats all previous state-of-the-art methods on MovieLens 1M, MovieLens 10M, and Netflix datasets, and adding more hidden layers can further improve the performance. Collaborative Filtering with User-Item Co-Autoregressive Models(CF-UIcA) Besides the success on object recognition, machine translation and system control in games, (deep) neural networks have achieved state-of-the-art results in collaborative filtering (CF) recently. Previous neural approaches for CF are either user-based or item-based, which cannot leverage all relevant information explicitly. We propose CF-UIcA, a neural co-autoregressive model for CF tasks, which exploit the structural autoregressiveness in the domains of both users and items. Furthermore, we separate the inherent dependence in this structure under a natural assumption and develop an efficient stochastic learning algorithm to handle large scale datasets. We evaluate CF-UIcA on two popular benchmarks: MovieLens 1M and Netflix, and achieve state-of-the-art predictive performance, which demonstrates the effectiveness of CF-UIcA. Collaborative GAN Unlike a conventional background inpainting approach that infers a missing area from image patches similar to the background, face completion requires semantic knowledge about the target object for realistic outputs. Current image inpainting approaches utilize generative adversarial networks (GANs) to achieve such semantic understanding. However, in adversarial learning, the semantic knowledge is learned implicitly and hence good semantic understanding is not always guaranteed. In this work, we propose a collaborative adversarial learning approach to face completion to explicitly induce the training process. Our method is formulated under a novel generative framework called collaborative GAN (collaGAN), which allows better semantic understanding of a target object through collaborative learning of multiple tasks including face completion, landmark detection, and semantic segmentation. Together with the collaGAN, we also introduce an inpainting concentrated scheme such that the model emphasizes more on inpainting instead of autoencoding. Extensive experiments show that the proposed designs are indeed effective and collaborative adversarial learning provides better feature representations of the faces. In comparison with other generative image inpainting models and single task learning methods, our solution produces superior performances on all tasks. Collaborative GAN Sampling Generative adversarial networks (GANs) have shown great promise in generating complex data such as images. A standard practice in GANs is to discard the discriminator after training and use only the generator for sampling. However, this loses valuable information of real data distribution learned by the discriminator. In this work, we propose a collaborative sampling scheme between the generator and discriminator for improved data generation. Guided by the discriminator, our approach refines generated samples through gradient-based optimization, shifting the generator distribution closer to the real data distribution. Additionally, we present a practical discriminator shaping method that can further improve the sample refinement process. Orthogonal to existing GAN variants, our proposed method offers a new degree of freedom in GAN sampling. We demonstrate its efficacy through experiments on synthetic data and image generation tasks. Collaborative Generative Adversarial Network(CollaGAN) In many applications requiring multiple inputs to obtain a desired output, if any of the input data is missing, it often introduces large amounts of bias. Although many techniques have been developed for imputing missing data, the image imputation is still difficult due to complicated nature of natural images. To address this problem, here we proposed a novel framework for missing image data imputation, called Collaborative Generative Adversarial Network (CollaGAN). CollaGAN converts an image imputation problem to a multi-domain images-to-image translation task so that a single generator and discriminator network can successfully estimate the missing data using the remaining clean data set. We demonstrate that CollaGAN produces the images with a higher visual quality compared to the existing competing approaches in various image imputation tasks. Collaborative Human-AI(CHAI) Automated dermoscopic image analysis has witnessed rapid growth in diagnostic performance. Yet adoption faces resistance, in part, because no evidence is provided to support decisions. In this work, an approach for evidence-based classification is presented. A feature embedding is learned with CNNs, triplet-loss, and global average pooling, and used to classify via kNN search. Evidence is provided as both the discovered neighbors, as well as localized image regions most relevant to measuring distance between query and neighbors. To ensure that results are relevant in terms of both label accuracy and human visual similarity for any skill level, a novel hierarchical triplet logic is implemented to jointly learn an embedding according to disease labels and non-expert similarity. Results are improved over baselines trained on disease labels alone, as well as standard multiclass loss. Quantitative relevance of results, according to non-expert similarity, as well as localized image regions, are also significantly improved. Collaborative Learning We introduce collaborative learning in which multiple classifier heads of the same network are simultaneously trained on the same training data to improve generalization and robustness to label noise with no extra inference cost. It acquires the strengths from auxiliary training, multi-task learning and knowledge distillation. There are two important mechanisms involved in collaborative learning. First, the consensus of multiple views from different classifier heads on the same example provides supplementary information as well as regularization to each classifier, thereby improving generalization. Second, intermediate-level representation (ILR) sharing with backpropagation rescaling aggregates the gradient flows from all heads, which not only reduces training computational complexity, but also facilitates supervision to the shared layers. The empirical results on CIFAR and ImageNet datasets demonstrate that deep neural networks learned as a group in a collaborative way significantly reduce the generalization error and increase the robustness to label noise. Collaborative Memory Network(CMN) Recommendation systems play a vital role to keep users engaged with personalized content in modern online platforms. Deep learning has revolutionized many research fields and there is a recent surge of interest in applying it to collaborative filtering (CF). However, existing methods compose deep learning architectures with the latent factor model ignoring a major class of CF models, neighborhood or memory-based approaches. We propose Collaborative Memory Networks (CMN), a deep architecture to unify the two classes of CF models capitalizing on the strengths of the global structure of latent factor model and local neighborhood-based structure in a nonlinear fashion. Motivated by the success of Memory Networks, we fuse a memory component and neural attention mechanism as the neighborhood component. The associative addressing scheme with the user and item memories in the memory module encodes complex user-item relations coupled with the neural attention mechanism to learn a user-item specific neighborhood. Finally, the output module jointly exploits the neighborhood with the user and item memories to produce the ranking score. Stacking multiple memory modules together yield deeper architectures capturing increasingly complex user-item relations. Furthermore, we show strong connections between CMN components, memory networks and the three classes of CF models. Comprehensive experimental results demonstrate the effectiveness of CMN on three public datasets outperforming competitive baselines. Qualitative visualization of the attention weights provide insight into the model’s recommendation process and suggest the presence of higher order interactions. Collaborative Neural Network Group(CNNG) For the present engineering of neural systems, the preparing of extensive scale learning undertakings generally not just requires a huge neural system with a mind boggling preparing process yet additionally troublesome discover a clarification for genuine applications. In this paper, we might want to present the Collaborative Neural Network Group (CNNG). CNNG is a progression of neural systems that work cooperatively to deal with various errands independently in a similar learning framework. It is advanced from a solitary neural system by reflection. Along these lines, in light of various circumstances removed by the calculation, the CNNG can perform diverse techniques when handling the information. The examples of chose methodology can be seen by human to make profound adapting more reasonable. In our execution, the CNNG is joined by a few moderately little neural systems. We give a progression of examinations to assess the execution of CNNG contrasted with other learning strategies. The CNNG is able to get a higher accuracy with a much lower training cost. We can reduce the error rate by 74.5% and reached the accuracy of 99.45% in MNIST with three feedforward networks (4 layers) in one training epoch. Collage Inference MLaaS (ML-as-a-Service) offerings by cloud computing platforms are becoming increasingly popular these days. Pre-trained machine learning models are deployed on the cloud to support prediction based applications and services. For achieving higher throughput, incoming requests are served by running multiple replicas of the model on different machines concurrently. Incidence of straggler nodes in distributed inference is a significant concern since it can increase inference latency, violate SLOs of the service. In this paper, we propose a novel coded inference model to deal with stragglers in distributed image classification. We propose modified single shot object detection models, Collage-CNN models, to provide necessary resilience efficiently. A Collage-CNN model takes collage images formed by combining multiple images as its input and performs multi-image classification in one shot. We generate custom training collages using images from standard image classification datasets and train the model to achieve high classification accuracy. Deploying the Collage-CNN models in the cloud, we demonstrate that the 99th percentile latency can be reduced by 1.45X to 2.46X compared to replication based approaches and without compromising prediction accuracy. Collage-CNN MLaaS (ML-as-a-Service) offerings by cloud computing platforms are becoming increasingly popular these days. Pre-trained machine learning models are deployed on the cloud to support prediction based applications and services. For achieving higher throughput, incoming requests are served by running multiple replicas of the model on different machines concurrently. Incidence of straggler nodes in distributed inference is a significant concern since it can increase inference latency, violate SLOs of the service. In this paper, we propose a novel coded inference model to deal with stragglers in distributed image classification. We propose modified single shot object detection models, Collage-CNN models, to provide necessary resilience efficiently. A Collage-CNN model takes collage images formed by combining multiple images as its input and performs multi-image classification in one shot. We generate custom training collages using images from standard image classification datasets and train the model to achieve high classification accuracy. Deploying the Collage-CNN models in the cloud, we demonstrate that the 99th percentile latency can be reduced by 1.45X to 2.46X compared to replication based approaches and without compromising prediction accuracy. Collective Adaptive Resource-sharing Markovian Agents(CARMA) In this paper we present CARMA, a language recently defined to support specification and analysis of collective adaptive systems. CARMA is a stochastic process algebra equipped with linguistic constructs specifically developed for modelling and programming systems that can operate in open-ended and unpredictable environments. This class of systems is typically composed of a huge number of interacting agents that dynamically adjust and combine their behaviour to achieve specific goals. A CARMA model, termed a collective, consists of a set of components, each of which exhibits a set of attributes. To model dynamic aggregations, which are sometimes referred to as ensembles, CARMA provides communication primitives that are based on predicates over the exhibited attributes. These predicates are used to select the participants in a communication. Two communication mechanisms are provided in the CARMA language: multicast-based and unicast-based. Collective And Point Anomalies(CAPA) The challenge of efficiently identifying anomalies in data sequences is an important statistical problem that now arises in many applications. Whilst there has been substantial work aimed at making statistical analyses robust to outliers, or point anomalies, there has been much less work on detecting anomalous segments, or collective anomalies. By bringing together ideas from changepoint detection and robust statistics, we introduce Collective And Point Anomalies (CAPA), a computationally efficient approach that is suitable when collective anomalies are characterised by either a change in mean, variance, or both, and distinguishes them from point anomalies. Theoretical results establish the consistency of CAPA at detecting collective anomalies and empirical results show that CAPA has close to linear computational cost as well as being more accurate at detecting and locating collective anomalies than other approaches. We demonstrate the utility of CAPA through its ability to detect exoplanets from light curve data from the Kepler telescope. Collective Intelligence(COIN) Collective Intelligence is shared or group intelligence that emerges from the collaboration, collective efforts, and competition of many individuals and appears in consensus decision making. The term appears in sociobiology, political science and in context of mass peer review and crowdsourcing applications. It may involve consensus, social capital and formalisms such as voting systems, social media and other means of quantifying mass activity. Collective IQ is a measure of collective intelligence, although it is often used interchangeably with the term collective intelligence. (‘Building new conclusions from independent contributors is really what collective intelligence is all about.’) ➘ “Probability Collectives” http://…/0251.pdf Collocation In corpus linguistics, a collocation is a sequence of words or terms that co-occur more often than would be expected by chance. In phraseology, collocation is a sub-type of phraseme. An example of a phraseological collocation, as propounded by Michael Halliday, is the expression strong tea. While the same meaning could be conveyed by the roughly equivalent *powerful tea, this expression is considered incorrect by English speakers. Conversely, the corresponding expression for computer, powerful computers is preferred over *strong computers. Phraseological collocations should not be confused with idioms, where meaning is derived, whereas collocations are mostly compositional. There are about six main types of collocations: adjective+noun, noun+noun (such as collective nouns), verb+noun, adverb+adjective, verbs+prepositional phrase (phrasal verbs), and verb+adverb. Collocation extraction is a task that extracts collocations automatically from a corpus, using computational linguistics. Colors of Noise In audio engineering, electronics, physics, and many other fields, the color of noise refers to the power spectrum of a noise signal (a signal produced by a stochastic process). Different colors of noise have significantly different properties: for example, as audio signals they will sound different to human ears, and as images they will have a visibly different texture. Therefore, each application typically requires noise of a specific color. This sense of ‘color’ for noise signals is similar to the concept of timbre in music (which is also called ‘tone color’); however the latter is almost always used for sound, and may consider very detailed features of the spectrum. The practice of naming kinds of noise after colors started with white noise, a signal whose spectrum has equal power within any equal interval of frequencies. That name was given by analogy with white light, which was (incorrectly) assumed to have such a flat power spectrum over the visible range. Other color names, like pink, red, and blue were then given to noise with other spectral profiles, often (but not always) in reference to the color of light with similar spectra. Some of those names have standard definitions in certain disciplines, while others are very informal and poorly defined. Many of these definitions assume a signal with components at all frequencies, with a power spectral density per unit of bandwidth proportional to 1/f ß and hence they are examples of power-law noise. For instance, the spectral density of white noise is flat (ß = 0), while flicker or pink noise has ß = 1, and Brownian noise has ß = 2. Column Subset Selection Problem(CSSP) Dimensionality reduction is a first step of many machine learning pipelines. Two popular approaches are principal component analysis, which projects onto a small number of well chosen but non-interpretable directions, and feature selection, which selects a small number of the original features. Feature selection can be abstracted as a numerical linear algebra problem called the column subset selection problem (CSSP). CSSP corresponds to selecting the best subset of columns of a matrix $X \in \mathbb{R}^{N \times d}$, where \emph{best} is often meant in the sense of minimizing the approximation error, i.e., the norm of the residual after projection of $X$ onto the space spanned by the selected columns. Such an optimization over subsets of ${1,\dots,d}$ is usually impractical. One workaround that has been vastly explored is to resort to polynomial-cost, random subset selection algorithms that favor small values of this approximation error. We propose such a randomized algorithm, based on sampling from a projection determinantal point process (DPP), a repulsive distribution over a fixed number $k$ of indices ${1,\dots,d}$ that favors diversity among the selected columns. We give bounds on the ratio of the expected approximation error for this DPP over the optimal error of PCA. These bounds improve over the state-of-the-art bounds of \emph{volume sampling} when some realistic structural assumptions are satisfied for $X$. Numerical experiments suggest that our bounds are tight, and that our algorithms have comparable performance with the \emph{double phase} algorithm, often considered to be the practical state-of-the-art. Column subset selection with DPPs thus inherits the best of both worlds: good empirical performance and tight error bounds. Column2Vec We present Column2Vec, a distributed representation of database columns based on column metadata. Our distributed representation has several applications. Using known names for groups of columns (i.e., a table name), we train a model to generate an appropriate name for columns in an unnamed table. We demonstrate the viability of our approach using schema information collected from open source applications on GitHub. Column-oriented DBMS A column-oriented DBMS is a database management system (DBMS) that stores data tables as sections of columns of data rather than as rows of data. In comparison, most relational DBMSs store data in rows. This column-oriented DBMS has advantages for data warehouses, customer relationship management (CRM) systems, and library card catalogs, and other ad hoc inquiry systems where aggregates are computed over large numbers of similar data items. It is possible to achieve some of the benefits of column-oriented and row-oriented organization with any DBMSs. Denoting one as column-oriented refers to both the ease of expression of a column-oriented structure and the focus on optimizations for column-oriented workloads. This approach is in contrast to row-oriented or row store databases and with correlation databases, which use a value-based storage structure. Combination of Peaks of Energy(COPE) Sound analysis research has mainly been focused on speech and music processing. The deployed methodologies are not suitable for analysis of sounds with varying background noise, in many cases with very low signal-to-noise ratio (SNR). In this paper, we present a method for the detection of patterns of interest in audio signals. We propose novel trainable feature extractors, which we call COPE (Combination of Peaks of Energy). The structure of a COPE feature extractor is determined using a single prototype sound pattern in an automatic configuration process, which is a type of representation learning. We construct a set of COPE feature extractors, configured on a number of training patterns. Then we take their responses to build feature vectors that we use in combination with a classifier to detect and classify patterns of interest in audio signals. We carried out experiments on four public data sets: MIVIA audio events, MIVIA road events, ESC-10 and TU Dortmund data sets. The results that we achieved (recognition rate equal to 91.71% on the MIVIA audio events, 94% on the MIVIA road events, 81.25% on the ESC-10 and 94.27% on the TU Dortmund) demonstrate the effectiveness of the proposed method and are higher than the ones obtained by other existing approaches. The COPE feature extractors have high robustness to variations of SNR. Real-time performance is achieved even when the value of a large number of features is computed. Combinations of Mutually Exclusive Alterations(CoMEt) Cancer is a heterogeneous disease with different combinations of genetic and epigenetic alterations driving the development of cancer in different individuals. While these alterations are believed to converge on genes in key cellular signaling and regulatory pathways, our knowledge of these pathways remains incomplete, making it difficult to identify driver alterations by their recurrence across genes or known pathways. We introduce Combinations of Mutually Exclusive Alterations (CoMEt), an algorithm to identify combinations of alterations de novo, without any prior biological knowledge (e.g. pathways or protein interactions). CoMEt searches for combinations of mutations that exhibit mutual exclusivity, a pattern expected for mutations in pathways. CoMEt has several important feature that distinguish it from existing approaches to analyze mutual exclusivity among alterations. These include: an exact statistical test for mutual exclusivity that is more sensitive in detecting combinations containing rare alterations; simultaneous identification of collections of one or more combinations of mutually exclusive alterations; simultaneous analysis of subtype-specific mutations; and summarization over an ensemble of collections of mutually exclusive alterations. These features enable CoMEt to robustly identify alterations affecting multiple pathways, or hallmarks of cancer. Combinatorial Generalization Combinatorial generalization – the ability to understand and produce novel combinations of already familiar elements – is considered to be a core capacity of the human mind and a major challenge to neural network models. A significant body of research suggests that conventional neural networks can’t solve this problem unless they are endowed with mechanisms specifically engineered for the purpose of representing symbols. Combinatorial Optimization In applied mathematics and theoretical computer science, combinatorial optimization is a topic that consists of finding an optimal object from a finite set of objects. In many such problems, exhaustive search is not feasible. It operates on the domain of those optimization problems, in which the set of feasible solutions is discrete or can be reduced to discrete, and in which the goal is to find the best solution. Some common problems involving combinatorial optimization are the traveling salesman problem (“TSP”) and the minimum spanning tree problem (“MST”). Combined Series Approach(CSA) STMotif Comet.ml Comet allows you to track, compare and collaborate on Machine Learning experiments. Use Comet.ml if you need a tool that: · Allows for hyper parameters, metrics, code, stdout tracking · Supports Keras, Tensorflow, PyTorch, scikit-learn out of the box and other libraries with the manual API. · Runs seamlessly on every machine including your laptop, AWS, Azure or company owned machines Common Cause Principle(CCP) It seems that a correlation between events A and B indicates either that A causes B, or that B causes A, or that A and B have a common cause. It also seems that causes always occur before their effects and, thus, that common causes always occur before the correlated events. Reichenbach was the first to formalize this idea rather precisely. Common Subspace Learning(CoSpace) With a large amount of open satellite multispectral imagery (e.g., Sentinel-2 and Landsat-8), considerable attention has been paid to global multispectral land cover classification. However, its limited spectral information hinders further improving the classification performance. Hyperspectral imaging enables discrimination between spectrally similar classes but its swath width from space is narrow compared to multispectral ones. To achieve accurate land cover classification over a large coverage, we propose a cross-modality feature learning framework, called common subspace learning (CoSpace), by jointly considering subspace learning and supervised classification. By locally aligning the manifold structure of the two modalities, CoSpace linearly learns a shared latent subspace from hyperspectral-multispectral(HS-MS) correspondences. The multispectral out-of-samples can be then projected into the subspace, which are expected to take advantages of rich spectral information of the corresponding hyperspectral data used for learning, and thus leads to a better classification. Extensive experiments on two simulated HSMS datasets (University of Houston and Chikusei), where HS-MS data sets have trade-offs between coverage and spectral resolution, are performed to demonstrate the superiority and effectiveness of the proposed method in comparison with previous state-of-the-art methods. Communication In Focus(CIF) A NLU technology that is based on a novel approach that does not require the pre-definition of these terms. Rather, the design uses Context Discriminants to digest these new subjects based on prior understanding of the based language. Context Discriminant reduces complex documents into snippets of words of semantic neighbors consisting of context and points of views on subjects. Higher order derivatives are achieved by applying CD to the result produced by the prior CD. This approach enables us to refine the contexts on related subjects over distant semantic neighbors and to discover higher order dependent subjects that depict entity relationships between subjects. Community Detection Communities are often defined in terms of the partition of the set of vertices, that is each node is put into one and only one community. This is a useful simplification and most community detection methods find this type of community structure. However in some cases a better representation could be one where vertices are in more than one community. This might happen in a social network where each vertex represents a person, and the communities represent the different groups of friends: one community for family, another community for co-workers, one for friends in the same sports club, and so on. The use of cliques for community detection discussed below is just one example of how such overlapping community structure can be found. ➘ “Complex Network” Community detection algorithms: a comparative analysis A Comparison of Community Detection Algorithms on Artificial Networks Community Question Answering Summarization Corpora(CQASUMM) Community Question Answering forums such as Quora, Stackoverflow are rich knowledge resources, often catering to information on topics overlooked by major search engines. Answers submitted to these forums are often elaborated, contain spam, are marred by slurs and business promotions. It is difficult for a reader to go through numerous such answers to gauge community opinion. As a result summarization becomes a prioritized task for CQA forums. While a number of efforts have been made to summarize factoid CQA, little work exists in summarizing non-factoid CQA. We believe this is due to the lack of a considerably large, annotated dataset for CQA summarization. We create CQASUMM, the first huge annotated CQA summarization dataset by filtering the 4.4 million Yahoo! Answers L6 dataset. We sample threads where the best answer can double up as a reference summary and build hundred word summaries from them. We treat other answers as candidates documents for summarization. We provide a script to generate the dataset and introduce the new task of Community Question Answering Summarization. Multi document summarization has been widely studied with news article datasets, especially in the DUC and TAC challenges using news corpora. However documents in CQA have higher variance, contradicting opinion and lesser amount of overlap. We compare the popular multi document summarization techniques and evaluate their performance on our CQA corpora. We look into the state-of-the-art and understand the cases where existing multi document summarizers (MDS) fail. We find that most MDS workflows are built for the entirely factual news corpora, whereas our corpus has a fair share of opinion based instances too. We therefore introduce OpinioSumm, a new MDS which outperforms the best baseline by 4.6% w.r.t ROUGE-1 score. Community Trees We introduce the concept of community trees that summarizes topological structures within a network. A community tree is a tree structure representing clique communities from the clique percolation method (CPM). The community tree also generates a persistent diagram. Community trees and persistent diagrams reveal topological structures of the underlying networks and can be used as visualization tools. We study the stability of community trees and derive a quantity called the total star number (TSN) that presents an upper bound on the change of community trees. Our findings provide a topological interpretation for the stability of communities generated by the CPM. CommunityGAN Community detection refers to the task of discovering groups of vertices sharing similar properties or functions so as to understand the network data. With the recent development of deep learning, graph representation learning techniques are also utilized for community detection. However, the communities can only be inferred by applying clustering algorithms based on learned vertex embeddings. These general cluster algorithms like K-means and Gaussian Mixture Model cannot output much overlapped communities, which have been proved to be very common in many real-world networks. In this paper, we propose CommunityGAN, a novel community detection framework that jointly solves overlapping community detection and graph representation learning. First, unlike the embedding of conventional graph representation learning algorithms where the vector entry values have no specific meanings, the embedding of CommunityGAN indicates the membership strength of vertices to communities. Second, a specifically designed Generative Adversarial Net (GAN) is adopted to optimize such embedding. Through the minimax competition between the motif-level generator and discriminator, both of them can alternatively and iteratively boost their performance and finally output a better community structure. Extensive experiments on synthetic data and real-world tasks demonstrate that CommunityGAN achieves substantial community detection performance gains over the state-of-the-art methods. Compact Description In critical applications of anomaly detection including computer security and fraud prevention, the anomaly detector must be configurable by the analyst to minimize the effort on false positives. One important way to configure the anomaly detector is by providing true labels for a few instances. We study the problem of label-efficient active learning to automatically tune anomaly detection ensembles and make four main contributions. First, we present an important insight into how anomaly detector ensembles are naturally suited for active learning. This insight allows us to relate the greedy querying strategy to uncertainty sampling, with implications for label-efficiency. Second, we present a novel formalism called compact description to describe the discovered anomalies and show that it can also be employed to improve the diversity of the instances presented to the analyst without loss in the anomaly discovery rate. Third, we present a novel data drift detection algorithm that not only detects the drift robustly, but also allows us to take corrective actions to adapt the detector in a principled manner. Fourth, we present extensive experiments to evaluate our insights and algorithms in both batch and streaming settings. Our results show that in addition to discovering significantly more anomalies than state-of-the-art unsupervised baselines, our active learning algorithms under the streaming-data setup are competitive with the batch setup. COMpact Image Captioning(COMIC) Recent works in image captioning have shown very promising raw performance. However, we realize that most of these encoder-decoder style networks with attention do not scale naturally to large vocabulary size, making them difficult to be deployed on embedded system with limited hardware resources. This is because the size of word and output embedding matrices grow proportionally with the size of vocabulary, adversely affecting the compactness of these networks. To address this limitation, this paper introduces a brand new idea in the domain of image captioning. That is, we tackle the problem of compactness of image captioning models which is hitherto unexplored. We showed that, our proposed model, named COMIC for COMpact Image Captioning, achieves comparable results in five common evaluation metrics with state-of-the-art approaches on both MS-COCO and InstaPIC-1.1M datasets despite having an embedding vocabulary size that is 39x – 99x smaller. Compact Partial Order Index(CPOI) Large amounts of RDF/S data are produced and published lately, and several modern applications require the provision of versioning and archiving services over such datasets. In this paper we propose a novel storage index for archiving versions of such datasets, called CPOI (compact partial order index), that exploits the fact that an RDF Knowledge Base (KB), is a graph (or equivalently a set of triples), and thus it has not a unique serialization (as it happens with text). If we want to keep stored several versions we actually want to store multiple sets of triples. CPOI is a data structure for storing such sets aiming at reducing the storage space since this is important not only for reducing storage costs, but also for reducing the various communication costs and enabling hosting in main memory (and thus processing efficiently) large quantities of data. CPOI is based on a partial order structure over sets of triple identifiers, where the triple identifiers are represented in a gapped form using variable length encoding schemes. For this index we evaluate analytically and experimentally various identifier assignment techniques and their space savings. The results show significant storage savings, specifically, the storage space of the compressed sets in large and realistic synthetic datasets is about the 8% of the size of the uncompressed sets. Compact Trip Representation(CTR) Representing the movements of objects (trips) over a network in a compact way while retaining the capability of exploiting such data effectively is an important challenge of real applications. We present a new Compact Trip Representation (CTR) that handles the spatio-temporal data associated with users’ trips over transportation networks. Depending on the network and types of queries, nodes in the network can represent intersections, stops, or even street segments. CTR represents separately sequences of nodes and the time instants when users traverse these nodes. The spatial component is handled with a data structure based on the well-known Compressed Suffix Array (CSA), which provides both a compact representation and interesting indexing capabilities. The temporal component is self-indexed with either a Hu-Tucker-shaped Wavelet-tree or a Wavelet Matrix that solve range-interval queries efficiently. We show how CTR can solve relevant counting-based spatial, temporal, and spatio-temporal queries over large sets of trips. Experimental results show the space requirements (around 50-70% of the space needed by a compact non-indexed baseline) and query efficiency (most queries are solved in the range of 1-1000 microseconds) of CTR. Comparative Argument Mining(CAM) We present the first work on domain-independent comparative argument mining (CAM), which is the automatic extraction of direct comparisons from text. After motivating the need and identifying the widespread use of this so far under-researched topic, we present the first publicly available open-domain dataset for CAM. The dataset was collected using crowdsourcing and contains 7199 unique sentences for 217 distinct comparison target pairs selected over several domains, of which 27% contain a directed (better vs. worse) comparison. In classification experiments, we examine the impact of representations, features, and classifiers, and reach an F1-score of 88% with a gradient boosting model based on pre-trained sentence embeddings, especially reliably identifying non-comparative sentences. This paves the way for domain-independent comparison extraction from web-scale corpora for the use in result ranking and presentation for comparative queries. Comparative Opinion Mining Opinion mining refers to the use of natural language processing, text analysis and computational linguistics to identify and extract subjective information in textual material. Opinion mining, also known as sentiment analysis, has received a lot of attention in recent times, as it provides a number of tools to analyse the public opinion on a number of different topics. Comparative opinion mining is a subfield of opinion mining that deals with identifying and extracting information that is expressed in a comparative form (e.g.~’paper X is better than the Y’). Comparative opinion mining plays a very important role when ones tries to evaluate something, as it provides a reference point for the comparison. This paper provides a review of the area of comparative opinion mining. It is the first review that cover specifically this topic as all previous reviews dealt mostly with general opinion mining. This survey covers comparative opinion mining from two different angles. One from perspective of techniques and the other from perspective of comparative opinion elements. It also incorporates preprocessing tools as well as dataset that were used by the past researchers that can be useful to the future researchers in the field of comparative opinion mining. compare-mt In this paper, we describe compare-mt, a tool for holistic analysis and comparison of the results of systems for language generation tasks such as machine translation. The main goal of the tool is to give the user a high-level and coherent view of the salient differences between systems that can then be used to guide further analysis or system improvement. It implements a number of tools to do so, such as analysis of accuracy of generation of particular types of words, bucketed histograms of sentence accuracies or counts based on salient characteristics, and extraction of characteristic $n$-grams for each system. It also has a number of advanced features such as use of linguistic labels, source side data, or comparison of log likelihoods for probabilistic models, and also aims to be easily extensible by users to new types of analysis. The code is available at https://…/compare-mt Comparison-Based Algorithm(CBA) Stochastic optimization finds a wide range of applications in operations research and management science. However, existing stochastic optimization techniques often require the information of random samples (e.g., demands in newsvendor problem) or the objective values at the sampled points (e.g., the lost sales cost), which might not be available in practice. In this paper, we consider a new setup for stochastic optimization, in which the decision maker can only access to comparison information between a random sample and two chosen points in each iteration. We propose a comparison-based algorithm (CBA) to solve such problems in single dimension with convex objective functions. Particularly, the CBA properly chooses the two points in each iteration and constructs an unbiased gradient estimate for the original problem. We show that the CBA achieves the same convergence rate as the optimal stochastic gradient methods (with the samples observed). We also consider extensions of our approach to high dimensional quadratic problems as well as problems with non-convex objective functions. Numerical experiments show that the CBA performs well in test problems. Comparison-Based Random Forest Assume we are given a set of items from a general metric space, but we neither have access to the representation of the data nor to the distances between data points. Instead, suppose that we can actively choose a triplet of items (A,B,C) and ask an oracle whether item A is closer to item B or to item C. In this paper, we propose a novel random forest algorithm for regression and classification that relies only on such triplet comparisons. In the theory part of this paper, we establish sufficient conditions for the consistency of such a forest. In a set of comprehensive experiments, we then demonstrate that the proposed random forest is efficient both for classification and regression. In particular, it is even competitive with other methods that have direct access to the metric representation of the data. Compatible Policy Search(COPOS) Trust-region methods have yielded state-of-the-art results in policy search. A common approach is to use KL-divergence to bound the region of trust resulting in a natural gradient policy update. We show that the natural gradient and trust region optimization are equivalent if we use the natural parameterization of a standard exponential policy distribution in combination with compatible value function approximation. Moreover, we show that standard natural gradient updates may reduce the entropy of the policy according to a wrong schedule leading to premature convergence. To control entropy reduction we introduce a new policy search method called compatible policy search (COPOS) which bounds entropy loss. The experimental results show that COPOS yields state-of-the-art results in challenging continuous control tasks and in discrete partially observable tasks. CompEngine Modern biomedical applications often involve time-series data, from high-throughput phenotyping of model organisms, through to individual disease diagnosis and treatment using biomedical data streams. Data and tools for time-series analysis are developed and applied across the sciences and in industry, but meaningful cross-disciplinary interactions are limited by the challenge of identifying fruitful connections. Here we introduce the web platform, CompEngine, a self-organizing, living library of time-series data that lowers the barrier to forming meaningful interdisciplinary connections between time series. Using a canonical feature-based representation, CompEngine places all time series in a common space, regardless of their origin, allowing users to upload their data and immediately explore interdisciplinary connections to other data with similar properties, and be alerted when similar data is uploaded in the future. In contrast to conventional databases, which are organized by assigned metadata, CompEngine incentivizes data sharing by automatically connecting experimental and theoretical scientists across disciplines based on the empirical structure of their data. CompEngine’s growing library of interdisciplinary time-series data also facilitates comprehensively characterization of algorithm performance across diverse types of data, and can be used to empirically motivate the development of new time-series analysis algorithms. Compensated Coupling Motivated by the success of using black-box predictive algorithms as subroutines for online decision-making, we develop a new framework for designing online policies given access to an oracle providing statistical information about an offline benchmark. Having access to such prediction oracles enables simple and natural Bayesian selection policies, and raises the question as to how these policies perform in different settings. Our work makes two important contributions towards tackling this question: First, we develop a general technique we call *compensated coupling* which can be used to derive bounds on the expected regret (i.e., additive loss with respect to a benchmark) for any online policy and offline benchmark; Second, using this technique, we show that the Bayes Selector has constant expected regret (i.e., independent of the number of arrivals and resource levels) in any online packing and matching problem with a finite type-space. Our results generalize and simplify many existing results for online packing and matching problems, and suggest a promising pathway for obtaining oracle-driven policies for other online decision-making settings. Competing Prediction Algorithm Prediction is a well-studied machine learning task, and prediction algorithms are core ingredients in online products and services. Despite their centrality in the competition between online companies who offer prediction-based products, the strategic use of prediction algorithms remains unexplored. The goal of this paper is to examine strategic use of prediction algorithms. We introduce a novel game-theoretic setting that is based on the PAC learning framework, where each player (aka a prediction algorithm at competition) seeks to maximize the sum of points for which it produces an accurate prediction and the others do not. We show that algorithms aiming at generalization may wittingly miss-predict some points to perform better than others on expectation. We analyze the empirical game, i.e. the game induced on a given sample, prove that it always possesses a pure Nash equilibrium, and show that every better-response learning process converges. Moreover, our learning-theoretic analysis suggests that players can, with high probability, learn an approximate pure Nash equilibrium for the whole population using a small number of samples. Competing Risks This form of analysis is known by its use of death certificates. In traditional overall survival analysis the cause of death is irrelevant to the analysis. In a competing risks survival analyses each death certificate is reviewed. If the disease of interest is cancer, and the person/patient dies of a car accident, the patient is labelled as censored at death, instead of being labelled as having died. Issues with this method arise as each hospital and or registry may code for causes of death differently. For example, there exists variability in the way a patient who has cancer and commits suicide is coded/labelled. In addition, if a patient has an eye removed due to an ocular cancer and dies getting hit while crossing the road because he didn’t see the car would often be considered to be censored rather than having died due to the cancer or its subsequent effects. ➘ “Survival Analysis” Competitive Analysis Competitive analysis is a method invented for analyzing online algorithms, in which the performance of an online algorithm (which must satisfy an unpredictable sequence of requests, completing each request without being able to see the future) is compared to the performance of an optimal offline algorithm that can view the sequence of requests in advance. An algorithm is competitive if its competitive ratio – the ratio between its performance and the offline algorithm’s performance – is bounded. Unlike traditional worst-case analysis, where the performance of an algorithm is measured only for ‘hard’ inputs, competitive analysis requires that an algorithm perform well both on hard and easy inputs, where ‘hard’ and ‘easy’ are defined by the performance of the optimal offline algorithm. For many algorithms, performance is dependent not only on the size of the inputs, but also on their values. One such example is the quicksort algorithm, which sorts an array of elements. Such data-dependent algorithms are analysed for average-case and worst-case data. Competitive analysis is a way of doing worst case analysis for on-line and randomized algorithms, which are typically data dependent. In competitive analysis, one imagines an ‘adversary’ that deliberately chooses difficult data, to maximize the ratio of the cost of the algorithm being studied and some optimal algorithm. Adversaries range in power from the oblivious adversary, which has no knowledge of the random choices made by the algorithm pitted against it, to the adaptive adversary that has full knowledge of how an algorithm works and its internal state at any point during its execution. Note that this distinction is only meaningful for randomized algorithms. For a deterministic algorithm, either adversary can simply compute what state that algorithm must have at any time in the future, and choose difficult data accordingly. For example, the quicksort algorithm chooses one element, called the ‘pivot’, that is, on average, not too far from the center value of the data being sorted. Quicksort then separates the data into two piles, one of which contains all elements with value less than the value of the pivot, and the other containing the rest of the elements. If quicksort chooses the pivot in some deterministic fashion (for instance, always choosing the first element in the list), then it is easy for an adversary to arrange the data beforehand so that quicksort will perform in worst-case time. If, however, quicksort chooses some random element to be the pivot, then an adversary without knowledge of what random numbers are coming up cannot arrange the data to guarantee worst-case execution time for quicksort. The classic on-line problem first analysed with competitive analysis (Sleator & Tarjan 1985) is the list update problem: Given a list of items and a sequence of requests for the various items, minimize the cost of accessing the list where the elements closer to the front of the list cost less to access. (Typically, the cost of accessing an item is equal to its position in the list.) After an access, the list may be rearranged. Most rearrangements have a cost. The Move-To-Front algorithm simply moves the requested item to the front after the access, at no cost. The Transpose algorithm swaps the accessed item with the item immediately before it, also at no cost. Classical methods of analysis showed that Transpose is optimal in certain contexts. In practice, Move-To-Front performed much better. Competitive analysis was used to show that an adversary can make Transpose perform arbitrarily badly compared to an optimal algorithm, whereas Move-To-Front can never be made to incur more than twice the cost of an optimal algorithm. In the case of online requests from a server, competitive algorithms are used to overcome uncertainties about the future. That is, the algorithm does not ‘know’ the future, while the imaginary adversary (the ‘competitor’) ‘knows’. Similarly, competitive algorithms were developed for distributed systems, where the algorithm has to react to a request arriving at one location, without ‘knowing’ what has just happened in a remote location. This setting was presented in (Awerbuch, Kutten & Peleg 1992). Competitive Dense Fully Convolutional Network(CDFNet) Increased information sharing through short and long-range skip connections between layers in fully convolutional networks have demonstrated significant improvement in performance for semantic segmentation. In this paper, we propose Competitive Dense Fully Convolutional Networks (CDFNet) by introducing competitive maxout activations in place of naive feature concatenation for inducing competition amongst layers. Within CDFNet, we propose two architectural contributions, namely competitive dense block (CDB) and competitive unpooling block (CUB) to induce competition at local and global scales for short and long-range skip connections respectively. This extension is demonstrated to boost learning of specialized sub-networks targeted at segmenting specific anatomies, which in turn eases the training of complex tasks. We present the proof-of-concept on the challenging task of whole body segmentation in the publicly available VISCERAL benchmark and demonstrate improved performance over multiple learning and registration based state-of-the-art methods. Competitive Intelligence(CI) Competitive intelligence is the action of defining, gathering, analyzing, and distributing intelligence about products, customers, competitors, and any aspect of the environment needed to support executives and managers making strategic decisions for an organization. Competitive intelligence essentially means understanding and learning what’s happening in the world outside your business so one can be as competitive as possible. It means learning as much as possible-as soon as possible-about one’s industry in general, one’s competitors, or even one’s county’s particular zoning rules. In short, it empowers you to anticipate and face challenges head on. A more focused definition of CI regards it as the organizational function responsible for the early identification of risks and opportunities in the market before they become obvious. Experts also call this process the early signal analysis. This definition focuses attention on the difference between dissemination of widely available factual information (such as market statistics, financial reports, newspaper clippings) performed by functions such as libraries and information centers, and competitive intelligence which is a perspective on developments and events aimed at yielding a competitive edge. Competitive Intelligence and 6 Tips for Its Effective Use Competitive Learning Competitive learning is a form of unsupervised learning in artificial neural networks, in which nodes compete for the right to respond to a subset of the input data. A variant of Hebbian learning, competitive learning works by increasing the specialization of each node in the network. It is well suited to finding clusters within data. Models and algorithms based on the principle of competitive learning include vector quantization and self-organising maps (Kohonen maps). https://…/handbookch7.html Competitive Pathway Network(CoPaNet) In the design of deep neural architectures, recent studies have demonstrated the benefits of grouping subnetworks into a larger network. For examples, the Inception architecture integrates multi-scale subnetworks and the residual network can be regarded that a residual unit combines a residual subnetwork with an identity shortcut. In this work, we embrace this observation and propose the Competitive Pathway Network (CoPaNet). The CoPaNet comprises a stack of competitive pathway units and each unit contains multiple parallel residual-type subnetworks followed by a max operation for feature competition. This mechanism enhances the model capability by learning a variety of features in subnetworks. The proposed strategy explicitly shows that the features propagate through pathways in various routing patterns, which is referred to as pathway encoding of category information. Moreover, the cross-block shortcut can be added to the CoPaNet to encourage feature reuse. We evaluated the proposed CoPaNet on four object recognition benchmarks: CIFAR-10, CIFAR-100, SVHN, and ImageNet. CoPaNet obtained the state-of-the-art or comparable results using similar amounts of parameters. The code of CoPaNet is available at: https://…/CoPaNet. Complementary Conditional GAN(CCGAN) Majority of state-of-the-art deep learning methods for vision applications are discriminative approaches, which model the conditional distribution. The success of such approaches heavily depends on high-quality labeled instances, which are not easy to obtain, especially as the number of candidate classes increases. In this paper, we study the complementary learning problem. Unlike ordinary labels, complementary labels are easy to obtain because an annotator only needs to provide a yes/no answer to a randomly chosen candidate class for each instance. We propose a generative-discriminative complementary learning method that estimates the ordinary labels by modeling both the conditional (discriminative) and instance (generative) distributions. Our method, we call Complementary Conditional GAN (CCGAN), improves the accuracy of predicting ordinary labels and is able to generate high quality instances in spite of weak supervision. In addition to the extensive empirical studies, we also theoretically show that our model can retrieve the true conditional distribution from the complementarily-labeled data. Complementary Recommendations Using Adversarial Feature Transformer(CRAFT) Traditional approaches for complementary product recommendations rely on behavioral and non-visual data such as customer co-views or co-buys. However, certain domains such as fashion are primarily visual. We propose a framework that harnesses visual cues in an unsupervised manner to learn the distribution of co-occurring complementary items in real world images. Our model learns a non-linear transformation between the two manifolds of source and target complementary item categories (e.g., tops and bottoms in outfits). Given a large dataset of images containing instances of co-occurring object categories, we train a generative transformer network directly on the feature representation space by casting it as an adversarial optimization problem. Such a conditional generative model can produce multiple novel samples of complementary items (in the feature space) for a given query item. The final recommendations are selected from the closest real world examples to the synthesized complementary features. We apply our framework to the task of recommending complementary tops for a given bottom clothing item. The recommendations made by our system are diverse, and are favored by human experts over the baseline approaches. Complementary Temporal Action Proposal(CTAP) Temporal action proposal generation is an important task, akin to object proposals, temporal action proposals are intended to capture ‘clips’ or temporal intervals in videos that are likely to contain an action. Previous methods can be divided to two groups: sliding window ranking and actionness score grouping. Sliding windows uniformly cover all segments in videos, but the temporal boundaries are imprecise; grouping based method may have more precise boundaries but it may omit some proposals when the quality of actionness score is low. Based on the complementary characteristics of these two methods, we propose a novel Complementary Temporal Action Proposal (CTAP) generator. Specifically, we apply a Proposal-level Actionness Trustworthiness Estimator (PATE) on the sliding windows proposals to generate the probabilities indicating whether the actions can be correctly detected by actionness scores, the windows with high scores are collected. The collected sliding windows and actionness proposals are then processed by a temporal convolutional neural network for proposal ranking and boundary adjustment. CTAP outperforms state-of-the-art methods on average recall (AR) by a large margin on THUMOS-14 and ActivityNet 1.3 datasets. We further apply CTAP as a proposal generation method in an existing action detector, and show consistent significant improvements. Complementary Temporal Difference Learning(CTDL) Complementary Learning Systems (CLS) theory suggests that the brain uses a ‘neocortical’ and a ‘hippocampal’ learning system to achieve complex behavior. These two systems are complementary in that the ‘neocortical’ system relies on slow learning of distributed representations while the ‘hippocampal’ system relies on fast learning of pattern-separated representations. Both of these systems project to the striatum, which is a key neural structure in the brain’s implementation of Reinforcement Learning (RL). Current deep RL approaches share similarities with a ‘neocortical’ system because they slowly learn distributed representations through backpropagation in Deep Neural Networks (DNNs). An ongoing criticism of such approaches is that they are data inefficient and lack flexibility. CLS theory suggests that the addition of a ‘hippocampal’ system could address these criticisms. In the present study we propose a novel algorithm known as Complementary Temporal Difference Learning (CTDL), which combines a DNN with a Self-Organising Map (SOM) to obtain the benefits of both a ‘neocortical’ and a ‘hippocampal’ system. Key features of CTDL include the use of Temporal Difference (TD) error to update a SOM and the combination of a SOM and DNN to calculate action values. We evaluate CTDL on grid worlds and the Cart-Pole environment, and show several benefits over the classic Deep Q-Network (DQN) approach. These results demonstrate (1) the utility of complementary learning systems for the evaluation of actions, (2) that the TD error signal is a useful form of communication between the two systems and (3) the biological plausibility of the proposed approach. Complete Representations GAN(CR-GAN) Generating multi-view images from a single-view input is an essential yet challenging problem. It has broad applications in vision, graphics, and robotics. Our study indicates that the widely-used generative adversarial network (GAN) may learn ‘incomplete’ representations due to the single-pathway framework: an encoder-decoder network followed by a discriminator network. We propose CR-GAN to address this problem. In addition to the single reconstruction path, we introduce a generation sideway to maintain the completeness of the learned embedding space. The two learning pathways collaborate and compete in a parameter-sharing manner, yielding considerably improved generalization ability to ‘unseen’ dataset. More importantly, the two-pathway framework makes it possible to combine both labeled and unlabeled data for self-supervised learning, which further enriches the embedding space for realistic generations. The experimental results prove that CR-GAN significantly outperforms state-of-the-art methods, especially when generating from ‘unseen’ inputs in wild conditions. Complete Spatial Randomness(CSR) Complete spatial randomness (CSR) describes a point process whereby point events occur within a given study area in a completely random fashion. It is synonymous with a homogeneous spatial Poisson process. Such a process is modeled using only one parameter \rho, i.e. the density of points within the defined area. The term complete spatial randomness is commonly used in Applied Statistics in the context of examining certain point patterns, whereas in most other statistical contexts it is referred to the concept of a spatial Poisson process. Completed Partially Directed Acyclic Graph(CPDAG) ➘ “Directed Acyclic Graph” Complete-Linkage Clustering Complete-linkage clustering is one of several methods of agglomerative hierarchical clustering. At the beginning of the process, each element is in a cluster of its own. The clusters are then sequentially combined into larger clusters until all elements end up being in the same cluster. At each step, the two clusters separated by the shortest distance are combined. The definition of ‘shortest distance’ is what differentiates between the different agglomerative clustering methods. In complete-linkage clustering, the link between two clusters contains all element pairs, and the distance between clusters equals the distance between those two elements (one in each cluster) that are farthest away from each other. The shortest of these links that remains at any step causes the fusion of the two clusters whose elements are involved. The method is also known as farthest neighbour clustering. The result of the clustering can be visualized as a dendrogram, which shows the sequence of cluster fusion and the distance at which each fusion took place. Complex Adaptive System(CAS) Complexity theory is a relatively new field that began in the mid-1980s at the Santa Fe Institute in New Mexico. Work at the Santa Fe Institute is usually presented as the study of Complex Adaptive Systems (CAS). The CAS movement is predominantly American, as opposed to the European “natural science” tradition in the area of cybernetics and systems. Like in cybernetics and systems theory, CAS shares the subject of general properties of complex systems across traditional disciplinary boundaries. However, CAS is distinguished by the extensive use of computer simulations as a research tool, and an emphasis on systems, such as markets or ecologies, which are less integrated or “organized” than the ones studied by the older tradition (e.g., organisms, machines and companies). Complex Correntropy Recent studies have demonstrated that correntropy is an efficient tool for analyzing higher-order statistical moments in nonGaussian noise environments. Although correntropy has been used with complex data, no theoretical study was pursued to elucidate its properties, nor how to best use it for optimization. This paper presents a probabilistic interpretation for correntropy using complex-valued data called complex correntropy. A recursive solution for the maximum complex correntropy criterion (MCCC) is introduced based on a fixed point solution. This technique is applied to a simple system identification case study, and the results demonstrate prominent advantages when compared to the complex recursive least squares (RLS) algorithm. By using such probabilistic interpretation, correntropy can be applied to solve several problems involving complex data in a more straightforward way. Keywords: complex-valued data correntropy, maximum complex correntropy criterion, fixed-point algorithm. Complex Event Processing(CEP) Event processing is a method of tracking and analyzing (processing) streams of information (data) about things that happen (events), and deriving a conclusion from them. Complex event processing, or CEP, is event processing that combines data from multiple sources to infer events or patterns that suggest more complicated circumstances. The goal of complex event processing is to identify meaningful events (such as opportunities or threats) and respond to them as quickly as possible. Complex Network In the context of network theory, a complex network is a graph (network) with non-trivial topological features – features that do not occur in simple networks such as lattices or random graphs but often occur in graphs modelling real systems. The study of complex networks is a young and active area of scientific research inspired largely by the empirical study of real-world networks such as computer networks and social networks. Complex Network Classifier(CNC) Classifying large scale networks into several categories and distinguishing them according to their fine structures is of great importance with several applications in real life. However, most studies of complex networks focus on properties of a single network but seldom on classification, clustering, and comparison between different networks, in which the network is treated as a whole. Due to the non-Euclidean properties of the data, conventional methods can hardly be applied on networks directly. In this paper, we propose a novel framework of complex network classifier (CNC) by integrating network embedding and convolutional neural network to tackle the problem of network classification. By training the classifiers on synthetic complex network data and real international trade network data, we show CNC can not only classify networks in a high accuracy and robustness, it can also extract the features of the networks automatically. Complex Systems Complex systems present problems both in mathematical modelling and philosophical foundations. The study of complex systems represents a new approach to science that investigates how relationships between parts give rise to the collective behaviors of a system and how the system interacts and forms relationships with its environment. The equations from which models of complex systems are developed generally derive from statistical physics, information theory and non-linear dynamics and represent organized but unpredictable behaviors of natural systems that are considered fundamentally complex. The physical manifestations of such systems are difficult to define, so a common choice is to identify ‘the system’ with the mathematical information model rather than referring to the undefined physical subject the model represents. Such systems are used to model processes in computer science, biology, economics, physics, chemistry, and many other fields. It is also called complex systems theory, complexity science, study of complex systems, sciences of complexity, non-equilibrium physics, and historical physics. A variety of abstract theoretical complex systems is studied as a field of mathematics. The key problems of complex systems are difficulties with their formal modelling and simulation. From such a perspective, in different research contexts complex systems are defined on the basis of their different attributes. Since all complex systems have many interconnected components, the science of networks and network theory are important aspects of the study of complex systems. A consensus regarding a single universal definition of complex system does not yet exist. For systems that are less usefully represented with equations various other kinds of narratives and methods for identifying, exploring, designing and interacting with complex systems are used. Complex-Valued Network for Matching(CNM) This paper seeks to model human language by the mathematical framework of quantum physics. With the well-designed mathematical formulations in quantum physics, this framework unifies different linguistic units in a single complex-valued vector space, e.g. words as particles in quantum states and sentences as mixed systems. A complex-valued network is built to implement this framework for semantic matching. With well-constrained complex-valued components, the network admits interpretations to explicit physical meanings. The proposed complex-valued network for matching (CNM) achieves comparable performances to strong CNN and RNN baselines on two benchmarking question answering (QA) datasets. Complex-Valued Neural Network(CVNN) The complex-valued Neural Network is an extension of a (usual) real-valued neural network, whose input and output signals and parameters such as weights and thresholds are all complex numbers (the activation function is inevitably a complex-valued function). Neural Networks have been applied to various fields such as communication systems, image processing and speech recognition, in which complex numbers are often used through the Fourier Transformation. This indicates that complex-valued neural networks are useful. In addition, in the human brain, an action potential may have different pulse patterns, and the distance between pulses may be different. This suggests that introducing complex numbers representing phase and amplitude into neural networks is appropriate. In these years the complex-valued neural networks expand the application fields in image processing, computer vision, optoelectronic imaging, and communication and so on. The potentially wide applicability yields new aspects of theories required for novel or more effective functions and mechanisms. Complier Average Causal Effects(CACE) Typically, studies analyze data based on treatment assignment rather than treatment received. This focus on assignment is called an intention-to-treat (ITT) analysis. In a policy environment, the ITT may make a lot of sense; we are answering this specific question: ‘What is the overall effect in the real world where the intervention is made available yet some people take advantage of it while others do not?’ Alternatively, researchers may be interested in different question: ‘What is the causal effect of actually receiving the treatment?’ Now, to answer the second question, there are numerous subtle issues that you need to wrestle with (again, go take the course). But, long story short, we need to (1) identify the folks in the intervention group who actually do what they have been encouraged to do (receive the intervention) but only because they were encouraged, and not because they would have received the intervention anyways had they not been randomized, and compare their outcomes with (2) the folks in the control group who did not seek out the intervention on their own initiative but would have received the intervention had they been encouraged. These two groups are considered to be compliers – they would always do what they are told in the context of the study. And the effect of the intervention that is based on outcomes from this type of patient is called the complier average causal effect (CACE). Component Lasso Method We propose a new sparse regression method called the component lasso, based on a simple idea. The method uses the connected-components structure of the sample covariance matrix to split the problem into smaller ones. It then applies the lasso to each subproblem separately, obtaining a coefficient vector for each one. Finally, it uses non-negative least squares to recombine the different vectors into a single solution. This step is useful in selecting and reweighting components that are correlated with the response. Simulated and real data examples show that the component lasso can outperform standard regression methods such as the lasso and elastic net, achieving a lower mean squared error as well as better support recovery. The modular structure also lends itself naturally to parallel computation. Composable Preprocessing Operators(CPO) Toolset that enriches ‘mlr’ with a diverse set of preprocessing operators. Composable Preprocessing Operators (‘CPO’s) are first-class R objects that can be applied to data.frames and ‘mlr’ ‘Task’s to modify data, can be attached to ‘mlr’ ‘Learner’s to add preprocessing to machine learning algorithms, and can be composed to form preprocessing pipelines. mlrCPO Composite Gaussian Process Models(CGP) A new type of nonstationary Gaussian process model is devel- oped for approximating computationally expensive functions. The new model is a composite of two Gaussian processes, where the first one captures the smooth global trend and the second one models lo- cal details. The new predictor also incorporates a flexible variance model, which makes it more capable of approximating surfaces with varying volatility. Compared to the commonly used stationary Gaus- sian process model, the new predictor is numerically more stable and can more accurately approximate complex surfaces when the experi- mental design is sparse. In addition, the new model can also improve the prediction intervals by quantifying the change of local variability associated with the response. Composite Indicator(COIN) A composite indicator is formed when individual indicators are compiled into a single index, on the basis of an underlying model of the multi-dimensional concept that is being measured. A composite indicator measures multi-dimensional concepts (e.g. competitiveness, e-trade or environmental quality) which cannot be captured by a single indicator. Ideally, a composite indicator should be based on a theoretical framework / definition, which allows individual indicators / variables to be selected, combined and weighted in a manner which reflects the dimensions or structure of the phenomena being measured. Composite Quantile Regression(CQR) Composition Assistance for Photo Taking(CAPTAIN) Many people are interested in taking astonishing photos and sharing with others. Emerging hightech hardware and software facilitate ubiquitousness and functionality of digital photography. Because composition matters in photography, researchers have leveraged some common composition techniques to assess the aesthetic quality of photos computationally. However, composition techniques developed by professionals are far more diverse than well-documented techniques can cover. We leverage the vast underexplored innovations in photography for computational composition assistance. We propose a comprehensive framework, named CAPTAIN (Composition Assistance for Photo Taking), containing integrated deep-learned semantic detectors, sub-genre categorization, artistic pose clustering, personalized aesthetics-based image retrieval, and style set matching. The framework is backed by a large dataset crawled from a photo-sharing Website with mostly photography enthusiasts and professionals. The work proposes a sequence of steps that have not been explored in the past by researchers. The work addresses personal preferences for composition through presenting a ranked-list of photographs to the user based on user-specified weights in the similarity measure. The matching algorithm recognizes the best shot among a sequence of shots with respect to the user’s preferred style set. We have conducted a number of experiments on the newly proposed components and reported findings. A user study demonstrates that the work is useful to those taking photos. Compositional Coding Text classification is a challenging problem which aims to identify the category of text. Recently, Capsule Networks (CapsNets) are proposed for image classification, it has been shown that CapsNets have several advantages over Convolutional Neural Networks (CNNs), while, their validity in the domain of text has less been explored. An effective method named deep compositional code learning has been proposed lately. This method can save many parameters about word embeddings without any significant sacrifices in performance. In this paper, we introduce the Compositional Coding (CC) mechanism between capsules, and we propose a new routing algorithm, which is based on k-means clustering theory. Experiments conducted on eight challenging text classification datasets show the proposed method achieves similar accuracy compared to the state-of-the-art approach with significantly fewer parameters. Compositional Coding for Collaborative Filtering Efficiency is crucial to the online recommender systems. Representing users and items as binary vectors for Collaborative Filtering (CF) can achieve fast user-item affinity computation in the Hamming space, in recent years, we have witnessed an emerging research effort in exploiting binary hashing techniques for CF methods. However, CF with binary codes naturally suffers from low accuracy due to limited representation capability in each bit, which impedes it from modeling complex structure of the data. In this work, we attempt to improve the efficiency without hurting the model performance by utilizing both the accuracy of real-valued vectors and the efficiency of binary codes to represent users/items. In particular, we propose the Compositional Coding for Collaborative Filtering (CCCF) framework, which not only gains better recommendation efficiency than the state-of-the-art binarized CF approaches but also achieves even higher accuracy than the real-valued CF method. Specifically, CCCF innovatively represents each user/item with a set of binary vectors, which are associated with a sparse real-value weight vector. Each value of the weight vector encodes the importance of the corresponding binary vector to the user/item. The continuous weight vectors greatly enhances the representation capability of binary codes, and its sparsity guarantees the processing speed. Furthermore, an integer weight approximation scheme is proposed to further accelerate the speed. Based on the CCCF framework, we design an efficient discrete optimization algorithm to learn its parameters. Extensive experiments on three real-world datasets show that our method outperforms the state-of-the-art binarized CF methods (even achieves better performance than the real-valued CF method) by a large margin in terms of both recommendation accuracy and efficiency. Compositional Data In statistics, compositional data are quantitative descriptions of the parts of some whole, conveying exclusively relative information. This definition, given by John Aitchison (1986) has several consequences: · A compositional data point, or composition for short, can be represented by a positive real vector with as many parts as considered. Sometimes, if the total amount is fixed and known, one component of the vector can be omitted. · As compositions only carry relative information, the only information is given by the ratios between components. Consequently, a composition multiplied by any positive constant contains the same information as the former. Therefore, proportional positive vectors are equivalent when considered as compositions. · As usual in mathematics, equivalent classes are represented by some element of the class, called a representative. Thus, equivalent compositions can be represented by positive vectors whose components add to a given constant kappa. The vector operation assigning the constant sum representative is called closure, where D is the number of parts (components) and denotes a row vector. · Compositional data can be represented by constant sum real vectors with positive components, and this vectors span a simplex. Compositional Data Analysis(CoDa) Compositional data analysis deals with situations where the relevant information is contained only in the ratios between the measured variables, and not in the reported values. Compositional data analysis usually deals with relative information between parts where the total (abundances, mass, amount, etc.) is unknown or uninformative. A Concise Guide to Compositional Data Analysis Compositional,compositions Compositional GAN Generative Adversarial Networks (GANs) can produce images of surprising complexity and realism, but are generally modeled to sample from a single latent source ignoring the explicit spatial interaction between multiple entities that could be present in a scene. Capturing such complex interactions between different objects in the world, including their relative scaling, spatial layout, occlusion, or viewpoint transformation is a challenging problem. In this work, we propose to model object composition in a GAN framework as a self-consistent composition-decomposition network. Our model is conditioned on the object images from their marginal distributions to generate a realistic image from their joint distribution by explicitly learning the possible interactions. We evaluate our model through qualitative experiments and user evaluations in both the scenarios when either paired or unpaired examples for the individual object images and the joint scenes are given during training. Our results reveal that the learned model captures potential interactions between the two object domains given as input to output new instances of composed scene at test time in a reasonable fashion. Compositional Imitation Learning and Execution(CompILE) We introduce a framework for Compositional Imitation Learning and Execution (CompILE) of hierarchically-structured behavior. CompILE learns reusable, variable-length segments of behavior from demonstration data using a novel unsupervised, fully-differentiable sequence segmentation module. These learned behaviors can then be re-composed and executed to perform new tasks. At training time, CompILE auto-encodes observed behavior into a sequence of latent codes, each corresponding to a variable-length segment in the input sequence. Once trained, our model generalizes to sequences of longer length and from environment instances not seen during training. We evaluate our model in a challenging 2D multi-task environment and show that CompILE can find correct task boundaries and event encodings in an unsupervised manner without requiring annotated demonstration data. Latent codes and associated behavior policies discovered by CompILE can be used by a hierarchical agent, where the high-level policy selects actions in the latent code space, and the low-level, task-specific policies are simply the learned decoders. We found that our agent could learn given only sparse rewards, where agents without task-specific policies struggle. Compositional Network Embedding Network embedding has proved extremely useful in a variety of network analysis tasks such as node classification, link prediction, and network visualization. Almost all the existing network embedding methods learn to map the node IDs to their corresponding node embeddings. This design principle, however, hinders the existing methods from being applied in real cases. Node ID is not generalizable and, thus, the existing methods have to pay great effort in cold-start problem. The heterogeneous network usually requires extra work to encode node types, as node type is not able to be identified by node ID. Node ID carries rare information, resulting in the criticism that the existing methods are not robust to noise. To address this issue, we introduce Compositional Network Embedding, a general inductive network representation learning framework that generates node embeddings by combining node features based on the principle of compositionally. Instead of directly optimizing an embedding lookup based on arbitrary node IDs, we learn a composition function that infers node embeddings by combining the corresponding node attribute embeddings through a graph-based loss. For evaluation, we conduct the experiments on link prediction under four different settings. The results verified the effectiveness and generalization ability of compositional network embeddings, especially on unseen nodes. Compositional Online Learning with Kernels(COLK) In this work, we address optimization problems where the objective function is a nonlinear function of an expected value, i.e., compositional stochastic {strongly convex programs}. We consider the case where the decision variable is not vector-valued but instead belongs to a reproducing Kernel Hilbert Space (RKHS), motivated by risk-aware formulations of supervised learning and Markov Decision Processes defined over continuous spaces. We develop the first memory-efficient stochastic algorithm for this setting, which we call Compositional Online Learning with Kernels (COLK). COLK, at its core a two-time-scale stochastic approximation method, addresses the fact that (i) compositions of expected value problems cannot be addressed by classical stochastic gradient due to the presence of the inner expectation; and (ii) the RKHS-induced parameterization has complexity which is proportional to the iteration index which is mitigated through greedily constructed subspace projections. We establish almost sure convergence of COLK with attenuating step-sizes, and linear convergence in mean to a neighborhood with constant step-sizes, as well as the fact that its complexity is at-worst finite. The experiments with robust formulations of supervised learning demonstrate that COLK reliably converges, attains consistent performance across training runs, and thus overcomes overfitting. Compositional Pattern Producing Network(DPPN) Compositional pattern-producing networks (CPPNs) are a variation of artificial neural networks (ANNs) that differ in their set of activation functions and how they are applied. While ANNs often contain only sigmoid functions and sometimes Gaussian functions, CPPNs can include both types of functions and many others. The choice of functions for the canonical set can be biased toward specific types of patterns and regularities. For example, periodic functions such as sine produce segmented patterns with repetitions, while symmetric functions such as Gaussian produce symmetric patterns. Linear functions can be employed to produce linear or fractal-like patterns. Thus, the architect of a CPPN-based genetic art system can bias the types of patterns it generates by deciding the set of canonical functions to include. Compositional Stochastic Average Gradient Descent(C-SAG) Many machine learning, statistical inference, and portfolio optimization problems require minimization of a composition of expected value functions (CEVF). Of particular interest is the finite-sum versions of such compositional optimization problems (FS-CEVF). Compositional stochastic variance reduced gradient (C-SVRG) methods that combine stochastic compositional gradient descent (SCGD) and stochastic variance reduced gradient descent (SVRG) methods are the state-of-the-art methods for FS-CEVF problems. We introduce compositional stochastic average gradient descent (C-SAG) a novel extension of the stochastic average gradient method (SAG) to minimize composition of finite-sum functions. C-SAG, like SAG, estimates gradient by incorporating memory of previous gradient information. We present theoretical analyses of C-SAG which show that C-SAG, like SAG, and C-SVRG, achieves a linear convergence rate when the objective function is strongly convex; However, C-CAG achieves lower oracle query complexity per iteration than C-SVRG. Finally, we present results of experiments showing that C-SAG converges substantially faster than full gradient (FG), as well as C-SVRG. Comprehensive EVent Ontology(CEVO) While the general analysis of named entities has received substantial research attention, the analysis of relations over named entities has not. In fact, a review of the literature on unstructured as well as structured data revealed a deficiency in research on the abstract conceptualization required to organize relations. We believe that such an abstract conceptualization can benefit various communities and applications such as natural language processing, information extraction, machine learning and ontology engineering. In this paper, we present CEVO (i.e., a Comprehensive EVent Ontology) built on Levin’s conceptual hierarchy of English verbs that categorizes verbs with the shared meaning and syntactic behavior. We present the fundamental concepts and requirements for this ontology. Furthermore, we present three use cases for demonstrating the benefits of this ontology on annotation tasks: 1) annotating relations in plain text, 2) annotating ontological properties and 3) linking textual relations to ontological properties. COmprehensive INstructional video analysis(COIN) There are substantial instructional videos on the Internet, which enables us to acquire knowledge for completing various tasks. However, most existing datasets for instructional video analysis have the limitations in diversity and scale,which makes them far from many real-world applications where more diverse activities occur. Moreover, it still remains a great challenge to organize and harness such data. To address these problems, we introduce a large-scale dataset called ‘COIN’ for COmprehensive INstructional video analysis. Organized with a hierarchical structure, the COIN dataset contains 11,827 videos of 180 tasks in 12 domains (e.g., vehicles, gadgets, etc.) related to our daily life. With a new developed toolbox, all the videos are annotated effectively with a series of step descriptions and the corresponding temporal boundaries. Furthermore, we propose a simple yet effective method to capture the dependencies among different steps, which can be easily plugged into conventional proposal-based action detection methods for localizing important steps in instructional videos. In order to provide a benchmark for instructional video analysis, we evaluate plenty of approaches on the COIN dataset under different evaluation criteria. We expect the introduction of the COIN dataset will promote the future in-depth research on instructional video analysis for the community. Compressed Learning(CL) In this paper, we provide theoretical results to show that compressed learning, learning directly in the compressed domain, is possible. In Particular, we provide tight bounds demonstrating that the linear kernel SVM’s classifier in the measurement domain, with high probability, has true accuracy close to the accuracy of the best linear threshold classifier in the data domain. We show that this is beneficial both from the compressed sensing and the machine learning points of view. Furthermore, we indicate that for a family of well-known compressed sensing matrices, compressed learning is universal, in the sense that learning and classification in the measurement domain works provided that the data are sparse in some, even unknown, basis. Moreover, we show that our results are also applicable to a family of smooth manifold-learning tasks. Finally, we support our claims with experimental results. Compressed Learning: A Deep Neural Network Approach Compressed Randomized UTV(CoR-UTV) Low-rank matrix approximations play a fundamental role in numerical linear algebra and signal processing applications. This paper introduces a novel rank-revealing matrix decomposition algorithm termed Compressed Randomized UTV (CoR-UTV) decomposition along with a CoR-UTV variant aided by the power method technique. CoR-UTV is primarily developed to compute an approximation to a low-rank input matrix by making use of random sampling schemes. Given a large and dense matrix of size $m\times n$ with numerical rank $k$, where $k \ll \text{min}$, CoR-UTV requires a few passes over the data, and runs in $O(mnk)$ floating-point operations. Furthermore, CoR-UTV can exploit modern computational platforms and, consequently, can be optimized for maximum efficiency. CoR-UTV is simple and accurate, and outperforms reported alternative methods in terms of efficiency and accuracy. Simulations with synthetic data as well as real data in image reconstruction and robust principal component analysis applications support our claims. Compressed, Complementary, Computationally-Efficient Adaptive Gradient Online Learning(CompAdaGrad) The adaptive gradient online learning method known as AdaGrad has seen widespread use in the machine learning community in stochastic and adversarial online learning problems and more recently in deep learning methods. The method’s full-matrix incarnation offers much better theoretical guarantees and potentially better empirical performance than its diagonal version; however, this version is computationally prohibitive and so the simpler diagonal version often is used in practice. We introduce a new method, CompAdaGrad, that navigates the space between these two schemes and show that this method can yield results much better than diagonal AdaGrad while avoiding the (effectively intractable) $O(n^3)$ computational complexity of full-matrix AdaGrad for dimension $n$. CompAdaGrad essentially performs full-matrix regularization in a low-dimensional subspace while performing diagonal regularization in the complementary subspace. We derive CompAdaGrad’s updates for composite mirror descent in case of the squared $\ell_2$ norm and the $\ell_1$ norm, demonstrate that its complexity per iteration is linear in the dimension, and establish guarantees for the method independent of the choice of composite regularizer. Finally, we show preliminary results on several datasets. Compressive Affine Phase Retrieval via Lifting(CAPRL) In this paper, we consider compressive/sparse affine phase retrieval proposed in [B. Gao B, Q. Sun, Y. Wang and Z. Xu, Adv. in Appl. Math., 93(2018), 121-141]. By the lift technique, and heuristic nuclear norm for convex relaxation of rank and $\ell$ one norm convex relaxation of sparsity, we establish convex models , which are called compressive affine phase retrieval via lifting (CAPRL). In order to compute these models, we develop inertial proximal ADMM for multiple separated operators and also give out its convergence analysis. Our numerical experiments via proposed algorithm show that sparse signal can be exactly and stably recovered via CAPRL. We also list some other applications of our proposed algorithm. Compressive Autoencoder With Pruning Based on ADMM(CAE-P) Since compressive autoencoder (CAE) was proposed, autoencoder, as a simple and efficient neural network model, has achieved better performance than traditional codecs such as JPEG[3], JPEG 2000[4] etc. in lossy image compression. However, it faces the problem that the bitrate, characterizing the compression ratio, cannot be optimized by general methods due to its discreteness. Current research additionally trains a entropy estimator to indirectly optimize the bitrate. In this paper, we proposed the compressive autoencoder with pruning based on ADMM (CAE-P) which replaces the traditionally used entropy estimating technique with ADMM pruning method inspired by the field of neural network architecture search and avoided the extra effort needed for training an entropy estimator. We tested our models on natural image dataset Kodak PhotoCD and achieved better results than the original CAE model which relies on entropy coding along with traditional codecs. We further explored the effectiveness of the ADMM-based pruning method in CAE-P by looking into the detail of latent codes learned by the model. Compressive K-means(CKM) The Lloyd-Max algorithm is a classical approach to perform K-means clustering. Unfortunately, its cost becomes prohibitive as the training dataset grows large. We propose a compressive version of K-means (CKM), that estimates cluster centers from a sketch, i.e. from a drastically compressed representation of the training dataset. We demonstrate empirically that CKM performs similarly to Lloyd-Max, for a sketch size proportional to the number of cen-troids times the ambient dimension, and independent of the size of the original dataset. Given the sketch, the computational complexity of CKM is also independent of the size of the dataset. Unlike Lloyd-Max which requires several replicates, we further demonstrate that CKM is almost insensitive to initialization. For a large dataset of 10^7 data points, we show that CKM can run two orders of magnitude faster than five replicates of Lloyd-Max, with similar clustering performance on artificial data. Finally, CKM achieves lower classification errors on handwritten digits classification. ➘ “Lloyd-Max” COmpresSIve NE(COSINE) There is recently a surge in approaches that learn low-dimensional embeddings of nodes in networks. As there are many large-scale real-world networks, it’s inefficient for existing approaches to store amounts of parameters in memory and update them edge after edge. With the knowledge that nodes having similar neighborhood will be close to each other in embedding space, we propose COSINE (COmpresSIve NE) algorithm which reduces the memory footprint and accelerates the training process by parameters sharing among similar nodes. COSINE applies graph partitioning algorithms to networks and builds parameter sharing dependency of nodes based on the result of partitioning. With parameters sharing among similar nodes, COSINE injects prior knowledge about higher structural information into training process which makes network embedding more efficient and effective. COSINE can be applied to any embedding lookup method and learn high-quality embeddings with limited memory and shorter training time. We conduct experiments of multi-label classification and link prediction, where baselines and our model have the same memory usage. Experimental results show that COSINE gives baselines up to 23% increase on classification and up to 25% increase on link prediction. Moreover, time of all representation learning methods using COSINE decreases from 30% to 70%. Compressive Sampling(CS) Compressed sensing (also known as compressive sensing, compressive sampling, or sparse sampling) is a signal processing technique for efficiently acquiring and reconstructing a signal, by finding solutions to underdetermined linear systems. This is based on the principle that, through optimization, the sparsity of a signal can be exploited to recover it from far fewer samples than required by the Shannon-Nyquist sampling theorem. There are two conditions under which recovery is possible. The first one is sparsity which requires the signal to be sparse in some domain. The second one is incoherence which is applied through the isometric property which is sufficient for sparse signals. MRI is a prominent application. A Mathematical Introduction to Compressive Sensing An Introduction To Compressive Sampling Compressive Sensing Compressive Sensing(CS) ➘ “Compressive Sampling” Computation Control Protocol(CCP) Cooperative computation is a promising approach for localized data processing for Internet of Things (IoT), where computationally intensive tasks in a device could be divided into sub-tasks, and offloaded to other devices or servers in close proximity. However, exploiting the potential of cooperative computation is challenging mainly due to the heterogeneous nature of IoT devices. Indeed, IoT devices may have different and time-varying computing power and energy resources, and could be mobile. Coded computation, which advocates mixing data in sub-tasks by employing erasure codes and offloading these sub-tasks to other devices for computation, is recently gaining interest, thanks to its higher reliability, smaller delay, and lower communication costs. In this paper, we develop a coded cooperative computation framework, which we name Computation Control Protocol (CCP), by taking into account heterogeneous computing power and energy resources of IoT devices. CCP dynamically allocates sub-tasks to helpers and is adaptive to time-varying resources. We show that (i) CCP improves task completion delay significantly as compared to baselines, (ii) task completion delay of CCP is very close to its theoretical characterization, and (iii) the efficiency of CCP in terms of resource utilization is higher than 99%, which is significant. Computation Graph Structural equation modeling (SEM) is evolving as available data is becoming more complex, reaching the limits of what traditional estimation approaches can achieve. As SEM expands to ever larger, more complex applications, the estimation challenge grows and currently available methods will be insufficient. To overcome this challenge in SEM, we see an opportunity to use existing solutions from the field of deep learning, which has been pioneering methods for estimation of complex models for decades. To this end, this paper introduces computation graphs, a flexible method of specifying objective functions. When combined with state-of-the-art optimizers, we argue that our computation graph approach is capable not only of estimating SEM models, but also of rapidly extending them — without the need of bespoke software development for each new extension. We show that several SEM improvements follow naturally from our approach; not only existing extensions such as least absolute deviation estimation and penalized regression models, but also novel extensions such as spike-and-slab penalties for sparse factor analysis. By applying computation graphs to SEM, we hope to greatly accelerate the process of developing SEM techniques, paving the way for novel applications. The accompanying R package tensorsem is under active development. Computation Tree Logic Computation tree logic (CTL) is a branching-time logic, meaning that its model of time is a tree-like structure in which the future is not determined; there are different paths in the future, any one of which might be an actual path that is realized. It is used in formal verification of software or hardware artifacts, typically by software applications known as model checkers which determine if a given artifact possesses safety or liveness properties. For example, CTL can specify that when some initial condition is satisfied (e.g., all program variables are positive or no cars on a highway straddle two lanes), then all possible executions of a program avoid some undesirable condition (e.g., dividing a number by zero or two cars colliding on a highway). In this example, the safety property could be verified by a model checker that explores all possible transitions out of program states satisfying the initial condition and ensures that all such executions satisfy the property. Computation tree logic is in a class of temporal logics that includes linear temporal logic (LTL). Although there are properties expressible only in CTL and properties expressible only in LTL, all properties expressible in either logic can also be expressed in CTL*. Computational Intelligence(CI) Computational intelligence (CI) is a set of nature-inspired computational methodologies and approaches to address complex real-world problems to which traditional approaches, i.e., first principles modeling or explicit statistical modeling, are ineffective or infeasible. Many such real-life problems are not considered to be well-posed problems mathematically, but nature provides many counterexamples of biological systems exhibiting the required function, practically. For instance, the human body has about 200 joints (degrees of freedom), but humans have little problem in executing a target movement of the hand, specified in just three Cartesian dimensions. Even if the torso were mechanically fixed, there is an excess of 7:3 parameters to be controlled for natural arm movement. Traditional models also often fail to handle uncertainty, noise and the presence of an ever-changing context. Computational Intelligence provides solutions for such and other complicated problems and inverse problems. It primarily includes artificial neural networks, evolutionary computation and fuzzy logic. In addition, CI also embraces biologically inspired algorithms such as swarm intelligence and artificial immune systems, which can be seen as a part of evolutionary computation, and includes broader fields such as image processing, data mining, and natural language processing. Furthermore other formalisms: Dempster-Shafer theory, chaos theory and many-valued logic are used in the construction of computational models. The characteristic of “intelligence” is usually attributed to humans. More recently, many products and items also claim to be “intelligent”. Intelligence is directly linked to the reasoning and decision making. Fuzzy logic was introduced in 1965 as a tool to formalise and represent the reasoning process and fuzzy logic systems which are based on fuzzy logic possess many characteristics attributed to intelligence. Fuzzy logic deals effectively with uncertainty that is common for human reasoning, perception and inference and, contrary to some misconceptions, has a very formal and strict mathematical backbone (‘is quite deterministic in itself yet allowing uncertainties to be effectively represented and manipulated by it’, so to speak). Neural networks, introduced in 1940s (further developed in 1980s) mimic the human brain and represent a computational mechanism based on a simplified mathematical model of the perceptrons (neurons) and signals that they process. Evolutionary computation, introduced in the 1970s and more popular since the 1990s mimics the population-based sexual evolution through reproduction of generations. It also mimics genetics in so called genetic algorithms. Computational Linguistics Computational linguistics is an interdisciplinary field concerned with the statistical or rule-based modeling of natural language from a computational perspective. Traditionally, computational linguistics was usually performed by computer scientists who had specialized in the application of computers to the processing of a natural language. Computational linguists often work as members of interdisciplinary teams, including linguists (specifically trained in linguistics), language experts (persons with some level of ability in the languages relevant to a given project), and computer scientists. In general, computational linguistics draws upon the involvement of linguists, computer scientists, experts in artificial intelligence, mathematicians, logicians, philosophers, cognitive scientists, cognitive psychologists, psycholinguists, anthropologists and neuroscientists, among others. Computational linguistics has theoretical and applied components, where theoretical computational linguistics takes up issues in theoretical linguistics and cognitive science, and applied computational linguistics focuses on the practical outcome of modeling human language use. Computational Network Toolkit(CNTK) CNTK (http://www.cntk.ai ), the Computational Network Toolkit by Microsoft Research, is a unified deep-learning toolkit that describes neural networks as a series of computational steps via a directed graph. In this directed graph, leaf nodes represent input values or network parameters, while other nodes represent matrix operations upon their inputs. CNTK allows to easily realize and combine popular model types such as feed-forward DNNs, convolutional nets (CNNs), and recurrent networks (RNNs/LSTMs). It implements stochastic gradient descent (SGD, error backpropagation) learning with automatic differentiation and parallelization across multiple GPUs and servers. CNTK has been available under an open-source license since April 2015. It is our hope that the community will take advantage of CNTK to share ideas more quickly through the exchange of open source working code. Computational Productive Laziness(CPL) In artificial intelligence (AI) mediated workforce management systems (e.g., crowdsourcing), long-term success depends on workers accomplishing tasks productively and resting well. This dual objective can be summarized by the concept of productive laziness. Existing scheduling approaches mostly focus on efficiency but overlook worker wellbeing through proper rest. In order to enable workforce management systems to follow the IEEE Ethically Aligned Design guidelines to prioritize worker wellbeing, we propose a distributed Computational Productive Laziness (CPL) approach in this paper. It intelligently recommends personalized work-rest schedules based on local data concerning a worker’s capabilities and situational factors to incorporate opportunistic resting and achieve superlinear collective productivity without the need for explicit coordination messages. Extensive experiments based on a real-world dataset of over 5,000 workers demonstrate that CPL enables workers to spend 70% of the effort to complete 90% of the tasks on average, providing more ethically aligned scheduling than existing approaches. Computational Theory of Mind In philosophy, a computational theory of mind names a view that the human mind or the human brain (or both) is an information processing system and that thinking is a form of computing. The theory was proposed in its modern form by Hilary Putnam in 1961, and developed by the MIT philosopher and cognitive scientist (and Putnam’s PhD student) Jerry Fodor in the 1960s, 1970s and 1980s. Despite being vigorously disputed in analytic philosophy in the 1990s (due to work by Putnam himself, John Searle, and others), the view is common in modern cognitive psychology and is presumed by many theorists of evolutionary psychology; in the 2000s and 2010s the view has resurfaced in analytic philosophy (Scheutz 2003, Edelman 2008). The computational theory of mind holds that the mind is a computation that arises from the brain acting as a computing machine. The theory can be elaborated in many ways, the most popular of which is that the brain is a computer and the mind is the result of the program that the brain runs. A program is the finite description of an algorithm or effective procedure, which prescribes a deterministic sequence of discrete actions that produces outputs based only on inputs and the internal states (memory) of the computing machine. For any admissible input, algorithms terminate in a finite number of steps. So the computational theory of mind is the claim that the mind is a computation of a machine (the brain) that derives output representations of the world from input representations and internal memory in a deterministic (non-random) way that is consistent with the theory of computation. Computational theories of mind are often said to require mental representation because ‘input’ into a computation comes in the form of symbols or representations of other objects. A computer cannot compute an actual object, but must interpret and represent the object in some form and then compute the representation. The computational theory of mind is related to the representational theory of mind in that they both require that mental states are representations. However the two theories differ in that the representational theory claims that all mental states are representations while the computational theory leaves open that certain mental states, such as pain or depression, may not be representational and therefore may not be suitable for a computational treatment. These non-representational mental states are known as qualia. In Fodor’s original views, the computational theory of mind is also related to the language of thought. The language of thought theory allows the mind to process more complex representations with the help of semantics. Computer Aided Diagnosis In radiology, computer-aided detection (CADe), also called computer-aided diagnosis (CADx), are procedures in medicine that assist doctors in the interpretation of medical images. Imaging techniques in X-ray, MRI, and Ultrasound diagnostics yield a great deal of information, which the radiologist has to analyze and evaluate comprehensively in a short time. CAD systems help scan digital images, e.g. from computed tomography, for typical appearances and to highlight conspicuous sections, such as possible diseases. Computer Assisted/Aided Qualitative Data Analysis Software(CAQDAS) Computer Assisted/Aided Qualitative Data Analysis Software (CAQDAS) offers tools that assist with qualitative research such as transcription analysis, coding and text interpretation, recursive abstraction, content analysis, discourse analysis, grounded theory methodology, etc. Computer Science Computer science is the scientific and practical approach to computation and its applications. It is the systematic study of the feasibility, structure, expression, and mechanization of the methodical procedures (or algorithms) that underlie the acquisition, representation, processing, storage, communication of, and access to information, whether such information is encoded as bits in a computer memory or transcribed in genes and protein structures in a biological cell. An alternate, more succinct definition of computer science is the study of automating algorithmic processes that scale. A computer scientist specializes in the theory of computation and the design of computational systems. Its subfields can be divided into a variety of theoretical and practical disciplines. Some fields, such as computational complexity theory (which explores the fundamental properties of computational and intractable problems), are highly abstract, while fields such as computer graphics emphasize real-world visual applications. Still other fields focus on the challenges in implementing computation. For example, programming language theory considers various approaches to the description of computation, while the study of computer programming itself investigates various aspects of the use of programming language and complex systems. Human-computer interaction considers the challenges in making computers and computations useful, usable, and universally accessible to humans. Computer Vision(CV) Computer vision is a field that includes methods for acquiring, processing, analyzing, and understanding images and, in general, high-dimensional data from the real world in order to produce numerical or symbolic information, e.g., in the forms of decisions. A theme in the development of this field has been to duplicate the abilities of human vision by electronically perceiving and understanding an image. This image understanding can be seen as the disentangling of symbolic information from image data using models constructed with the aid of geometry, physics, statistics, and learning theory. Computer vision has also been described as the enterprise of automating and integrating a wide range of processes and representations for vision perception. Computer vision is the automatic analysis of images and videos by computers in order to gain some understanding of the world. Computer vision is inspired by the capabilities of the human vision system and, when initially addressed in the 1960s and 1970s, it was thought to be a relatively straightforward problem to solve. However, the reason we think/thought that vision is easy is that we have our own visual system which makes the task seem intuitive to our conscious minds. In fact, the human visual system is very complex and even the estimates of how much of the brain is involved with visual processing vary from 25% up to more than 50%. Computer-Assisted Fraud Detection The automatic detection of frauds in banking transactions has been recently studied as a way to help the analysts finding fraudulent operations. Due to the availability of a human feedback, this task has been studied in the framework of active learning: the fraud predictor is allowed to sequentially call on an oracle. This human intervention is used to label new examples and improve the classification accuracy of the latter. Such a setting is not adapted in the case of fraud detection with financial data in European countries. Actually, as a human verification is mandatory to consider a fraud as really detected, it is not necessary to focus on improving the classifier. We introduce the setting of ‘Computer-assisted fraud detection’ where the goal is to minimize the number of non fraudulent operations submitted to an oracle. The existing methods are applied to this task and we show that a simple meta-algorithm provides competitive results in this scenario on benchmark datasets. Concentrated Differential Privacy(CDP) Deep learning techniques based on neural networks have shown significant success in a wide range of AI tasks. Large-scale training datasets are one of the critical factors for their success. However, when the training datasets are crowdsourced from individuals and contain sensitive information, the model parameters may encode private information and bear the risks of privacy leakage. The recent growing trend of the sharing and publishing of pre-trained models further aggravates such privacy risks. To tackle this problem, we propose a differentially private approach for training neural networks. Our approach includes several new techniques for optimizing both privacy loss and model accuracy. We employ a generalization of differential privacy called concentrated differential privacy (CDP), with both a formal and refined privacy loss analysis on two different data batching methods. We implement a dynamic privacy budget allocator over the course of training to improve model accuracy. Extensive experiments demonstrate that our approach effectively improves privacy loss accounting, training efficiency and model quality under a given privacy budget. Concentration Free Outlier Factor We present a novel notion of outlier, called the Concentration Free Outlier Factor, or CFOF. As a main contribution, we formalize the notion of concentration of outlier scores and theoretically prove that CFOF does not concentrate in the Euclidean space for any arbitrary large dimensionality. To the best of our knowledge, there are no other proposals of data analysis measures related to the Euclidean distance for which it has been provided theoretical evidence that they are immune to the concentration effect. We determine the closed form of the distribution of CFOF scores in arbitrarily large dimensionalities and show that the CFOF score of a point depends on its squared norm standard score and on the kurtosis of the data distribution, thus providing a clear and statistically founded characterization of this notion. Moreover, we leverage this closed form to provide evidence that the definition does not suffer of the hubness problem affecting other measures. We prove that the number of CFOF outliers coming from each cluster is proportional to cluster size and kurtosis, a property that we call semi-locality. We determine that semi-locality characterizes existing reverse nearest neighbor-based outlier definitions, thus clarifying the exact nature of their observed local behavior. We also formally prove that classical distance-based and density-based outliers concentrate both for bounded and unbounded sample sizes and for fixed and variable values of the neighborhood parameter. We introduce the fast-CFOF algorithm for detecting outliers in large high-dimensional dataset. The algorithm has linear cost, supports multi-resolution analysis, and is embarrassingly parallel. Experiments highlight that the technique is able to efficiently process huge datasets and to deal even with large values of the neighborhood parameter, to avoid concentration, and to obtain excellent accuracy. Concept Drift In predictive analytics and machine learning, the concept drift means that the statistical properties of the target variable, which the model is trying to predict, change over time in unforeseen ways. This causes problems because the predictions become less accurate as time passes. The term concept refers to the quantity to be predicted. More generally, it can also refer to other phenomena of interest besides the target concept, such as an input, but, in the context of concept drift, the term commonly refers to the target variable. Concept Interaction Graph Identifying the relationship between two text objects is a core research problem underlying many natural language processing tasks. A wide range of deep learning schemes have been proposed for text matching, mainly focusing on sentence matching, question answering or query document matching. We point out that existing approaches do not perform well at matching long documents, which is critical, for example, to AI-based news article understanding and event or story formation. The reason is that these methods either omit or fail to fully utilize complicated semantic structures in long documents. In this paper, we propose a graph approach to text matching, especially targeting long document matching, such as identifying whether two news articles report the same event in the real world, possibly with different narratives. We propose the Concept Interaction Graph to yield a graph representation for a document, with vertices representing different concepts, each being one or a group of coherent keywords in the document, and with edges representing the interactions between different concepts, connected by sentences in the document. Based on the graph representation of document pairs, we further propose a Siamese Encoded Graph Convolutional Network that learns vertex representations through a Siamese neural network and aggregates the vertex features though Graph Convolutional Networks to generate the matching result. Extensive evaluation of the proposed approach based on two labeled news article datasets created at Tencent for its intelligent news products show that the proposed graph approach to long document matching significantly outperforms a wide range of state-of-the-art methods. Concept Learning Concept learning, also known as category learning, concept attainment, and concept formation, is defined by Bruner, Goodnow, & Austin (1967) as ‘the search for and listing of attributes that can be used to distinguish exemplars from non exemplars of various categories’. More simply put, concepts are the mental categories that help us classify objects, events, or ideas, building on the understanding that each object, event, or idea has a set of common relevant features. Thus, concept learning is a strategy which requires a learner to compare and contrast groups or categories that contain concept-relevant features with groups or categories that do not contain concept-relevant features. Concept learning also refers to a learning task in which a human or machine learner is trained to classify objects by being shown a set of example objects along with their class labels. The learner simplifies what has been observed by condensing it in the form of an example. This simplified version of what has been learned is then applied to future examples. Concept learning may be simple or complex because learning takes place over many areas. When a concept is difficult, it is less likely that the learner will be able to simplify, and therefore will be less likely to learn. Colloquially, the task is known as learning from examples. Most theories of concept learning are based on the storage of exemplars and avoid summarization or overt abstraction of any kind. Concept Mask Existing works on semantic segmentation typically consider a small number of labels, ranging from tens to a few hundreds. With a large number of labels, training and evaluation of such task become extremely challenging due to correlation between labels and lack of datasets with complete annotations. We formulate semantic segmentation as a problem of image segmentation given a semantic concept, and propose a novel system which can potentially handle an unlimited number of concepts, including objects, parts, stuff, and attributes. We achieve this using a weakly and semi-supervised framework leveraging multiple datasets with different levels of supervision. We first train a deep neural network on a 6M stock image dataset with only image-level labels to learn visual-semantic embedding on 18K concepts. Then, we refine and extend the embedding network to predict an attention map, using a curated dataset with bounding box annotations on 750 concepts. Finally, we train an attention-driven class agnostic segmentation network using an 80-category fully annotated dataset. We perform extensive experiments to validate that the proposed system performs competitively to the state of the art on fully supervised concepts, and is capable of producing accurate segmentations for weakly learned and unseen concepts. Concept Mining Concept mining is an activity that results in the extraction of concepts from artifacts. Solutions to the task typically involve aspects of artificial intelligence and statistics, such as data mining and text mining. Because artifacts are typically a loosely structured sequence of words and other symbols (rather than concepts), the problem is nontrivial, but it can provide powerful insights into the meaning, provenance and similarity of documents. Concept2vec Although there is an emerging trend towards generating embeddings for primarily unstructured data, and recently for structured data, there is not yet any systematic suite for measuring the quality of embeddings. This deficiency is further sensed with respect to embeddings generated for structured data because there are no concrete evaluation metrics measuring the quality of encoded structure as well as semantic patterns in the embedding space. In this paper, we introduce a framework containing three distinct tasks concerned with the individual aspects of ontological concepts: (i) the categorization aspect, (ii) the hierarchical aspect, and (iii) the relational aspect. Then, in the scope of each task, a number of intrinsic metrics are proposed for evaluating the quality of the embeddings. Furthermore, w.r.t. this framework multiple experimental studies were run to compare the quality of the available embedding models. Employing this framework in future research can reduce misjudgment and provide greater insight about quality comparisons of embeddings for ontological concepts. Concept-Cognitive Learning(CCL) Concept-cognitive learning (CCL) is a hot topic in recent years, and it has attracted much attention from the communities of formal concept analysis, granular computing and cognitive computing. However, the relationship among cognitive computing (CC), conceptcognitive computing (CCC), and CCL is not clearly described. To this end, we explain the relationship of CC, CCC, and CCL. Then, we propose a generalized CCL from the point of view of machine learning. Finally, experiments on seven data sets are conducted to evaluate concept formation and concept-cognitive processes of the proposed generalized CCL. Concept-Oriented Deep Learning(CODL) Concepts are the foundation of human deep learning, understanding, and knowledge integration and transfer. We propose concept-oriented deep learning (CODL) which extends (machine) deep learning with concept representations and conceptual understanding capability. CODL addresses some of the major limitations of deep learning: interpretability, transferability, contextual adaptation, and requirement for lots of labeled training data. We discuss the major aspects of CODL including concept graph, concept representations, concept exemplars, and concept representation learning systems supporting incremental and continual learning. Conceptual Clustering Conceptual clustering is a machine learning paradigm for unsupervised classification developed mainly during the 1980s. It is distinguished from ordinary data clustering by generating a concept description for each generated class. Most conceptual clustering methods are capable of generating hierarchical category structures; see Categorization for more information on hierarchy. Conceptual clustering is closely related to formal concept analysis, decision tree learning, and mixture model learning. http://…/eswc2008-PAM.pdf Conceptual Expansion Problems with few examples of a new class of objects prove challenging to most classifiers. One solution to is to reuse existing data through transfer methods such as one-shot learning or domain adaption. However these approaches require an explicit hand-authored or learned definition of how reuse can occur. We present an approach called conceptual expansion that learns how to reuse existing machine-learned knowledge when classifying new cases. We evaluate our approach by adding new classes of objects to the CIFAR-10 dataset and varying the number of available examples of these new classes. Conceptual Interoperability Constraint(COIN) Building meaningful interoperation with external software units requires performing the conceptual interoperability analysis that starts with identifying the conceptual interoperability constraints of each software unit, then it compares the systems’ constraints to detect their conceptual mismatch. We call the conceptual interoperability constraints (the COINs) that can be of different types including structure, dynamic, and quality. Missing such constraints may lead to unexpected mismatches, expensive resolution, and running-late projects. However, it is a challenging task for software architects and analysts to manually analyze the unstructured text in API documents to identify the COINs. Not only it is a tedious and time-consuming task, but also it needs knowledge about the constraint types. In this article, we present and evaluate our idea of utilizing machine learning techniques in automating the COIN identification, which is the first step of conceptual interoperability analysis, from human text in API documents. Our empirical research started with a multiple-case study to build the ground truth dataset, on which we contributed our machine learning COIN-Classification Model. We show the model’s robustness through experiments using different machine learning text-classification algorithms. The experiments’ results revealed that our model can achieve up to 87% accuracy in automatically identifying the COINs in text. Thus, we implemented a tool that embeds our model to demonstrate its practical value in industrial context. Then, we evaluated the practitioners’ acceptance for the tool and found that they significantly agreed on its usefulness and ease of use. Conceptual Knowledge Markup Language(CKML) Conceptual Knowledge Markup Language (CKML) is an application of XML. Earlier versions of CKML followed rather exclusively the philosophy of Conceptual Knowledge Processing (CKP), a principled approach to knowledge representation and data analysis that ‘advocates methods and instruments of conceptual knowledge processing which support people in their rational thinking, judgment and acting and promote critical discussion.’ The new version of CKML continues to follow this approach, but also incorporates various principles, insights and techniques from Information Flow (IF), the logical design of distributed systems. Among other things, this allows diverse communities of discourse to compare their own information structures, as coded in logical theories, with that of other communities that share a common generic ontology. CKML incorporates the CKP ideas of concept lattice and formal context, along with the IF ideas of classification (= formal context), infomorphism, theory, interpretation and local logic. Ontology Markup Language (OML), a subset of CKML that is a self-sufficient markup language in its own right, follows the principles and ideas of Conceptual Graphs (CG). OML is used for structuring the specifications and axiomatics of metadata into ontologies. OML incorporates the CG ideas of concept, conceptual relation, conceptual graph, conceptual context, participants and ontology. The link from OML to CKML is the process of conceptual scaling, which is the interpretive transformation of ontologically structured knowledge to conceptual structured knowledge. CONCODE Source code is rarely written in isolation. It depends significantly on the programmatic context, such as the class that the code would reside in. To study this phenomenon, we introduce the task of generating class member functions given English documentation and the programmatic context provided by the rest of the class. This task is challenging because the desired code can vary greatly depending on the functionality the class provides (e.g., a sort function may or may not be available when we are asked to ‘return the smallest element’ in a particular member variable list). We introduce CONCODE, a new large dataset with over 100,000 examples consisting of Java classes from online code repositories, and develop a new encoder-decoder architecture that models the interaction between the method documentation and the class environment. We also present a detailed error analysis suggesting that there is significant room for future work on this task. Concolic Testing Concolic testing alternates between CONCrete program execution and symbOLIC analysis to explore the execution paths of a software program and to increase code coverage. In this paper, we develop the first concolic testing approach for Deep Neural Networks (DNNs). More specifically, we utilise quantified linear arithmetic over rationals to express test requirements that have been studied in the literature, and then develop a coherent method to perform concolic testing with the aim of better coverage. Our experimental results show the effectiveness of the concolic testing approach in both achieving high coverage and finding adversarial examples. Concordance Correlation Coefficient In statistics, the concordance correlation coefficient measures the agreement between two variables, e.g., to evaluate reproducibility or for inter-rater reliability. Concrete Autoencoder We introduce the concrete autoencoder, an end-to-end differentiable method for global feature selection, which efficiently identifies a subset of the most informative features and simultaneously learns a neural network to reconstruct the input data from the selected features. Our method is unsupervised, and is based on using a concrete selector layer as the encoder and using a standard neural network as the decoder. During the training phase, the temperature of the concrete selector layer is gradually decreased, which encourages a user-specified number of discrete features to be learned. During test time, the selected features can be used with the decoder network to reconstruct the remaining input features. We evaluate concrete autoencoders on a variety of datasets, where they significantly outperform state-of-the-art methods for feature selection and data reconstruction. In particular, on a large-scale gene expression dataset, the concrete autoencoder selects a small subset of genes whose expression levels can be use to impute the expression levels of the remaining genes. In doing so, it improves on the current widely-used expert-curated L1000 landmark genes, potentially reducing measurement costs by 20%. The concrete autoencoder can be implemented by adding just a few lines of code to a standard autoencoder. Concurrent Meta-Reinforcement Learning(CMRL) State-of-the-art meta reinforcement learning algorithms typically assume the setting of a single agent interacting with its environment in a sequential manner. A negative side-effect of this sequential execution paradigm is that, as the environment becomes more and more challenging, and thus requiring more interaction episodes for the meta-learner, it needs the agent to reason over longer and longer time-scales. To combat the difficulty of long time-scale credit assignment, we propose an alternative parallel framework, which we name ‘Concurrent Meta-Reinforcement Learning’ (CMRL), that transforms the temporal credit assignment problem into a multi-agent reinforcement learning one. In this multi-agent setting, a set of parallel agents are executed in the same environment and each of these ‘rollout’ agents are given the means to communicate with each other. The goal of the communication is to coordinate, in a collaborative manner, the most efficient exploration of the shared task the agents are currently assigned. This coordination therefore represents the meta-learning aspect of the framework, as each agent can be assigned or assign itself a particular section of the current task’s state space. This framework is in contrast to standard RL methods that assume that each parallel rollout occurs independently, which can potentially waste computation if many of the rollouts end up sampling the same part of the state space. Furthermore, the parallel setting enables us to define several reward sharing functions and auxiliary losses that are non-trivial to apply in the sequential setting. We demonstrate the effectiveness of our proposed CMRL at improving over sequential methods in a variety of challenging tasks. CoNDA Domain classification is the task of mapping spoken language utterances to one of the natural language understanding domains in intelligent personal digital assistants (IPDAs). This is a major component in mainstream IPDAs in industry. Apart from official domains, thousands of third-party domains are also created by external developers to enhance the capability of IPDAs. As more domains are developed rapidly, the question of how to continuously accommodate the new domains still remains challenging. Moreover, existing continual learning approaches do not address the problem of incorporating personalized information dynamically for better domain classification. In this paper, we propose CoNDA, a neural network based approach for domain classification that supports incremental learning of new classes. Empirical evaluation shows that CoNDA achieves high accuracy and outperforms baselines by a large margin on both incrementally added new domains and existing domains. Condensed Representation Correlated pattern mining has increasingly become an important task in data mining since these patterns allow conveying knowledge about meaningful and surprising relations among data. Frequent correlated patterns were thoroughly studied in the literature. In this thesis, we propose to benefit from both frequent correlated as well as rare correlated patterns according to the bond correlation measure. We propose to extract a subset without information loss of the sets of frequent correlated and of rare correlated patterns, this subset is called “Condensed Representation“. In this regard, we are based on the notions derived from the Formal Concept Analysis FCA, specifically the equivalence classes associated to a closure operator fbond dedicated to the bond measure, to introduce new concise representations of both frequent correlated and rare correlated patterns. CondenseNet Deep neural networks are increasingly used on mobile devices, where computational resources are limited. In this paper we develop CondenseNet, a novel network architecture with unprecedented efficiency. It combines dense connectivity between layers with a mechanism to remove unused connections. The dense connectivity facilitates feature re-use in the network, whereas learned group convolutions remove connections between layers for which this feature re-use is superfluous. At test time, our model can be implemented using standard grouped convolutions – allowing for efficient computation in practice. Our experiments demonstrate that CondenseNets are much more efficient than stateof-the-art compact convolutional networks such as MobileNets and ShuffleNets. Condition Monitoring(CM) Condition monitoring (or, colloquially, CM) is the process of monitoring a parameter of condition in machinery (vibration, temperature etc.), in order to identify a significant change which is indicative of a developing fault. It is a major component of ➘ “Predictive Maintenance”. The use of condition monitoring allows maintenance to be scheduled, or other actions to be taken to prevent failure and avoid its consequences. Condition monitoring has a unique benefit in that conditions that would shorten normal lifespan can be addressed before they develop into a major failure. Condition monitoring techniques are normally used on rotating equipment and other machinery (pumps, electric motors, internal combustion engines, presses), while periodic inspection using non-destructive testing techniques and fit for service (FFS) evaluation are used for stationary plant equipment such as steam boilers, piping and heat exchangers. http://…/9781466584051 Conditional Autoregressive Model(CAR) The essential idea here is that the probability of values estimated at any given location are conditional on the level of neighboring values. mclcar Conditional BEKK matrix-F We propose a new Conditional BEKK matrix-F (CBF) model for the time-varying realized covariance (RCOV) matrices. This CBF model is capable of capturing heavy-tailed RCOV, which is an important stylized fact but could not be handled adequately by the Wishart-based models. To further mimic the long memory feature of the RCOV, a special CBF model with the conditional heterogeneous autoregressive (HAR) structure is introduced. Moreover, we give a systematical study on the probabilistic properties and statistical inferences of the CBF model, including exploring its stationarity, establishing the asymptotics of its maximum likelihood estimator, and giving some new inner-product-based tests for its model checking. In order to handle a large dimensional RCOV matrix, we construct two reduced CBF models — the variance-target CBF model (for moderate but fixed dimensional RCOV matrix) and the factor CBF model (for high dimensional RCOV matrix). For both reduced models, the asymptotic theory of the estimated parameters is derived. The importance of our entire methodology is illustrated by simulation results and two real examples. COnditional COordinate GAN(COCO-GAN) Humans can only interact with part of the surrounding environment due to biological restrictions. Therefore, we learn to reason the spatial relationships across a series of observations to piece together the surrounding environment. Inspired by such behavior and the fact that machines also have computational constraints, we propose \underline{CO}nditional \underline{CO}ordinate GAN (COCO-GAN) of which the generator generates images by parts based on their spatial coordinates as the condition. On the other hand, the discriminator learns to justify realism across multiple assembled patches by global coherence, local appearance, and edge-crossing continuity. Despite the full images are never generated during training, we show that COCO-GAN can produce \textbf{state-of-the-art-quality} full images during inference. We further demonstrate a variety of novel applications enabled by teaching the network to be aware of coordinates. First, we perform extrapolation to the learned coordinate manifold and generate off-the-boundary patches. Combining with the originally generated full image, COCO-GAN can produce images that are larger than training samples, which we called ‘beyond-boundary generation’. We then showcase panorama generation within a cylindrical coordinate system that inherently preserves horizontally cyclic topology. On the computation side, COCO-GAN has a built-in divide-and-conquer paradigm that reduces memory requisition during training and inference, provides high-parallelism, and can generate parts of images on-demand. Conditional Extreme Value Models Extreme value theory (EVT) is often used to model environmental, financial and internet traffic data. Multivariate EVT assumes a multivariate domain of attraction condition for the distribution of a random vector necessitating that each component satisfy a marginal domain of attraction condition. Heffernan and Tawn [2004] and Heffernan and Resnick [2007] developed an approximation to the joint distribution of the random vector by conditioning on one of the components being in an extreme value domain. The usual method of analysis using multivariate extreme value theory often is not helpful either because of asymptotic independence or due to one component of the observation vector not being in a domain of attraction. These defects can be addressed by using the conditional extreme value model. Conditional Fiducial Model The fiducial is not unique in general, but we prove that in a restricted class of models it is uniquely determined by the sampling distribution of the data. It depends in particular not on the choice of a data generating model. The arguments lead to a generalization of the classical formula found by Fisher (1930). The restricted class includes cases with discrete distributions, the case of the shape parameter in the Gamma distribution, and also the case of the correlation coefficient in a bivariate Gaussian model. One of the examples can also be used in a pedagogical context to demonstrate possible difficulties with likelihood-, Bayesian-, and bootstrap-inference. Examples that demonstrate non-uniqueness are also presented. It is explained that they can be seen as cases with restrictions on the parameter space. Motivated by this the concept of a conditional fiducial model is introduced. This class of models includes the common case of iid samples from a one-parameter model investigated by Hannig (2013), the structural group models investigated by Fraser (1968), and also certain models discussed by Fisher (1973) in his final writing on the subject. Conditional Gradient Augmented Lagrangian(CGAL) We propose a new conditional gradient (CG) type method that processes the augmented Lagrangian (AL) function for smooth minimization problems with linear inclusion constraints over a compact domain. The proposed method (CGAL: conditional gradient augmented Lagrangian) maintains favorable properties of the classical conditional gradient method, including cheap linear minimization oracle calls as well as low-dimensional updates, while being still applicable to key semidefinite programming problems where the existing conditional gradient methods do not apply or are too slow. Under the strong duality assumption, we prove that the new method converges with $\tilde{\mathcal{O}}(1/\sqrt{k})$ rate both in the primal objective residual and feasibility gap. We then provide step-size rules that significantly enhance the empirical convergence while retaining the rates. Our numerical results with semidefinite programming examples indicate the algorithm exhibits ${\mathcal{O}}(1/{k})$ and even linear rates in special cases in practice. Conditional Graph Neural Process(CGNP) We introduce a novel encoder-decoder architecture to embed functional processes into latent vector spaces. This embedding can then be decoded to sample the encoded functions over any arbitrary domain. This autoencoder generalizes the recently introduced Conditional Neural Process (CNP) model of random processes. Our architecture employs the latest advances in graph neural networks to process irregularly sampled functions. Thus, we refer to our model as Conditional Graph Neural Process (CGNP). Graph neural networks can effectively exploit local’ structures of the metric spaces over which the functions/processes are defined. The contributions of this paper are twofold: (i) a novel graph-based encoder-decoder architecture for functional and process embeddings, and (ii) a demonstration of the importance of using the structure of metric spaces for this type of representations. Conditional Information Gain Network Deep neural network models owe their representational power to the high number of learnable parameters. It is often infeasible to run these largely parametrized deep models in limited resource environments, like mobile phones. Network models employing conditional computing are able to reduce computational requirements while achieving high representational power, with their ability to model hierarchies. We propose Conditional Information Gain Networks, which allow the feed forward deep neural networks to execute conditionally, skipping parts of the model based on the sample and the decision mechanisms inserted in the architecture. These decision mechanisms are trained using cost functions based on differentiable Information Gain, inspired by the training procedures of decision trees. These information gain based decision mechanisms are differentiable and can be trained end-to-end using a unified framework with a general cost function, covering both classification and decision losses. We test the effectiveness of the proposed method on MNIST and recently introduced Fashion MNIST datasets and show that our information gain based conditional execution approach can achieve better or comparable classification results using significantly fewer parameters, compared to standard convolutional neural network baselines. Conditional Linear Regression Work in machine learning and statistics commonly focuses on building models that capture the vast majority of data, possibly ignoring a segment of the population as outliers. However, there does not often exist a good model on the whole dataset, so we seek to find a small subset where there exists a useful model. We are interested in finding a linear rule capable of achieving more accurate predictions for just a segment of the population. We give an efficient algorithm with theoretical analysis for the conditional linear regression task, which is the joint task of identifying a significant segment of the population, described by a k-DNF, along with its linear regression fit. Conditional Maximum-Entropy Method(C-MaxEnt) Bayesian approach to extreme-value statistics based on conditional maximum-entropy method Conditional Neural Process(CNP) Deep neural networks excel at function approximation, yet they are typically trained from scratch for each new function. On the other hand, Bayesian methods, such as Gaussian Processes (GPs), exploit prior knowledge to quickly infer the shape of a new function at test time. Yet GPs are computationally expensive, and it can be hard to design appropriate priors. In this paper we propose a family of neural models, Conditional Neural Processes (CNPs), that combine the benefits of both. CNPs are inspired by the flexibility of stochastic processes such as GPs, but are structured as neural networks and trained via gradient descent. CNPs make accurate predictions after observing only a handful of training data points, yet scale to complex functions and large datasets. We demonstrate the performance and versatility of the approach on a range of canonical machine learning tasks, including regression, classification and image completion. Conditional Power(CP) Conditional power (CP) is the probability that the final study result will be statistically significant, given the data observed thus far and a specific assumption about the pattern of the data to be observed in the remainder of the study, such as assuming the original design effect, or the effect estimated from the current data, or under the null hypothesis. In many clinical trials, a CP computation at a pre-specified point in the study, such as mid-way, is used as the basis for early termination for futility when there is little evidence of a beneficial effect. Conditional Predictive Impact(CPI) We propose a general test of conditional independence. The conditional predictive impact (CPI) is a provably consistent and unbiased estimator of one or several features’ association with a given outcome, conditional on a (potentially empty) reduced feature set. The measure can be calculated using any supervised learning algorithm and loss function. It relies on no parametric assumptions and applies equally well to continuous and categorical predictors and outcomes. The CPI can be efficiently computed for low- or high-dimensional data without any sparsity constraints. We illustrate PAC-Bayesian convergence rates for the CPI and develop statistical inference procedures for evaluating its magnitude, significance, and precision. These tests aid in feature and model selection, extending traditional frequentist and Bayesian techniques to general supervised learning tasks. The CPI may also be used in conjunction with causal discovery algorithms to identify underlying graph structures for multivariate systems. We test our method in conjunction with various algorithms, including linear regression, neural networks, random forests, and support vector machines. Empirical results show that the CPI compares favorably to alternative variable importance measures and other nonparametric tests of conditional independence on a diverse array of real and simulated datasets. Simulations confirm that our inference procedures successfully control Type I error and achieve nominal coverage probability. Our method has been implemented in an R package, cpi, which can be downloaded from https://…/cpi. Conditional Preference Network(CP-net) Interactive Learning of Acyclic Conditional Preference Networks Conditional Progressive Growing of GAN(CPGGAN) Accurate computer-assisted diagnosis can alleviate the risk of overlooking the diagnosis in a clinical environment. Towards this, as a Data Augmentation (DA) technique, Generative Adversarial Networks (GANs) can synthesize additional training data to handle small/fragmented medical images from various scanners; those images are realistic but completely different from the original ones, filling the data lack in the real image distribution. However, we cannot easily use them to locate the position of disease areas, considering expert physicians’ annotation as time-expensive tasks. Therefore, this paper proposes Conditional Progressive Growing of GANs (CPGGANs), incorporating bounding box conditions into PGGANs to place brain metastases at desired position/size on 256 x 256 Magnetic Resonance (MR) images, for Convolutional Neural Network-based tumor detection; this first GAN-based medical DA using automatic bounding box annotation improves the robustness during training. The results show that CPGGAN-based DA can boost 10% sensitivity in diagnosis with an acceptable amount of additional False Positives—even with physicians’ highly-rough and inconsistent bounding box annotation. Surprisingly, further realistic tumor appearance, achieved with additional normal brain MR images for CPGGAN training, does not contribute to detection performance, while even three expert physicians cannot accurately distinguish them from the real ones in Visual Turing Test. Conditional Random Field(CRF) Conditional random fields (CRFs) are a class of statistical modelling method often applied in pattern recognition and machine learning, where they are used for structured prediction. Whereas an ordinary classifier predicts a label for a single sample without regard to ‘neighboring’ samples, a CRF can take context into account; e.g., the linear chain CRF popular in natural language processing predicts sequences of labels for sequences of input samples. CRFs are a type of discriminative undirected probabilistic graphical model. It is used to encode known relationships between observations and construct consistent interpretations. It is often used for labeling or parsing of sequential data, such as natural language text or biological sequences and in computer vision. Specifically, CRFs find applications in shallow parsing, named entity recognition and gene finding, among other tasks, being an alternative to the related hidden Markov models (HMMs). In computer vision, CRFs are often used for object recognition and image segmentation. Complete tutorial on Text Classification using Conditional Random Fields Model Conditional Random Fields as Recurrent Neural Networks(CRF-RNN) Pixel-level labelling tasks, such as semantic segmentation, play a central role in image understanding. Recent approaches have attempted to harness the capabilities of deep learning techniques for image recognition to tackle pixel-level labelling tasks. One central issue in this methodology is the limited capacity of deep learning techniques to delineate visual objects. To solve this problem, we introduce a new form of convolutional neural network that combines the strengths of Convolutional Neural Networks (CNNs) and Conditional Random Fields (CRFs)-based probabilistic graphical modelling. To this end, we formulate Conditional Random Fields as Recurrent Neural Networks. This network, called CRF-RNN, is then plugged in as a part of a CNN to obtain a deep network that has desirable properties of both CNNs and CRFs. Importantly, our system fully integrates CRF modelling with CNNs, making it possible to train the whole deep network end-to-end with the usual back-propagation algorithm, avoiding offline post-processing methods for object delineation. GitXiv Conditional Subspace VAE(CSVAE) Variational autoencoders (VAEs) are widely used deep generative models capable of learning unsupervised latent representations of data. Such representations are often difficult to interpret or control. We consider the problem of unsupervised learning of features correlated to specific labels in a dataset. We propose a VAE-based generative model which we show is capable of extracting features correlated to binary labels in the data and structuring it in a latent subspace which is easy to interpret. Our model, the Conditional Subspace VAE (CSVAE), uses mutual information minimization to learn a low-dimensional latent subspace associated with each label that can easily be inspected and independently manipulated. We demonstrate the utility of the learned representations for attribute manipulation tasks on both the Toronto Face and CelebA datasets. Conditional Teacher-Student Learning The teacher-student (T/S) learning has been shown to be effective for a variety of problems such as domain adaptation and model compression. One shortcoming of the T/S learning is that a teacher model, not always perfect, sporadically produces wrong guidance in form of posterior probabilities that misleads the student model towards a suboptimal performance. To overcome this problem, we propose a conditional T/S learning scheme, in which a ‘smart’ student model selectively chooses to learn from either the teacher model or the ground truth labels conditioned on whether the teacher can correctly predict the ground truth. Unlike a naive linear combination of the two knowledge sources, the conditional learning is exclusively engaged with the teacher model when the teacher model’s prediction is correct, and otherwise backs off to the ground truth. Thus, the student model is able to learn effectively from the teacher and even potentially surpass the teacher. We examine the proposed learning scheme on two tasks: domain adaptation on CHiME-3 dataset and speaker adaptation on Microsoft short message dictation dataset. The proposed method achieves 9.8% and 12.8% relative word error rate reductions, respectively, over T/S learning for environment adaptation and speaker-independent model for speaker adaptation. Conditional t-SNE(ct-SNE) Dimensionality reduction and manifold learning methods such as t-Distributed Stochastic Neighbor Embedding (t-SNE) are routinely used to map high-dimensional data into a 2-dimensional space to visualize and explore the data. However, two dimensions are typically insufficient to capture all structure in the data, the salient structure is often already known, and it is not obvious how to extract the remaining information in a similarly effective manner. To fill this gap, we introduce \emph{conditional t-SNE} (ct-SNE), a generalization of t-SNE that discounts prior information from the embedding in the form of labels. To achieve this, we propose a conditioned version of the t-SNE objective, obtaining a single, integrated, and elegant method. ct-SNE has one extra parameter over t-SNE; we investigate its effects and show how to efficiently optimize the objective. Factoring out prior knowledge allows complementary structure to be captured in the embedding, providing new insights. Qualitative and quantitative empirical results on synthetic and (large) real data show ct-SNE is effective and achieves its goal. Conditional WaveGAN(cWaveGAN) Generative models are successfully used for image synthesis in the recent years. But when it comes to other modalities like audio, text etc little progress has been made. Recent works focus on generating audio from a generative model in an unsupervised setting. We explore the possibility of using generative models conditioned on class labels. Concatenation based conditioning and conditional scaling were explored in this work with various hyper-parameter tuning methods. In this paper we introduce Conditional WaveGANs (cWaveGAN). Find our implementation at https://…/cwavegan Conditionally Autoregressive Hidden Markov Model(CarHMM) One of the central interests of animal movement ecology is relating movement characteristics to behavioural characteristics. The traditional discrete-time statistical tool for inferring unobserved behaviours from movement data is the hidden Markov model (HMM). While the HMM is an important and powerful tool, sometimes it is not flexible enough to appropriately fit the data. Data for marine animals often exhibit conditional autocorrelation, self-dependence of the step length process which cannot be explained solely by the behavioural state, which violates one of the main assumptions of the HMM. Using a grey seal track as an example, along with multiple simulation scenarios, we motivate and develop the conditionally autoregressive hidden Markov model (CarHMM), which is a generalization of the HMM designed specifically to handle conditional autocorrelation. In addition to introducing and examining the new CarHMM, we provide guidelines for all stages of an analysis using either an HMM or CarHMM. These include guidelines for pre-processing location data to obtain deflection angles and step lengths, model selection, and model checking. In addition to these practical guidelines, we link estimated model parameters to biologically meaningful quantities such as activity budget and residency time. We also provide interpretations of traditional ‘foraging’ and ‘transiting’ behaviours in the context of the new CarHMM parameters. Condition-Based Maintenance(CBM) Condition-based maintenance (CBM), shortly described, is maintenance when need arises. This maintenance is performed after one or more indicators show that equipment is going to fail or that equipment performance is deteriorating. This concept is applicable to mission critical systems that incorporate active redundancy and fault reporting. It is also applicable to non-mission critical systems that lack redundancy and fault reporting. Condition-based maintenance was introduced to try to maintain the correct equipment at the right time. CBM is based on using real-time data to prioritize and optimize maintenance resources. Observing the state of the system is known as condition monitoring. Such a system will determine the equipment’s health, and act only when maintenance is actually necessary. Developments in recent years have allowed extensive instrumentation of equipment, and together with better tools for analyzing condition data, the maintenance personnel of today are more than ever able to decide what is the right time to perform maintenance on some piece of equipment. Ideally condition-based maintenance will allow the maintenance personnel to do only the right things, minimizing spare parts cost, system downtime and time spent on maintenance. http://…/3313ijmnct03.pdf CONESTA(CONESTA) High-dimensional prediction models are increasingly used to analyze biological data such as neuroimaging of genetic data sets. However, classical penalized algorithms yield to dense solutions that are difficult to interpret without arbitrary thresholding. Alternatives based on sparsity-inducing penalties suffer from coefficient instability. Complex structured sparsity-inducing penalties are a promising approach to force the solution to adhere to some domain-specific constraints and thus offering new perspectives in biomarker identification. We propose a generic optimization framework that can combine any smooth convex loss function with: (i) penalties whose proximal operator is known and (ii) with a large range of complex, non-smooth convex structured penalties such as total variation, or overlapping group lasso. Although many papers have addressed a similar goal, few have tackled it in such a generic way and in the context of high-dimensional data. The proposed continuation algorithm, called \textit{CONESTA}, dynamically smooths the complex penalties to avoid the computation of proximal operators, that are either not known or expensive to compute. The decreasing sequence of smoothing parameters is dynamically adapted, using the duality gap, in order to maintain the optimal convergence speed towards any globally desired precision with duality gap guarantee. First, we demonstrate, on both simulated data and on experimental MRI data, that CONESTA outperforms the excessive gap method, ADMM, proximal gradient smoothing (without continuation) and inexact FISTA in terms of convergence speed and/or precision of the solution. Second, on the experimental MRI data set, we establish the superiority of structured sparsity-inducing penalties ($\ell_1$ and total variation) over non-structured methods in terms of the recovery of meaningful and stable groups of predictive variables. Confidence Confidence is defined as the probability of seeing the rule’s consequent under the condition that the transactions also contain the antecedent. Confidence is directed and gives different values for the rules X→Y and Y→X. Association rules have to satisfy a minimum confidence constraint, conf(X→Y)≥γ. Confidence is not down-ward closed and was developed together with support by Agrawal et al. (the so-called support-confidence framework). Support is first used to find frequent (significant) itemsets exploiting its down-ward closure property to prune the search space. Then confidence is used in a second step to produce rules from the frequent itemsets that exceed a min. confidence threshold. A problem with confidence is that it is sensitive to the frequency of the consequent Y in the database. Caused by the way confidence is calculated, consequents with higher support will automatically produce higher confidence values even if there exists no association between the items. Confidence Bound Minimization Bayesian optimization has demonstrated impressive success in finding the optimum location $x^{*}$ and value $f^{*}=f(x^{*})=\max_{x\in\mathcal{X}}f(x)$ of the black-box function $f$. In some applications, however, the optimum value is known in advance and the goal is to find the corresponding optimum location. Existing work in Bayesian optimization (BO) has not effectively exploited the knowledge of $f^{*}$ for optimization. In this paper, we consider a new setting in BO in which the knowledge of the optimum value is available. Our goal is to exploit the knowledge about $f^{*}$ to search for the location $x^{*}$ efficiently. To achieve this goal, we first transform the Gaussian process surrogate using the information about the optimum value. Then, we propose two acquisition functions, called confidence bound minimization and expected regret minimization, which exploit the knowledge about the optimum value to identify the optimum location efficiently. We show that our approaches work both intuitively and quantitatively achieve better performance against standard BO methods. We demonstrate real applications in tuning a deep reinforcement learning algorithm on the CartPole problem and XGBoost on Skin Segmentation dataset in which the optimum values are publicly available. Confidence Interval In statistics, a confidence interval (CI) is a type of interval estimate of a population parameter and is used to indicate the reliability of an estimate. It is an observed interval (i.e. it is calculated from the observations), in principle different from sample to sample, that frequently includes the parameter of interest if the experiment is repeated. How frequently the observed interval contains the parameter is determined by the confidence level or confidence coefficient. Confidence Trigger Detection(CTD) With deep learning based image analysis getting popular in recent years, a lot of multiple objects tracking applications are in demand. Some of these applications (e.g., surveillance camera, intelligent robotics, and autonomous driving) require the system runs in real-time. Though recent proposed methods reach fairly high accuracy, the speed is still slower than real-time application requirement. In order to increase tracking-by-detection system’s speed for real-time tracking, we proposed confidence trigger detection (CTD) approach which uses confidence of tracker to decide when to trigger object detection. Using this approach, system can safely skip detection of images frames that objects barely move. We had studied the influence of different confidences in three popular detectors separately. Though we found trade-off between speed and accuracy, our approach reaches higher accuracy at given speed. Confidence Weighting(CW) Confidence weighting (CW) is concerned with measuring two variables: (1) what a respondent believes is a correct answer to a question and (2) what degree of certainty the respondent has toward the correctness of this belief. Confidence weighting when applied to a specific answer selection for a particular test or exam question is referred to in the literature from cognitive psychology as item-specific confidence, a term typically used by researchers who investigate metamemory or metacognition, comprehension monitoring, or feeling-of-knowing. Item-specific confidence is defined as calibrating the relationship between an objective performance of accuracy (e.g., a test answer selection) with the subjective measure of confidence, (e.g., a numeric value assigned to the selection). Studies on self-confidence and metacognition during test taking have used item-specific confidence as a way to assess the accuracy and confidence underlying knowledge judgments. Researchers outside of the field of cognitive psychology have used confidence weighting as applied to item-specific judgments in assessing alternative conceptions of difficult concepts in high school biology and physics, developing and evaluating computerized adaptive testing, testing computerized assessments of learning and understanding, and developing and testing formative and summative classroom assessments. Confidence weighting is one of three components of the Risk Inclination Model. Confidence-based Graph Convolutional Network(ConfGCN) Predicting properties of nodes in a graph is an important problem with applications in a variety of domains. Graph-based Semi-Supervised Learning (SSL) methods aim to address this problem by labeling a small subset of the nodes as seeds and then utilizing the graph structure to predict label scores for the rest of the nodes in the graph. Recently, Graph Convolutional Networks (GCNs) have achieved impressive performance on the graph-based SSL task. In addition to label scores, it is also desirable to have confidence scores associated with them. Unfortunately, confidence estimation in the context of GCN has not been previously explored. We fill this important gap in this paper and propose ConfGCN, which estimates labels scores along with their confidences jointly in GCN-based setting. ConfGCN uses these estimated confidences to determine the influence of one node on another during neighborhood aggregation, thereby acquiring anisotropic capabilities. Through extensive analysis and experiments on standard benchmarks, we find that ConfGCN is able to outperform state-of-the-art baselines. We have made ConfGCN’s source code available to encourage reproducible research. Confidence-Based Recommender(CoBaR) Neighborhood-based collaborative filtering algorithms usually adopt a fixed neighborhood size for every user or item, although groups of users or items may have different lengths depending on users’ preferences. In this paper, we propose an extension to a non-personalized recommender based on confidence intervals and hierarchical clustering to generate groups of users with optimal sizes. The evaluation shows that the proposed technique outperformed the traditional recommender algorithms in four publicly available datasets. Confidence-Weighted Linear Classification We introduce confidence-weighted linear classifiers, which add parameter confidence information to linear classifiers. Online learners in this setting update both classifier parameters and the estimate of their confidence. The particular online algorithms we study here maintain a Gaussian distribution over parameter vectors and update the mean and covariance of the distribution with each instance. Empirical evaluation on a range of NLP tasks show that our algorithm improves over other state of the art online and batch methods, learns faster in the online setting, and lends itself to better classifier combination after parallel training. Confident K-SRC and Dictionary Learning(CKSC) In recent years, kernel-based sparse coding (K-SRC) has received particular attention due to its efficient representation of nonlinear data structures in the feature space. Nevertheless, the existing K-SRC methods suffer from the lack of consistency between their training and test optimization frameworks. In this work, we propose a novel confident K-SRC and dictionary learning algorithm (CKSC) which focuses on the discriminative reconstruction of the data based on its representation in the kernel space. CKSC focuses on reconstructing each data sample via weighted contributions which are confident in its corresponding class of data. We employ novel discriminative terms to apply this scheme to both training and test frameworks in our algorithm. This specific design increases the consistency of these optimization frameworks and improves the discriminative performance in the recall phase. In addition, CKSC directly employs the supervised information in its dictionary learning framework to enhance the discriminative structure of the dictionary. For empirical evaluations, we implement our CKSC algorithm on multivariate time-series benchmarks such as DynTex++ and UTKinect. Our claims regarding the superior performance of the proposed algorithm are justified throughout comparing its classification results to the state-of-the-art K-SRC algorithms. Confident Multiple Choice Learning(CMCL) Ensemble methods are arguably the most trustworthy techniques for boosting the performance of machine learning models. Popular independent ensembles (IE) relying on naive averaging/voting scheme have been of typical choice for most applications involving deep neural networks, but they do not consider advanced collaboration among ensemble models. In this paper, we propose new ensemble methods specialized for deep neural networks, called confident multiple choice learning (CMCL): it is a variant of multiple choice learning (MCL) via addressing its overconfidence issue.In particular, the proposed major components of CMCL beyond the original MCL scheme are (i) new loss, i.e., confident oracle loss, (ii) new architecture, i.e., feature sharing and (iii) new training method, i.e., stochastic labeling. We demonstrate the effect of CMCL via experiments on the image classification on CIFAR and SVHN, and the foreground-background segmentation on the iCoseg. In particular, CMCL using 5 residual networks provides 14.05% and 6.60% relative reductions in the top-1 error rates from the corresponding IE scheme for the classification task on CIFAR and SVHN, respectively. Configural Frequency Analysis(CFA) Configural frequency analysis (CFA) is a method of exploratory data analysis, introduced by Gustav A. Lienert in 1969. The goal of a configural frequency analysis is to detect patterns in the data that occur significantly more (such patterns are called Types) or significantly less often (such patterns are called Antitypes) than expected by chance. Thus, the idea of a CFA is to provide by the identified types and antitypes some insight into the structure of the data. Types are interpreted as concepts which are constituted by a pattern of variable values. Antitypes are interpreted as patterns of variable values that do in general not occur together. cfa Configurational Comparative Methods(CCM) Configurational comparative methods (CCMs) subsume techniques for the identification of complex causal dependencies in configurational data using the theoretical framework of Boolean algebra and its various extensions (Rihoux and Ragin, 2009). For example, Qualitative Comparative Analysis (QCA; Ragin, 1987, 2000, 2008)-hitherto the most prominent representative of CCMs-has been applied in areas as diverse as business administration (e.g., Chung, 2001), environmental science (van Vliet et al., 2013), evaluation (Cragun et al., 2014), political science (Thiem, 2011), public health (Longest and Thoits, 2012) and sociology (Crowley, 2013). Besides three stand-alone programs based on graphical user interfaces, three R packages for QCA are currently available, each with a different scope of functionality: QCA (Du¸sa and Thiem, 2014; Thiem and Du¸sa, 2013a,c), QCA3 (Huang, 2014) and SetMethods (Quaranta, 2013) (an add-on package to Schneider and Wagemann, 2012). Confirmatory Analysis 1) Inferential Statistics – Deductive Approach: · Heavy reliance on probability models · Must accept untestable assumptions · Look for definite answers to specific questions · Emphasis on numerical calculations · Hypotheses determined at outset · Hypothesis tests and formal confidence interval estimation. 2) Advantages: · Provide precise information in the right circumstances · Well-established theory and methods. 3) Disadvantages: · Misleading impression of precision in less than ideal circumstances · Analysis driven by preconceived ideas · Difficult to notice unexpected results. Confirmatory Factor Analysis(CFA) Confirmatory Factor Analysis (CFA) is a particular form of factor analysis, most commonly used in social research. In confirmatory factor analysis, the researcher first develops a hypothesis about what factors they believe are underlying the used measures and may impose constraints on the model based on these a priori hypotheses. For example, if two factors are accounting for the covariance in the measures, and these factors are unrelated to one another, we can create a model where the correlation between factor X and factor Y is set to zero. Measures could then be obtained to assess how well the fitted model captured the covariance between all the items or measures in the model. Thus, if the results of statistical tests of the model fit indicate a poor fit, the model will be rejected. If the fit is weak, it may be due to a variety of reasons. We propose to introduce state of the art techniques to do CFA in R language. Then, we propose to do some examples of CFA with R and some datasets, revealing several scenarios where CFA is relevant. relabeLoadings Confirmatory Research Confirmatory research sets out to test a specific hypothesis to the exclusion of other considerations; whereas discovery research seeks to find out what might be important in understanding a research context, presenting findings as conjectural (e.g., suggestive, indicative) rather than definite’ (Taber, 2013: 45) Exploratory vs Confirmatory Research Conflict-Driven Clause Learning(CDCL) In computer science, Conflict-Driven Clause Learning (CDCL) is an algorithm for solving the Boolean satisfiability problem (SAT). Given a Boolean formula, the SAT problem asks for an assignment of variables so that the entire formula evaluates to true. The internal workings of CDCL SAT solvers were inspired by DPLL solvers. Conflict-free Asynchronous Machine Learning(CYCLADES) We present CYCLADES, a general framework for parallelizing stochastic optimization algorithms in a shared memory setting. CYCLADES is asynchronous during shared model updates, and requires no memory locking mechanisms, similar to HOGWILD!-type algorithms. Unlike HOGWILD!, CYCLADES introduces no conflicts during the parallel execution, and offers a black-box analysis for provable speedups across a large family of algorithms. Due to its inherent conflict-free nature and cache locality, our multi-core implementation of CYCLADES consistently outperforms HOGWILD!-type algorithms on sufficiently sparse datasets, leading to up to 40% speedup gains compared to the HOGWILD! implementation of SGD, and up to 5x gains over asynchronous implementations of variance reduction algorithms. Conflict-free Replicated Data Collection(CRDC) Internet-scale distributed systems often replicate data within and across data centers to provide low latency and high availability despite node and network failures. Replicas are required to accept updates without coordination with each other, and the updates are then propagated asynchronously. This brings the issue of conflict resolution among concurrent updates, which is often challenging and error-prone. The Conflict-free Replicated Data Type (CRDT) framework provides a principled approach to address this challenge. This work focuses on a special type of CRDT, namely the Conflict-free Replicated Data Collection (CRDC), e.g. list and queue. The CRDC can have complex and compound data items, which are organized in structures of rich semantics. Complex CRDCs can greatly ease the development of upper-layer applications, but also makes the conflict resolution notoriously difficult. This explains why existing CRDC designs are tricky, and hard to be generalized to other data types. A design framework is in great need to guide the systematic design of new CRDCs. To address the challenges above, we propose the Remove-Win Design Framework. The remove-win strategy for conflict resolution is simple but powerful. The remove operation just wipes out the data item, no matter how complex the value is. The user of the CRDC only needs to specify conflict resolution for non-remove operations. This resolution is destructed to three basic cases and are left as open terms in the CRDC design skeleton. Stubs containing user-specified conflict resolution logics are plugged into the skeleton to obtain concrete CRDC designs. We demonstrate the effectiveness of our design framework via a case study of designing a conflict-free replicated priority queue. Performance measurements also show the efficiency of the design derived from our design framework. Conflict-Free Replicated Data Types(CRDT) Internet-scale distributed systems often replicate data at multiple geographic locations to provide low latency and high availability, despite node and network failures. Geo-replicated systems that adopt a weak consistency model allow replicas to temporarily diverge, requiring a mechanism for merging concurrent updates into a common state. Conflict-free Replicated Data Types (CRDT) provide a principled approach to address this problem. This document presents an overview of Conflict-free Replicated Data Types research and practice, organizing the presentation in the aspects relevant for the application developer, the system developer and the CRDT developer. Confluo Confluo is a system for real-time distributed analysis of multiple data streams. Confluo simultaneously supports high throughput concurrent writes, online queries at millisecond timescales, and CPU-efficient ad-hoc queries via a combination of data structures carefully designed for the specialized case of multiple data streams, and an end-to-end optimized system design. Conformable Fractional Accumulation(CFA) The fractional order grey models (FGM) have appealed considerable interest of research in recent years due to its higher effectiveness and flexibility than the conventional grey models and other prediction models. However, the definitions of the fractional order accumulation (FOA) and difference (FOD) is computationally complex, which leads to difficulties for the theoretical analysis and applications. In this paper, the new definition of the FOA are proposed based on the definitions of Conformable Fractional Derivative, which is called the Conformable Fractional Accumulation (CFA), along with its inverse operation, the Conformable Fractional Difference (CFD). Then the new Conformable Fractional Grey Model (CFGM) based on CFA and CFD is introduced with detailed modelling procedures. The feasibility and simplicity and the CFGM are shown in the numerical example. And the at last the comprehensive real-world case studies of natural gas production forecasting in 11 countries are presented, and results show that the CFGM is much more effective than the existing FGM model in the 165 subcases. Conformable Fractional Grey Model(CFGM) The fractional order grey models (FGM) have appealed considerable interest of research in recent years due to its higher effectiveness and flexibility than the conventional grey models and other prediction models. However, the definitions of the fractional order accumulation (FOA) and difference (FOD) is computationally complex, which leads to difficulties for the theoretical analysis and applications. In this paper, the new definition of the FOA are proposed based on the definitions of Conformable Fractional Derivative, which is called the Conformable Fractional Accumulation (CFA), along with its inverse operation, the Conformable Fractional Difference (CFD). Then the new Conformable Fractional Grey Model (CFGM) based on CFA and CFD is introduced with detailed modelling procedures. The feasibility and simplicity and the CFGM are shown in the numerical example. And the at last the comprehensive real-world case studies of natural gas production forecasting in 11 countries are presented, and results show that the CFGM is much more effective than the existing FGM model in the 165 subcases. Conformal Prediction Conformal prediction uses past experience to determine precise levels of confidence in new predictions. Given an error probability e, together with a method that makes a prediction y of a label y, it produces a set of labels, typically containing y, that also contains y with probability 1-e. Conformal prediction can be applied to any method for producing y: a nearest-neighbor method, a support-vector machine, ridge regression, etc. Conformal prediction is designed for an on-line setting in which labels are predicted successively, each one being revealed before the next is predicted. The most novel and valuable feature of conformal prediction is that if the successive examples are sampled independently from the same distribution, then the successive predictions will be right 1-e of the time, even though they are based on an accumulating data set rather than on independent data sets. In addition to the model under which successive examples are sampled independently, other on-line compression models can also use conformal prediction. The widely used Gaussian linear model is one of these. Conformal Prediction Under Covariate Shift Conformalized Quantile Regression Conformal prediction is a technique for constructing prediction intervals that attain valid coverage in finite samples, without making distributional assumptions. Despite this appeal, existing conformal methods can be unnecessarily conservative because they form intervals of constant or weakly varying length across the input space. In this paper we propose a new method that is fully adaptive to heteroscedasticity. It combines conformal prediction with classical quantile regression, inheriting the advantages of both. We establish a theoretical guarantee of valid coverage, supplemented by extensive experiments on popular regression datasets. We compare the efficiency of conformalized quantile regression to other conformal methods, showing that our method tends to produce shorter intervals. Confounding ➘ “Confounding Variable” http://…/confounding.html Confounding Variable In statistics, a confounding variable (also confounding factor, a confound, or confounder) is an extraneous variable in a statistical model that correlates (directly or inversely) with both the dependent variable and the independent variable. A perceived relationship between an independent variable and a dependent variable that has been misestimated due to the failure to account for a confounding factor is termed a spurious relationship, and the presence of misestimation for this reason is termed omitted-variable bias. While specific definitions may vary, in essence a confounding variable fits the following four criteria, here given in a hypothetical situation with variable of interest ‘V’, confounding variable ‘C’ and outcome of interest ‘O’: 1. C is associated (inversely or directly) with O 2. C is associated with O, independent of V 3. C is associated (inversely or directly) with V 4. C is not in the causal pathway of V to O (C is not a direct consequence of V, not a way by which V produces O) The above correlation-based definition, however, is metaphorical at best – a growing number of analysts agree that confounding is a causal concept, and as such, cannot be described in terms of correlations nor associations. Confusion Matrix In the field of machine learning, a confusion matrix, also known as a contingency table or an error matrix , is a specific table layout that allows visualization of the performance of an algorithm, typically a supervised learning one (in unsupervised learning it is usually called a matching matrix). Each column of the matrix represents the instances in a predicted class, while each row represents the instances in an actual class. The name stems from the fact that it makes it easy to see if the system is confusing two classes (i.e. commonly mislabeling one as another). Confusion2Vec Word vector representations are a crucial part of Natural Language Processing (NLP) and Human Computer Interaction. In this paper, we propose a novel word vector representation, Confusion2Vec, motivated from the human speech production and perception that encodes representational ambiguity. Humans employ both acoustic similarity cues and contextual cues to decode information and we focus on a model that incorporates both sources of information. The representational ambiguity of acoustics, which manifests itself in word confusions, is often resolved by both humans and machines through contextual cues. A range of representational ambiguities can emerge in various domains further to acoustic perception, such as morphological transformations, paraphrasing for NLP tasks like machine translation etc. In this work, we present a case study in application to Automatic Speech Recognition (ASR), where the word confusions are related to acoustic similarity. We present several techniques to train an acoustic perceptual similarity representation ambiguity. We term this Confusion2Vec and learn on unsupervised-generated data from ASR confusion networks or lattice-like structures. Appropriate evaluations for the Confusion2Vec are formulated for gauging acoustic similarity in addition to semantic-syntactic and word similarity evaluations. The Confusion2Vec is able to model word confusions efficiently, without compromising on the semantic-syntactic word relations, thus effectively enriching the word vector space with extra task relevant ambiguity information. We provide an intuitive exploration of the 2-dimensional Confusion2Vec space using Principal Component Analysis of the embedding and relate to semantic, syntactic and acoustic relationships. The potential of Confusion2Vec in the utilization of uncertainty present in lattices is demonstrated through small examples relating to ASR error correction. Congested Scene Recognition Network(CSRNet) We propose a network for Congested Scene Recognition called CSRNet to provide a data-driven and deep learning method that can understand highly congested scenes and perform accurate count estimation as well as present high-quality density maps. The proposed CSRNet is composed of two major components: a convolutional neural network (CNN) as the front-end for 2D feature extraction and a dilated CNN for the back-end, which uses dilated kernels to deliver larger reception fields and to replace pooling operations. CSRNet is an easy-trained model because of its pure convolutional structure. To our best acknowledge, CSRNet is the first implementation using dilated CNNs for crowd counting tasks. We demonstrate CSRNet on four datasets (ShanghaiTech dataset, the UCF_CC_50 dataset, the WorldEXPO’10 dataset, and the UCSD dataset) and we deliver the state-of-the-art performance on all the datasets. In the ShanghaiTech Part_B dataset, we significantly achieve the MAE which is 47.3% lower than the previous state-of-the-art method. We extend the applications for counting other objects, such as the vehicle in TRANCOS dataset. Results show that CSRNet significantly improves the output quality with 15.4% lower MAE than the previous state-of-the-art approach. Congruence Class Model(CCM) CCMnet Congruence Distance A time series is a sequence of data items; typical examples are videos, stock ticker data, or streams of temperature measurements. Quite some research has been devoted to comparing and indexing simple time series, i.e., time series where the data items are real numbers or integers. However, for many application scenarios, the data items of a time series are not simple, but high-dimensional data points. Motivated by an application scenario dealing with motion gesture recognition, we develop a distance measure (which we call congruence distance) that serves as a model for the approximate congruency of two multi-dimensional time series. This distance measure generalizes the classical notion of congruence from point sets to multi-dimensional time series. We show that, given two input time series $S$ and $T$, computing the congruence distance of $S$ and $T$ is NP-hard. Afterwards, we present two algorithms that compute an approximation of the congruence distance. We provide theoretical bounds that relate these approximations with the exact congruence distance. Conjoint Analysis Conjoint analysis’ is a survey based statistical technique used in market research that helps determine how people value different attributes (feature, function, benefits) that make up an individual product or service. The objective of conjoint analysis is to determine what combination of a limited number of attributes is most influential on respondent choice or decision making. A controlled set of potential products or services is shown to survey respondents and by analyzing how they make preferences between these products, the implicit valuation of the individual elements making up the product or service can be determined. These implicit valuations (utilities or part-worths) can be used to create market models that estimate market share, revenue and even profitability of new designs. Conjoint originated in mathematical psychology and was developed by marketing professor Paul E. Green at the Wharton School of the University of Pennsylvania and Data Chan. Other prominent conjoint analysis pioneers include professor V. ‘Seenu’ Srinivasan of Stanford University who developed a linear programming (LINMAP) procedure for rank ordered data as well as a self-explicated approach, Richard Johnson who developed the Adaptive Conjoint Analysis technique in the 1980s and Jordan Louviere (University of Iowa) who invented and developed choice-based approaches to conjoint analysis and related techniques such as best-worst scaling. Today it is used in many of the social sciences and applied sciences including marketing, product management, and operations research. It is used frequently in testing customer acceptance of new product designs, in assessing the appeal of advertisements and in service design. It has been used in product positioning, but there are some who raise problems with this application of conjoint analysis. Conjoint analysis techniques may also be referred to as multiattribute compositional modelling, discrete choice modelling, or stated preference research, and is part of a broader set of trade-off analysis tools used for systematic analysis of decisions. These tools include Brand-Price Trade-Off, Simalto, and mathematical approaches such as AHP, evolutionary algorithms or rule-developing experimentation. What Is Conjoint Analysis? Conjugate Gradient Method(CG) In mathematics, the conjugate gradient method is an algorithm for the numerical solution of particular systems of linear equations, namely those whose matrix is symmetric and positive-definite. The conjugate gradient method is often implemented as an iterative algorithm, applicable to sparse systems that are too large to be handled by a direct implementation or other direct methods such as the Cholesky decomposition. Large sparse systems often arise when numerically solving partial differential equations or optimization problems. The conjugate gradient method can also be used to solve unconstrained optimization problems such as energy minimization. It was developed by Magnus Hestenes and Eduard Stiefel. Conjugate Prior In Bayesian probability theory, if the posterior distributions p(theta|x) are in the same family as the prior probability distribution p(theta), the prior and posterior are then called conjugate distributions, and the prior is called a conjugate prior for the likelihood function. For example, the Gaussian family is conjugate to itself (or self-conjugate) with respect to a Gaussian likelihood function: if the likelihood function is Gaussian, choosing a Gaussian prior over the mean will ensure that the posterior distribution is also Gaussian. This means that the Gaussian distribution is a conjugate prior for the likelihood which is also Gaussian. Connected Scatterplot The connected scatterplot visualizes two related time series in a scatterplot and connects the points with a line in temporal sequence. News media are increasingly using this technique to present data under the intuition that it is understandable and engaging. To explore these intuitions, we (1) describe how paired time series relationships appear in a connected scatterplot, (2) qualitatively evaluate how well people understand trends depicted in this format, (3) quantitatively measure the types and frequency of misinterpretations, and (4) empirically evaluate whether viewers will preferentially view graphs in this format over the more traditional format. The results suggest that low-complexity connected scatterplots can be understood with little explanation, and that viewers are biased towards inspecting connected scatterplots over the more traditional format. We also describe misinterpretations of connected scatterplots and propose further research into mitigating these mistakes for viewers unfamiliar with the technique. Connection Analytics Connection Analytics – an emerging discipline that provides answers to persistent business questions such as identification and influence of thought leaders, impact of external events or players on financial risk, or analysis of network performance based on causal relationships between nodes. It provides a new way of looking at people, products, physical phenomena, or events. Enterprises are using Big Data analytics to complement traditional SQL queries in answering very familiar questions, such as customer retention, marketing attribution, risk mitigation, and operational efficiency which, until now, required enormous compute power, time-consuming data management and the need for learning highly specialized programming and query languages. Connection Scan Algorithm(CSA) We introduce the Connection Scan Algorithm (CSA) to efficiently answer queries to timetable information systems. The input consists, in the simplest setting, of a source position and a desired target position. The output consist is a sequence of vehicles such as trains or buses that a traveler should take to get from the source to the target. We study several problem variations such as the earliest arrival and profile problems. We present algorithm variants that only optimize the arrival time or additionally optimize the number of transfers in the Pareto sense. An advantage of CSA is that is can easily adjust to changes in the timetable, allowing the easy incorporation of known vehicle delays. We additionally introduce the Minimum Expected Arrival Time (MEAT) problem to handle possible, uncertain, future vehicle delays. We present a solution to the MEAT problem that is based upon CSA. Finally, we extend CSA using the multilevel overlay paradigm to answer complex queries on nation-wide integrated timetables with trains and buses. Connectionist Temporal Classification(CTC) Connectionist temporal classification (CTC) is widely used for maximum likelihood learning in end-to-end speech recognition models. However, there is usually a disparity between the negative maximum likelihood and the performance metric used in speech recognition, e.g., word error rate (WER). This results in a mismatch between the objective function and metric during training. We show that the above problem can be mitigated by jointly training with maximum likelihood and policy gradient. In particular, with policy learning we are able to directly optimize on the (otherwise non-differentiable) performance metric. We show that joint training improves relative performance by 4% to 13% for our end-to-end model as compared to the same model learned through maximum likelihood. The model achieves 5.53% WER on Wall Street Journal dataset, and 5.42% and 14.70% on Librispeech test-clean and test-other set, respectively. Connectionist Temporal Classification Model(CTCModel) We report an extension of a Keras Model, called CTCModel, to perform the Connectionist Temporal Classification (CTC) in a transparent way. Combined with Recurrent Neural Networks, the Connectionist Temporal Classification is the reference method for dealing with unsegmented input sequences, i.e. with data that are a couple of observation and label sequences where each label is related to a subset of observation frames. CTCModel makes use of the CTC implementation in the Tensorflow backend for training and predicting sequences of labels using Keras. It consists of three branches made of Keras models: one for training, computing the CTC loss function; one for predicting, providing sequences of labels; and one for evaluating that returns standard metrics for analyzing sequences of predictions. Conover-Iman Test ConRelMiner The General Data Protection Regulation (GDPR) poses enormous challenges on companies and organizations with respect to understanding, implementing, and maintaining the contained constraints. We report on how the ConRelMiner method can be used for untangling the GDPR. For this, the GDPR is filtered and grouped along the roles mentioned by the GDPR and the reduction of sentences to be read by analysts is shown. Moreover, the output of the ConRelMiner – a cluster graph with relations between the sentences – is displayed and interpreted. Overall the goal is to illustrate how the effort for implementing the GDPR can be reduced and a structured and meaningful representation of the relevant GDPR sentences can be found. Conserved Differential Information(CDI) We introduce Minimal Achievable Sufficient Statistic (MASS) Learning, a training method for machine learning models that attempts to produce minimal sufficient statistics with respect to a class of functions (e.g. deep networks) being optimized over. In deriving MASS Learning, we also introduce Conserved Differential Information (CDI), an information-theoretic quantity that – unlike standard mutual information – can be usefully applied to deterministically-dependent continuous random variables like the input and output of a deep network. In a series of experiments, we show that deep networks trained with MASS Learning achieve competitive performance on supervised learning, regularization, and uncertainty quantification benchmarks. Consilience We describe an apparently new measure of multivariate goodness-of-fit between sets of quantitative results from a model (simulation, analytical, or multiple regression), paired with those observed under corresponding conditions from the system being modeled. Our approach returns a single, integrative measure, even though it can accommodate complex systems that produce responses of M types. For each response-type, the goodness-of-fit measure, which we label ‘Consilience’ (C), is maximally 1, for perfect fit; near 0 for the large-sample case (number of pairs, N, more than about 25) in which the modeled series is a random sample from a quasi-normal distribution with the same mean and variance as that of the observed series (null model); and, less than 0, toward minus-infinity, for progressively worse fit. In addition, lack-of-fit for each response-type can be apportioned between systematic and non-systematic (unexplained) components of error. Finally, for statistical assessment of models relative to the equivalent null model, we offer provisional estimates of critical C vs. N, and of critical joint-C vs. N and M, at various levels of Pr(type-I error). Application of our proposed methodology requires only MS Excel (2003 or later); we provide Excel XLS and XLSX templates that afford semi-automatic computation for systems involving up to M = 5 response types, each represented by up to N = 1000 observed-and-modeled result pairs. N need not be equal, nor response pairs in complete overlap, over M. Consistency as Logical Monotonicity(CALM) A key concern in modern distributed systems is to avoid the cost of coordination while maintaining consistent semantics. Until recently, there was no answer to the question of when coordination is actually required. In this paper we present an informal introduction to the CALM Theorem, which answers this question precisely by moving up from traditional storage consistency to consider properties of programs. CALM is an acronym for ‘consistency as logical monotonicity’. The CALM Theorem shows that the programs that have consistent, coordination-free distributed implementations are exactly the programs that can be expressed in monotonic logic. This theoretical result has practical implications for developers of distributed applications. We show how CALM provides a constructive application-level counterpart to conventional ‘systems’ wisdom, such as the apparently negative results of the CAP Theorem. We also discuss ways that monotonic thinking can influence distributed systems design, and how new programming language designs and tools can help developers write consistent, coordination-free code. Consistent Generative Query Network Stochastic video prediction is usually framed as an extrapolation problem where the goal is to sample a sequence of consecutive future image frames conditioned on a sequence of observed past frames. For the most part, algorithms for this task generate future video frames sequentially in an autoregressive fashion, which is slow and requires the input and output to be consecutive. We introduce a model that overcomes these drawbacks — it learns to generate a global latent representation from an arbitrary set of frames within a video. This representation can then be used to simultaneously and efficiently sample any number of temporally consistent frames at arbitrary time-points in the video. We apply our model to synthetic video prediction tasks and achieve results that are comparable to state-of-the-art video prediction models. In addition, we demonstrate the flexibility of our model by applying it to 3D scene reconstruction where we condition on location instead of time. To the best of our knowledge, our model is the first to provide flexible and coherent prediction on stochastic video datasets, as well as consistent 3D scene samples. Please check the project website https://bit.ly/2jX7Vyu to view scene reconstructions and videos produced by our model. Consistent Lifting, Embedding, and Alignment Rectification(CLEAR) A fundamental challenge in many robotics applications is to correctly synchronize and fuse observations across a team of sensors or agents. Instead of solely relying on pairwise matches among observations, multi-way matching methods leverage the notion of cycle consistency to (i) provide a natural correction mechanism for removing noise and outliers from pairwise matches; (ii) construct an efficient and low-rank representation of the data via merging the redundant observations. To solve this computationally challenging problem, state-of-the-art techniques resort to relaxation and rounding techniques that can potentially result in a solution that violates the cycle consistency principle. Hence, losing the aforementioned benefits. In this work, we present the CLEAR algorithm to address this issue by generating solutions that are, by construction, cycle consistent. Through a novel spectral graph clustering approach, CLEAR fuses the techniques in the multi-way matching and the spectral clustering literature and provides consistent solutions, even in challenging high-noise regimes. Our resulting general framework can provide significant improvement in the accuracy and efficiency of existing distributed multi-agent learning, collaborative SLAM, and multiobject tracking pipelines, which traditionally use pairwise (but potentially inconsistent) correspondences. Consistent Sampling We describe a very simple method for consistent sampling’ that allows for sampling with replacement. The method extends previous approaches to consistent sampling, which assign a pseudorandom real number to each element, and sample those with the smallest associated numbers. When sampling with replacement, our extension gives the item sampled a new, larger, associated pseudorandom number, and returns it to the pool of items being sampled. Constrained Attention Network(CAN) Aspect level sentiment classification is a fine-grained sentiment analysis task, compared to the sentence level classification. A sentence usually contains one or more aspects. To detect the sentiment towards a particular aspect in a sentence, previous studies have developed various methods for generating aspect-specific sentence representations. However, these studies handle each aspect of a sentence separately. In this paper, we argue that multiple aspects of a sentence are usually orthogonal based on the observation that different aspects concentrate on different parts of the sentence. To force the orthogonality among aspects, we propose constrained attention networks (CAN) for multi-aspect sentiment analysis, which handles multiple aspects of a sentence simultaneously. Experimental results on two public datasets demonstrate the effectiveness of our approach. We also extend our approach to multi-task settings, outperforming the state-of-the-arts significantly. Constrained CLR In this paper we explore different regression models based on Clusterwise Linear Regression (CLR). CLR aims to find the partition of the data into $k$ clusters, such that linear regressions fitted to each of the clusters minimize overall mean squared error on the whole data. The main obstacle preventing to use found regression models for prediction on the unseen test points is the absence of a reasonable way to obtain CLR cluster labels when the values of target variable are unknown. In this paper we propose two novel approaches on how to solve this problem. The first approach, predictive CLR builds a separate classification model to predict test CLR labels. The second approach, constrained CLR utilizes a set of user-specified constraints that enforce certain points to go to the same clusters. Assuming the constraint values are known for the test points, they can be directly used to assign CLR labels. We evaluate these two approaches on three UCI ML datasets as well as on a large corpus of health insurance claims. We show that both of the proposed algorithms significantly improve over the known CLR-based regression methods. Moreover, predictive CLR consistently outperforms linear regression and random forest, and shows comparable performance to support vector regression on UCI ML datasets. The constrained CLR approach achieves the best performance on the health insurance dataset, while enjoying only $\approx 20$ times increased computational time over linear regression. Constrained Optimisation With Latent Distribution(COLD) Optimising discrete data for a desired characteristic using gradient-based methods involves projecting the data into a continuous latent space and carrying out optimisation in this space. Carrying out global optimisation is difficult as optimisers are likely to follow gradients into regions of the latent space that the model has not been exposed to during training; samples generated from these regions are likely to be too dissimilar to the training data to be useful. We propose Constrained Optimisation with Latent Distributions (COLD), a constrained global optimisation procedure to find samples with high values of a desired property that are similar to yet distinct from the training data. We find that on MNIST, our procedure yields optima for each of three different objectives, and that enforcing tighter constraints improves the quality and increases the diversity of the generated images. On the ChEMBL molecular dataset, our method generates a diverse set of new molecules with drug-likeness scores similar to those of the highest-scoring molecules in the training data. We also demonstrate a computationally efficient way to approximate the constraint when evaluating it exactly is computationally expensive. Constrained Optimization In mathematical optimization, constrained optimization (in some contexts called constraint optimization) is the process of optimizing an objective function with respect to some variables in the presence of constraints on those variables. The objective function is either a cost function or energy function, which is to be minimized, or a reward function or utility function, which is to be maximized. Constraints can be either hard constraints, which set conditions for the variables that are required to be satisfied, or soft constraints, which have some variable values that are penalized in the objective function if, and based on the extent that, the conditions on the variables are not satisfied. Constrained Optimization By RAdial Basis Function Interpolation(COBRA) COnstrained PARAFAC2(COPA) PARAFAC2 has demonstrated success in modeling irregular tensors, where the tensor dimensions vary across one of the modes. An example scenario is jointly modeling treatments across a set of patients with varying number of medical encounters, where the alignment of events in time bears no clinical meaning, and it may also be impossible to align them due to their varying length. Despite recent improvements on scaling up unconstrained PARAFAC2, its model factors are usually dense and sensitive to noise which limits their interpretability. As a result, the following open challenges remain: a) various modeling constraints, such as temporal smoothness, sparsity and non-negativity, are needed to be imposed for interpretable temporal modeling and b) a scalable approach is required to support those constraints efficiently for large datasets. To tackle these challenges, we propose a COnstrained PARAFAC2 (COPA) method, which carefully incorporates optimization constraints such as temporal smoothness, sparsity, and non-negativity in the resulting factors. To efficiently support all those constraints, COPA adopts a hybrid optimization framework using alternating optimization and alternating direction method of multiplier (AO-ADMM). As evaluated on large electronic health record (EHR) datasets with hundreds of thousands of patients, COPA achieves significant speedups (up to 36x faster) over prior PARAFAC2 approaches that only attempt to handle a subset of the constraints that COPA enables. Overall, our method outperforms all the baselines attempting to handle a subset of the constraints in terms of speed, while achieving the same level of accuracy. Constrained Policy Optimization(CPO) For many applications of reinforcement learning it can be more convenient to specify both a reward function and constraints, rather than trying to design behavior through the reward function. For example, systems that physically interact with or around humans should satisfy safety constraints. Recent advances in policy search algorithms (Mnih et al., 2016, Schulman et al., 2015, Lillicrap et al., 2016, Levine et al., 2016) have enabled new capabilities in high-dimensional control, but do not consider the constrained setting. We propose Constrained Policy Optimization (CPO), the first general-purpose policy search algorithm for constrained reinforcement learning with guarantees for near-constraint satisfaction at each iteration. Our method allows us to train neural network policies for high-dimensional control while making guarantees about policy behavior all throughout training. Our guarantees are based on a new theoretical result, which is of independent interest: we prove a bound relating the expected returns of two policies to an average divergence between them. We demonstrate the effectiveness of our approach on simulated robot locomotion tasks where the agent must satisfy constraints motivated by safety. Constrained Quantile Regression Averaging(CQRA) Probabilistic load forecasts provide comprehensive information about future load uncertainties. In recent years, many methodologies and techniques have been proposed for probabilistic load forecasting. Forecast combination, a widely recognized best practice in point forecasting literature, has never been formally adopted to combine probabilistic load forecasts. This paper proposes a constrained quantile regression averaging (CQRA) method to create an improved ensemble from several individual probabilistic forecasts. We formulate the CQRA parameter estimation problem as a linear program with the objective of minimizing the pinball loss, with the constraints that the parameters are nonnegative and summing up to one. We demonstrate the effectiveness of the proposed method using two publicly available datasets, the ISO New England data and Irish smart meter data. Comparing with the best individual probabilistic forecast, the ensemble can reduce the pinball score by 4.39% on average. The proposed ensemble also demonstrates superior performance over nine other benchmark ensembles. Constrained Recurrent Sparse Auto-Encoder(CRsAE) Convolutional dictionary learning (CDL) has become a popular method for learning sparse representations from data. State-of-the-art algorithms perform dictionary learning (DL) through an optimization-based alternating-minimization procedure that comprises a sparse coding and a dictionary update step respectively. Here, we draw connections between CDL and neural networks by proposing an architecture for CDL termed the constrained recurrent sparse auto-encoder (CRsAE). We leverage the interpretation of the alternating-minimization algorithm for DL as an Expectation-Maximization algorithm to develop auto-encoders (AEs) that, for the first time, enable the simultaneous training of the dictionary and regularization parameter. The forward pass of the encoder, which performs sparse coding, solves the E-step using an encoding matrix and a soft-thresholding non-linearity imposed by the FISTA algorithm. The encoder in this regard is a variant of residual and recurrent neural networks. The M-step is implemented via a two-stage back-propagation. In the first stage, we perform back-propagation through the AE formed by the encoder and a linear decoder whose parameters are tied to the encoder. This stage parallels the dictionary update step in DL. In the second stage, we update the regularization parameter by performing back-propagation through the encoder using a loss function that includes a prior on the parameter motivated by Bayesian statistics. We leverage GPUs to achieve significant computational gains relative to state-of-the-art optimization-based approaches to CDL. We apply CRsAE to spike sorting, the problem of identifying the time of occurrence of neural action potentials in recordings of electrical activity from the brain. We demonstrate on recordings lasting hours that CRsAE speeds up spike sorting by 900x compared to notoriously slow classical algorithms based on convex optimization. Constrained State Estimation Constrained-Action Partially Observable Markov Decision Process(CA-POMDP) This paper addresses a fundamental question of multi-agent knowledge distribution: what information should be sent to whom and when, with the limited resources available to each agent? Communication requirements for multi-agent systems can be rather high when an accurate picture of the environment and the state of other agents must be maintained. To reduce the impact of multi-agent coordination on networked systems, e.g., power and bandwidth, this paper introduces two concepts for partially observable Markov decision processes (POMDPs): 1) action-based constraints which yield constrained-action partially observable Markov decision processes (CA-POMDPs); and 2) soft probabilistic constraint satisfaction for the resulting infinite-horizon controllers. To enable constraint analysis over an infinite horizon, an unconstrained policy is first represented as a Finite State Controller (FSC) and optimized with policy iteration. The FSC representation then allows for a combination of Markov chain Monte Carlo and discrete optimization to improve the probabilistic constraint satisfaction of the controller while minimizing the impact to the value function. Within the CA-POMDP framework we then propose Intelligent Knowledge Distribution (IKD) which yields per-agent policies for distributing knowledge between agents subject to interaction constraints. Finally, the CA-POMDP and IKD concepts are validated using an asset tracking problem where multiple unmanned aerial vehicles (UAVs) with heterogeneous sensors collaborate to localize a ground asset to assist in avoiding unseen obstacles in a disaster area. The IKD model was able to maintain asset tracking through multi-agent communications while only violating soft power and bandwidth constraints 3% of the time, while greedy and naive approaches violated constraints more than 60% of the time. Constraint Consistent Learning(CCL) A Library for Constraint Consistent Learning Constraint-based causal Discovery from NOnstationary/heterogeneous Data(CD-NOD) It is commonplace to encounter nonstationary or heterogeneous data. Such a distribution shift feature presents both challenges and opportunities for causal discovery, of which the underlying generating process changes over time or across domains. In this paper, we develop a principled framework for causal discovery from such data, called Constraint-based causal Discovery from NOnstationary/heterogeneous Data (CD-NOD), which addresses two important questions. First, we propose an enhanced constraint-based procedure to detect variables whose local mechanisms change and recover the skeleton of the causal structure over observed variables. Second, we present a way to determine causal orientations by making use of independent changes in the data distribution implied by the underlying causal model, benefiting from information carried by changing distributions. After learning the causal structure, next, we investigate how to efficiently estimate the driving force’ of the nonstationarity of a causal mechanism. That is, we aim to extract from data a low-dimensional and interpretable representation of changes. The proposed methods are totally nonparametric, with no restrictions on data distributions and causal mechanisms, and do not rely on window segmentation. Furthermore, we find that nonstationarity benefits causal structure identification with particular types of confounders. Finally, we show the tight connection between nonstationarity/heterogeneity and soft intervention in causal discovery. Experimental results on various synthetic and real-world data sets (task-fMRI and stock data) are presented to demonstrate the efficacy of the proposed methods. Consumer and Producer Based Recommendation(CPRec) User-Generated Content (UGC) is at the core of web applications where users can both produce and consume content. This differs from traditional e-Commerce domains where content producers and consumers are usually from two separate groups. In this work, we propose a method CPRec (consumer and producer based recommendation), for recommending content on UGC-based platforms. Specifically, we learn a core embedding for each user and two transformation matrices to project the user’s core embedding into two ‘role’ embeddings (i.e., a producer and consumer role). We model each interaction by the ternary relation between the consumer, the consumed item, and its producer. Empirical studies on two large-scale UGC applications show that our method outperforms standard collaborative filtering methods as well as recent methods that model producer information via item features. Content Grouping Content Grouping lets you group content into a logical structure that reflects how you think about your site or app, and then view and compare aggregated metrics by group name in addition to being able to drill down to the individual URL, page title, or screen name. For example, you can see the aggregated number of pageviews for all pages in a group like Men/Shirts, and then drill in to see each URL or page title. You start by creating a Content Group, a collection of content. For example, on an ecommerce site that sells clothing, you might create groups for Men, Women, and Children. Then, within each group, you might create content like Shirts, Pants, Outerwear. This would let you compare aggregated statistics for each type of clothing within a group (e.g., Men’s Shirts vs Men’s Pants vs. Men’s Outerwear). It would also let you drill in to each group to see how individual Shirts pages compare to one another, for example, Men/Shirts/T-shirts/index.html vs Men/Shirts/DressShirts/index.html. Content-Adaptive Scale Interaction Network(CaseNet) Objects in an image exhibit diverse scales. Adaptive receptive fields are expected to catch suitable range of context for accurate pixel level semantic prediction for handling objects of diverse sizes. Recently, atrous convolution with different dilation rates has been used to generate features of multi-scales through several branches and these features are fused for prediction. However, there is a lack of explicit interaction among the branches to adaptively make full use of the contexts. In this paper, we propose a Content-Adaptive Scale Interaction Network (CaseNet) to exploit the multi-scale features for scene parsing. We build the CaseNet based on the classic Atrous Spatial Pyramid Pooling (ASPP) module, followed by the proposed contextual scale interaction (CSI) module, and the scale adaptation (SA) module. Specifically, first, for each spatial position, we enable context interaction among different scales through scale-aware non-local operations across the scales, \ie, CSI module, which facilitates the generation of flexible mixed receptive fields, instead of a traditional flat one. Second, the scale adaptation module (SA) explicitly and softly selects the suitable scale for each spatial position and each channel. Ablation studies demonstrate the effectiveness of the proposed modules. We achieve state-of-the-art performance on three scene parsing benchmarks Cityscapes, ADE20K and LIP. Content-Aware ReAssembly of FEatures(CARAFE) Feature upsampling is a key operation in a number of modern convolutional network architectures, e.g. feature pyramids. Its design is critical for dense prediction tasks such as object detection and semantic/instance segmentation. In this work, we propose Content-Aware ReAssembly of FEatures (CARAFE), a universal, lightweight and highly effective operator to fulfill this goal. CARAFE has several appealing properties: (1) Large field of view. Unlike previous works (e.g. bilinear interpolation) that only exploit sub-pixel neighborhood, CARAFE can aggregate contextual information within a large receptive field. (2) Content-aware handling. Instead of using a fixed kernel for all samples (e.g. deconvolution), CARAFE enables instance specific content-aware handling, which generates adaptive kernels on-the-fly. (3) Lightweight and fast to compute. CARAFE introduces little computational overhead and can be readily integrated into modern network architectures. We conduct comprehensive evaluations on standard benchmarks in object detection, instance/semantic segmentation and inpainting. CARAFE shows consistent and substantial gains across all the tasks (1.2%, 1.3%, 1.8%, 1.1db respectively) with negligible computational overhead. It has great potential to serve as a strong building block for future research. Content-Aware Representation Learning Model(CARL) Heterogeneous networks not only present a challenge of heterogeneity in the types of nodes and relations, but also the attributes and content associated with the nodes. While recent works have looked at representation learning on homogeneous and heterogeneous networks, there is no work that has collectively addressed the following challenges: (a) the heterogeneous structural information of the network consisting of multiple types of nodes and relations; (b) the unstructured semantic content (e.g., text) associated with nodes; and (c) online updates due to incoming new nodes in growing network. We address these challenges by developing a Content-Aware Representation Learning model (CARL). CARL performs joint optimization of heterogeneous SkipGram and deep semantic encoding for capturing both heterogeneous structural closeness and unstructured semantic relations among all nodes, as function of node content, that exist in the network. Furthermore, an additional online update module is proposed for efficiently learning representations of incoming nodes. Extensive experiments demonstrate that CARL outperforms state-of-the-art baselines in various heterogeneous network mining tasks, such as link prediction, document retrieval, node recommendation and relevance search. We also demonstrate the effectiveness of the CARL’s online update module through a category visualization study. Context- Aware Bandits(CAB) In this paper, we present the CAB (Context- Aware Bandits). With CAB we attempt to craft a bandit algorithm that can exploit collaborative effects and that can be deployed in a practical recommendation system setting, where the multi-armed bandits have been shown to perform well in particular with respect to the cold start problem. CAB exploits, a context-aware clustering technique augmenting exploration-exploitation strategies in a contextual multi-armed bandit settings. CAB dynamically clusters the users based on the content universe under consideration. We demonstrate the efficacy of our approach on extensive real-world datasets, showing the scalability, and more importantly, the significant increased prediction performance compared to related state-of-the-art methods. Context Awareness Context awareness is a property of mobile devices that is defined complementarily to location awareness. Whereas location may determine how certain processes in a device operate, context may be applied more flexibly with mobile users, especially with users of smart phones. Context awareness originated as a term from ubiquitous computing or as so-called pervasive computing which sought to deal with linking changes in the environment with computer systems, which are otherwise static. The term has also been applied to business theory in relation to Contextual application design and business process management issues. Context Dependent Random Utility Model(CDM) Many applications in preference learning assume that decisions come from the maximization of a stable utility function. Yet a large experimental literature shows that individual choices and judgements can be affected by ‘irrelevant’ aspects of the context in which they are made. An important class of such contexts is the composition of the choice set. In this work, our goal is to discover such choice set effects from raw choice data. We introduce an extension of the Multinomial Logit (MNL) model, called the context dependent random utility model (CDM), which allows for a particular class of choice set effects. We show that the CDM can be thought of as a second-order approximation to a general choice system, can be inferred optimally using maximum likelihood and, importantly, is easily interpretable. We apply the CDM to both real and simulated choice data to perform principled exploratory analyses for the presence of choice set effects. Context Guided Network(CGNet) The demand of applying semantic segmentation model on mobile devices has been increasing rapidly. Current state-of-the-art networks have enormous amount of parameters hence unsuitable for mobile devices, while other small memory footprint models ignore the inherent characteristic of semantic segmentation. To tackle this problem, we propose a novel Context Guided Network (CGNet), which is a light-weight network for semantic segmentation on mobile devices. We first propose the Context Guided (CG) block, which learns the joint feature of both local feature and surrounding context, and further improves the joint feature with the global context. Based on the CG block, we develop Context Guided Network (CGNet), which captures contextual information in all stages of the network and is specially tailored for increasing segmentation accuracy. CGNet is also elaborately designed to reduce the number of parameters and save memory footprint. Under an equivalent number of parameters, the proposed CGNet significantly outperforms existing segmentation networks. Extensive experiments on Cityscapes and CamVid datasets verify the effectiveness of the proposed approach. Specifically, without any post-processing, CGNet achieves 64.8% mean IoU on Cityscapes with less than 0.5 M parameters, and has a frame-rate of 50 fps on one NVIDIA Tesla K80 card for 2048 $\times$ 1024 high-resolution images. The source code for the complete system are publicly available. Context Tree There has been growing interests in recent years from both practical and research perspectives for session-based recommendation tasks as long-term user profiles do not often exist in many real-life recommendation applications. In this case, recommendations for user’s immediate next actions need to be generated based on patterns in anonymous short sessions. An often overlooked aspect is that new items with limited observations arrive continuously in many domains (e.g. news and discussion forums). Therefore, recommendations need to be adaptive to such frequent changes. In this paper, we benchmark a new nonparametric method called context tree (CT) against various state-of-the-art methods on extensive datasets for session-based recommendation task. Apart from the standard static evaluation protocol adopted by previous literatures, we include an adaptive configuration to mimic the situation when new items with limited observations arrives continuously. Our results show that CT outperforms two best-performing approaches (recurrent neural network; heuristic-based nearest neighbor) in majority of the tested configurations and datasets. We analyze reasons for this and demonstrate that it is because of the better adaptation to changes in the domain, as well as the remarkable capability to learn static sequential patterns. Moreover, our running time analysis illustrates the efficiency of using CT as other nonparametric methods. Context-Aware Detection Network(CAD-Net) Accurate and robust detection of multi-class objects in optical remote sensing images is essential to many real-world applications such as urban planning, traffic control, searching and rescuing, etc. However, state-of-the-art object detection techniques designed for images captured using ground-level sensors usually experience a sharp performance drop when directly applied to remote sensing images, largely due to the object appearance differences in remote sensing images in term of sparse texture, low contrast, arbitrary orientations, large scale variations, etc. This paper presents a novel object detection network (CAD-Net) that exploits attention-modulated features as well as global and local contexts to address the new challenges in detecting objects from remote sensing images. The proposed CAD-Net learns global and local contexts of objects by capturing their correlations with the global scene (at scene-level) and the local neighboring objects or features (at object-level), respectively. In addition, it designs a spatial-and-scale-aware attention module that guides the network to focus on more informative regions and features as well as more appropriate feature scales. Experiments over two publicly available object detection datasets for remote sensing images demonstrate that the proposed CAD-Net achieves superior detection performance. The implementation codes will be made publicly available for facilitating future researches. Context-Aware Dynamic Block(CDB) Although deeper and larger neural networks have achieved better performance nowadays, the complex network structure and increasing computational cost cannot meet the demands of many resource-constrained applications. An effective way to address this problem is to make use of dynamic inference mechanism. Existing methods usually choose to execute or skip an entire specific layer through a switch structure, which can only alter the depth of the network. In this paper, we propose a dynamic inference method called Context-aware Dynamic Block (CDB), which provides more path selection choices in terms of network width and depth during inference. The execution of CDB is determined by a context-aware group controller, which can take into account both historical and object category information. The proposed method can be easily incorporated into most modern network architectures. Experimental results on ImageNet and CIFAR-100 demonstrate the superiority of our method on both efficiency and overall classification quality. To be specific, we integrate CDB block into ResNet-101 and find that our method significantly outperforms their counterparts and saves 45.1% FLOPs. Context-aware Path Ranking(C-PR) Knowledge base (KB) completion aims to infer missing facts from existing ones in a KB. Among various approaches, path ranking (PR) algorithms have received increasing attention in recent years. PR algorithms enumerate paths between entity pairs in a KB and use those paths as features to train a model for missing fact prediction. Due to their good performances and high model interpretability, several methods have been proposed. However, most existing methods suffer from scalability (high RAM consumption) and feature explosion (trains on an exponentially large number of features) problems. This paper proposes a Context-aware Path Ranking (C-PR) algorithm to solve these problems by introducing a selective path exploration strategy. C-PR learns global semantics of entities in the KB using word embedding and leverages the knowledge of entity semantics to enumerate contextually relevant paths using bidirectional random walk. Experimental results on three large KBs show that the path features (fewer in number) discovered by C-PR not only improve predictive performance but also are more interpretable than existing baselines. Context-Aware Personalized POI Sequence Recommender System(CAPS) The revolution of World Wide Web (WWW) and smart-phone technologies have been the key-factor behind remarkable success of social networks. With the ease of availability of check-in data, the location-based social networks (LBSN) (e.g., Facebook1, etc.) have been heavily explored in the past decade for Point-of-Interest (POI) recommendation. Though many POI recommenders have been defined, most of them have focused on recommending a single location or an arbitrary list that is not contextually coherent. It has been cumbersome to rely on such systems when one needs a contextually coherent list of locations, that can be used for various day-to-day activities, for e.g., itinerary planning. This paper proposes a model termed as CAPS (Context-Aware Personalized POI Sequence Recommender System) that generates contextually coherent POI sequences relevant to user preferences. To the best of our knowledge, CAPS is the first attempt to formulate the contextual POI sequence modeling by extending Recurrent Neural Network (RNN) and its variants. CAPS extends RNN by incorporating multiple contexts to the hidden layer and by incorporating global context (sequence features) to the hidden layers and the output layer. It extends the variants of RNN (e.g., Long-short term memory (LSTM)) by incorporating multiple contexts and global features in the gate update relations. The major contributions of this paper are: (i) it models the contextual POI sequence problem by incorporating personalized user preferences through multiple constraints (e.g., categorical, social, temporal, etc.), (ii) it extends RNN to incorporate the contexts of individual item and that of the whole sequence. It also extends the gated functionality of variants of RNN to incorporate the multiple contexts, and (iii) it evaluates the proposed models against two real-world data sets. Context-Aware Policy reuSe(CAPS) Transfer learning can greatly speed up reinforcement learning for a new task by leveraging policies of relevant tasks. Existing works of policy reuse either focus on only selecting a single best source policy for transfer without considering contexts, or cannot guarantee to learn an optimal policy for a target task. To improve transfer efficiency and guarantee optimality, we develop a novel policy reuse method, called {\em Context-Aware Policy reuSe} (CAPS), that enables multi-policy transfer. Our method learns when and which source policy is best for reuse, as well as when to terminate its reuse. CAPS provides theoretical guarantees in convergence and optimality for both source policy selection and target task learning. Empirical results on a grid-based navigation domain and the Pygame Learning Environment demonstrate that CAPS significantly outperforms other state-of-the-art policy reuse methods. Context-aware Pyramid Feature Extraction(CPFE) ➘ “Pyramid Feature Selective Network” Context-Aware Recommender Systems(CARS) Interpreting Contextual Effects By Contextual Modeling In Recommender Systems Context-Aware Segmentation and Clustering(CASC) Complex systems, such as airplanes, cars, or financial markets, produce multivariate time series data consisting of system observations over a period of time. Such data can be interpreted as a sequence of segments, where each segment is associated with a certain state of the system. An important problem in this domain is to identify repeated sequences of states, known as motifs. Such motifs correspond to complex behaviors that capture common sequences of state transitions. For example, a motif of ‘making a turn’ might manifest in sensor data as a sequence of states: slowing down, turning the wheel, and then speeding back up. However, discovering these motifs is challenging, because the individual states are unknown and need to be learned from the noisy time series. Simultaneously, the time series also needs to be precisely segmented and each segment needs to be associated with a state. Here we develop context-aware segmentation and clustering (CASC), a method for discovering common motifs in time series data. We formulate the problem of motif discovery as a large optimization problem, which we then solve using a greedy alternating minimization-based approach. CASC performs well in the presence of noise in the input data and is scalable to very large datasets. Furthermore, CASC leverages common motifs to more robustly segment the time series and assign segments to states. Experiments on synthetic data show that CASC outperforms state-of-the-art baselines by up to 38.2%, and two case studies demonstrate how our approach discovers insightful motifs in real-world time series data. Context-Aware Self-Attention Network Self-attention model have shown its flexibility in parallel computation and the effectiveness on modeling both long- and short-term dependencies. However, it calculates the dependencies between representations without considering the contextual information, which have proven useful for modeling dependencies among neural representations in various natural language tasks. In this work, we focus on improving self-attention networks through capturing the richness of context. To maintain the simplicity and flexibility of the self-attention networks, we propose to contextualize the transformations of the query and key layers, which are used to calculates the relevance between elements. Specifically, we leverage the internal representations that embed both global and deep contexts, thus avoid relying on external resources. Experimental results on WMT14 English-German and WMT17 Chinese-English translation tasks demonstrate the effectiveness and universality of the proposed methods. Furthermore, we conducted extensive analyses to quantity how the context vectors participate in the self-attention model. Context-aware Sentiment Word Identification(sentiword2vec) Traditional sentiment analysis often uses sentiment dictionary to extract sentiment information in text and classify documents. However, emerging informal words and phrases in user generated content call for analysis aware to the context. Usually, they have special meanings in a particular context. Because of its great performance in representing inter-word relation, we use sentiment word vectors to identify the special words. Based on the distributed language model word2vec, in this paper we represent a novel method about sentiment representation of word under particular context, to be detailed, to identify the words with abnormal sentiment polarity in long answers. Result shows the improved model shows better performance in representing the words with special meaning, while keep doing well in representing special idiomatic pattern. Finally, we will discuss the meaning of vectors representing in the field of sentiment, which may be different from general object-based conditions. Context-Dependent Diffusion Network(CDDN) Visual relationship detection can bridge the gap between computer vision and natural language for scene understanding of images. Different from pure object recognition tasks, the relation triplets of subject-predicate-object lie on an extreme diversity space, such as \textit{person-behind-person} and \textit{car-behind-building}, while suffering from the problem of combinatorial explosion. In this paper, we propose a context-dependent diffusion network (CDDN) framework to deal with visual relationship detection. To capture the interactions of different object instances, two types of graphs, word semantic graph and visual scene graph, are constructed to encode global context interdependency. The semantic graph is built through language priors to model semantic correlations across objects, whilst the visual scene graph defines the connections of scene objects so as to utilize the surrounding scene information. For the graph-structured data, we design a diffusion network to adaptively aggregate information from contexts, which can effectively learn latent representations of visual relationships and well cater to visual relationship detection in view of its isomorphic invariance to graphs. Experiments on two widely-used datasets demonstrate that our proposed method is more effective and achieves the state-of-the-art performance. ContextNet Modern deep learning architectures produce highly accurate results on many challenging semantic segmentation datasets. State-of-the-art methods are, however, not directly transferable to real-time applications or embedded devices, since naive adaptation of such systems to reduce computational cost (speed, memory and energy) causes a significant drop in accuracy. We propose ContextNet, a new deep neural network architecture which builds on factorized convolution, network compression and pyramid representations to produce competitive semantic segmentation in real-time with low memory requirements. ContextNet combines a deep branch at low resolution that captures global context information efficiently with a shallow branch that focuses on high-resolution segmentation details. We analyze our network in a thorough ablation study and present results on the Cityscapes dataset, achieving 66.1% accuracy at 18.2 frames per second at full (1024×2048) resolution. Contextual / Common Query Language(CQL) Contextual Query Language (CQL), previously known as Common Query Language, is a formal language for representing queries to information retrieval systems such as search engines, bibliographic catalogs and museum collection information. Based on the semantics of Z39.50, its design objective is that queries be human readable and writable, and that the language be intuitive while maintaining the expressiveness of more complex query languages. Contextual Bandit The problem of matching ads to interests is a natural machine learning problem in some ways since there is much information in who clicks on what. A fundamental problem with this information is that it is not supervised – in particular a click-or-not on one ad doesn’t generally tell you if a different ad would have been clicked on. This implies we have a fundamental exploration problem. A standard mathematical setting for this situation is “k-Armed Bandits”, often with various relevant embellishments. The k-Armed Bandit setting works on a round-by-round basis. On each round: 1. A policy chooses arm a from 1 of k arms (i.e. 1 of k ads). 2. The world reveals the reward ra of the chosen arm (i.e. whether the ad is clicked on). http://…/Multi-armed_bandit#Contextual_Bandit Contextual Bandits via RAndom Projection(CBRAP) Contextual bandits with linear payoffs, which are also known as linear bandits, provide a powerful alternative for solving practical problems of sequential decisions, e.g., online advertisements. In the era of big data, contextual data usually tend to be high-dimensional, which leads to new challenges for traditional linear bandits mostly designed for the setting of low-dimensional contextual data. Due to the curse of dimensionality, there are two challenges in most of the current bandit algorithms: the first is high time-complexity; and the second is extreme large upper regret bounds with high-dimensional data. In this paper, in order to attack the above two challenges effectively, we develop an algorithm of Contextual Bandits via RAndom Projection (\texttt{CBRAP}) in the setting of linear payoffs, which works especially for high-dimensional contextual data. The proposed \texttt{CBRAP} algorithm is time-efficient and flexible, because it enables players to choose an arm in a low-dimensional space, and relaxes the sparsity assumption of constant number of non-zero components in previous work. Besides, we provide a linear upper regret bound for the proposed algorithm, which is associated with reduced dimensions. Contextual Bilateral Loss(CoBi) This paper shows that when applying machine learning to digital zoom for photography, it is beneficial to use real, RAW sensor data for training. Existing learning-based super-resolution methods do not use real sensor data, instead operating on RGB images. In practice, these approaches result in loss of detail and accuracy in their digitally zoomed output when zooming in on distant image regions. We also show that synthesizing sensor data by resampling high-resolution RGB images is an oversimplified approximation of real sensor data and noise, resulting in worse image quality. The key barrier to using real sensor data for training is that ground truth high-resolution imagery is missing. We show how to obtain the ground-truth data with optically zoomed images and contribute a dataset, SR-RAW, for real-world computational zoom. We use SR-RAW to train a deep network with a novel contextual bilateral loss (CoBi) that delivers critical robustness to mild misalignment in input-output image pairs. The trained network achieves state-of-the-art performance in 4X and 8X computational zoom. Contextual Explanation Networks(CEN) We introduce contextual explanation networks (CENs)—a class of models that learn to predict by generating and leveraging intermediate explanations. CENs combine deep networks with context-specific probabilistic models and construct explanations in the form of locally-correct hypotheses. Contrary to the existing post-hoc model-explanation tools, CENs learn to predict and to explain jointly. Our approach offers two major advantages: (i) for each prediction, valid instance-specific explanations are generated with no computational overhead and (ii) prediction via explanation acts as a regularization and boosts performance in low-resource settings. We prove that local approximations to the decision boundary of our networks are consistent with the generated explanations. Our results on image and text classification and survival analysis tasks demonstrate that CENs can easily match or outperform the state-of-the-art while offering additional insights behind each prediction, valuable for decision support. Contextual Graph Markov Model We introduce the Contextual Graph Markov Model, an approach combining ideas from generative models and neural networks for the processing of graph data. It founds on a constructive methodology to build a deep architecture comprising layers of probabilistic models that learn to encode the structured information in an incremental fashion. Context is diffused in an efficient and scalable way across the graph vertexes and edges. The resulting graph encoding is used in combination with discriminative models to address structure classification benchmarks. Contextual Listen, Attend and Spell(CLAS) In automatic speech recognition (ASR) what a user says depends on the particular context she is in. Typically, this context is represented as a set of word n-grams. In this work, we present a novel, all-neural, end-to-end (E2E) ASR sys- tem that utilizes such context. Our approach, which we re- fer to as Contextual Listen, Attend and Spell (CLAS) jointly- optimizes the ASR components along with embeddings of the context n-grams. During inference, the CLAS system can be presented with context phrases which might contain out-of- vocabulary (OOV) terms not seen during training. We compare our proposed system to a more traditional contextualization approach, which performs shallow-fusion between independently trained LAS and contextual n-gram models during beam search. Across a number of tasks, we find that the proposed CLAS system outperforms the baseline method by as much as 68% relative WER, indicating the advantage of joint optimization over individually trained components. Index Terms: speech recognition, sequence-to-sequence models, listen attend and spell, LAS, attention, embedded speech recognition. Contextual Memory Tree We design and study a Contextual Memory Tree (CMT), a learning memory controller that inserts new memories into an experience store of unbounded size. It is designed to efficiently query for memories from that store, supporting logarithmic time insertion and retrieval operations. Hence CMT can be integrated into existing statistical learning algorithms as an augmented memory unit without substantially increasing training and inference computation. We demonstrate the efficacy of CMT by augmenting existing multi-class and multi-label classification algorithms with CMT and observe statistical improvement. We also test CMT learning on several image-captioning tasks to demonstrate that it performs computationally better than a simple nearest neighbors memory system while benefitting from reward learning. Contextual Multi-Armed Bandits Multi-Armed Bandits with side information. Contextual Outlier INterpretation(COIN) Outlier detection plays an essential role in many data-driven applications to identify isolated instances that are different from the majority. While many statistical learning and data mining techniques have been used for developing more effective outlier detection algorithms, the interpretation of detected outliers does not receive much attention. Interpretation is becoming increasingly important to help people trust and evaluate the developed models through providing intrinsic reasons why the certain outliers are chosen. It is difficult, if not impossible, to simply apply feature selection for explaining outliers due to the distinct characteristics of various detection models, complicated structures of data in certain applications, and imbalanced distribution of outliers and normal instances. In addition, the role of contrastive contexts where outliers locate, as well as the relation between outliers and contexts, are usually overlooked in interpretation. To tackle the issues above, in this paper, we propose a novel Contextual Outlier INterpretation (COIN) method to explain the abnormality of existing outliers spotted by detectors. The interpretability for an outlier is achieved from three aspects: outlierness score, attributes that contribute to the abnormality, and contextual description of its neighborhoods. Experimental results on various types of datasets demonstrate the flexibility and effectiveness of the proposed framework compared with existing interpretation approaches. Contextual Policy Optimisation(CPO) Policy gradient methods have been successfully applied to a variety of reinforcement learning tasks. However, while learning in a simulator, these methods do not utilise the opportunity to improve learning by adjusting certain environment variables: unobservable state features that are randomly determined by the environment in a physical setting, but that are controllable in a simulator. This can lead to slow learning, or convergence to highly suboptimal policies. In this paper, we present contextual policy optimisation (CPO). The central idea is to use Bayesian optimisation to actively select the distribution of the environment variable that maximises the improvement generated by each iteration of the policy gradient method. To make this Bayesian optimisation practical, we contribute two easy-to-compute low-dimensional fingerprints of the current policy. We apply CPO to a number of continuous control tasks of varying difficulty and show that CPO can efficiently learn policies that are robust to significant rare events, which are unlikely to be observable under random sampling but are key to learning good policies. Contextual Position-Based Model(CPBM) Accurate estimates of examination bias are crucial for unbiased learning-to-rank from implicit feedback in search engines and recommender systems, since they enable the use of Inverse Propensity Score (IPS) weighting techniques to address selection biases and missing data \citep{Joachims/etal/17a}. Unfortunately, existing examination-bias estimators \citep{Agarwal/etal/18c, wang2018position} are limited to the Position-Based Model (PBM) \citep{chuklin2015click}, where the examination bias may only depend on the rank of the document. To overcome this limitation, we propose a Contextual Position-Based Model (CPBM) where the examination bias may also depend on a context vector describing the query and the user. Furthermore, we propose an effective estimator for the CPBM based on intervention harvesting \citep{Agarwal/etal/18c}. A key feature of the estimator is that it does not require disruptive interventions but merely exploits natural variation resulting from the use of multiple historic ranking functions. Semi-synthetic experiments on the Yahoo Learning-To-Rank dataset demonstrate the superior effectiveness of the new approach. Contextual Recurrent Neural Networks(Contextual RNN) There is an implicit assumption that by unfolding recurrent neural networks (RNN) in finite time, the misspecification of choosing a zero value for the initial hidden state is mitigated by later time steps. This assumption has been shown to work in practice and alternative initialization may be suggested but often overlooked. In this paper, we propose a method of parameterizing the initial hidden state of an RNN. The resulting architecture, referred to as a Contextual RNN, can be trained end-to-end. The performance on an associative retrieval task is found to improve by conditioning the RNN initial hidden state on contextual information from the input sequence. Furthermore, we propose a novel method of conditionally generating sequences using the hidden state parameterization of Contextual RNN. Contextual Regression Machine learning algorithms such as linear regression, SVM and neural network have played an increasingly important role in the process of scientific discovery. However, none of them is both interpretable and accurate on nonlinear datasets. Here we present contextual regression, a method that joins these two desirable properties together using a hybrid architecture of neural network embedding and dot product layer. We demonstrate its high prediction accuracy and sensitivity through the task of predictive feature selection on a simulated dataset and the application of predicting open chromatin sites in the human genome. On the simulated data, our method achieved high fidelity recovery of feature contributions under random noise levels up to 200%. On the open chromatin dataset, the application of our method not only outperformed the state of the art method in terms of accuracy, but also unveiled two previously unfound open chromatin related histone marks. Our method can fill the blank of accurate and interpretable nonlinear modeling in scientific data mining tasks. Contextual Stochastic Block Model We provide the first information theoretic tight analysis for inference of latent community structure given a sparse graph along with high dimensional node covariates, correlated with the same latent communities. Our work bridges recent theoretical breakthroughs in the detection of latent community structure without nodes covariates and a large body of empirical work using diverse heuristics for combining node covariates with graphs for inference. The tightness of our analysis implies in particular, the information theoretical necessity of combining the different sources of information. Our analysis holds for networks of large degrees as well as for a Gaussian version of the model. Contextualized non-Local Neural Network(CN3) Recently, a large number of neural mechanisms and models have been proposed for sequence learning, of which self-attention, as exemplified by the Transformer model, and graph neural networks (GNNs) have attracted much attention. In this paper, we propose an approach that combines and draws on the complementary strengths of these two methods. Specifically, we propose contextualized non-local neural networks (CN$^{\textbf{3}}$), which can both dynamically construct a task-specific structure of a sentence and leverage rich local dependencies within a particular neighborhood. Experimental results on ten NLP tasks in text classification, semantic matching, and sequence labeling show that our proposed model outperforms competitive baselines and discovers task-specific dependency structures, thus providing better interpretability to users. Contingency Training When applied to high-dimensional datasets, feature selection algorithms might still leave dozens of irrelevant variables in the dataset. Therefore, even after feature selection has been applied, classifiers must be prepared to the presence of irrelevant variables. This paper investigates a new training method called Contingency Training which increases the accuracy as well as the robustness against irrelevant attributes. Contingency training is classifier independent. By subsampling and removing information from each sample, it creates a set of constraints. These constraints aid the method to automatically find proper importance weights of the dataset’s features. Experiments are conducted with the contingency training applied to neural networks over traditional datasets as well as datasets with additional irrelevant variables. For all of the tests, contingency training surpassed the unmodified training on datasets with irrelevant variables and even outperformed slightly when only a few or no irrelevant variables were present. Continual Learning via Neural Pruning(CLNP) We introduce Continual Learning via Neural Pruning (CLNP), a new method aimed at lifelong learning in fixed capacity models based on neuronal model sparsification. In this method, subsequent tasks are trained using the inactive neurons and filters of the sparsified network and cause zero deterioration to the performance of previous tasks. In order to deal with the possible compromise between model sparsity and performance, we formalize and incorporate the concept of graceful forgetting: the idea that it is preferable to suffer a small amount of forgetting in a controlled manner if it helps regain network capacity and prevents uncontrolled loss of performance during the training of future tasks. CLNP also provides simple continual learning diagnostic tools in terms of the number of free neurons left for the training of future tasks as well as the number of neurons that are being reused. In particular, we see in experiments that CLNP verifies and automatically takes advantage of the fact that the features of earlier layers are more transferable. We show empirically that CLNP leads to significantly improved results over current weight elasticity based methods. Continuation Multiple Instance Learning(C-MIL) Weakly supervised object detection (WSOD) is a challenging task when provided with image category supervision but required to simultaneously learn object locations and object detectors. Many WSOD approaches adopt multiple instance learning (MIL) and have non-convex loss functions which are prone to get stuck into local minima (falsely localize object parts) while missing full object extent during training. In this paper, we introduce a continuation optimization method into MIL and thereby creating continuation multiple instance learning (C-MIL), with the intention of alleviating the non-convexity problem in a systematic way. We partition instances into spatially related and class related subsets, and approximate the original loss function with a series of smoothed loss functions defined within the subsets. Optimizing smoothed loss functions prevents the training procedure falling prematurely into local minima and facilitates the discovery of Stable Semantic Extremal Regions (SSERs) which indicate full object extent. On the PASCAL VOC 2007 and 2012 datasets, C-MIL improves the state-of-the-art of weakly supervised object detection and weakly supervised object localization with large margins. Continued Logarithm(CL) Analysis of the Continued Logarithm Algorithm Continuous Adaptive Blending(CAB) The ability to perform offline A/B-testing and off-policy learning using logged contextual bandit feedback is highly desirable in a broad range of applications, including recommender systems, search engines, ad placement, and personalized health care. Both offline A/B-testing and off-policy learning require a counterfactual estimator that evaluates how some new policy would have performed, if it had been used instead of the logging policy. This paper proposes a new counterfactual estimator – called Continuous Adaptive Blending (CAB) – for this policy evaluation problem that combines regression and weighting approaches for an effective bias/variance trade-off. It can be substantially less biased than clipped Inverse Propensity Score weighting and the Direct Method, and it can have less variance compared with Doubly Robust and IPS estimators. Experimental results show that CAB provides excellent and reliable estimation accuracy compared to other blended estimators, and – unlike the SWITCH estimator – is sub-differentiable such that it can be used for learning. Continuous Bag-of-Words(CBOW) The ‘continuous bag-of-words model’ (CBOW) adds inputs from words within short window to predict the current word. http://…/1301.3781.pdf Continuous Computation Language(CCL) For Sybase Complex Event Procesing (CEP), developers create CEP applications using the Continuous Computation Language (CCL). Introduced in 2005, CCL was the first commercial, declarative SQL-based CEP language and remains the most extensive SQL-based CEP language on the market. Because the Continuous Computation Language (CCL) is a SQL-based language, it gives programmers a huge head start in creating CEP applications. The Sybase CEP Studio helps manage all aspects of the application development process, further increasing programmer productivity. Continuous Lagrangian Reachability(CLRT) We introduce continuous Lagrangian reachability (CLRT), a new algorithm for the computation of a tight and continuous-time reachtube for the solution flows of a nonlinear, time-variant dynamical system. CLRT employs finite strain theory to determine the deformation of the solution set from time $t_i$ to time $t_{i+1}$. We have developed simple explicit analytic formulas for the optimal metric for this deformation; this is superior to prior work, which used semi-definite programming. CLRT also uses infinitesimal strain theory to derive an optimal time increment $h_i$ between $t_i$ and $t_{i+1}$, nonlinear optimization to minimally bloat (i.e., using a minimal radius) the state set at time $t_i$ such that it includes all the states of the solution flow in the interval $[t_i,t_{i+1}]$. We use $\delta$-satisfiability to ensure the correctness of the bloating. Our results on a series of benchmarks show that CLRT performs favorably compared to state-of-the-art tools such as CAPD in terms of the continuous reachtube volumes they compute. Continuous Semantic Topic Embedding Model(CSTEM) This paper proposes the continuous semantic topic embedding model (CSTEM) which finds latent topic variables in documents using continuous semantic distance function between the topics and the words by means of the variational autoencoder(VAE). The semantic distance could be represented by any symmetric bell-shaped geometric distance function on the Euclidean space, for which the Mahalanobis distance is used in this paper. In order for the semantic distance to perform more properly, we newly introduce an additional model parameter for each word to take out the global factor from this distance indicating how likely it occurs regardless of its topic. It certainly improves the problem that the Gaussian distribution which is used in previous topic model with continuous word embedding could not explain the semantic relation correctly and helps to obtain the higher topic coherence. Through the experiments with the dataset of 20 Newsgroup, NIPS papers and CNN/Dailymail corpus, the performance of the recent state-of-the-art models is accomplished by our model as well as generating topic embedding vectors which makes possible to observe where the topic vectors are embedded with the word vectors in the real Euclidean space and how the topics are related each other semantically. Continuous Skip-gram(Skip-gram) The training objective of the Skip-gram model is to find word representations that are useful for predicting the surrounding words in a sentence or a document. More formally, given a sequence of training words w1,w2,w3, … ,wT , the objective of the Skip-gram model is to maximize the average log probability, where c is the size of the training context (which can be a function of the center word wt). Larger c results in more training examples and thus can lead to a higher accuracy, at the expense of the 2 training time. http://…/1301.3781.pdf Continuous Time Autoregressive Moving Average(CARMA) We introduce the class of continuous-time autoregressive moving-average (CARMA) processes in Hilbert spaces. As driving noises of these processes we consider Levy processes in Hilbert space. We provide the basic definitions, show relevant properties of these processes and establish the equivalents of CARMA processes on the real line. Finally, CARMA processes in Hilbert space are linked to the stochastic wave equation and functional autoregressive processes. Multivariate stochastic delay differential equations and CAR representations of CARMA processes Continuous Time Stochastic Modelling(CTSM) In probability theory and statistics, a continuous-time stochastic process, or a continuous-space-time stochastic process is a stochastic process for which the index variable takes a continuous set of values, as contrasted with a discrete-time process for which the index variable takes only distinct values. An alternative terminology uses continuous parameter as being more inclusive. A more restricted class of processes are the continuous stochastic processes: here the term often (but not always) implies both that the index variable is continuous and that sample paths of the process are continuous. Given the possible confusion, caution is needed. Continuous-time stochastic processes that are constructed from discrete-time processes via a waiting time distribution are called continuous-time random walks. ctsmr Continuous-Time Dynamic Network Embedding(CTDNE) Networks evolve continuously over time with the addition, deletion, and changing of links and nodes. Such temporal networks (or edge streams) consist of a sequence of timestamped edges and are seemingly ubiquitous. Despite the importance of accurately modeling the temporal information, most embedding methods ignore it entirely or approximate the temporal network using a sequence of static snapshot graphs. In this work, we introduce the notion of \emph{temporal walks} for learning dynamic embeddings from temporal networks. Temporal walks capture the temporally valid interactions (\eg, flow of information, spread of disease) in the dynamic network in a lossless fashion. Based on the notion of temporal walks, we describe a general class of embeddings called continuous-time dynamic network embeddings (CTDNEs) that completely avoid the issues and problems that arise when approximating the temporal network as a sequence of static snapshot graphs. Unlike previous work, CTDNEs learn dynamic node embeddings directly from the temporal network at the finest temporal granularity and thus use only temporally valid information. As such CTDNEs naturally support online learning of the node embeddings in a streaming real-time fashion. The experiments demonstrate the effectiveness of this class of embedding methods for prediction in temporal networks. Continuous-Time Fractionally Integrated ARMA(CARFIMA) carfima Continuous-Time State Transition Model(CTSTM) Contour-Aware Informative Aggregation Network(CIA-Net) Accurate segmenting nuclei instances is a crucial step in computer-aided image analysis to extract rich features for cellular estimation and following diagnosis as well as treatment. While it still remains challenging because the wide existence of nuclei clusters, along with the large morphological variances among different organs make nuclei instance segmentation susceptible to over-/under-segmentation. Additionally, the inevitably subjective annotating and mislabeling prevent the network learning from reliable samples and eventually reduce the generalization capability for robustly segmenting unseen organ nuclei. To address these issues, we propose a novel deep neural network, namely Contour-aware Informative Aggregation Network (CIA-Net) with multi-level information aggregation module between two task-specific decoders. Rather than independent decoders, it leverages the merit of spatial and texture dependencies between nuclei and contour by bi-directionally aggregating task-specific features. Furthermore, we proposed a novel smooth truncated loss that modulates losses to reduce the perturbation from outliers. Consequently, the network can focus on learning from reliable and informative samples, which inherently improves the generalization capability. Experiments on the 2018 MICCAI challenge of Multi-Organ-Nuclei-Segmentation validated the effectiveness of our proposed method, surpassing all the other 35 competitive teams by a significant margin. Contrast In statistics, particularly analysis of variance and linear regression, an orthogonal contrast is a linear combination of two or more factor level means (averages) whose coefficients add up to zero. Non-orthogonal contrasts do not necessarily sum to 0. Contrasts should be constructed “to answer specific research questions”, and do not necessarily have to be orthogonal. Contrast Analysis ➚ “Contrast” Contrast Analysis What is a contrast matrix (a term, pertaining to an analysis with categorical predictors)? Contrastive Divergence(CD) Contrastive Divergence (CD), an approximate Maximum-Likelihood (ML) learning algorithm proposed by Geoffrey Hinton. Contrastive Divergence is basically a funky term for “approximate gradient descent”. Contrastive Fairness We present contrastive fairness, a new direction in causal inference applied to algorithmic fairness. Earlier methods dealt with the ‘what if?’ question (counterfactual fairness, NeurIPS’17). We establish the theoretical and mathematical implications of the contrastive question ‘why this and not that?’ in context of algorithmic fairness in machine learning. This is essential to defend the fairness of algorithmic decisions in tasks where a person or sub-group of people is chosen over another (job recruitment, university admission, company layovers, etc). This development is also helpful to institutions to ensure or defend the fairness of their automated decision making processes. A test case of employee job location allocation is provided as an illustrative example. Contrastive Multivariate Singular Spectrum Analysis We introduce Contrastive Multivariate Singular Spectrum Analysis, a novel unsupervised method for dimensionality reduction and signal decomposition of time series data. By utilizing an appropriate background dataset, the method transforms a target time series dataset in a way that evinces the sub-signals that are enhanced in the target dataset, as opposed to only those that account for the greatest variance. This shifts the goal from finding signals that explain the most variance to signals that matter the most to the analyst. We demonstrate our method on an illustrative synthetic example, as well as show the utility of our method in the downstream clustering of electrocardiogram signals from the public MHEALTH dataset. Contrastive Predictive Coding While supervised learning has enabled great progress in many applications, unsupervised learning has not seen such widespread adoption, and remains an important and challenging endeavor for artificial intelligence. In this work, we propose a universal unsupervised learning approach to extract useful representations from high-dimensional data, which we call Contrastive Predictive Coding. The key insight of our model is to learn such representations by predicting the future in latent space by using powerful autoregressive models. We use a probabilistic contrastive loss which induces the latent space to capture information that is maximally useful to predict future samples. It also makes the model tractable by using negative sampling. While most prior work has focused on evaluating representations for a particular modality, we demonstrate that our approach is able to learn useful representations achieving strong performance on four distinct domains: speech, images, text and reinforcement learning in 3D environments. Contrastive Principal Component Analysis(cPCA) We present a new technique called contrastive principal component analysis (cPCA) that is designed to discover low-dimensional structure that is unique to a dataset, or enriched in one dataset relative to other data. The technique is a generalization of standard PCA, for the setting where multiple datasets are available — e.g. a treatment and a control group, or a mixed versus a homogeneous population — and the goal is to explore patterns that are specific to one of the datasets. We conduct a wide variety of experiments in which cPCA identifies important dataset-specific patterns that are missed by PCA, demonstrating that it is useful for many applications: subgroup discovery, visualizing trends, feature selection, denoising, and data-dependent standardization. We provide geometrical interpretations of cPCA and show that it satisfies desirable theoretical guarantees. We also extend cPCA to nonlinear settings in the form of kernel cPCA. We have released our code as a python package and documentation is on Github. Contrastive Variational Autoencoder(cVAE) Variational autoencoders are powerful algorithms for identifying dominant latent structure in a single dataset. In many applications, however, we are interested in modeling latent structure and variation that are enriched in a target dataset compared to some background—e.g. enriched in patients compared to the general population. Contrastive learning is a principled framework to capture such enriched variation between the target and background, but state-of-the-art contrastive methods are limited to linear models. In this paper, we introduce the contrastive variational autoencoder (cVAE), which combines the benefits of contrastive learning with the power of deep generative models. The cVAE is designed to identify and enhance salient latent features. The cVAE is trained on two related but unpaired datasets, one of which has minimal contribution from the salient latent features. The cVAE explicitly models latent features that are shared between the datasets, as well as those that are enriched in one dataset relative to the other, which allows the algorithm to isolate and enhance the salient latent features. The algorithm is straightforward to implement, has a similar run-time to the standard VAE, and is robust to noise and dataset purity. We conduct experiments across diverse types of data, including gene expression and facial images, showing that the cVAE effectively uncovers latent structure that is salient in a particular analysis. Contrastivecenter Loss The deep convolutional neural network(CNN) has significantly raised the performance of image classification and face recognition. Softmax is usually used as supervision, but it only penalizes the classification loss. In this paper, we propose a novel auxiliary supervision signal called contrastivecenter loss, which can further enhance the discriminative power of the features, for it learns a class center for each class. The proposed contrastive-center loss simultaneously considers intra-class compactness and inter-class separability, by penalizing the contrastive values between: (1)the distances of training samples to their corresponding class centers, and (2)the sum of the distances of training samples to their non-corresponding class centers. Experiments on different datasets demonstrate the effectiveness of contrastive-center loss. Control Toolbox(CT) We introduce the Control Toolbox (CT), an open-source C++ library for efficient modelling, control, estimation, trajectory optimization and model predictive control. The CT is applicable to a broad class of dynamic systems, but features additional modelling tools specially designed for robotics. This paper outlines its general concept, its major building blocks and highlights selected application examples. The CT was designed for intuitive modelling of systems governed by ordinary differential- or difference equations. It supports rapid prototyping of cost functions and constraints and provides common interfaces for different optimal control solvers. To date, we support Single Shooting, the iterative Linear-Quadratic Regulator, Gauss-Newton Multiple Shooting and classical Direct Multiple Shooting. We provide interfaces to different NLP and linear-quadratic solvers, such as IPOPT, SNOPT, HPIPM, or a custom Riccati solver. The CT was designed with performance for online control in mind and allows to solve large-scale optimal control problems highly efficiently. Some of the key features enabling fast run-time performance are full support for Automatic Differentiation, derivative code generation and thorough multi-threading. For robotics problems, the we offer an interface to a fully auto-differentiable rigid-body dynamics modelling engine. In combination with derivative code generation, this allows for an unprecedented performance in solving optimal control problems for complex articulated robotic systems. Contropedia Collaborative content creation inevitably reaches situations where different points of view lead to conflict. We focus on Wikipedia, the free encyclopedia anyone may edit, where disputes about content in controversial articles often reflect larger societal debates. While Wikipedia has a public edit history and discussion section for every article, the substance of these sections is difficult to phantom for Wikipedia users interested in the development of an article and in locating which topics were most controversial. In this paper we present Contropedia, a tool that augments Wikipedia articles and gives insight into the development of controversial topics. Contropedia uses an efficient language agnostic measure based on the edit history that focuses on wiki links to easily identify which topics within a Wikipedia article have been most controversial and when. conu There has been a need for a simple, easy-to-use handler for writing tests and other code around containers that would implement helpful methods and utilities. For this we introduce conu, a low-level Python library. This project has been driven from the start by the requirements of container maintainers and testers. In addition to basic image and container management methods, it provides other often used functions, such as container mount, shortcut methods for getting an IP address, exposed ports, logs, name, image extending using source-to-image, and many others. conu aims for stable engine-agnostic APIs that would be implemented by several container runtime back-ends. Switching between two different container engines should require only minimum effort. When used for testing, one set of tests could be executed for multiple back-ends. ConvCSNet Compressive sensing (CS), aiming to reconstruct an image/signal from a small set of random measurements has attracted considerable attentions in recent years. Due to the high dimensionality of images, previous CS methods mainly work on image blocks to avoid the huge requirements of memory and computation, i.e., image blocks are measured with Gaussian random matrices, and the whole images are recovered from the reconstructed image blocks. Though efficient, such methods suffer from serious blocking artifacts. In this paper, we propose a convolutional CS framework that senses the whole image using a set of convolutional filters. Instead of reconstructing individual blocks, the whole image is reconstructed from the linear convolutional measurements. Specifically, the convolutional CS is implemented based on a convolutional neural network (CNN), which performs both the convolutional CS and nonlinear reconstruction. Through end-to-end training, the sensing filters and the reconstruction network can be jointly optimized. To facilitate the design of the CS reconstruction network, a novel two-branch CNN inspired from a sparsity-based CS reconstruction model is developed. Experimental results show that the proposed method substantially outperforms previous state-of-the-art CS methods in term of both PSNR and visual quality. CONVERGE-FAST-AUXNET In this paper, we introduce an innovative method to improve the convergence speed and accuracy of object detection neural networks. Our approach, CONVERGE-FAST-AUXNET, is based on employing multiple, dependent loss metrics and weighting them optimally using an on-line trained auxiliary network. Experiments are performed in the well-known RoboCup@Work challenge environment. A fully convolutional segmentation network is trained on detecting objects’ pickup points. We empirically obtain an approximate measure for the rate of success of a robotic pickup operation based on the accuracy of the object detection network. Our experiments show that adding an optimally weighted Euclidean distance loss to a network trained on the commonly used Intersection over Union (IoU) metric reduces the convergence time by 42.48%. The estimated pickup rate is improved by 39.90%. Compared to state-of-the-art task weighting methods, the improvement is 24.5% in convergence, and 15.8% on the estimated pickup rate. Convergence Clubs Clustering Regions that form Convergence Clubs, according to the definition of Phillips and Sul (2009) . ConvergenceClubs CONvergence of iterated CORrelations(CONCOR) Given an adjacency matrix, or a set of adjacency matrices for different relations, a correlation matrix can be formed by the following procedure. Form a profile vector for a vertex i by concatenating the ith row in every adjacency matrix; the i,jth element of the correlation matrix is the Pearson correlation coefficient of the profile vectors of i and j. This (square, symmetric) matrix is called the first correlation matrix. The procedure can be performed iteratively on the correlation matrix until convergence. Each entry is now 1 or -1. This matrix is used to split the data into two blocks such that members of the same block are positively correlated, members of different blocks are negatively correlated. CONCOR uses the above technique to split the initial data into two blocks. Successive splits are then applied to the separate blocks. At each iteration all blocks are submitted for analysis, however blocks containing two vertices are not split. Consequently n-partitions of the binary tree can produce up to 2n blocks. Note that any similarity matrix can be used as input. http://…/concor-in-r Convergence of Random Variables In probability theory, there exist several different notions of convergence of random variables. The convergence of sequences of random variables to some limit random variable is an important concept in probability theory, and its applications to statistics and stochastic processes. The same concepts are known in more general mathematics as stochastic convergence and they formalize the idea that a sequence of essentially random or unpredictable events can sometimes be expected to settle down into a behaviour that is essentially unchanging when items far enough into the sequence are studied. The different possible notions of convergence relate to how such a behaviour can be characterised: two readily understood behaviours are that the sequence eventually takes a constant value, and that values in the sequence continue to change but can be described by an unchanging probability distribution. http://…ty_theory#Convergence_of_random_variables http://…-of-convergence-in-probability-theory.jpg COnvergent Actor-Critic by Humans(COACH) To widen their accessibility and increase their utility, intelligent agents must be able to learn complex behaviors as specified by (non-expert) human users. Moreover, they will need to learn these behaviors within a reasonable amount of time while efficiently leveraging the sparse feedback a human trainer is capable of providing. Recent work has shown that human feedback can be characterized as a critique of an agent’s current behavior rather than as an alternative reward signal to be maximized, culminating in the COnvergent Actor-Critic by Humans (COACH) algorithm for making direct policy updates based on human feedback. Our work builds on COACH, moving to a setting where the agent’s policy is represented by a deep neural network. We employ a series of modifications on top of the original COACH algorithm that are critical for successfully learning behaviors from high-dimensional observations, while also satisfying the constraint of obtaining reduced sample complexity. We demonstrate the effectiveness of our Deep COACH algorithm in the rich 3D world of Minecraft with an agent that learns to complete tasks by mapping from raw pixels to actions using only real-time human feedback in 10-15 minutes of interaction. Convergent Cross Mapping(CCM) Convergent cross mapping (CCM) is a statistical test for a cause-and-effect relationship between two time series variables that, like the Granger causality test, seeks to resolve the problem that correlation does not imply causation. While Granger causality is best suited for purely stochastic systems where the influence of the causal variables are separable (independent of each other), CCM is based on the theory of Dynamical systems and can be applied to systems where causal variables have synergistic effects. The test was developed in 2012 by the lab of George Sugihara of the Scripps Institution of Oceanography, La Jolla, California, USA. Parallelizing Convergent Cross Mapping Using Apache Spark Convex Banding of the Covariance Matrix We introduce a new sparse estimator of the covariance matrix for high-dimensional models in which the variables have a known ordering. Our estimator, which is the solution to a convex optimization problem, is equivalently expressed as an estimator which tapers the sample covariance matrix by a Toeplitz, sparsely-banded, data-adaptive matrix. As a result of this adaptivity, the convex banding estimator enjoys theoretical optimality properties not attained by previous banding or tapered estimators. In particular, our convex banding estimator is minimax rate adaptive in Frobenius and operator norms, up to log factors, over commonly-studied classes of covariance matrices, and over more general classes. Furthermore, it correctly recovers the bandwidth when the true covariance is exactly banded. Our convex formulation admits a simple and efficient algorithm. Empirical studies demonstrate its practical effectiveness and illustrate that our exactly-banded estimator works well even when the true covariance matrix is only close to a banded matrix, confirming our theoretical results. Our method compares favorably with all existing methods, in terms of accuracy and speed. We illustrate the practical merits of the convex banding estimator by showing that it can be used to improve the performance of discriminant analysis for classifying sound recordings. Convex Feasibility Problem(CFP) The convex feasibility problem (CFP) is to find a feasible point in the intersection of finitely many convex and closed sets. If the intersection is empty then the CFP is inconsistent and a feasible point does not exist. Convex Function In mathematics, a real-valued function f(x) defined on an interval is called convex (or convex downward or concave upward) if the line segment between any two points on the graph of the function lies above the graph, in a Euclidean space (or more generally a vector space) of at least two dimensions. Equivalently, a function is convex if its epigraph (the set of points on or above the graph of the function) is a convex set. Well-known examples of convex functions are the quadratic function f(x)=x^2 and the exponential function f(x)=e^x for any real number x. Convex functions play an important role in many areas of mathematics. They are especially important in the study of optimization problems where they are distinguished by a number of convenient properties. For instance, a (strictly) convex function on an open set has no more than one minimum. Even in infinite-dimensional spaces, under suitable additional hypotheses, convex functions continue to satisfy such properties and, as a result, they are the most well-understood functionals in the calculus of variations. In probability theory, a convex function applied to the expected value of a random variable is always less than or equal to the expected value of the convex function of the random variable. This result, known as Jensen’s inequality, underlies many important inequalities (including, for instance, the arithmetic-geometric mean inequality and Hölder’s inequality). Exponential growth is a special case of convexity. Exponential growth narrowly means “increasing at a rate proportional to the current value”, while convex growth generally means “increasing at an increasing rate (but not necessarily proportionally to current value)”. Convex Hierarchical Testing(CHT) We consider the testing of all pairwise interactions in a two-class problem with many features. We devise a hierarchical testing framework that considers an interaction only when one or more of its constituent features has a nonzero main effect. The test is based on a convex optimization framework that seamlessly considers main effects and interactions together. Convex Optimization Convex minimization, a subfield of optimization, studies the problem of minimizing convex functions over convex sets. The convexity property can make optimization in some sense “easier” than the general case – for example, any local minimum must be a global minimum. Convexified Convolutional Neural Networks(CCNN) We describe the class of convexified convolutional neural networks (CCNNs), which capture the parameter sharing of convolutional neural networks in a convex manner. By representing the nonlinear convolutional filters as vectors in a reproducing kernel Hilbert space, the CNN parameters can be represented as a low-rank matrix, which can be relaxed to obtain a convex optimization problem. For learning two-layer convolutional neural networks, we prove that the generalization error obtained by a convexified CNN converges to that of the best possible CNN. For learning deeper networks, we train CCNNs in a layer-wise manner. Empirically, CCNNs achieve performance competitive with CNNs trained by backpropagation, SVMs, fully-connected neural networks, stacked denoising auto-encoders, and other baseline methods. ConvFlow Bayesian posterior inference is prevalent in various machine learning problems. Variational inference provides one way to approximate the posterior distribution, however its expressive power is limited and so is the accuracy of resulting approximation. Recently, there has a trend of using neural networks to approximate the variational posterior distribution due to the flexibility of neural network architecture. One way to construct flexible variational distribution is to warp a simple density into a complex by normalizing flows, where the resulting density can be analytically evaluated. However, there is a trade-off between the flexibility of normalizing flow and computation cost for efficient transformation. In this paper, we propose a simple yet effective architecture of normalizing flows, ConvFlow, based on convolution over the dimensions of random input vector. Experiments on synthetic and real world posterior inference problems demonstrate the effectiveness and efficiency of the proposed method. ConvLab We present ConvLab, an open-source multi-domain end-to-end dialog system platform, that enables researchers to quickly set up experiments with reusable components and compare a large set of different approaches, ranging from conventional pipeline systems to end-to-end neural models, in common environments. ConvLab offers a set of fully annotated datasets and associated pre-trained reference models. As a showcase, we extend the MultiWOZ dataset with user dialog act annotations to train all component models and demonstrate how ConvLab makes it easy and effortless to conduct complicated experiments in multi-domain end-to-end dialog settings. ConvNetJS ConvNetJS is a Javascript library for training Deep Learning models (mainly Neural Networks) entirely in your browser. Open a tab and you’re training. No software requirements, no compilers, no installations, no GPUs, no sweat. Convolution In mathematics and, in particular, functional analysis, convolution is a mathematical operation on two functions f and g, producing a third function that is typically viewed as a modified version of one of the original functions, giving the area overlap between the two functions as a function of the amount that one of the original functions is translated. Convolution is similar to cross-correlation. It has applications that include probability, statistics, computer vision, image and signal processing, electrical engineering, and differential equations. Convolutional Analysis Operator Learning(CAOL) Convolutional analysis operator learning (CAOL) enables the unsupervised training of (hierachical) convolutional sparsifying operators or autoencoders from large datasets. One can use many training images for CAOL, but a precise understanding of the impact of doing so has remained an open question. This paper presents a series of results that lend insight into the impact of dataset size on the filter update in CAOL. The first result is a general deterministic bound on errors in the estimated filters that then leads to two specific bounds under particular random models. The first bound illustrates a decrease in the expected filter estimation error as the number of training samples increases, and the second bound provides high probability analogues. The bounds depend on properties of the training data, and we investigate their empirical values with real data. Taken together, these results provide evidence for the potential benefit of using more training data in CAOL. Convolutional Block Attention Module(CBAM) We propose Convolutional Block Attention Module (CBAM), a simple yet effective attention module for feed-forward convolutional neural networks. Given an intermediate feature map, our module sequentially infers attention maps along two separate dimensions, channel and spatial, then the attention maps are multiplied to the input feature map for adaptive feature refinement. Because CBAM is a lightweight and general module, it can be integrated into any CNN architectures seamlessly with negligible overheads and is end-to-end trainable along with base CNNs. We validate our CBAM through extensive experiments on ImageNet-1K, MS~COCO detection, and VOC~2007 detection datasets. Our experiments show consistent improvements in classification and detection performances with various models, demonstrating the wide applicability of CBAM. The code and models will be publicly available. Convolutional Cluster Pooling We present a novel and hierarchical approach for supervised classification of signals spanning over a fixed graph, reflecting shared properties of the dataset. To this end, we introduce a Convolutional Cluster Pooling layer exploiting a multi-scale clustering in order to highlight, at different resolutions, locally connected regions on the input graph. Our proposal generalises well-established neural models such as Convolutional Neural Networks (CNNs) on irregular and complex domains, by means of the exploitation of the weight sharing property in a graph-oriented architecture. In this work, such property is based on the centrality of each vertex within its soft-assigned cluster. Extensive experiments on NTU RGB+D, CIFAR-10 and 20NEWS demonstrate the effectiveness of the proposed technique in capturing both local and global patterns in graph-structured data out of different domains. Convolutional Deep Averaging Network(CDAN) Unordered feature sets are a nonstandard data structure that traditional neural networks are incapable of addressing in a principled manner. Providing a concatenation of features in an arbitrary order may lead to the learning of spurious patterns or biases that do not actually exist. Another complication is introduced if the number of features varies between each set. We propose convolutional deep averaging networks (CDANs) for classifying and learning representations of datasets whose instances comprise variable-size, unordered feature sets. CDANs are efficient, permutation-invariant, and capable of accepting sets of arbitrary size. We emphasize the importance of nonlinear feature embeddings for obtaining effective CDAN classifiers and illustrate their advantages in experiments versus linear embeddings and alternative permutation-invariant and -equivariant architectures. Convolutional Dictionary Learning Convolutional sparse representations are a form of sparse representation with a dictionary that has a structure that is equivalent to convolution with a set of linear filters. While effective algorithms have recently been developed for the convolutional sparse coding problem, the corresponding dictionary learning problem is substantially more challenging. Furthermore, although a number of different approaches have been proposed, the absence of thorough comparisons between them makes it difficult to determine which of them represents the current state of the art. The present work both addresses this deficiency and proposes some new approaches that outperform existing ones in certain contexts. A thorough set of performance comparisons indicates a very wide range of performance differences among the existing and proposed methods, and clearly identifies those that are the most effective. Convolutional Gaussian Processes We present a practical way of introducing convolutional structure into Gaussian processes, making them more suited to high-dimensional inputs like images. The main contribution of our work is the construction of an inter-domain inducing point approximation that is well-tailored to the convolutional kernel. This allows us to gain the generalisation benefit of a convolutional kernel, together with fast but accurate posterior inference. We investigate several variations of the convolutional kernel, and apply it to MNIST and CIFAR-10, which have both been known to be challenging for Gaussian processes. We also show how the marginal likelihood can be used to find an optimal weighting between convolutional and RBF kernels to further improve performance. We hope that this illustration of the usefulness of a marginal likelihood will help automate discovering architectures in larger models. Convolutional Geometric Matrix Completion(CGMC) Geometric matrix completion~(GMC) has been proposed for recommendation by integrating the relationship~(link) graphs among users/items into matrix completion~(MC) . Traditional \mbox{GMC} methods typically adopt graph regularization to impose smoothness priors for \mbox{MC}. Recently, geometric deep learning on graphs~(\mbox{GDLG}) is proposed to solve the GMC problem, showing better performance than existing GMC methods including traditional graph regularization based methods. To the best of our knowledge, there exists only one GDLG method for GMC, which is called \mbox{RMGCNN}. RMGCNN combines graph convolutional network~(GCN) and recurrent neural network~(RNN) together for GMC. In the original work of RMGCNN, RMGCNN demonstrates better performance than pure GCN-based method. In this paper, we propose a new \mbox{GMC} method, called \underline{c}onvolutional \underline{g}eometric \underline{m}atrix \underline{c}ompletion~(CGMC), for recommendation with graphs among users/items. CGMC is a pure GCN-based method with a newly designed graph convolutional network. Experimental results on real datasets show that CGMC can outperform other state-of-the-art methods including RMGCNN. Convolutional Highways Convolutional highways are deep networks based on multiple stacked convolutional layers for feature preprocessing. Convolutional Neural Knowledge Graph Learning Previous models for learning entity and relationship embeddings of knowledge graphs such as TransE, TransH, and TransR aim to explore new links based on learned representations. However, these models interpret relationships as simple translations on entity embeddings. In this paper, we try to learn more complex connections between entities and relationships. In particular, we use a Convolutional Neural Network (CNN) to learn entity and relationship representations in knowledge graphs. In our model, we treat entities and relationships as one-dimensional numerical sequences with the same length. After that, we combine each triplet of head, relationship, and tail together as a matrix with height 3. CNN is applied to the triplets to get confidence scores. Positive and manually corrupted negative triplets are used to train the embeddings and the CNN model simultaneously. Experimental results on public benchmark datasets show that the proposed model outperforms state-of-the-art models on exploring unseen relationships, which proves that CNN is effective to learn complex interactive patterns between entities and relationships. Convolutional Neural Network In computer science, a convolutional neural network is a type of feed-forward artificial neural network where the individual neurons are tiled in such a way that they respond to overlapping regions in the visual field. Convolutional networks were inspired by biological processes and are variations of multilayer perceptrons which are designed to use minimal amounts of preprocessing. They are widely used models for image recognition. http://…oduction_to_Convolutional_Neural_Networks Convolutional Neural Network – Support Vector Machine(CNN-SVM) Convolutional neural networks (CNNs) are similar to ‘ordinary’ neural networks in the sense that they are made up of hidden layers consisting of neurons with ‘learnable’ parameters. These neurons receive inputs, performs a dot product, and then follows it with a non-linearity. The whole network expresses the mapping between raw image pixels and their class scores. Conventionally, the Softmax function is the classifier used at the last layer of this network. However, there have been studies (Alalshekmubarak and Smith, 2013; Agarap, 2017; Tang, 2013) conducted to challenge this norm. The cited studies introduce the usage of linear support vector machine (SVM) in an artificial neural network architecture. This project is yet another take on the subject, and is inspired by (Tang, 2013). Empirical data has shown that the CNN-SVM model was able to achieve a test accuracy of ~99.04% using the MNIST dataset (LeCun, Cortes, and Burges, 2010). On the other hand, the CNN-Softmax was able to achieve a test accuracy of ~99.23% using the same dataset. Both models were also tested on the recently-published Fashion-MNIST dataset (Xiao, Rasul, and Vollgraf, 2017), which is suppose to be a more difficult image classification dataset than MNIST (Zalandoresearch, 2017). This proved to be the case as CNN-SVM reached a test accuracy of ~90.72%, while the CNN-Softmax reached a test accuracy of ~91.86%. The said results may be improved if data preprocessing techniques were employed on the datasets, and if the base CNN model was a relatively more sophisticated than the one used in this study. Convolutional Neural Network with Alternately Updated Clique(CliqueNet) Improving information flow in deep networks helps to ease the training difficulties and utilize parameters more efficiently. Here we propose a new convolutional neural network architecture with alternately updated clique (CliqueNet). In contrast to prior networks, there are both forward and backward connections between any two layers in the same block. The layers are constructed as a loop and are updated alternately. The CliqueNet has some unique properties. For each layer, it is both the input and output of any other layer in the same block, so that the information flow among layers is maximized. During propagation, the newly updated layers are concatenated to re-update previously updated layer, and parameters are reused for multiple times. This recurrent feedback structure is able to bring higher level visual information back to refine low-level filters and achieve spatial attention. We analyze the features generated at different stages and observe that using refined features leads to a better result. We adopt a multi-scale feature strategy that effectively avoids the progressive growth of parameters. Experiments on image recognition datasets including CIFAR-10, CIFAR-100, SVHN and ImageNet show that our proposed models achieve the state-of-the-art performance with fewer parameters. Convolutional Neural Network with Bar Images(CNN-BI) Even though computational intelligence techniques have been extensively utilized in financial trading systems, almost all developed models use the time series data for price prediction or identifying buy-sell points. However, in this study we decided to use 2-D stock bar chart images directly without introducing any additional time series associated with the underlying stock. We propose a novel algorithmic trading model CNN-BI (Convolutional Neural Network with Bar Images) using a 2-D Convolutional Neural Network. We generated 2-D images of sliding windows of 30-day bar charts for Dow 30 stocks and trained a deep Convolutional Neural Network (CNN) model for our algorithmic trading model. We tested our model separately between 2007-2012 and 2012-2017 for representing different market conditions. The results indicate that the model was able to outperform Buy and Hold strategy, especially in trendless or bear markets. Since this is a preliminary study and probably one of the first attempts using such an unconventional approach, there is always potential for improvement. Overall, the results are promising and the model might be integrated as part of an ensemble trading model combined with different strategies. Convolutional Neural Tangent Kernel(CNTK) On Exact Computation with an Infinitely Wide Neural Net Convolutional Poisson Gamma Belief Network(CPGBN) For text analysis, one often resorts to a lossy representation that either completely ignores word order or embeds each word as a low-dimensional dense feature vector. In this paper, we propose convolutional Poisson factor analysis (CPFA) that directly operates on a lossless representation that processes the words in each document as a sequence of high-dimensional one-hot vectors. To boost its performance, we further propose the convolutional Poisson gamma belief network (CPGBN) that couples CPFA with the gamma belief network via a novel probabilistic pooling layer. CPFA forms words into phrases and captures very specific phrase-level topics, and CPGBN further builds a hierarchy of increasingly more general phrase-level topics. For efficient inference, we develop both a Gibbs sampler and a Weibull distribution based convolutional variational auto-encoder. Experimental results demonstrate that CPGBN can extract high-quality text latent representations that capture the word order information, and hence can be leveraged as a building block to enrich a wide variety of existing latent variable models that ignore word order. Convolutional Recurrent Neural Network(CRNN) This paper proposes a novel framework for detecting redundancy in supervised sentence categorisation. Unlike traditional singleton neural network, our model incorporates character-aware convolutional neural network (Char-CNN) with character-aware recurrent neural network (Char-RNN) to form a convolutional recurrent neural network (CRNN). Our model benefits from Char-CNN in that only salient features are selected and fed into the integrated Char-RNN. Char-RNN effectively learns long sequence semantics via sophisticated update mechanism. We compare our framework against the state-of-the-art text classification algorithms on four popular benchmarking corpus. For instance, our model achieves competing precision rate, recall ratio, and F1 score on the Google-news data-set. For twenty-news-groups data stream, our algorithm obtains the optimum on precision rate, recall ratio, and F1 score. For Brown Corpus, our framework obtains the best F1 score and almost equivalent precision rate and recall ratio over the top competitor. For the question classification collection, CRNN produces the optimal recall rate and F1 score and comparable precision rate. We also analyse three different RNN hidden recurrent cells’ impact on performance and their runtime efficiency. We observe that MGU achieves the optimal runtime and comparable performance against GRU and LSTM. For TFIDF based algorithms, we experiment with word2vec, GloVe, and sent2vec embeddings and report their performance differences. Convolutional Relational Machine(CRM) We present an end-to-end deep Convolutional Neural Network called Convolutional Relational Machine (CRM) for recognizing group activities that utilizes the information in spatial relations between individual persons in image or video. It learns to produce an intermediate spatial representation (activity map) based on individual and group activities. A multi-stage refinement component is responsible for decreasing the incorrect predictions in the activity map. Finally, an aggregation component uses the refined information to recognize group activities. Experimental results demonstrate the constructive contribution of the information extracted and represented in the form of the activity map. CRM shows advantages over state-of-the-art models on Volleyball and Collective Activity datasets. Convolutional Self-Attention Network(CSAN) Self-attention networks (SANs) have drawn increasing interest due to their high parallelization in computation and flexibility in modeling dependencies. SANs can be further enhanced with multi-head attention by allowing the model to attend to information from different representation subspaces. In this work, we propose novel convolutional self-attention networks, which offer SANs the abilities to 1) strengthen dependencies among neighboring elements, and 2) model the interaction between features extracted by multiple attention heads. Experimental results of machine translation on different language pairs and model settings show that our approach outperforms both the strong Transformer baseline and other existing models on enhancing the locality of SANs. Comparing with prior studies, the proposed model is parameter free in terms of introducing no more parameters. Convolutional Sequence Embedding Recommendation Model(Caser) Top-$N$ sequential recommendation models each user as a sequence of items interacted in the past and aims to predict top-$N$ ranked items that a user will likely interact in a near future’. The order of interaction implies that sequential patterns play an important role where more recent items in a sequence have a larger impact on the next item. In this paper, we propose a Convolutional Sequence Embedding Recommendation Model (\emph{Caser}) as a solution to address this requirement. The idea is to embed a sequence of recent items into an image’ in the time and latent spaces and learn sequential patterns as local features of the image using convolutional filters. This approach provides a unified and flexible network structure for capturing both general preferences and sequential patterns. The experiments on public datasets demonstrated that Caser consistently outperforms state-of-the-art sequential recommendation methods on a variety of common evaluation metrics. Convolutional Spiking Neural Network Spiking neural networks are motivated from principles of neural systems and may possess unexplored advantages in the context of machine learning. A class of \textit{convolutional spiking neural networks} is introduced, trained to detect image features with an unsupervised, competitive learning mechanism. Image features can be shared within subpopulations of neurons, or each may evolve independently to capture different features in different regions of input space. We analyze the time and memory requirements of learning with and operating such networks. The MNIST dataset is used as an experimental testbed, and comparisons are made between the performance and convergence speed of a baseline spiking neural network. Convolutional Tsetlin Machine(CTM) Deep neural networks have obtained astounding successes for important pattern recognition tasks, but they suffer from high computational complexity and the lack of interpretability. The recent Tsetlin Machine (TM) attempts to address this lack by using easy-to-interpret conjunctive clauses in propositional logic to solve complex pattern recognition problems. The TM provides competitive accuracy in several benchmarks, while keeping the important property of interpretability. It further facilitates hardware-near implementation since inputs, patterns, and outputs are expressed as bits, while recognition and learning rely on straightforward bit manipulation. In this paper, we exploit the TM paradigm by introducing the Convolutional Tsetlin Machine (CTM), as an interpretable alternative to convolutional neural networks (CNNs). Whereas the TM categorizes an image by employing each clause once to the whole image, the CTM uses each clause as a convolution filter. That is, a clause is evaluated multiple times, once per image patch taking part in the convolution. To make the clauses location-aware, each patch is further augmented with its coordinates within the image. The output of a convolution clause is obtained simply by ORing the outcome of evaluating the clause on each patch. In the learning phase of the TM, clauses that evaluate to 1 are contrasted against the input. For the CTM, we instead contrast against one of the patches, randomly selected among the patches that made the clause evaluate to 1. Accordingly, the standard Type I and Type II feedback of the classic TM can be employed directly, without further modification. The CTM obtains a peak test accuracy of 99.51% on MNIST, 96.21% on Kuzushiji-MNIST, 89.56% on Fashion-MNIST, and 100.0% on the 2D Noisy XOR Problem, which is competitive with results reported for simple 4-layer CNNs, BinaryConnect, and a recent FPGA-accelerated Binary CNN. Convolutional Universal Text Information Extractor(CUTIE) Extracting key information from documents, such as receipts or invoices, and preserving the interested texts to structured data is crucial in the document-intensive streamline processes of office automation in areas that includes but not limited to accounting, financial, and taxation areas. To avoid designing expert rules for each specific type of document, some published works attempt to tackle the problem by learning a model to explore the semantic context in text sequences based on the Named Entity Recognition (NER) method in the NLP field. In this paper, we propose to harness the effective information from both semantic meaning and spatial distribution of texts in documents. Specifically, our proposed model, Convolutional Universal Text Information Extractor (CUTIE), applies convolutional neural networks on gridded texts where texts are embedded as features with semantical connotations. We further explore the effect of employing different structures of convolutional neural network and propose a fast and portable structure. We demonstrate the effectiveness of the proposed method on a dataset with up to 6,980 labelled receipts, without any pre-training or post-processing, achieving state of the art performance that is much higher than BERT but with only 1/10 parameters and without requiring the 3,300M word dataset for pre-training. Experimental results also demonstrate that the CUTIE being able to achieve state of the art performance with much smaller amount of training data. ConvTimeNet Training deep neural networks often requires careful hyper-parameter tuning and significant computational resources. In this paper, we propose ConvTimeNet (CTN): an off-the-shelf deep convolutional neural network (CNN) trained on diverse univariate time series classification (TSC) source tasks. Once trained, CTN can be easily adapted to new TSC target tasks via a small amount of fine-tuning using labeled instances from the target tasks. We note that the length of convolutional filters is a key aspect when building a pre-trained model that can generalize to time series of different lengths across datasets. To achieve this, we incorporate filters of multiple lengths in all convolutional layers of CTN to capture temporal features at multiple time scales. We consider all 65 datasets with time series of lengths up to 512 points from the UCR TSC Benchmark for training and testing transferability of CTN: We train CTN on a randomly chosen subset of 24 datasets using a multi-head approach with a different softmax layer for each training dataset, and study generalizability and transferability of the learned filters on the remaining 41 TSC datasets. We observe significant gains in classification accuracy as well as computational efficiency when using pre-trained CTN as a starting point for subsequent task-specific fine-tuning compared to existing state-of-the-art TSC approaches. We also provide qualitative insights into the working of CTN by: i) analyzing the activations and filters of first convolution layer suggesting the filters in CTN are generically useful, ii) analyzing the impact of the design decision to incorporate multiple length decisions, and iii) finding regions of time series that affect the final classification decision via occlusion sensitivity analysis. Conway-Maxwell Poisson(CMP) Count data are a popular outcome in many empirical studies, especially as big data has become available on human and social behavior. The Conway-Maxwell Poisson (CMP) distribution is popularly used for modeling count data due to its ability to handle both overdispersed and underdispersed data. Yet, current methods for estimating CMP regression models are not efficient, especially with high-dimensional data. Extant methods use either nonlinear optimization or MCMC methods. We propose a flexible estimation framework for CMP regression based on iterative reweighed least squares (IRLS). Because CMP belongs to the exponential family, convergence is guaranteed and is more efficient. We also extend this framework to allow estimation for additive models with smoothing splines. We illustrate the usefulness of this approach through simulation study and application to real data on speed dating. Cook’s Distance In statistics, Cook’s distance or Cook’s D is a commonly used estimate of the influence of a data point when performing least squares regression analysis. In a practical ordinary least squares analysis, Cook’s distance can be used in several ways: to indicate data points that are particularly worth checking for validity; to indicate regions of the design space where it would be good to be able to obtain more data points. It is named after the American statistician R. Dennis Cook, who introduced the concept in 1977. Cooperative Game Theory In game theory, a cooperative game is a game where groups of players (‘coalitions’) may enforce cooperative behaviour, hence the game is a competition between coalitions of players, rather than between individual players. An example is a coordination game, when players choose the strategies by a consensus decision-making process. Recreational games are rarely cooperative, because they usually lack mechanisms by which coalitions may enforce coordinated behaviour on the members of the coalition. Such mechanisms, however, are abundant in real life situations (e.g. contract law). Cooperative theory starts with a formalization of games that abstracts away altogether from procedures and … concentrates, instead, on the possibilities for agreement. … There are several reasons that explain why cooperative games came to be treated separately. One is that when one does build negotiation and enforcement procedures explicitly into the model, then the results of a non-cooperative analysis depend very strongly on the precise form of the procedures, on the order of making offers and counter-offers and so on. This may be appropriate in voting situations in which precise rules of parliamentary order prevail, where a good strategist can indeed carry the day. But problems of negotiation are usually more amorphous; it is difficult to pin down just what the procedures are. More fundamentally, there is a feeling that procedures are not really all that relevant; that it is the possibilities for coalition forming, promising and threatening that are decisive, rather than whose turn it is to speak. … Detail distracts attention from essentials. Some things are seen better from a distance; the Roman camps around Metzada are indiscernible when one is in them, but easily visible from the top of the mountain. Cooperative Inverse Reinforcement Learning(CIRL) For an autonomous system to be helpful to humans and to pose no unwarranted risks, it needs to align its values with those of the humans in its environment in such a way that its actions contribute to the maximization of value for the humans. We propose a formal definition of the value alignment problem as {\em cooperative inverse reinforcement learning} (CIRL). A CIRL problem is a cooperative, partial-information game with two agents, human and robot; both are rewarded according to the human’s reward function, but the robot does not initially know what this is. In contrast to classical IRL, where the human is assumed to act optimally in isolation, optimal CIRL solutions produce behaviors such as active teaching, active learning, and communicative actions that are more effective in achieving value alignment. We show that computing optimal joint policies in CIRL games can be reduced to solving a POMDP, prove that optimality in isolation is suboptimal in CIRL, and derive an approximate CIRL algorithm. Cooperative Learning Learning paradigms involving varying levels of supervision have received a lot of interest within the computer vision and machine learning communities. The supervisory information is typically considered to come from a human supervisor — a ‘teacher’ figure. In this paper, we consider an alternate source of supervision — a ‘peer’ — i.e. a different machine. We introduce cooperative learning, where two agents trying to learn the same visual concepts, but in potentially different environments using different sources of data (sensors), communicate their current knowledge of these concepts to each other. Given the distinct sources of data in both agents, the mode of communication between the two agents is not obvious. We propose the use of visual attributes — semantic mid-level visual properties such as furry, wooden, etc.– as the mode of communication between the agents. Our experiments in three domains — objects, scenes, and animals — demonstrate that our proposed cooperative learning approach improves the performance of both agents as compared to their performance if they were to learn in isolation. Our approach is particularly applicable in scenarios where privacy, security and/or bandwidth constraints restrict the amount and type of information the two agents can exchange. Cooperative SGD State-of-the-art distributed machine learning suffers from significant delays due to frequent communication and synchronizing between worker nodes. Emerging communication-efficient SGD algorithms that limit synchronization between locally trained models have been shown to be effective in speeding-up distributed SGD. However, a rigorous convergence analysis and comparative study of different communication-reduction strategies remains a largely open problem. This paper presents a new framework called Coooperative SGD that subsumes existing communication-efficient SGD algorithms such as federated-averaging, elastic-averaging and decentralized SGD. By analyzing Cooperative SGD, we provide novel convergence guarantees for existing algorithms. Moreover this framework enables us to design new communication-efficient SGD algorithms that strike the best balance between reducing communication overhead and achieving fast error convergence. Cooperative Training(CoT) We propose Cooperative Training (CoT) for training generative models that measure a tractable density function for target data. CoT coordinately trains a generator $G$ and an auxiliary predictive mediator $M$. The training target of $M$ is to estimate a mixture density of the learned distribution $G$ and the target distribution $P$, and that of $G$ is to minimize the Jensen-Shannon divergence estimated through $M$. CoT achieves independent success without the necessity of pre-training via Maximum Likelihood Estimation or involving high-variance algorithms like REINFORCE. This low-variance algorithm is theoretically proved to be unbiased for both generative and predictive tasks. We also theoretically and empirically show the superiority of CoT over most previous algorithms, in terms of generative quality and diversity, predictive generalization ability and computational cost. Coopetititve Soft Gating Ensemble(CSGE) In this article, we proposed the Coopetititve Soft Gating Ensemble or CSGE for general machine learning tasks. The goal of machine learning is to create models which poses a high generalisation capability. But often problems are too complex to be solved by a single model. Therefore, ensemble methods combine predictions of multiple models. The CSGE comprises a comprehensible combination based on three different aspects relating to the overall global historical performance, the local-/situation-dependent and time-dependent performance of its ensemble members. The CSGE can be optimised according to arbitrary loss functions making it accessible for a wider range of problems. We introduce a novel training procedure including a hyper-parameter initialisation at its heart. We show that the CSGE approach reaches state-of-the-art performance for both classification and regression tasks. Still, the CSGE allows to quantify the influence of all base estimators by means of the three weighting aspects in a comprehensive way. In terms of Organic computing (OC), our CSGE approach combines multiple base models towards a self-organising complex system. Moreover, we provide a scikit-learn compatible implementation. CoordConv Uber uses convolutional neural networks in many domains that could potentially involve coordinate transforms, from designing self-driving vehicles to automating street sign detection to build maps and maximizing the efficiency of spatial movements in the Uber Marketplace. In deep learning, few ideas have experienced as much impact as convolution. Almost all state-of-the-art results in machine vision make use of stacks of convolutional layers as basic building blocks. Since such architectures are widespread, we should expect that they excel at simple tasks like painting a single pixel in a tiny image, right Surprisingly, it turns out that convolution often has difficulty completing seemingly trivial tasks. In our paper, An Intriguing Failing of Convolutional Neural Networks and the CoordConv Solution, we expose and analyze a generic inability of convolutional neural networks (CNNs) to transform spatial representations between two different types: coordinates in (i, j) Cartesian space and coordinates in one-hot pixel space. It´s surprising because the task appears so simple, and it may be important because such coordinate transforms seem to be required to solve many common tasks, like detecting objects in images, training generative models of images, and training reinforcement learning (RL) agents from pixels. It turns out that these tasks may have subtly suffered from this failing of convolution all along, as suggested by performance improvements we demonstrate across several domains when using the solution we propose, a layer called CoordConv. Coordinate Architecture Search(CAS) The Transformer architecture is superior to RNN-based models in computational efficiency. Recently, GPT and BERT demonstrate the efficacy of Transformer models on various NLP tasks using pre-trained language models on large-scale corpora. Surprisingly, these Transformer architectures are suboptimal for language model itself. Neither self-attention nor the positional encoding in the Transformer is able to efficiently incorporate the word-level sequential context crucial to language modeling. In this paper, we explore effective Transformer architectures for language model, including adding additional LSTM layers to better capture the sequential context while still keeping the computation efficient. We propose Coordinate Architecture Search (CAS) to find an effective architecture through iterative refinement of the model. Experimental results on the PTB, WikiText-2, and WikiText-103 show that CAS achieves perplexities between 20.42 and 34.11 on all problems, i.e. on average an improvement of 12.0 perplexity units compared to state-of-the-art LSTMs. Coordinate Ascent Coordinate Descent(CD) Coordinate descent is a non-derivative optimization algorithm. To find a local minimum of a function, one does line search along one coordinate direction at the current point in each iteration. One uses different coordinate directions cyclically throughout the procedure. On non-separable functions the algorithm may fail to find the optimum in a reasonable number of function evaluations. To improve the convergence an appropriate coordinate system can be gradually learned, such that new search coordinates obtained using PCA are as decorrelated as possible with respect to the objective function ➚ “Adaptive Coordinate Descent” Coordinate Descent Algorithms(CDA) This monograph presents a class of algorithms called coordinate descent algorithms for mathematicians, statisticians, and engineers outside the field of optimization. This particular class of algorithms has recently gained popularity due to their effectiveness in solving large-scale optimization problems in machine learning, compressed sensing, image processing, and computational statistics. Coordinate descent algorithms solve optimization problems by successively minimizing along each coordinate or coordinate hyperplane, which is ideal for parallelized and distributed computing. Avoiding detailed technicalities and proofs, this monograph gives relevant theory and examples for practitioners to effectively apply coordinate descent to modern problems in data science and engineering. To keep the primer up-to-date, we intend to publish this monograph only after no additional topics need to be added and we foresee no further major advances in the area. copCAR Regression Model(copCAR) Non-Gaussian spatial data are common in many fields. When fitting regressions for such data, one needs to account for spatial dependence to ensure reliable inference for the regression coefficients. The two most commonly used regression models for spatially aggregated data are the automodel and the areal generalized linear mixed model (GLMM). These models induce spatial dependence in different ways but share the smoothing approach, which is intuitive but problematic. This article develops a new regression model for areal data. The new model is called copCAR because it is copula-based and employs the areal GLMM#s conditional autoregression (CAR). copCAR overcomes many of the drawbacks of the automodel and the areal GLMM. Specifically, copCAR (1) is flexible and intuitive, (2) permits positive spatial dependence for all types of data, (3) permits efficient computation, and (4) provides reliable spatial regression inference and information about dependence strength. An implementation is provided by R package copCAR, which is available from the Comprehensive R Archive Network, and supplementary materials are available online. copCAR COPri Nowadays, most companies need to collect, store, and manage personal information in order to deliver their services. Accordingly, privacy has emerged as a key concern for these companies since they need to comply with privacy laws and regulations. To deal with them properly, such privacy concerns should be considered since the early phases of system design. Ontologies have proven to be a key factor for elaborating high-quality requirements models. However, most existing work deals with privacy as a special case of security requirements, thereby missing essential traits of this family of requirements. In this paper, we introduce COPri, a Core Ontology for Privacy requirements engineering that adopts and extends our previous work on privacy requirements engineering ontology that has been mined through a systematic literature review. Additionally, we implement, validate and then evaluate our ontology. Copula In probability theory and statistics, a copula is a multivariate probability distribution for which the marginal probability distribution of each variable is uniform. Copulas are used to describe the dependence between random variables. They are named for their resemblance to grammatical copulas in linguistics. Copula Statistic(CoS) A new index based on empirical copulas, termed the Copula Statistic (CoS), is introduced for assessing the strength of multivariate dependence and for testing statistical independence. New properties of the copulas are proved. They allow us to define the CoS in terms of a relative distance function between the empirical copula, the Fr\’echet-Hoeffding bounds and the independence copula. Monte Carlo simulations reveal that for large sample sizes, the CoS is approximately normal. This property is utilised to develop a CoS-based statistical test of independence against various noisy functional dependencies. It is shown that this test exhibits higher statistical power than the Total Information Coefficient (TICe), the Distance Correlation (dCor), the Randomized Dependence Coefficient (RDC), and the Copula Correlation (Ccor) for monotonic and circular functional dependencies. Furthermore, the R2-equitability of the CoS is investigated for estimating the strength of a collection of functional dependencies with additive Gaussian noise. Finally, the CoS is applied to a real stock market data set from which we infer that a bivariate analysis is insufficient to unveil multivariate dependencies and to two gene expression data sets of the Yeast and of the E. Coli, which allow us to demonstrate the good performance of the CoS. CoQA Humans gather information by engaging in conversations involving a series of interconnected questions and answers. For machines to assist in information gathering, it is therefore essential to enable them to answer conversational questions. We introduce CoQA, a novel dataset for building Conversational Question Answering systems. Our dataset contains 127k questions with answers, obtained from 8k conversations about text passages from seven diverse domains. The questions are conversational, and the answers are free-form text with their corresponding evidence highlighted in the passage. We analyze CoQA in depth and show that conversational questions have challenging phenomena not present in existing reading comprehension datasets, e.g., coreference and pragmatic reasoning. We evaluate strong conversational and reading comprehension models on CoQA. The best system obtains an F1 score of 65.1%, which is 23.7 points behind human performance (88.8%), indicating there is ample room for improvement. We launch CoQA as a challenge to the community at http://…/coqa Coral Reefs Optimization(CRO) This paper presents a novel bioinspired algorithm to tackle complex optimization problems: the coral reefs optimization (CRO) algorithm. The CRO algorithm artificially simulates a coral reef, where different corals (namely, solutions to the optimization problem considered) grow and reproduce in coral colonies, fighting by choking out other corals for space in the reef. This fight for space, along with the specific characteristics of the corals’ reproduction, produces a robust metaheuristic algorithm shown to be powerful for solving hard optimization problems. In this research the CRO algorithm is tested in several continuous and discrete benchmark problems, as well as in practical application scenarios (i.e., optimum mobile network deployment and off-shore wind farm design). The obtained results confirm the excellent performance of the proposed algorithm and open line of research for further application of the algorithm to real-world problems. Core Conflictual Relationship Theme(CCRT) Core Conflictual Relationship: Text Mining to Discover What and When Core2Vec Recent advances in the field of network representation learning are mostly attributed to the application of the skip-gram model in the context of graphs. State-of-the-art analogues of skip-gram model in graphs define a notion of neighbourhood and aim to find the vector representation for a node, which maximizes the likelihood of preserving this neighborhood. In this paper, we take a drastic departure from the existing notion of neighbourhood of a node by utilizing the idea of coreness. More specifically, we utilize the well-established idea that nodes with similar core numbers play equivalent roles in the network and hence induce a novel and an organic notion of neighbourhood. Based on this idea, we propose core2vec, a new algorithmic framework for learning low dimensional continuous feature mapping for a node. Consequently, the nodes having similar core numbers are relatively closer in the vector space that we learn. We further demonstrate the effectiveness of core2vec by comparing word similarity scores obtained by our method where the node representations are drawn from standard word association graphs against scores computed by other state-of-the-art network representation techniques like node2vec, DeepWalk and LINE. Our results always outperform these existing methods Coreset We study fair clustering problems as proposed by Chierichetti et al. Here, points have a sensitive attribute and all clusters in the solution are required to be balanced with respect to it (to counteract any form of data-inherent bias). Previous algorithms for fair clustering do not scale well. We show how to model and compute so-called coresets for fair clustering problems, which can be used to significantly reduce the input data size. We prove that the coresets are composable and show how to compute them in a streaming setting. We also propose a novel combination of the coreset construction with a sketching technique due to Cohen et al. which may be of independent interest. We conclude with an empirical evaluation. Coreset-Based Neural Network Compression We propose a novel Convolutional Neural Network (CNN) compression algorithm based on coreset representations of filters. We exploit the redundancies extant in the space of CNN weights and neuronal activations (across samples) in order to obtain compression. Our method requires no retraining, is easy to implement, and obtains state-of-the-art compression performance across a wide variety of CNN architectures. Coupled with quantization and Huffman coding, we create networks that provide AlexNet-like accuracy, with a memory footprint that is $832\times$ smaller than the original AlexNet, while also introducing significant reductions in inference time as well. Additionally these compressed networks when fine-tuned, successfully generalize to other domains as well. CornerNet We propose CornerNet, a new approach to object detection where we detect an object bounding box as a pair of keypoints, the top-left corner and the bottom-right corner, using a single convolution neural network. By detecting objects as paired keypoints, we eliminate the need for designing a set of anchor boxes commonly used in prior single-stage detectors. In addition to our novel formulation, we introduce corner pooling, a new type of pooling layer that helps the network better localize corners. Experiments show that CornerNet achieves a 42.1% AP on MS COCO, outperforming all existing one-stage detectors. CornerNet-Lite Keypoint-based methods are a relatively new paradigm in object detection, eliminating the need for anchor boxes and offering a simplified detection framework. Keypoint-based CornerNet achieves state of the art accuracy among single-stage detectors. However, this accuracy comes at high processing cost. In this work, we tackle the problem of efficient keypoint-based object detection and introduce CornerNet-Lite. CornerNet-Lite is a combination of two efficient variants of CornerNet: CornerNet-Saccade, which uses an attention mechanism to eliminate the need for exhaustively processing all pixels of the image, and CornerNet-Squeeze, which introduces a new compact backbone architecture. Together these two variants address the two critical use cases in efficient object detection: improving efficiency without sacrificing accuracy, and improving accuracy at real-time efficiency. CornerNet-Saccade is suitable for offline processing, improving the efficiency of CornerNet by 6.0x and the AP by 1.0% on COCO. CornerNet-Squeeze is suitable for real-time detection, improving both the efficiency and accuracy of the popular real-time detector YOLOv3 (34.4% AP at 34ms for CornerNet-Squeeze compared to 33.0% AP at 39ms for YOLOv3 on COCO). Together these contributions for the first time reveal the potential of keypoint-based detection to be useful for applications requiring processing efficiency. Corpora Agnostic Word Vectorization Method(WordNet2Vec) A complex nature of big data resources demands new methods for structuring especially for textual content. WordNet is a good knowledge source for comprehensive abstraction of natural language as its good implementations exist for many languages. Since WordNet embeds natural language in the form of a complex network, a transformation mechanism WordNet2Vec is proposed in the paper. It creates vectors for each word from WordNet. These vectors encapsulate general position – role of a given word towards all other words in the natural language. Any list or set of such vectors contains knowledge about the context of its component within the whole language. Such word representation can be easily applied to many analytic tasks like classification or clustering. Corpus Linguistics Corpus linguistics is the study of language as expressed in samples (corpora) of “real world” text. This method represents a digestive approach to deriving a set of abstract rules by which a natural language is governed or else relates to another language. Originally done by hand, corpora are now largely derived by an automated process. Corpus linguistics adherents believe that reliable language analysis best occurs on field-collected samples, in natural contexts and with minimal experimental interference. Within corpus linguistics there are divergent views as to the value of corpus annotation, from John Sinclair advocating minimal annotation and allowing texts to ‘speak for themselves’, to others, such as the Survey of English Usage team (based in University College, London) advocating annotation as a path to greater linguistic understanding and rigour. Correct Classification Percentage(CCP) Correct Classification Percentage (CCP) described in the paper: Jialiang Li (2013) . mcca Correct Eventual Consistency Tool(CEC) This tool helps programmers verify whether their distributed application design is safe. Correlated Anomaly Detection(CAD) Correlated anomaly detection (CAD) from streaming data is a type of group anomaly detection and an essential task in useful real-time data mining applications like botnet detection, financial event detection, industrial process monitor, etc. Correlated Component Analysis(CORRCA) ➘ “Correlated Components Analysis” Correlated Components Analysis(CORRCA) How does one find data dimensions that are reliably expressed across repetitions? For example, in neuroscience one may want to identify combinations of brain signals that are reliably activated across multiple trials or subjects. For a clinical assessment with multiple ratings, one may want to identify an aggregate score that is reliably reproduced across raters. The approach proposed here — ‘correlated components analysis’ — is to identify components that maximally correlate between repetitions (e.g. trials, subjects, raters). This can be expressed as the maximization of the ratio of between-repetition to within-repetition covariance, resulting in a generalized eigenvalue problem. We show that covariances can be computed efficiently without explicitly considering all pairs of repetitions, that the result is equivalent to multi-class linear discriminant analysis for unbiased signals, and that the approach also maximize reliability, defined as the mean divided by the deviation across repetitions. We also extend the method to non-linear components using kernels, discuss regularization to improve numerical stability, present parametric and non-parametric tests to establish statistical significance, and provide code. Weighted correlated component analysis for frequency recognition in SSVEP-based BCI Correlated High-Utility Pattern Miner(CoUPM) In the field of data mining and analytics, the utility theory from Economic can bring benefits in many real-life applications. In recent decade, a new research field called utility-oriented mining has already attracted great attention. Previous studies have, however, the limitation that they rarely consider the inherent correlation of items among patterns. Consider the purchase behaviors of consumer, a high-utility group of products (w.r.t. multi-products) may contain several very high-utility products with some low-utility products. However, it is considered as a valuable pattern even if this behavior/pattern may be not highly correlated, or even happen by chance. In this paper, in light of these challenges, we propose an efficient utility mining approach namely non-redundant Correlated high-Utility Pattern Miner (CoUPM) by taking positive correlation and profitable value into account. The derived patterns with high utility and strong positive correlation can lead to more insightful availability than those patterns only have high profitable values. The utility-list structure is revised and applied to store necessary information of both correlation and utility. Several pruning strategies are further developed to improve the efficiency for discovering the desired patterns. Experimental results show that the non-redundant correlated high-utility patterns have more effectiveness than some other kinds of interesting patterns. Moreover, efficiency of the proposed CoUPM algorithm significantly outperforms the state-of-the-art algorithm. Correlated Topic Model(CTM) Topic models, such as latent Dirichlet allocation (LDA), can be useful tools for the statistical analysis of document collections and other discrete data. The LDA model assumes that the words of each document arise from a mixture of topics, each of which is a distribution over the vocabulary. A limitation of LDA is the inability to model topic correlation even though, for example, a document about genetics is more likely to also be about disease than x-ray astronomy. This limitation stems from the use of the Dirichlet distribution to model the variability among the topic proportions. In this paper we develop the correlated topic model (CTM), where the topic proportions exhibit correlation via the logistic normal distribution. We derive a mean-field variational inference algorithm for approximate posterior inference in this model, which is complicated by the fact that the logistic normal is not conjugate to the multinomial. The CTM gives a better fit than LDA on a collection of OCRed articles from the journal Science. Furthermore, the CTM provides a natural way of visualizing and exploring this and other unstructured data sets. Correlated Variational Auto-Encoder(CVAE) Variational Auto-Encoders (VAEs) are capable of learning latent representations for high dimensional data. However, due to the i.i.d. assumption, VAEs only optimize the singleton variational distributions and fail to account for the correlations between data points, which might be crucial for learning latent representations from dataset where a priori we know correlations exist. We propose Correlated Variational Auto-Encoders (CVAEs) that can take the correlation structure into consideration when learning latent representations with VAEs. CVAEs apply a prior based on the correlation structure. To address the intractability introduced by the correlated prior, we develop an approximation by average of a set of tractable lower bounds over all maximal acyclic subgraphs of the undirected correlation graph. Experimental results on matching and link prediction on public benchmark rating datasets and spectral clustering on a synthetic dataset show the effectiveness of the proposed method over baseline algorithms. CORrelation ALignment(CORAL) In this chapter, we present CORrelation ALignment (CORAL), a simple yet effective method for unsupervised domain adaptation. CORAL minimizes domain shift by aligning the second-order statistics of source and target distributions, without requiring any target labels. In contrast to subspace manifold methods, it aligns the original feature distributions of the source and target domains, rather than the bases of lower-dimensional subspaces. It is also much simpler than other distribution matching methods. CORAL performs remarkably well in extensive evaluations on standard benchmark datasets. We first describe a solution that applies a linear transformation to source features to align them with target features before classifier training. For linear classifiers, we propose to equivalently apply CORAL to the classifier weights, leading to added efficiency when the number of classifiers is small but the number and dimensionality of target examples are very high. The resulting CORAL Linear Discriminant Analysis (CORAL-LDA) outperforms LDA by a large margin on standard domain adaptation benchmarks. Finally, we extend CORAL to learn a nonlinear transformation that aligns correlations of layer activations in deep neural networks (DNNs). The resulting Deep CORAL approach works seamlessly with DNNs and achieves state-of-the-art performance on standard benchmark datasets. Our code is available at:~\url{https://…/CORAL} Correlation Assisted Private Estimation(CAPE) Many applications of machine learning, such as human health research, involve processing private or sensitive information. Privacy concerns may impose significant hurdles to collaboration in scenarios where there are multiple sites holding data and the goal is to estimate properties jointly across all datasets. Differentially private decentralized algorithms can provide strong privacy guarantees. However, the accuracy of the joint estimates may be poor when the datasets at each site are small. This paper proposes a new framework, Correlation Assisted Private Estimation (CAPE), for designing privacy-preserving decentralized algorithms with better accuracy guarantees in an honest-but-curious model. CAPE can be used in conjunction with the functional mechanism for statistical and machine learning optimization problems. A tighter characterization of the functional mechanism is provided that allows CAPE to achieve the same performance as a centralized algorithm in the decentralized setting using all datasets. Empirical results on regression and neural network problems for both synthetic and real datasets show that differentially private methods can be competitive with non-private algorithms in many scenarios of interest. Correlation Clustering Correlation Clustering is a powerful graph partitioning model that aims to cluster items based on the notion of similarity between items. An instance of the Correlation Clustering problem consists of a graph $G$ (not necessarily complete) whose edges are labeled by a binary classifier as ‘similar’ and ‘dissimilar’. Classically, we are tasked with producing a clustering that minimizes the number of disagreements: an edge is in disagreement if it is a ‘similar’ edge and is present across clusters or if it is a ‘dissimilar’ edge and is present within a cluster. Define the disagreements vector to be an $n$ dimensional vector indexed by the vertices, where the $v$-th index is the number of disagreements at vertex $v$. Recently, Puleo and Milenkovic (ICML ’16) initiated the study of the Correlation Clustering framework in which the objectives were more general functions of the disagreements vector. Correlation Congruence for Knowledge Distillation(CCKD) Most teacher-student frameworks based on knowledge distillation (KD) depend on a strong congruent constraint on instance level. However, they usually ignore the correlation between multiple instances, which is also valuable for knowledge transfer. In this work, we propose a new framework named correlation congruence for knowledge distillation (CCKD), which transfers not only the instance-level information, but also the correlation between instances. Furthermore, a generalized kernel method based on Taylor series expansion is proposed to better capture the correlation between instances. Empirical experiments and ablation studies on image classification tasks (including CIFAR-100, ImageNet-1K) and metric learning tasks (including ReID and Face Recognition) show that the proposed CCKD substantially outperforms the original KD and achieves state-of-the-art accuracy compared with other SOTA KD-based methods. The CCKD can be easily deployed in the majority of the teacher-student framework such as KD and hint-based learning methods. CORrelation Differences(CORD) Given a zero mean random vector X=:(X1,…,Xp) ∈ R^p, we consider the problem of defining and estimating a partition G of {1,…,p} such that the components of X with indices in the same group of the partition have a similar, community-like behavior. We introduce a new model, the G-exchangeable model, to define group similarity. This model is a natural extension of the more commonly used G-latent model, for which the partition G is generally not identifiable, without additional restrictions on X. In contrast, we show that for any random vector X there exists an identifiable partition G according to which X is G-exchangeable, thereby providing a clear target for community estimation. Moreover, we provide another model, the G-block covariance model, which generalizes the G-exchangeable model, and can be of interest in its own right for defining group similarity. We discuss connections between the three types of G-models. We exploit the connection with G-block covariance models to develop a new metric, CORD, and a homonymous method for community estimation. We specialize and analyze our method for Gaussian copula data. We show that this method recovers the partition according to which X is G-exchangeable with a G-block copula correlation matrix. In the particular case of Gaussian distributions, this estimator, under mild assumptions, identifies the unique minimal partition according to the G-latent model. The CORD estimator is consistent as long as the communities are separated at a rate that we prove to be minimax optimal, via lower bound calculations. Our procedure is fast and extensive numerical studies show that it recovers communities defined by our models, while existing variable clustering algorithms typically fail to do so. This is further supported by two real-data examples. Correlation-Adjusted Regression Survival Scores(CARS) Contains functions to estimate the Correlation-Adjusted Regression Survival (CARS) Scores. The method is described in Welchowski, T. and Zuber, V. and Schmid, M., (2018), Correlation-Adjusted Regression Survival Scores for High-Dimensional Variable Selection, . carSurv Correlation-Based Community Detection(CBCD) Mining community structures from the complex network is an important problem across a variety of fields. Many existing community detection methods detect communities through optimizing a community evaluation function. However, most of these functions even have high values on random graphs and may fail to detect small communities in the large-scale network (the so-called resolution limit problem). In this paper, we introduce two novel node-centric community evaluation functions by connecting correlation analysis with community detection. We will further show that the correlation analysis can provide a novel theoretical framework which unifies some existing evaluation functions in the context of a correlation-based optimization problem. In this framework, we can mitigate the resolution limit problem and eliminate the influence of random fluctuations by selecting the right correlation function. Furthermore, we introduce three key properties used in mining association rule into the context of community detection to help us choose the appropriate correlation function. Based on our introduced correlation functions, we propose a community detection algorithm called CBCD. Our proposed algorithm outperforms existing state-of-the-art algorithms on both synthetic benchmark networks and real-world networks. Correntropy Correntropy is a nonlinear similarity measure between two random variables. Learning with the Maximum Correntropy Criterion Induced Losses for Regression Correspondence Analysis(CA) Correspondence analysis (CA) is a multivariate statistical tool used to visualize and interpret data dependencies. CA has found applications in fields ranging from epidemiology to social sciences. However, current methods used to perform CA do not scale to large, high-dimensional datasets. By re-interpreting the objective in CA using an information-theoretic tool called the principal inertia components, we demonstrate that performing CA is equivalent to solving a functional optimization problem over the space of finite variance functions of two random variable. We show that this optimization problem, in turn, can be efficiently approximated by neural networks. The resulting formulation, called the correspondence analysis neural network (CA-NN), enables CA to be performed at an unprecedented scale. We validate the CA-NN on synthetic data, and demonstrate how it can be used to perform CA on a variety of datasets, including food recipes, wine compositions, and images. Our results outperform traditional methods used in CA, indicating that CA-NN can serve as a new, scalable tool for interpretability and visualization of complex dependencies between random variables. CorrRNN In this paper, we study automatic keyphrase generation. Although conventional approaches to this task show promising results, they neglect correlation among keyphrases, resulting in duplication and coverage issues. To solve these problems, we propose a new sequence-to-sequence architecture for keyphrase generation named CorrRNN, which captures correlation among multiple keyphrases in two ways. First, we employ a coverage vector to indicate whether the word in the source document has been summarized by previous phrases to improve the coverage for keyphrases. Second, preceding phrases are taken into account to eliminate duplicate phrases and improve result coherence. Experiment results show that our model significantly outperforms the state-of-the-art method on benchmark datasets in terms of both accuracy and diversity. Cortana Analytics Cortana Analytics is a fully managed big data and advanced analytics suite that enables you to transform your data into intelligent action. Cortex Neural Network(CrtxNN) Neural Network has been successfully applied to many real-world problems, such as image recognition and machine translation. However, for the current architecture of neural networks, it is hard to perform complex cognitive tasks, for example, to process the image and audio inputs together. Cortex, as an important architecture in the brain, is important for animals to perform the complex cognitive task. We view the architecture of Cortex in the brain as a missing part in the design of the current artificial neural network. In this paper, we purpose Cortex Neural Network (CrtxNN). The Cortex Neural Network is an upper architecture of neural networks which motivated from cerebral cortex in the brain to handle different tasks in the same learning system. It is able to identify different tasks and solve them with different methods. In our implementation, the Cortex Neural Network is able to process different cognitive tasks and perform reflection to get a higher accuracy. We provide a series of experiments to examine the capability of the cortex architecture on traditional neural networks. Our experiments proved its ability on the Cortex Neural Network can reach accuracy by 98.32% on MNIST and 62% on CIFAR10 at the same time, which can promisingly reduce the loss by 40%. CortexNet In the past five years we have observed the rise of incredibly well performing feed-forward neural networks trained supervisedly for vision related tasks. These models have achieved super-human performance on object recognition, localisation, and detection in still images. However, there is a need to identify the best strategy to employ these networks with temporal visual inputs and obtain a robust and stable representation of video data. Inspired by the human visual system, we propose a deep neural network family, CortexNet, which features not only bottom-up feed-forward connections, but also it models the abundant top-down feedback and lateral connections, which are present in our visual cortex. We introduce two training schemes – the unsupervised MatchNet and weakly supervised TempoNet modes – where a network learns how to correctly anticipate a subsequent frame in a video clip or the identity of its predominant subject, by learning egomotion clues and how to automatically track several objects in the current scene. Find the project website at https://…/. CoSegNet We introduce CoSegNet, a deep neural network architecture for co-segmentation of a set of 3D shapes represented as point clouds. CoSegNet takes as input a set of unsegmented shapes, proposes per-shape parts, and then jointly optimizes the part labelings across the set subjected to a novel group consistency loss expressed via matrix rank estimates. The proposals are refined in each iteration by an auxiliary network that acts as a weak regularizing prior, pre-trained to denoise noisy, unlabeled parts from a large collection of segmented 3D shapes, where the part compositions within the same object category can be highly inconsistent. The output is a consistent part labeling for the input set, with each shape segmented into up to K (a user-specified hyperparameter) parts. The overall pipeline is thus weakly supervised, producing consistent segmentations tailored to the test set, without consistent ground-truth segmentations. We show qualitative and quantitative results from CoSegNet and evaluate it via ablation studies and comparisons to state-of-the-art co-segmentation methods. Coset Weighted Potential Game In this paper we first define a new kind of potential games, called coset weighted potential game, which is a generalized form of weighted potential game. Using semi-tensor product of matrices, an algebraic method is provided to verify whether a finite game is a coset weighted potential game, and a simple formula is obtained to calculate the corresponding potential function. Then some properties of coset weighted potential games are revealed. Finally, by resorting to the vector space structure of finite games, a new orthogonal decomposition based on coset weights is proposed, the corresponding geometric and algebraic expressions of all the subspaces are given by providing their bases. Cosine Distance ➘ “Cosine Similarity” Cosine Similarity Cosine similarity is a measure of similarity between two vectors of an inner product space that measures the cosine of the angle between them. The cosine of 0° is 1, and it is less than 1 for any other angle. It is thus a judgment of orientation and not magnitude: two vectors with the same orientation have a cosine similarity of 1, two vectors at 90° have a similarity of 0, and two vectors diametrically opposed have a similarity of -1, independent of their magnitude. Cosine similarity is particularly used in positive space, where the outcome is neatly bounded in. Note that these bounds apply for any number of dimensions, and cosine similarity is most commonly used in high-dimensional positive spaces. For example, in information retrieval and text mining, each term is notionally assigned a different dimension and a document is characterised by a vector where the value of each dimension corresponds to the number of times that term appears in the document. Cosine similarity then gives a useful measure of how similar two documents are likely to be in terms of their subject matter. The technique is also used to measure cohesion within clusters in the field of data mining. Cosinor Analysis Cosinor analysis uses the least squares method to fit a sine wave to a time series. Cosinor analysis is often used in the analysis of biologic time series that demonstrate predictible rhythms. This method can be used with an unequally spaced time series. Cosmos DB Azure Cosmos DB is Microsoft’s proprietary globally-distributed, multi-model database service ‘for managing data at planet-scale’ launched in May 2017. It is schema-agnostic, horizontally scalable and generally classified as a NoSQL database. Internally, Cosmos DB stores ‘items’ in ‘containers’, with these 2 concepts being surfaced differently depending on the API used (these would be ‘documents’ in ‘collections’ when using the MongoDB-compatible API, for example). Containers are grouped in ‘databases’, which are analogous to namespaces above containers. Containers are schema-agnostic, which means that no schema is enforced when adding items. Cost-aware Cascading Upper Confidence Bound(CC-UCB) In this paper, we propose a cost-aware cascading bandits model, a new variant of multi-armed ban- dits with cascading feedback, by considering the random cost of pulling arms. In each step, the learning agent chooses an ordered list of items and examines them sequentially, until certain stopping condition is satisfied. Our objective is then to max- imize the expected net reward in each step, i.e., the reward obtained in each step minus the total cost in- curred in examining the items, by deciding the or- dered list of items, as well as when to stop examina- tion. We study both the offline and online settings, depending on whether the state and cost statistics of the items are known beforehand. For the of- fline setting, we show that the Unit Cost Ranking with Threshold 1 (UCR-T1) policy is optimal. For the online setting, we propose a Cost-aware Cascading Upper Confidence Bound (CC-UCB) algorithm, and show that the cumulative regret scales in O(log T ). We also provide a lower bound for all {\alpha}-consistent policies, which scales in {\Omega}(log T ) and matches our upper bound. The performance of the CC-UCB algorithm is evaluated with both synthetic and real-world data. Cost-Effective REgion-based Active Learning for Semantic Segmentation(CEREALS) State of the art methods for semantic image segmentation are trained in a supervised fashion using a large corpus of fully labeled training images. However, gathering such a corpus is expensive, due to human annotation effort, in contrast to gathering unlabeled data. We propose an active learning-based strategy, called CEREALS, in which a human only has to hand-label a few, automatically selected, regions within an unlabeled image corpus. This minimizes human annotation effort while maximizing the performance of a semantic image segmentation method. The automatic selection procedure is achieved by: a) using a suitable information measure combined with an estimate about human annotation effort, which is inferred from a learned cost model, and b) exploiting the spatial coherency of an image. The performance of CEREALS is demonstrated on Cityscapes, where we are able to reduce the annotation effort to 17%, while keeping 95% of the mean Intersection over Union (mIoU) of a model that was trained with the fully annotated training set of Cityscapes. Cost-Sensitive Dynamic Principal Projection(CS-DPP) We study multi-label classification (MLC) with three important real-world issues: online updating, label space dimensional reduction (LSDR), and cost-sensitivity. Current MLC algorithms have not been designed to address these three issues simultaneously. In this paper, we propose a novel algorithm, cost-sensitive dynamic principal projection (CS-DPP) that resolves all three issues. The foundation of CS-DPP is an online LSDR framework derived from a leading LSDR algorithm. In particular, CS-DPP is equipped with an efficient online dimension reducer motivated by matrix stochastic gradient, and establishes its theoretical backbone when coupled with a carefully-designed online regression learner. In addition, CS-DPP embeds the cost information into label weights to achieve cost-sensitivity along with theoretical guarantees. Experimental results verify that CS-DPP achieves better practical performance than current MLC algorithms across different evaluation criteria, and demonstrate the importance of resolving the three issues simultaneously. Counterfactual Explanations for Robustness, Transparency, Interpretability, and Fairness of Artificial Intelligence Models(CERTIFAI) As artificial intelligence plays an increasingly important role in our society, there are ethical and moral obligations for both businesses and researchers to ensure that their machine learning models are designed, deployed, and maintained responsibly. These models need to be rigorously audited for fairness, robustness, transparency, and interpretability. A variety of methods have been developed that focus on these issues in isolation, however, managing these methods in conjunction with model development can be cumbersome and timeconsuming. In this paper, we introduce a unified and model-agnostic approach to address these issues: Counterfactual Explanations for Robustness, Transparency, Interpretability, and Fairness of Artificial Intelligence models (CERTIFAI). Unlike previous methods in this domain, CERTIFAI is a general tool that can be applied to any black-box model and any type of input data. Given a model and an input instance, CERTIFAI uses a custom genetic algorithm to generate counterfactuals: instances close to the input that change the prediction of the model. We demonstrate how these counterfactuals can be used to examine issues of robustness, interpretability, transparency, and fairness. Additionally, we introduce CERScore, the first black-box model robustness score that performs comparably to methods that have access to model internals. Counterfactual Fairness Machine learning has matured to the point to where it is now being considered to automate decisions in loan lending, employee hiring, and predictive policing. In many of these scenarios however, previous decisions have been made that are unfairly biased against certain subpopulations (e.g., those of a particular race, gender, or sexual orientation). Because this past data is often biased, machine learning predictors must account for this to avoid perpetuating discriminatory practices (or incidentally making new ones). In this paper, we develop a framework for modeling fairness in any dataset using tools from counterfactual inference. We propose a definition called counterfactual fairness that captures the intuition that a decision is fair towards an individual if it gives the same predictions in (a) the observed world and (b) a world where the individual had always belonged to a different demographic group, other background causes of the outcome being equal. We demonstrate our framework on two real-world problems: fair prediction of law school success, and fair modeling of an individual’s criminality in policing data. Counterfactual Inference Counterfactual Regret Minimization(CRM) CFR is a self-play algorithm: it learns to play a game by repeatedly playing against itself. The program starts off with a strategy that is uniformly random, where it will play every action at every decision point with an equal probability. It then simulates playing games against itself. After every game, it revisits its decisions, and finds ways to improve its strategy. It repeats this process for billions of games, improving its strategy each time. As it plays, it gets closer and closer towards an optimal strategy for the game: a strategy that can do no worse than tie against any opponent. The way it improves over time is by summing the total amount of regret it has for each action at each decision point, where regret means: how much better would I have done over all the games so far if I had just always played this one action at this decision, instead of choosing whatever mixture over actions that my strategy said I should use? Positive regret means that we would have done better if we had taken that action more often. Negative regret means that we would have done better by not taking that action at all. After each game that the program plays against itself, it computes and adds in the new regret values for all of its decisions it just made. It then recomputes its strategy so that it takes actions with probabilities proportional to their positive regret. If an action would have been good in the past, then it will choose it more often in the future. Counting Objects via Scale-Aware Adversarial Density Adaption(CODA) Recent advances in crowd counting have achieved promising results with increasingly complex convolutional neural network designs. However, due to the unpredictable domain shift, generalizing trained model to unseen scenarios is often suboptimal. Inspired by the observation that density maps of different scenarios share similar local structures, we propose a novel adversarial learning approach in this paper, i.e., CODA (\emph{Counting Objects via scale-aware adversarial Density Adaption}). To deal with different object scales and density distributions, we perform adversarial training with pyramid patches of multi-scales from both source- and target-domain. Along with a ranking constraint across levels of the pyramid input, consistent object counts can be produced for different scales. Extensive experiments demonstrate that our network produces much better results on unseen datasets compared with existing counting adaption models. Notably, the performance of our CODA is comparable with the state-of-the-art fully-supervised models that are trained on the target dataset. Further analysis indicates that our density adaption framework can effortlessly extend to scenarios with different objects. \emph{The code is available at https://…/CODA.} Count-Min Sketch In computing, the count-min sketch (CM sketch) is a probabilistic data structure that serves as a frequency table of events in a stream of data. It uses hash functions to map events to frequencies, but unlike a hash table uses only sub-linear space, at the expense of overcounting some events due to collisions. The count-min sketch was invented in 2003 by Graham Count-min sketches are somewhat similar to Bloom filters; the main distinction is that Bloom filters represent sets, while CM sketches represent multisets. Spectral Bloom filters with multi-set policy are conceptually isomorphic to the count-min sketch. Coupled CycleGAN ➘ “Unsupervised Coupled Cycle Generative Adversarial Hashing Network” Coupled Evolutionary Network(CEN) Age estimation of unknown persons is a challenging pattern analysis task due to the lacking of training data and various aging mechanisms for different people. Label distribution learning-based methods usually make distribution assumptions to simplify age estimation. However, age label distributions are often complex and difficult to be modeled in a parameter way. Inspired by the biological evolutionary mechanism, we propose a Coupled Evolutionary Network (CEN) with two concurrent evolutionary processes: evolutionary label distribution learning and evolutionary slack regression. Evolutionary network learns and refines age label distributions in an iteratively learning way. Evolutionary label distribution learning adaptively learns and constantly refines the age label distributions without making strong assumptions on the distribution patterns. To further utilize the ordered and continuous information of age labels, we accordingly propose an evolutionary slack regression to convert the discrete age label regression into the continuous age interval regression. Experimental results on Morph, ChaLearn15 and MegaAge-Asian datasets show the superiority of our method. Coupled Graph-Tensor Factorization(CGTF) Joint analysis of data from multiple information repositories facilitates uncovering the underlying structure in heterogeneous datasets. Single and coupled matrix-tensor factorization (CMTF) has been widely used in this context for imputation-based recommendation from ratings, social network, and other user-item data. When this side information is in the form of item-item correlation matrices or graphs, existing CMTF algorithms may fall short. Alleviating current limitations, we introduce a novel model coined coupled graph-tensor factorization (CGTF) that judiciously accounts for graph-related side information. The CGTF model has the potential to overcome practical challenges, such as missing slabs from the tensor and/or missing rows/columns from the correlation matrices. A novel alternating direction method of multipliers (ADMM) is also developed that recovers the nonnegative factors of CGTF. Our algorithm enjoys closed-form updates that result in reduced computational complexity and allow for convergence claims. A novel direction is further explored by employing the interpretable factors to detect graph communities having the tensor as side information. The resulting community detection approach is successful even when some links in the graphs are missing. Results with real data sets corroborate the merits of the proposed methods relative to state-of-the-art competing factorization techniques in providing recommendations and detecting communities. Coupled Recurrent Network(CRN) Many semantic video analysis tasks can benefit from multiple, heterogenous signals. For example, in addition to the original RGB input sequences, sequences of optical flow are usually used to boost the performance of human action recognition in videos. To learn from these heterogenous input sources, existing methods reply on two-stream architectural designs that contain independent, parallel streams of Recurrent Neural Networks (RNNs). However, two-stream RNNs do not fully exploit the reciprocal information contained in the multiple signals, let alone exploit it in a recurrent manner. To this end, we propose in this paper a novel recurrent architecture, termed Coupled Recurrent Network (CRN), to deal with multiple input sources. In CRN, the parallel streams of RNNs are coupled together. Key design of CRN is a Recurrent Interpretation Block (RIB) that supports learning of reciprocal feature representations from multiple signals in a recurrent manner. Different from RNNs which stack the training loss at each time step or the last time step, we propose an effective and efficient training strategy for CRN. Experiments show the efficacy of the proposed CRN. In particular, we achieve the new state of the art on the benchmark datasets of human action recognition and multi-person pose estimation. Coupled Sparse Asymmetric Least Squares(COSALES) SALES Coupled U-Net(CU-Net) We design a new connectivity pattern for the U-Net architecture. Given several stacked U-Nets, we couple each U-Net pair through the connections of their semantic blocks, resulting in the coupled U-Nets (CU-Net). The coupling connections could make the information flow more efficiently across U-Nets. The feature reuse across U-Nets makes each U-Net very parameter efficient. We evaluate the coupled U-Nets on two benchmark datasets of human pose estimation. Both the accuracy and model parameter number are compared. The CU-Net obtains comparable accuracy as state-of-the-art methods. However, it only has at least 60% fewer parameters than other approaches. Courier Dispatching Problem(CDP) The courier dispatching problem (CDP) was raised from an online pickup-service platform of Alibaba. The courier dispatching problem aims to assign a set of couriers to serve pickup requests with stochastic spatial and temporal arrival rate among urban regions. The objective is to maximize the revenue of served requests given a limited number of couriers over a period of time. Many online algorithms such as dynamic matching and vehicle routing strategy from existing literature could be applied to tackle this problem. Covariance Matrix Adaptation Evolution Strategy(CMA-ES) CMA-ES stands for Covariance Matrix Adaptation Evolution Strategy. Evolution strategies (ES) are stochastic, derivative-free methods for numerical optimization of non-linear or non-convex continuous optimization problems. They belong to the class of evolutionary algorithms and evolutionary computation. An evolutionary algorithm is broadly based on the principle of biological evolution, namely the repeated interplay of variation (via recombination and mutation) and selection: in each generation (iteration) new individuals (candidate solutions, denoted as x) are generated by variation, usually in a stochastic way, of the current parental individuals. Then, some individuals are selected to become the parents in the next generation based on their fitness or objective function value f(x). Like this, over the generation sequence, individuals with better and better f-values are generated. In an evolution strategy, new candidate solutions are sampled according to a multivariate normal distribution in the R^n. Recombination amounts to selecting a new mean value for the distribution. Mutation amounts to adding a random vector, a perturbation with zero mean. Pairwise dependencies between the variables in the distribution are represented by a covariance matrix. The covariance matrix adaptation (CMA) is a method to update the covariance matrix of this distribution. This is particularly useful, if the function f is ill-conditioned. Adaptation of the covariance matrix amounts to learning a second order model of the underlying objective function similar to the approximation of the inverse Hessian matrix in the Quasi-Newton method in classical optimization. In contrast to most classical methods, fewer assumptions on the nature of the underlying objective function are made. Only the ranking between candidate solutions is exploited for learning the sample distribution and neither derivatives nor even the function values themselves are required by the method. Covariant Compositional Network(CCN) Most existing neural networks for learning graphs address permutation invariance by conceiving of the network as a message passing scheme, where each node sums the feature vectors coming from its neighbors. We argue that this imposes a limitation on their representation power, and instead propose a new general architecture for representing objects consisting of a hierarchy of parts, which we call Covariant Compositional Networks (CCNs). Here, covariance means that the activation of each neuron must transform in a specific way under permutations, similarly to steerability in CNNs. We achieve covariance by making each activation transform according to a tensor representation of the permutation group, and derive the corresponding tensor aggregation rules that each neuron must implement. Experiments show that CCNs can outperform competing methods on standard graph learning benchmarks. Covariate Adaptive Clustering predkmeans Covariate Balancing Propensity Score(CBPS) Implements the covariate balancing propensity score (CBPS) proposed by Imai and Ratkovic (2014) . The propensity score is estimated such that it maximizes the resulting covariate balance as well as the prediction of treatment assignment. The method, therefore, avoids an iteration between model fitting and balance checking. The package also implements several extensions of the CBPS beyond the cross-sectional, binary treatment setting. The current version implements the CBPS for longitudinal settings so that it can be used in conjunction with marginal structural models from Imai and Ratkovic (2015) , treatments with three- and four- valued treatment variables, continuous-valued treatments from Fong, Hazlett, and Imai (2015) , and the situation with multiple distinct binary treatments administered simultaneously. In the future it will be extended to other settings including the generalization of experimental and instrumental variable estimates. Recently we have added the optimal CBPS which chooses the optimal balancing function and results in doubly robust and efficient estimator for the treatment effect as well as high dimensional CBPS when a large number of covariates exist. CBPS Covariate Gaussian Process Latent Variable Model(c-GPLVM) Gaussian Process Regression (GPR) and Gaussian Process Latent Variable Models (GPLVM) offer a principled way of performing probabilistic non-linear regression and dimensionality reduction. In this paper we propose a hybrid between the two, the covariate-GPLVM (c-GPLVM), to perform dimensionality reduction in the presence of covariate information (e.g. continuous covariates, class labels, or censored survival times). This construction lets us adjust for covariate effects and reveals meaningful latent structure which is not revealed when using GPLVM. Furthermore, we introduce structured decomposable kernels which will let us interpret how the fixed and latent inputs contribute to feature-level variation, e.g. identify the presence of a non-linear interaction. We demonstrate the utility of this model on applications in disease progression modelling from high-dimensional gene expression data in the presence of additional phenotypes. Coverage Probability In statistics, the coverage probability of a confidence interval is the proportion of the time that the interval contains the true value of interest. For example, suppose our interest is in the mean number of months that people with a particular type of cancer remain in remission following successful treatment with chemotherapy. The confidence interval aims to contain the unknown mean remission duration with a given probability. This is the “confidence level” or “confidence coefficient” of the constructed interval which is effectively the “nominal coverage probability” of the procedure for constructing confidence intervals. The “nominal coverage probability” is often set at 0.95. The coverage probability is the actual probability that the interval contains the true mean remission duration in this example. Covert Network Covert networks are social networks that often consist of harmful users. Social Network Analysis (SNA) has played an important role in reducing criminal activities (e.g., counter terrorism) via detecting the influential users in such networks. There are various popular measures to quantify how influential or central any vertex is in a network. As expected, strategic and influential miscreants in covert networks would try to hide herself and her partners (called {\em leaders}) from being detected via these measures by introducing new edges. Waniek et al. show that the corresponding computational problem, called Hiding Leader, is NP-Complete for the degree and closeness centrality measures. Cox Proportional-Hazards Regression Cox proportional hazards regression is a semiparametric method for adjusting survival rate estimates to quantify the effect of predictor variables. The method represents the effects of explanatory variables as a multiplier of a common baseline hazard function, h0(t). The hazard function is the nonparametric part of the Cox proportional hazards regression function, whereas the impact of the predictor variables is a loglinear regression. Cox Regression The term Cox regression model (omitting proportional hazards) is sometimes used to describe the extension of the Cox model to include time-dependent factors. However, this usage is potentially ambiguous since the Cox proportional hazards model can itself be described as a regression model. Coxcomb Plot / Polar Area Diagram The polar area diagram is similar to a usual pie chart, except sectors are equal angles and differ rather in how far each sector extends from the center of the circle. The polar area diagram is used to plot cyclic phenomena (e.g., count of deaths by month). For example, if the count of deaths in each month for a year are to be plotted then there will be 12 sectors (one per month) all with the same angle of 30 degrees each. The radius of each sector would be proportional to the square root of the death count for the month, so the area of a sector represents the number of deaths in a month. If the death count in each month is subdivided by cause of death, it is possible to make multiple comparisons on one diagram, as is seen in the polar area diagram famously developed by Florence Nightingale. Cozy Cozy is a tool that automatically writes data structure implementations from high-level specifications. The source code is available on GitHub. Documentation on using Cozy for your own projects can be found there as well. cPCA++ In this work, we propose a new data visualization and clustering technique for discovering discriminative structures in high-dimensional data. This technique, referred to as cPCA++, utilizes the fact that the interesting features of a ‘target’ dataset may be obscured by high variance components during traditional PCA. By analyzing what is referred to as a ‘background’ dataset (i.e., one that exhibits the high variance principal components but not the interesting structures), our technique is capable of efficiently highlighting the structure that is unique to the ‘target’ dataset. Similar to another recently proposed algorithm called ‘contrastive PCA’ (cPCA), the proposed cPCA++ method identifies important dataset specific patterns that are not detected by traditional PCA in a wide variety of settings. However, the proposed cPCA++ method is significantly more efficient than cPCA, because it does not require the parameter sweep in the latter approach. We applied the cPCA++ method to the problem of image splicing localization. In this application, we utilize authentic edges as the background dataset and the spliced edges as the target dataset. The proposed method is significantly more efficient than state-of-the-art methods, as the former does not require iterative updates of filter weights via stochastic gradient descent and backpropagation, nor the training of a classifier. Furthermore, the cPCA++ method is shown to provide performance scores comparable to the state-of-the-art Multi-task Fully Convolutional Network (MFCN). CPDist Preference are central to decision making by both machines and humans. Representing, learning, and reasoning with preferences is an important area of study both within computer science and across the sciences. When working with preferences it is necessary to understand and compute the distance between sets of objects, e.g., the preferences of a user and a the descriptions of objects to be recommended. We present CPDist, a novel neural network to address the problem of learning to measure the distance between structured preference representations. We use the popular CP-net formalism to represent preferences and then leverage deep neural networks to learn a recently proposed metric function that is computationally hard to compute directly. CPDist is a novel metric learning approach based on the use of deep siamese networks which learn the Kendal Tau distance between partial orders that are induced by compact preference representations. We find that CPDist is able to learn the distance function with high accuracy and outperform existing approximation algorithms on both the regression and classification task using less computation time. Performance remains good even when CPDist is trained with only a small number of samples compared to the dimension of the solution space, indicating the network generalizes well. CP-Net While several convolution-like operators have recently been proposed for extracting features out of point clouds, down-sampling an unordered point cloud in a deep neural network has not been rigorously studied. Existing methods down-sample the points regardless of their importance for the output. As a result, some important points in the point cloud may be removed, while less valuable points may be passed to the next layers. In contrast, adaptive down-sampling methods sample the points by taking into account the importance of each point, which varies based on the application, task and training data. In this paper, we propose a permutation-invariant learning-based adaptive down-sampling layer, called Critical Points Layer (CPL), which reduces the number of points in an unordered point cloud while retaining the important points. Unlike most graph-based point cloud down-sampling methods that use $k$-NN search algorithm to find the neighbouring points, CPL is a global down-sampling method, rendering it computationally very efficient. The proposed layer can be used along with any graph-based point cloud convolution layer to form a convolutional neural network, dubbed CP-Net in this paper. We introduce a CP-Net for $3$D object classification that achieves the best accuracy for the ModelNet$40$ dataset among point cloud-based methods, which validates the effectiveness of the CPL. CPSDebug Debugging Cyber-Physical System (CPS) models can be extremely complex. Indeed, only the detection of a failure is insuffcient to know how to correct a faulty model. Faults can propagate in time and in space producing observable misbehaviours in locations completely different from the location of the fault. Understanding the reason of an observed failure is typically a challenging and laborious task left to the experience and domain knowledge of the designer. \n In this paper, we propose CPSDebug, a novel approach that by combining testing, specification mining, and failure analysis, can automatically explain failures in Simulink/Stateflow models. We evaluate CPSDebug on two case studies, involving two use scenarios and several classes of faults, demonstrating the potential value of our approach. Cramer-Wold ICA(CW-ICA) Non-linear source separation is a challenging open problem with many applications. We extend a recently proposed Adversarial Non-linear ICA (ANICA) model, and introduce Cramer-Wold ICA (CW-ICA). In contrast to ANICA we use a simple, closed–form optimization target instead of a discriminator–based independence measure. Our results show that CW-ICA achieves comparable results to ANICA, while foregoing the need for adversarial training. Credibility incorporating Semantic analysis and Temporal factor(CredSaT) The widespread use of big social data has pointed the research community in several significant directions. In particular, the notion of social trust has attracted a great deal of attention from information processors | computer scientists and information consumers | formal organizations. This is evident in various applications such as recommendation systems, viral marketing and expertise retrieval. Hence, it is essential to have frameworks that can temporally measure users credibility in all domains categorised under big social data. This paper presents CredSaT (Credibility incorporating Semantic analysis and Temporal factor): a fine-grained users credibility analysis framework for big social data. A novel metric that includes both new and current features, as well as the temporal factor, is harnessed to establish the credibility ranking of users. Experiments on real-world dataset demonstrate the effectiveness and applicability of our model to indicate highly domain-based trustworthy users. Further, CredSaT shows the capacity in capturing spammers and other anomalous users. Credible Interval In Bayesian statistics, a credible interval (or Bayesian confidence interval) is an interval in the domain of a posterior probability distribution used for interval estimation. The generalisation to multivariate problems is the credible region. Credible intervals are analogous to confidence intervals in frequentist statistics, although they differ on a philosophical basis; Bayesian intervals treat their bounds as fixed and the estimated parameter as a random variable, whereas frequentist confidence intervals treat their bounds as random variables and the parameter as a fixed value. For example, in an experiment that determines the uncertainty distribution of parameter t, if the probability that t lies between 35 and 45 is 0.95, then 35 <= t <= 45 is a 95% credible interval. Credible Interval / Credibility Interval In Bayesian statistics, a credible interval (or Bayesian confidence interval) is an interval in the domain of a posterior probability distribution used for interval estimation. The generalisation to multivariate problems is the credible region. Credible intervals are analogous to confidence intervals in frequentist statistics. For example, in an experiment that determines the uncertainty distribution of parameter , if the probability that lies between 35 and 45 is 0.95, then is a 95% credible interval. CrescendoNet We introduce a new deep convolutional neural network, CrescendoNet, by stacking simple building blocks without residual connections. Each Crescendo block contains independent convolution paths with increased depths. The numbers of convolution layers and parameters are only increased linearly in Crescendo blocks. In experiments, CrescendoNet with only 15 layers outperforms almost all networks without residual connections on benchmark datasets, CIFAR10, CIFAR100, and SVHN. Given sufficient amount of data as in SVHN dataset, CrescendoNet with 15 layers and 4.1M parameters can match the performance of DenseNet-BC with 250 layers and 15.3M parameters. CrescendoNet provides a new way to construct high performance deep convolutional neural networks without residual connections. Moreover, through investigating the behavior and performance of subnetworks in CrescendoNet, we note that the high performance of CrescendoNet may come from its implicit ensemble behavior, which differs from the FractalNet that is also a deep convolutional neural network without residual connections. Furthermore, the independence between paths in CrescendoNet allows us to introduce a new path-wise training procedure, which can reduce the memory needed for training. Critical Line Algorithm(CLA) The critical line method developed by the Nobel Prize winner H. Markowitz is a classical technique for the construction of a minimum-variance frontier within the paradigm of ‘the expected return-risk’ (mean-variance) and finding minimum portfolios. Considerable interest has recently been attracted to the development of a fast algorithm for the construction of the minimum-variance frontier. In some works, such algorithms have been used to find statistically stable optimal portfoli.o An Open-Source Implementation of the Critical-Line Algorithm for Portfolio Optimization The Constrained Critical Line Algorithm The Critical Line Method Applying Markowitz’s Critical Line Algorithm Critical Points Layer(CPL) While several convolution-like operators have recently been proposed for extracting features out of point clouds, down-sampling an unordered point cloud in a deep neural network has not been rigorously studied. Existing methods down-sample the points regardless of their importance for the output. As a result, some important points in the point cloud may be removed, while less valuable points may be passed to the next layers. In contrast, adaptive down-sampling methods sample the points by taking into account the importance of each point, which varies based on the application, task and training data. In this paper, we propose a permutation-invariant learning-based adaptive down-sampling layer, called Critical Points Layer (CPL), which reduces the number of points in an unordered point cloud while retaining the important points. Unlike most graph-based point cloud down-sampling methods that use $k$-NN search algorithm to find the neighbouring points, CPL is a global down-sampling method, rendering it computationally very efficient. The proposed layer can be used along with any graph-based point cloud convolution layer to form a convolutional neural network, dubbed CP-Net in this paper. We introduce a CP-Net for $3$D object classification that achieves the best accuracy for the ModelNet$40$ dataset among point cloud-based methods, which validates the effectiveness of the CPL. CRN++ Synthetic biology is a rapidly emerging research area, with expected wide-ranging impact in biology, nanofabrication, and medicine. A key technical challenge lies in embedding computation in molecular contexts where electronic micro-controllers cannot be inserted. This necessitates effective representation of computation using molecular components. While previous work established the Turing-completeness of chemical reactions, defining representations that are faithful, efficient, and practical remains challenging. This paper introduces CRN++, a new language for programming deterministic (mass-action) chemical kinetics to perform computation. We present its syntax and semantics, and build a compiler translating CRN++ programs into chemical reactions, thereby laying the foundation of a comprehensive framework for molecular programming. Our language addresses the key challenge of embedding familiar imperative constructs into a set of chemical reactions happening simultaneously and manipulating real-valued concentrations. Although some deviation from ideal output value cannot be avoided, we develop methods to minimize the error, and implement error analysis tools. We demonstrate the feasibility of using CRN++ on a suite of well-known algorithms for discrete and real-valued computation. CRN++ can be easily extended to support new commands or chemical reaction implementations, and thus provides a foundation for developing more robust and practical molecular programs. Cromwell Cromwell is a Workflow Management System geared towards scientific workflows. cromwellDashboard Cronbach’s Alpha In statistics (classical test theory), Cronbach’s alpha is the trivial name used for tau-equivalent reliability \rho_T as a (lowerbound) estimate of the reliability of a psychometric test. Synonymous terms are: coefficient alpha, Guttman’s \lambda _{3}, Hoyt method and KR-20. Cronbach’s alpha reliability coefficient is one of the most widely used indicators of the scale reliability. It is used often without concern for the data (this will be a different text) because it is simple to calculate and it requires only one implementation of a single scale. The aim of this article is to provide some more insight into the functioning of this reliability coefficient without going into heavy mathematics. Cross Adversarial Source Separation(CASS) This paper introduces a cross adversarial source separation (CASS) framework via autoencoder, a new model that aims at separating an input signal consisting of a mixture of multiple components into individual components defined via adversarial learning and autoencoder fitting. CASS unifies popular generative networks like auto-encoders (AEs) and generative adversarial networks (GANs) in a single framework. The basic building block that filters the input signal and reconstructs the $i$-th target component is a pair of deep neural networks $\mathcal{EN}_i$ and $\mathcal{DE}_i$ as an encoder for dimension reduction and a decoder for component reconstruction, respectively. The decoder $\mathcal{DE}_i$ as a generator is enhanced by a discriminator network $\mathcal{D}_i$ that favors signal structures of the $i$-th component in the $i$-th given dataset as guidance through adversarial learning. In contrast with existing practices in AEs which trains each Auto-Encoder independently, or in GANs that share the same generator, we introduce cross adversarial training that emphasizes adversarial relation between any arbitrary network pairs $(\mathcal{DE}_i,\mathcal{D}_j)$, achieving state-of-the-art performance especially when target components share similar data structures. Cross Entropy In information theory, the cross entropy between two probability distributions over the same underlying set of events measures the average number of bits needed to identify an event drawn from the set, if a coding scheme is used that is optimized for an ‘unnatural’ probability distribution q, rather than the ‘true’ distribution p. Cross Industry Standard Process for Data Mining(CRISP-DM) CRISP-DM stands for Cross Industry Standard Process for Data Mining. It is a data mining process model that describes commonly used approaches that expert data miners use to tackle problems. Polls conducted in 2002, 2004, and 2007 show that it is the leading methodology used by data miners. The only other data mining standard named in these polls was SEMMA. However, 3-4 times as many people reported using CRISP-DM. A review and critique of data mining process models in 2009 called the CRISP-DM the “de facto standard for developing data mining and knowledge discovery projects.” Other reviews of CRISP-DM and data mining process models include Kurgan and Musilek’s 2006 review, and Azevedo and Santos’ 2008 comparison of CRISP-DM and SEMMA. Cross Local Intrinsic Dimensionality(LID) Generative Adversarial Networks (GANs) are an elegant mechanism for data generation. However, a key challenge when using GANs is how to best measure their ability to generate realistic data. In this paper, we demonstrate that an intrinsic dimensional characterization of the data space learned by a GAN model leads to an effective evaluation metric for GAN quality. In particular, we propose a new evaluation measure, CrossLID, that assesses the local intrinsic dimensionality (LID) of real-world data with respect to neighborhoods found in GAN-generated samples. Intuitively, CrossLID measures the degree to which manifolds of two data distributions coincide with each other. In experiments on 4 benchmark image datasets, we compare our proposed measure to several state-of-the-art evaluation metrics. Our experiments show that CrossLID is strongly correlated with the progress of GAN training, is sensitive to mode collapse, is robust to small-scale noise and image transformations, and robust to sample size. Furthermore, we show how CrossLID can be used within the GAN training process to improve generation quality. Cross Validation Cross-validation, sometimes called rotation estimation, is a model validation technique for assessing how the results of a statistical analysis will generalize to an independent data set. It is mainly used in settings where the goal is prediction, and one wants to estimate how accurately a predictive model will perform in practice. It is worth highlighting that in a prediction problem, a model is usually given a dataset of known data on which training is run (training dataset), and a dataset of unknown data (or first seen data) against which the model is tested (testing dataset). The goal of cross validation is to define a dataset to “test” the model in the training phase (i.e., the validation dataset), in order to limit problems like overfitting, give an insight on how the model will generalize to an independent data set (i.e., an unknown dataset, for instance from a real problem), etc. CROSSBOW Deep learning models are trained on servers with many GPUs, and training must scale with the number of GPUs. Systems such as TensorFlow and Caffe2 train models with parallel synchronous stochastic gradient descent: they process a batch of training data at a time, partitioned across GPUs, and average the resulting partial gradients to obtain an updated global model. To fully utilise all GPUs, systems must increase the batch size, which hinders statistical efficiency. Users tune hyper-parameters such as the learning rate to compensate for this, which is complex and model-specific. We describe CROSSBOW, a new single-server multi-GPU system for training deep learning models that enables users to freely choose their preferred batch size – however small – while scaling to multiple GPUs. CROSSBOW uses many parallel model replicas and avoids reduced statistical efficiency through a new synchronous training method. We introduce SMA, a synchronous variant of model averaging in which replicas independently explore the solution space with gradient descent, but adjust their search synchronously based on the trajectory of a globally-consistent average model. CROSSBOW achieves high hardware efficiency with small batch sizes by potentially training multiple model replicas per GPU, automatically tuning the number of replicas to maximise throughput. Our experiments show that CROSSBOW improves the training time of deep learning models on an 8-GPU server by 1.3-4x compared to TensorFlow. CrossCat CrossCat is a domain-general, Bayesian method for analyzing high-dimensional data tables. CrossCat estimates the full joint distribution over the variables in the table from the data, via approximate inference in a hierarchical, nonparametric Bayesian model, and provides efficient samplers for every conditional distribution. CrossCat combines strengths of nonparametric mixture modeling and Bayesian network structure learning: it can model any joint distribution given enough data by positing latent variables, but also discovers independencies between the observable variables. A range of exploratory analysis and predictive modeling tasks can be addressed via CrossCat, including detecting predictive relationships between variables, finding multiple overlapping clusterings, imputing missing values, and simultaneously selecting features and classifying rows. Research on CrossCat has shown that it is suitable for analysis of real-world tables of up to 10 million cells, including hospital cost and quality measures, voting records, handwritten digits, and state-level unemployment time series. Cross-Dimensional Self-Attention(CDSA) Many real-world applications involve multivariate, geo-tagged time series data: at each location, multiple sensors record corresponding measurements. For example, air quality monitoring system records PM2.5, CO, etc. The resulting time-series data often has missing values due to device outages or communication errors. In order to impute the missing values, state-of-the-art methods are built on Recurrent Neural Networks (RNN), which process each time stamp sequentially, prohibiting the direct modeling of the relationship between distant time stamps. Recently, the self-attention mechanism has been proposed for sequence modeling tasks such as machine translation, significantly outperforming RNN because the relationship between each two time stamps can be modeled explicitly. In this paper, we are the first to adapt the self-attention mechanism for multivariate, geo-tagged time series data. In order to jointly capture the self-attention across multiple dimensions, including time, location and the sensor measurements, while maintain low computational complexity, we propose a novel approach called Cross-Dimensional Self-Attention (CDSA) to process each dimension sequentially, yet in an order-independent manner. Our extensive experiments on four real-world datasets, including three standard benchmarks and our newly collected NYC-traffic dataset, demonstrate that our approach outperforms the state-of-the-art imputation and forecasting methods. A detailed systematic analysis confirms the effectiveness of our design choices. Cross-Domain Adversarial Auto-Encoder(CDAAE) In this paper, we propose the Cross-Domain Adversarial Auto-Encoder (CDAAE) to address the problem of cross-domain image inference, generation and transformation. We make the assumption that images from different domains share the same latent code space for content, while having separate latent code space for style. The proposed framework can map cross-domain data to a latent code vector consisting of a content part and a style part. The latent code vector is matched with a prior distribution so that we can generate meaningful samples from any part of the prior space. Consequently, given a sample of one domain, our framework can generate various samples of the other domain with the same content of the input. This makes the proposed framework different from the current work of cross-domain transformation. Besides, the proposed framework can be trained with both labeled and unlabeled data, which makes it also suitable for domain adaptation. Experimental results on data sets SVHN, MNIST and CASIA show the proposed framework achieved visually appealing performance for image generation task. Besides, we also demonstrate the proposed method achieved superior results for domain adaptation. Code of our experiments is available in https://…/CDAAE. Cross-Domain Collaborative Learning(CDCL) This paper introduces a novel heterogenous domain adaptation (HDA) method for hyperspectral image classification with a limited amount of labeled samples in both domains. The method is achieved in the way of cross-domain collaborative learning (CDCL), which is addressed via cluster canonical correlation analysis (C-CCA) and random walker (RW) algorithms. To be specific, the proposed CDCL method is an iterative process of three main stages, i.e. twice of RW-based pseudolabeling and cross domain learning via C-CCA. Firstly, given the initially labeled target samples as training set ($\mathbf{TS}$), the RW-based pseudolabeling is employed to update $\mathbf{TS}$ and extract target clusters ($\mathbf{TCs}$) by fusing the segmentation results obtained by RW and extended RW (ERW) classifiers. Secondly, cross domain learning via C-CCA is applied using labeled source samples and $\mathbf{TCs}$. The unlabeled target samples are then classified with the estimated probability maps using the model trained in the projected correlation subspace. Thirdly, both $\mathbf{TS}$ and estimated probability maps are used for updating $\mathbf{TS}$ again via RW-based pseudolabeling. When the iterative process finishes, the result obtained by the ERW classifier using the final $\mathbf{TS}$ and estimated probability maps is regarded as the final classification map. Experimental results on four real HSIs demonstrate that the proposed method can achieve better performance compared with the state-of-the-art HDA and ERW methods. Cross-Domain Labeled LDA(CDL-LDA) Cross-domain text classification aims at building a classifier for a target domain which leverages data from both source and target domain. One promising idea is to minimize the feature distribution differences of the two domains. Most existing studies explicitly minimize such differences by an exact alignment mechanism (aligning features by one-to-one feature alignment, projection matrix etc.). Such exact alignment, however, will restrict models’ learning ability and will further impair models’ performance on classification tasks when the semantic distributions of different domains are very different. To address this problem, we propose a novel group alignment which aligns the semantics at group level. In addition, to help the model learn better semantic groups and semantics within these groups, we also propose a partial supervision for model’s learning in source domain. To this end, we embed the group alignment and a partial supervision into a cross-domain topic model, and propose a Cross-Domain Labeled LDA (CDL-LDA). On the standard 20Newsgroup and Reuters dataset, extensive quantitative (classification, perplexity etc.) and qualitative (topic detection) experiments are conducted to show the effectiveness of the proposed group alignment and partial supervision. Cross-Domain Latent Feature Mapping(CDLFM) Collaborative Filtering (CF) is a widely adopted technique in recommender systems. Traditional CF models mainly focus on predicting a user’s preference to the items in a single domain such as the movie domain or the music domain. A major challenge for such models is the data sparsity problem, and especially, CF cannot make accurate predictions for the cold-start users who have no ratings at all. Although Cross-Domain Collaborative Filtering (CDCF) is proposed for effectively transferring users’ rating preference across different domains, it is still difficult for existing CDCF models to tackle the cold-start users in the target domain due to the extreme data sparsity. In this paper, we propose a Cross-Domain Latent Feature Mapping (CDLFM) model for cold-start users in the target domain. Firstly, in order to better characterize users in sparse domains, we take the users’ similarity relationship on rating behaviors into consideration and propose the Matrix Factorization by incorporating User Similarities (MFUS) in which three similarity measures are proposed. Next, to perform knowledge transfer across domains, we propose a neighborhood based gradient boosting trees method to learn the cross-domain user latent feature mapping function. For each cold-start user, we learn his/her feature mapping function based on the latent feature pairs of those linked users who have similar rating behaviors with the cold-start user in the auxiliary domain. And the preference of the cold-start user in the target domain can be predicted based on the mapping function and his/her latent features in the auxiliary domain. Experimental results on two real data sets extracted from Amazon transaction data demonstrate the superiority of our proposed model against other state-of-the-art methods. CrossE Knowledge graph embedding aims to learn distributed representations for entities and relations, and is proven to be effective in many applications. Crossover interactions — bi-directional effects between entities and relations — help select related information when predicting a new triple, but haven’t been formally discussed before. In this paper, we propose CrossE, a novel knowledge graph embedding which explicitly simulates crossover interactions. It not only learns one general embedding for each entity and relation as most previous methods do, but also generates multiple triple specific embeddings for both of them, named interaction embeddings. We evaluate embeddings on typical link prediction tasks and find that CrossE achieves state-of-the-art results on complex and more challenging datasets. Furthermore, we evaluate embeddings from a new perspective — giving explanations for predicted triples, which is important for real applications. In this work, an explanation for a triple is regarded as a reliable closed-path between the head and the tail entity. Compared to other baselines, we show experimentally that CrossE, benefiting from interaction embeddings, is more capable of generating reliable explanations to support its predictions. Cross-Entropy Clustering We build a general and easily applicable clustering theory, which we call crossentropy clustering (shortly CEC), which joins the advantages of classical kmeans (easy implementation and speed) with those of EM (a ne invariance and ability to adapt to clusters of desired shapes). Moreover, contrary to k-means and EM, CEC nds the optimal number of clusters by automatically removing groups which have negative information cost. Although CEC, like EM, can be build on an arbitrary family of densities, in the most important case of Gaussian CEC the division into clusters is a ne invariant.