# WhatIs-K

 K* Nearest Neighbors Algorithm Prediction with k* nearest neighbor algorithm based on a publication by Anava and Levy (2016) . ksNN Kaleido Graph mining is one of the most important categories of graph algorithms. However, exploring the subgraphs of an input graph produces a huge amount of intermediate data. The ‘think like a vertex’ programming paradigm, pioneered by Pregel, cannot readily formulate mining problems, which is designed to produce graph computation problems like PageRank. Existing mining systems like Arabesque and RStream need large amounts of computing and memory resources. In this paper, we present Kaleido, an efficient single machine, out-of-core graph mining system which treats disks as an extension of memory. Kaleido treats intermediate data in graph mining tasks as a tensor and adopts a succinct data structure for the intermediate data. Kaleido utilizes the eigenvalue of the adjacency matrix of a subgraph to efficiently solve the subgraph isomorphism problems with an acceptable constraint that the vertex number of a subgraph is less than 9. Kaleido implements half-memory-half-disk storage for storing large intermediate data, which treats the disk as an extension of the memory. Comparing with two state-of-the-art mining systems, Arabesque and RStream, Kaleido outperforms them by a GeoMean 12.3$\times$ and 40.0$\times$ respectively. Kalman Filter Kalman filtering, also known as linear quadratic estimation (LQE), is an algorithm that uses a series of measurements observed over time, containing noise (random variations) and other inaccuracies, and produces estimates of unknown variables that tend to be more precise than those based on a single measurement alone. More formally, the Kalman filter operates recursively on streams of noisy input data to produce a statistically optimal estimate of the underlying system state. The filter is named after Rudolf (Rudy) E. Kálmán, one of the primary developers of its theory. The Kalman filter has numerous applications in technology. A common application is for guidance, navigation and control of vehicles, particularly aircraft and spacecraft. Furthermore, the Kalman filter is a widely applied concept in time series analysis used in fields such as signal processing and econometrics. Kalman filters also are one of the main topics in the field of Robotic motion planning and control, and sometimes included in Trajectory optimization. ➚ “Extended Kalman Filter” Kalman Filter For Dummies Understanding the Basis of the Kalman Filter via a Simple and Intuitive Derivation Kalman Gradient Descent We introduce Kalman Gradient Descent, a stochastic optimization algorithm that uses Kalman filtering to adaptively reduce gradient variance in stochastic gradient descent by filtering the gradient estimates. We present both a theoretical analysis of convergence in a non-convex setting and experimental results which demonstrate improved performance on a variety of machine learning areas including neural networks and black box variational inference. We also present a distributed version of our algorithm that enables large-dimensional optimization, and we extend our algorithm to SGD with momentum and RMSProp. Kalman Optimization for Value Approximation(KOVA) Policy evaluation is a key process in reinforcement learning. It assesses a given policy using estimation of the corresponding value function. When using a parameterized function to approximate the value, it is common to optimize the set of parameters by minimizing the sum of squared Bellman Temporal Differences errors. However, this approach ignores certain distributional properties of both the errors and value parameters. Taking these distributions into account in the optimization process can provide useful information on the amount of confidence in value estimation. In this work we propose to optimize the value by minimizing a regularized objective function which forms a trust region over its parameters. We present a novel optimization method, the Kalman Optimization for Value Approximation (KOVA), based on the Extended Kalman Filter. KOVA minimizes the regularized objective function by adopting a Bayesian perspective over both the value parameters and noisy observed returns. This distributional property provides information on parameter uncertainty in addition to value estimates. We provide theoretical results of our approach and analyze the performance of our proposed optimizer on domains with large state and action spaces. Kalman Smoothing The optimal fixed-interval smoother provides the optimal estimate using the measurements from a fixed interval z_1 to z_n. This is also called ‘Kalman Smoothing’. There are several smoothing algorithms in common use. ➘ “Kalman Filter” KAMILA Clustering(KAMILA) KAMILA clustering, a novel method for clustering mixed-type data in the spirit of k-means clustering. It does not require dummy coding of variables, and is efficient enough to scale to rather large data sets. kamila Kanerva Machine We present an end-to-end trained memory system that quickly adapts to new data and generates samples like them. Inspired by Kanerva’s sparse distributed memory, it has a robust distributed reading and writing mechanism. The memory is analytically tractable, which enables optimal on-line compression via a Bayesian update-rule. We formulate it as a hierarchical conditional generative model, where memory provides a rich data-dependent prior distribution. Consequently, the top-down memory and bottom-up perception are combined to produce the code representing an observation. Empirically, we demonstrate that the adaptive memory significantly improves generative models trained on both the Omniglot and CIFAR datasets. Compared with the Differentiable Neural Computer (DNC) and its variants, our memory model has greater capacity and is significantly easier to train. k-Anonymity k-anonymity is a property possessed by certain anonymized data. The concept of k-anonymity was first formulated by Latanya Sweeney in a paper published in 2002 as an attempt to solve the problem: “Given person-specific field-structured data, produce a release of the data with scientific guarantees that the individuals who are the subjects of the data cannot be re-identified while the data remain practically useful.” A release of data is said to have the k-anonymity property if the information for each person contained in the release cannot be distinguished from at least k-1 individuals whose information also appear in the release. Kanri Distance(KDC) Kanri’s proprietary combination of patented statistical and process methods provides a uniquely powerful and insightful ability to evaluate large data sets with multiple variables. While many tools evaluate patterns and dynamics for large data, only the Kanri Distance Calculator allows users to understand where they stand with respect to a desired target state and the specific contribution of each variable toward the overall distance from the target state. The Kanri model not only calculates the relationship of variables within the overall data set, but more importantly mathematically teases out the interaction between each of them. This combination of relational insights fuels Kanri’s breakthrough distance calculator. It answers the question ‘In a world of exponentially expanding data how do I find the variables that will solve my problem and it helps quickly to reach that conclusion.’ But the Kanri model does not stop there. Kanri tells you exactly, formulaically how much each variable contributes. The Kanri Distance Calculator opens a new world of solution development possibilities that can apply the power of massive data sets to an individual…or to an individualized objective. Kanri Distance Calculator Free License Version with Demo Kantorovich Distance ➘ “Wasserstein Metric” Kantorovich distance on a weighted graph kantorovich Kaplan-Meier Estimator The Kaplan-Meier estimator, also known as the product limit estimator, is a non-parametric statistic used to estimate the survival function from lifetime data. In medical research, it is often used to measure the fraction of patients living for a certain amount of time after treatment. In other fields, Kaplan-Meier estimators may be used to measure the length of time people remain unemployed after a job loss, the time-to-failure of machine parts, or how long fleshy fruits remain on plants before they are removed by frugivores. The estimator is named after Edward L. Kaplan and Paul Meier, who each submitted similar manuscripts to the Journal of the American Statistical Association. The journal editor, John Tukey, convinced them to combine their work into one paper, which has been cited about 34,000 times since its publication. Kaplan-Meier Plot numKM Kaplan-Meier Survival Curves In 1958, Edward L. Kaplan and Paul Meier collaborated to publish a seminal paper on how to deal with incomplete observations. Subsequently, the Kaplan-Meier curves and estimates of survival data have become a familiar way of dealing with differing survival times (times-to-event), especially when not all the subjects continue in the study. “Survival” times need not relate to actual survival with death being the event; the “event” may be any event of interest. Kaplan-Meier analyses are also used in nonmedical disciplines. Kardam Asynchronous distributed machine learning solutions have proven very effective so far, but always assuming perfectly functioning workers. In practice, some of the workers can however exhibit Byzantine behavior, caused by hardware failures, software bugs, corrupt data, or even malicious attacks. We introduce \emph{Kardam}, the first distributed asynchronous stochastic gradient descent (SGD) algorithm that copes with Byzantine workers. Kardam consists of two complementary components: a filtering and a dampening component. The first is scalar-based and ensures resilience against $\frac{1}{3}$ Byzantine workers. Essentially, this filter leverages the Lipschitzness of cost functions and acts as a self-stabilizer against Byzantine workers that would attempt to corrupt the progress of SGD. The dampening component bounds the convergence rate by adjusting to stale information through a generic gradient weighting scheme. We prove that Kardam guarantees almost sure convergence in the presence of asynchrony and Byzantine behavior, and we derive its convergence rate. We evaluate Kardam on the CIFAR-100 and EMNIST datasets and measure its overhead with respect to non Byzantine-resilient solutions. We empirically show that Kardam does not introduce additional noise to the learning procedure but does induce a slowdown (the cost of Byzantine resilience) that we both theoretically and empirically show to be less than $f/n$, where $f$ is the number of Byzantine failures tolerated and $n$ the total number of workers. Interestingly, we also empirically observe that the dampening component is interesting in its own right for it enables to build an SGD algorithm that outperforms alternative staleness-aware asynchronous competitors in environments with honest workers. Karger’s Algorithm In computer science and graph theory, Karger’s algorithm is a randomized algorithm to compute a minimum cut of a connected graph. It was invented by David Karger and first published in 1993. The idea of the algorithm is based on the concept of contraction of an edge (u, v) in an undirected graph G = (V, E). Informally speaking, the contraction of an edge merges the nodes u and v into one, reducing the total number of nodes of the graph by one. All other edges connecting either u or v are ‘reattached’ to the merged node, effectively producing a multigraph. Karger’s basic algorithm iteratively contracts randomly chosen edges until only two nodes remain; those nodes represent a cut in the original graph. By iterating this basic algorithm a sufficient number of times, a minimum cut can be found with high probability. Karlin-Rubin Theorem The Karlin-Rubin theorem can be regarded as an extension of the Neyman-Pearson lemma for composite hypotheses. Parametric Inference: Karlin-Rubin Theorem Karmarkar’s Algorithm Karmarkar’s algorithm is an algorithm introduced by Narendra Karmarkar in 1984 for solving linear programming problems. It was the first reasonably efficient algorithm that solves these problems in polynomial time. The ellipsoid method is also polynomial time but proved to be inefficient in practice. A simple introduction to Karmarkar’s Algorithm for Linear Programming Katz Centrality In graph theory, the Katz centrality of a node is a measure of centrality in a network. It was introduced by Leo Katz in 1953 and is used to measure the relative degree of influence of an actor (or node) within a social network. Unlike typical centrality measures which consider only the shortest path (the geodesic) between a pair of actors, Katz centrality measures influence by taking into account the total number of walks between a pair of actors. It is similar to Google’s PageRank and to the eigenvector centrality. · Katz centrality can be used to compute centrality in directed networks such as citation networks and the World Wide Web. · Katz centrality is more suitable in the analysis of directed acyclic graphs where traditionally used measures like eigenvector centrality are rendered useless. · Katz centrality can also be used in estimating the relative status or influence of actors in a social network. · In neuroscience, it is found that Katz centrality correlates with the relative firing rate of neurons in a neural network. Kaufman’s Adaptive Moving Average(KAMA) Kaufman’s Adaptive Moving Average (KAMA) was created by Perry J. Kaufman and presented in 1998 in his book “Trading Systems and Methods, 3rd Edition”. The main advantage of KAMA over other moving averages is that it takes into consideration not only the direction, but also the market volatility. KAMA adjusts its length according to the prevailing market conditions. Kayak Kayak: Library for Deep Neural Networks. This is a library that implements some useful modules and provides automatic differentiation utilities for learning deep neural networks. It is similar in spirit to tools like Theano and Torch. The objective of Kayak is to be simple to use and extend, for rapid prototyping in Python. It is unlikely to be faster than these other tools, although it is competitive and sometimes faster in performance when the architectures are highly complex. It will certainly not be faster on convolutional architectures for visual object detection and recognition tasks than, e.g., Alex Krizhevsky’s CUDA Convnet or Caffe. The point of Kayak is to be able to experiment in Python with patterns that look a lot like what you’re already used to with Numpy. It makes it easy to manage batches of data and compute gradients with backpropagation. Kayenta Kayenta is a platform for Automated Canary Analysis (ACA). It is used by Spinnaker to enable automated canary deployments. Please see the comprehensive canary documentation for more details. A canary release is a technique to reduce the risk from deploying a new version of software into production. A new version of software, referred to as the canary, is deployed to a small subset of users alongside the stable running version. Traffic is split between these two versions such that a portion of incoming requests are diverted to the canary. This approach can quickly uncover any problems with the new version without impacting the majority of users. The quality of the canary version is assessed by comparing key metrics that describe the behavior of the old and new versions. If there is significant degradation in these metrics, the canary is aborted and all of the traffic is routed to the stable version in an effort to minimize the impact of unexpected behavior. Canaries are usually run against deployments containing changes to code, but they can also be used for operational changes, including changes to configuration. KB Reconstruction We aim to automatically generate natural language narratives about an input structured knowledge base (KB). We build our generation framework based on a pointer network which can copy facts from the input KB, and add two attention mechanisms: (i) slot-aware attention to capture the association between a slot type and its corresponding slot value; and (ii) a new table position self-attention to capture the inter-dependencies among related slots. For evaluation, besides standard metrics including BLEU, METEOR, and ROUGE, we also propose a \textit{KB reconstruction} based metric by extracting a KB from the generation output and comparing it with the input KB. We also create a new data set which includes 106,216 pairs of structured KBs and their corresponding natural language descriptions for two distinct entity types. Experiments show that our approach significantly outperforms state-of-the-art methods. The reconstructed KB achieves 68.8% – 72.6% F-score. KB4Rec To develop a knowledge-aware recommender system, a key data problem is how we can obtain rich and structured knowledge information for recommender system (RS) items. Existing datasets or methods either use side information from original recommender systems (containing very few kinds of useful information) or utilize private knowledge base (KB). In this paper, we present the first public linked KB dataset for recommender systems, named KB4Rec v1.0, which has linked three widely used RS datasets with the popular KB Freebase. Based on our linked dataset, we first preform some interesting qualitative analysis experiments, in which we discuss the effect of two important factors (i.e. popularity and recency) on whether a RS item can be linked to a KB entity. Finally, we present the comparison of several knowledge-aware recommendation algorithms on our linked dataset. K-Beam Subgradient Descent Minimax optimization plays a key role in adversarial training of machine learning algorithms, such as learning generative models, domain adaptation, privacy preservation, and robust learning. In this paper, we demonstrate the failure of alternating gradient descent in minimax optimization problems due to the discontinuity of solutions of the inner maximization. To address this, we propose a new epsilon-subgradient descent algorithm that addresses this problem by simultaneously tracking K candidate solutions. Practically, the algorithm can find solutions that previous saddle-point algorithms cannot find, with only a sublinear increase of complexity in K. We analyze the conditions under which the algorithm converges to the true solution in detail. A significant improvement in stability and convergence speed of the algorithm is observed in simple representative problems, GAN training, and domain-adaptation problems. KBGAN We introduce an adversarial learning framework, which we named KBGAN, to improve the performances of a wide range of existing knowledge graph embedding models. Because knowledge graph datasets typically only contain positive facts, sampling useful negative training examples is a non-trivial task. Replacing the head or tail entity of a fact with a uniformly randomly selected entity is a conventional method for generating negative facts used by many previous works, but the majority of negative facts generated in this way can be easily discriminated from positive facts, and will contribute little towards the training. Inspired by generative adversarial networks (GANs), we use one knowledge graph embedding model as a negative sample generator to assist the training of our desired model, which acts as the discriminator in GANs. The objective of the generator is to generate difficult negative samples that can maximize their likeliness determined by the discriminator, while the discriminator minimizes its training loss. This framework is independent of the concrete form of generator and discriminator, and therefore can utilize a wide variety of knowledge graph embedding models as its building blocks. In experiments, we adversarially train two translation-based models, TransE and TransD, each with assistance from one of the two probability-based models, DistMult and ComplEx. We evaluate the performances of KBGAN on the link prediction task, using three knowledge base completion datasets: FB15k-237, WN18 and WN18RR. Experimental results show that adversarial training substantially improves the performances of target embedding models under various settings. KBpedia KBpedia is a comprehensive knowledge structure for promoting data interoperability and knowledge-based artificial intelligence, or KBAI. The KBpedia knowledge structure combines seven ‘core’ public knowledge bases – Wikipedia, Wikidata, schema.org, DBpedia, GeoNames, OpenCyc, and UMBEL – into an integrated whole. KBpedia’s upper structure, or knowledge graph, is the KBpedia Knowledge Ontology. We base KKO on the universal categories and knowledge representation theories of the great 19th century American logician, polymath and scientist, Charles Sanders Peirce. KBpedia, written primarily in OWL 2, includes 55,000 reference concepts, about 30 million entities, and 5,000 relations and properties, all organized according to about 70 modular typologies that can be readily substituted or expanded. We test candidates added to KBpedia using a rigorous (but still fallible) suite of logic and consistency tests – and best practices – before acceptance. The result is a flexible and computable knowledge graph that can be sliced-and-diced and configured for all sorts of machine learning tasks, including supervised, unsupervised and deep learning. K-Competitive Autoencoder for Text(KATE) Autoencoders have been successful in learning meaningful representations from image datasets. However, their performance on text datasets has not been widely studied. Traditional autoencoders tend to learn possibly trivial representations of text documents due to their confounding properties such as high-dimensionality, sparsity and power-law word distributions. In this paper, we propose a novel k-competitive autoencoder, called KATE, for text documents. Due to the competition between the neurons in the hidden layer, each neuron becomes specialized in recognizing specific data patterns, and overall the model can learn meaningful representations of textual data. A comprehensive set of experiments show that KATE can learn better representations than traditional autoencoders including denoising, contractive, variational, and k-sparse autoencoders. Our model also outperforms deep generative models, probabilistic topic models, and even word representation models (e.g., Word2Vec) in terms of several downstream tasks such as document classification, regression, and retrieval. K-Core Decomposition The $k$-core decomposition is a fundamental primitive in many machine learning and data mining applications. We present the first distributed and the first streaming algorithms to compute and maintain an approximate $k$-core decomposition with provable guarantees. Our algorithms achieve rigorous bounds on space complexity while bounding the number of passes or number of rounds of computation. We do so by presenting a new powerful sketching technique for $k$-core decomposition, and then by showing it can be computed efficiently in both streaming and MapReduce models. Finally, we confirm the effectiveness of our sketching technique empirically on a number of publicly available graphs. KDSL We propose KDSL, a new word sense disambiguation (WSD) framework that utilizes knowledge to automatically generate sense-labeled data for supervised learning. First, from WordNet, we automatically construct a semantic knowledge base called DisDict, which provides refined feature words that highlight the differences among word senses, i.e., synsets. Second, we automatically generate new sense-labeled data by DisDict from unlabeled corpora. Third, these generated data, together with manually labeled data, are fed to a supervised learning neural network to model the semantic relations among synsets, feature words and their contexts. Jointly with the supervised learning process, we also implement unsupervised learning on unlabeled data as an auxiliary task. The experimental results show that KDSL outperforms several representative state-of-the-art methods on various major benchmarks. Interestingly, it performs relatively well even when manually labeled data is unavailable, thus provides a new promising backoff strategy for WSD. Kendall Distance ➘ “Kendall Tau Distance” Kendall,rankdist Kendall Rank Correlation Coefficient In statistics, the Kendall rank correlation coefficient, commonly referred to as Kendall’s tau coefficient (after the Greek letter τ), is a statistic used to measure the association between two measured quantities. A tau test is a non-parametric hypothesis test for statistical dependence based on the tau coefficient. It is a measure of rank correlation: the similarity of the orderings of the data when ranked by each of the quantities. It is named after Maurice Kendall, who developed it in 1938, though Gustav Fechner had proposed a similar measure in the context of time series in 1897. Kendall Tau Distance The Kendall tau rank distance is a metric that counts the number of pairwise disagreements between two ranking lists. The larger the distance, the more dissimilar the two lists are. Kendall tau distance is also called bubble-sort distance since it is equivalent to the number of swaps that the bubble sort algorithm would make to place one list in the same order as the other list. The Kendall tau distance was created by Maurice Kendall. Keras Keras is a high-level neural networks library, written in Python and capable of running on top of either TensorFlow or Theano. It was developed with a focus on enabling fast experimentation. Use Keras if you need a deep learning library that: • Allows for easy and fast prototyping (through total modularity, minimalism, and extensibility). • Supports both convolutional networks and recurrent networks, as well as combinations of the two. • Supports arbitrary connectivity schemes (including multi-input and multi-output training). • Runs seamlessly on CPU and GPU. Deep Learning with Keras Kernel Canonical Correlation Analysis(KCCA) Measures of association between two sets of random variables have long been of interest to statisticians. The classical canonical correlation analysis can characterize, but also be limited to, linear association. In this article we study nonlinear association measures using the kernel method. The introduction of kernel method from machine learning community has a great impact on statistical analysis. The kernel canonical correlation analysis (KCCA) is a method that generalizes the classical linear canonical correlation analysis to nonlinear setting. Such a generalization is nonparametric. It allows us to depict the nonlinear relation of two sets of variables and enables applications of classical multivariate data analysis originally constrained to linearity relation. Moreover, the kernel-based canonical correlation analysis no longer requires the Gaussian distributional assumption on observations, and therefore enhances greatly the applicability. Kernel Canonical Correlation Analysis and its Applications to Nonlinear Measures of Association and Test of Independence Kernel Conditional Deviance for Causal Inference(KCDC) Discovering the causal structure among a set of variables is a fundamental problem in many areas of science. In this paper, we propose Kernel Conditional Deviance for Causal Inference (KCDC) a fully nonparametric causal discovery method based on purely observational data. From a novel interpretation of the notion of asymmetry between cause and effect, we derive a corresponding asymmetry measure using the framework of reproducing kernel Hilbert spaces. Based on this, we propose three decision rules for causal discovery. We demonstrate the wide applicability of our method across a range of diverse synthetic datasets. Furthermore, we test our method on real-world time series data and the real-world benchmark dataset Tubingen Cause-Effect Pairs where we outperform existing state-of-the-art methods. Kernel Convolution(kervolution) Convolutional neural networks (CNNs) have enabled the state-of-the-art performance in many computer vision tasks. However, little effort has been devoted to establishing convolution in non-linear space. Existing works mainly leverage on the activation layers, which can only provide point-wise non-linearity. To solve this problem, a new operation, kervolution (kernel convolution), is introduced to approximate complex behaviors of human perception systems leveraging on the kernel trick. It generalizes convolution, enhances the model capacity, and captures higher order interactions of features, via patch-wise kernel functions, but without introducing additional parameters. Extensive experiments show that kervolutional neural networks (KNN) achieve higher accuracy and faster convergence than baseline CNN. Kernel Density Estimation(KDE) In statistics, kernel density estimation (KDE) is a non-parametric way to estimate the probability density function of a random variable. Kernel density estimation is a fundamental data smoothing problem where inferences about the population are made, based on a finite data sample. In some fields such as signal processing and econometrics it is also termed the Parzen-Rosenblatt window method, after Emanuel Parzen and Murray Rosenblatt, who are usually credited with independently creating it in its current form. http://…/Scott2015.pdf Kernel Fisher Discriminant Analysis(KFD,KFDA) In statistics, kernel Fisher discriminant analysis (KFD), also known as generalized discriminant analysis and kernel discriminant analysis, is a kernelized version of linear discriminant analysis. It is named after Ronald Fisher. Using the kernel trick, LDA is implicitly performed in a new feature space, which allows non-linear mappings to be learned. ➘ “Linear Discriminant Analysis” lfda,kfda Kernel Graph Convolutional Neural Network Graph kernels have been successfully applied to many graph classification problems. Typically, a kernel is first designed, and then an SVM classifier is trained based on the features defined implicitly by this kernel. This two-stage approach decouples data representation from learning, which is suboptimal. On the other hand, Convolutional Neural Networks (CNNs) have the capability to learn their own features directly from the raw data during training. Unfortunately, they cannot handle irregular data such as graphs. We address this challenge by using graph kernels to embed meaningful local neighborhoods of the graphs in a continuous vector space. A set of filters is then convolved with these patches, pooled, and the output is then passed to a feedforward network. With limited parameter tuning, our approach outperforms strong baselines on 7 out of 10 benchmark datasets. Kernel Machine Learning(KernelML) I created a custom ‘particle optimizer’ and published a pip python package called kernelml. The motivation for making this algorithm was to give analysts and data scientists a generalized machine learning algorithm for complex loss functions and non-linear coefficients. The optimizer uses a combination of simple machine learning and probabilistic simulations to search for optimal parameters using a loss function, input and output matrices, and (optionally) a random sampler. I´m currently working on more features and hope to eventually make the project open source. Kernel Mean Embedding A Hilbert space embedding of a distribution. Book: Kernel Mean Embedding of Distributions Kernel Mean-p Power Error(KMPE) Correntropy is a second order statistical measure in kernel space, which has been successfully applied in robust learning and signal processing. In this paper, we define a nonsecond order statistical measure in kernel space, called the kernel mean-p power error (KMPE), including the correntropic loss (CLoss) as a special case. Some basic properties of KMPE are presented. In particular, we apply the KMPE to extreme learning machine (ELM) and principal component analysis (PCA), and develop two robust learning algorithms, namely ELM-KMPE and PCA-KMPE. Experimental results on synthetic and benchmark data show that the developed algorithms can achieve consistently better performance when compared with some existing methods. Kernel Methods In computer science, kernel methods are a class of algorithms for pattern analysis, whose best known member is the support vector machine (SVM). The general task of pattern analysis is to find and study general types of relations (for example clusters, rankings, principal components, correlations, classifications) in datasets. For many of these tasks, data have to be represented as feature vectors, but kernel methods replace this representation by similarities to other data points. Kernel Normalized Least-Mean Square(KNLMS) In the last decade, a considerable research effort has been devoted to developing adaptive algorithms based on kernel functions. One of the main features of these algorithms is that they form a family of universal approximation techniques, solving problems with nonlinearities elegantly. In this paper, we present data-selective adaptive kernel normalized least-mean square (KNLMS) algorithms that can increase their learning rate and reduce their computational complexity. In fact, these methods deal with kernel expansions, creating a growing structure also known as the dictionary, whose size depends on the number of observations and their innovation. The algorithms described herein use an adaptive step-size to accelerate the learning and can offer an excellent tradeoff between convergence speed and steady state, which allows them to solve nonlinear filtering and estimation problems with a large number of parameters without requiring a large computational cost. The data-selective update scheme also limits the number of operations performed and the size of the dictionary created by the kernel expansion, saving computational resources and dealing with one of the major problems of kernel adaptive algorithms. A statistical analysis is carried out along with a computational complexity analysis of the proposed algorithms. Simulations show that the proposed KNLMS algorithms outperform existing algorithms in examples of nonlinear system identification and prediction of a time series originating from a nonlinear difference equation. Kernel Principal Component Analysis(kPCA) In the field of multivariate statistics, kernel principal component analysis (kernel PCA) is an extension of principal component analysis (PCA) using techniques of kernel methods. Using a kernel, the originally linear operations of PCA are done in a reproducing kernel Hilbert space with a non-linear mapping. Kernel Regression With Sparse Metric Learning(KR-SML) Kernel regression is a popular non-parametric fitting technique. It aims at learning a function which estimates the targets for test inputs as precise as possible. Generally, the function value for a test input is estimated by a weighted average of the surrounding training examples. The weights are typically computed by a distance-based kernel function and they strongly depend on the distances between examples. In this paper, we first review the latest developments of sparse metric learning and kernel regression. Then a novel kernel regression method involving sparse metric learning, which is called kernel regression with sparse metric learning (KR$\_$SML), is proposed. The sparse kernel regression model is established by enforcing a mixed $(2,1)$-norm regularization over the metric matrix. It learns a Mahalanobis distance metric by a gradient descent procedure, which can simultaneously conduct dimensionality reduction and lead to good prediction results. Our work is the first to combine kernel regression with sparse metric learning. To verify the effectiveness of the proposed method, it is evaluated on 19 data sets for regression. Furthermore, the new method is also applied to solving practical problems of forecasting short-term traffic flows. In the end, we compare the proposed method with other three related kernel regression methods on all test data sets under two criterions. Experimental results show that the proposed method is much more competitive. Kernel Support Matrix Machine(KSMM) Tensor is a natural and compact representation for real world data which are often multi-dimensional. Meanwhile, problems of supervised tensor learning (STL) are commonly encountered in applications. Most existing classifiers based on tensor representation, such as support tensor machine (STM) need to solve iteratively which occupy much time and may suffer from local minima. In this paper, we present a kernel support matrix machine (KSMM) connected with the matrix Hilbert space to perform supervised learning when data are represented as matrices. KSMM is a general framework for constructing matrix-based hyperplane to exploit information. We analyze a unifying optimization problem for which we propose an asymptotically convergent algorithm. The goal is to both determine the hyperplane as well as predict the unlabeled samples. Theoretical analysis for the generalization bounds is derived based on Rademacher complexity with respect to a probability distribution. We demonstrate the merits of the proposed method by exhaustive experiments on simulation study and a number of real-word datasets from a variety of application domains. Kernel Treelets(KT) A new method for hierarchical clustering is presented. It combines treelets, a particular multiscale decomposition of data, with a projection on a reproducing kernel Hilbert space. The proposed approach, called kernel treelets (KT), effectively substitutes the correlation coefficient matrix used in treelets with a symmetric, positive semi-definite matrix efficiently constructed from a kernel function. Unlike most clustering methods, which require data sets to be numeric, KT can be applied to more general data and yield a multi-resolution sequence of basis on the data directly in feature space. The effectiveness and potential of KT in clustering analysis is illustrated with some examples. Kernel Wasserstein Distance The Wasserstein distance is a powerful metric based on the theory of optimal transport. It gives a natural measure of the distance between two distributions with a wide range of applications. In contrast to a number of the common divergences on distributions such as Kullback-Leibler or Jensen-Shannon, it is (weakly) continuous, and thus ideal for analyzing corrupted data. To date, however, no kernel methods for dealing with nonlinear data have been proposed via the Wasserstein distance. In this work, we develop a novel method to compute the L2-Wasserstein distance in a kernel space implemented using the kernel trick. The latter is a general method in machine learning employed to handle data in a nonlinear manner. We evaluate the proposed approach in identifying computerized tomography (CT) slices with dental artifacts in head and neck cancer, performing unsupervised hierarchical clustering on the resulting Wasserstein distance matrix that is computed on imaging texture features extracted from each CT slice. Our experiments show that the kernel approach outperforms classical non-kernel approaches in identifying CT slices with artifacts. Kernelized Movement Primitives(KMP) During the past few years, probabilistic approaches to imitation learning have earned a relevant place in the literature. One of their most prominent features, in addition to extracting a mean trajectory from task demonstrations, is that they provide a variance estimation. The intuitive meaning of this variance, however, changes across different techniques, indicating either variability or uncertainty. In this paper we leverage kernelized movement primitives (KMP) to provide a new perspective on imitation learning by predicting variability, correlations and uncertainty about robot actions. This rich set of information is used in combination with optimal controller fusion to learn actions from data, with two main advantages: i) robots become safe when uncertain about their actions and ii) they are able to leverage partial demonstrations, given as elementary sub-tasks, to optimally perform a higher level, more complex task. We showcase our approach in a painting task, where a human user and a KUKA robot collaborate to paint a wooden board. The task is divided into two sub-tasks and we show that using our approach the robot becomes compliant (hence safe) outside the training regions and executes the two sub-tasks with optimal gains. KernelQC Quasi-cliques are dense incomplete subgraphs of a graph that generalize the notion of cliques. Enumerating quasi-cliques from a graph is a robust way to detect densely connected structures with applications to bio-informatics and social network analysis. However, enumerating quasi-cliques in a graph is a challenging problem, even harder than the problem of enumerating cliques. We consider the enumeration of top-k degree-based quasi-cliques, and make the following contributions: (1) We show that even the problem of detecting if a given quasi-clique is maximal (i.e. not contained within another quasi-clique) is NP-hard (2) We present a novel heuristic algorithm KernelQC to enumerate the k largest quasi-cliques in a graph. Our method is based on identifying kernels of extremely dense subgraphs within a graph, following by growing subgraphs around these kernels, to arrive at quasi-cliques with the required densities (3) Experimental results show that our algorithm accurately enumerates quasi-cliques from a graph, is much faster than current state-of-the-art methods for quasi-clique enumeration (often more than three orders of magnitude faster), and can scale to larger graphs than current methods. Kervolutional Neural Network(KNN) ➚ “Kernel Convolution” Key Performance Variable(KPV) Keyhole Markup Language(KML) Keyhole Markup Language (KML) is an XML notation for expressing geographic annotation and visualization within Internet-based, two-dimensional maps and three-dimensional Earth browsers. KML was developed for use with Google Earth, which was originally named Keyhole Earth Viewer. It was created by Keyhole, Inc, which was acquired by Google in 2004. KML became an international standard of the Open Geospatial Consortium in 2008. Google Earth was the first program able to view and graphically edit KML files. Other projects such as Marble have also started to develop KML support. https://…/shapeFileToKML http://…/9781482234817 plotKML Keyphrase Extraction ➘ “Keyphrase Extraction Algorithm” Keyphrase Extraction Algorithm(KEA,KEA++) Keywords and keyphrases (multi-word units) are widely used in large document collections. They describe the content of single documents and provide a kind of semantic metadata that is useful for a wide variety of purposes. The task of assigning keyphrases to a document is called keyphrase indexing. For example, academic papers are often accompanied by a set of keyphrases freely chosen by the author. In libraries professional indexers select keyphrases from a controlled vocabulary (also called Subject Headings) according to defined cataloguing rules. On the Internet, digital libraries, or any depositories of data (flickr, del.icio.us, blog articles etc.) also use keyphrases (or here called content tags or content labels) to organize and provide a thematic access to their data. KEA is an algorithm for extracting keyphrases from text documents. It can be either used for free indexing or for indexing with a controlled vocabulary. KEA is implemented in Java and is platform independent. It is an open-source software distributed under the GNU General Public License. http://…/06-OM-IHW-Thesaurus-auto-keyphrase.pdf Keyphrase Indexing Keyphrases represent a brief but precise summary of documents. They are widely used for organizing library holdings and providing thematic access to them. Manual assignment of highquality keyphrases is expensive and time-consuming, therefore automatic techniques are in great demand. There are two existing approaches. In keyphrase extraction, the phrases occurring in the document are analyzed to identify apparently significant ones, on the basis of properties such as frequency and length. In term assignment keyphrases are chosen from a controlled vocabulary of terms, and documents are classified according to their content into classes that correspond to elements of the vocabulary. One serious disadvantage of the former approach is that the extracted phrases are often ill formed or inappropriate. The assignment approach circumvents this problem, but for satisfactory results a vast and accurate manually created corpus of training material is needed. This paper describes keyphrase indexing, an intermediate approach between keyphrase extraction and term assignment that combines the advantages of both and avoids their shortcomings. Keypoint Attended Visual Attention Network(KAVAN) As an intuitive way of expression emotion, the animated Graphical Interchange Format (GIF) images have been widely used on social media. Most previous studies on automated GIF emotion recognition fail to effectively utilize GIF’s unique properties, and this potentially limits the recognition performance. In this study, we demonstrate the importance of human related information in GIFs and conduct human-centered GIF emotion recognition with a proposed Keypoint Attended Visual Attention Network (KAVAN). The framework consists of a facial attention module and a hierarchical segment temporal module. The facial attention module exploits the strong relationship between GIF contents and human characters, and extracts frame-level visual feature with a focus on human faces. The Hierarchical Segment LSTM (HS-LSTM) module is then proposed to better learn global GIF representations. Our proposed framework outperforms the state-of-the-art on the MIT GIFGIF dataset. Furthermore, the facial attention module provides reliable facial region mask predictions, which improves the model’s interpretability. KeystoneML KeystoneML is a software framework, written in Scala, from the UC Berkeley AMPLab designed to simplify the construction of large scale, end-to-end, machine learning pipelines with Apache Spark. 6 reasons why I like KeystoneML KeyVec Previous studies have demonstrated the empirical success of word embeddings in various applications. In this paper, we investigate the problem of learning distributed representations for text documents which many machine learning algorithms take as input for a number of NLP tasks. We propose a neural network model, KeyVec, which learns document representations with the goal of preserving key semantics of the input text. It enables the learned low-dimensional vectors to retain the topics and important information from the documents that will flow to downstream tasks. Our empirical evaluations show the superior quality of KeyVec representations in two different document understanding tasks. KFHE-HOMER Multi-label classification allows a datapoint to be labelled with more than one class at the same time. Ensemble methods generally perform much better than single classifiers. Except bagging style ensembles like ECC, RAkEL, in multi-label classification, other ensemble methods have not been explored much. KFHE (Kalman Filter-based Heuristic Ensemble), is a recent ensemble method which uses the Kalman filter to combine several models. KFHE views the final ensemble to be learned as a state to be estimated which it estimates using multiple noisy ‘measurements’. These ‘measurements’ are essentially component classifiers trained under different settings. This work extends KFHE to multi-label domain by proposing KFHE-HOMER which enhances the performance of HOMER using the KFHE framework. KFHE-HOMER sequentially trains multiple HOMER classifiers using weighted training datapoints and random hyperparameters. These models are considered as measurements and their related error as the uncertainty of the measurements. Then the Kalman filter framework is used to combine these measurements to get a more accurate estimate. The method was tested on 10 multi-label datasets and compared with other multi-label classification algorithms. Results show that KFHE-HOMER performs consistently better than similar multi-label ensemble methods. K-fold Cross Validation In k-fold cross-validation, the original sample is randomly partitioned into k equal size subsamples. Of the k subsamples, a single subsample is retained as the validation data for testing the model, and the remaining k – 1 subsamples are used as training data. The cross-validation process is then repeated k times (the folds), with each of the k subsamples used exactly once as the validation data. The k results from the folds can then be averaged (or otherwise combined) to produce a single estimation. The advantage of this method over repeated random sub-sampling is that all observations are used for both training and validation, and each observation is used for validation exactly once. 10-fold cross-validation is commonly used, but in general k remains an unfixed parameter. KG-AUTOENCODER In the last years, deep learning has shown to be a game-changing technology in artificial intelligence thanks to the numerous successes it reached in diverse application fields. Among others, the use of deep learning for the recommendation problem, although new, looks quite promising due to its positive performances in terms of accuracy of recommendation results. In a recommendation setting, in order to predict user ratings on unknown items a possible configuration of a deep neural network is that of autoencoders tipically used to produce a lower dimensionality representation of the original data. In this paper we present KG-AUTOENCODER, an autoencoder that bases the structure of its neural network on the semanticsaware topology of a knowledge graph thus providing a label for neurons in the hidden layer that are eventually used to build a user profile and then compute recommendations. We show the effectiveness of KG-AUTOENCODER in terms of accuracy, diversity and novelty by comparing with state of the art recommendation algorithms. KGCleaner KGCleaner is a framework to \emph{identify} and \emph{correct} errors in data produced and delivered by an information extraction system. These tasks have been understudied and KGCleaner is the first to address both. We introduce a multi-task model that jointly learns to predict if an extracted relation is credible and repair it if not. We evaluate our approach and other models as instance of our framework on two collections: a Wikidata corpus of nearly 700K facts and 5M fact-relevant sentences and a collection of 30K facts from the 2015 TAC Knowledge Base Population task. For credibility classification, parameter efficient simple shallow neural network can achieve an absolute performance gain of 30 $F_1$ points on Wikidata and comparable performance on TAC. For the repair task, significant performance (at more than twice) gain can be obtained depending on the nature of the dataset and the models. K-Groups We propose a new class of distribution-based clustering algorithms, called k-groups, based on energy distance between samples. The energy distance clustering criterion assigns observations to clusters according to a multi-sample energy statistic that measures the distance between distributions. The energy distance determines a consistent test for equality of distributions, and it is based on a population distance that characterizes equality of distributions. The k-groups procedure therefore generalizes the k-means method, which separates clusters that have different means. We propose two k-groups algorithms: k-groups by first variation; and k-groups by second variation. The implementation of k-groups is partly based on Hartigan and Wong’s algorithm for k-means. The algorithm is generalized from moving one point on each iteration (first variation) to moving $m$ $(m > 1)$ points. For univariate data, we prove that Hartigan and Wong’s k-means algorithm is a special case of k-groups by first variation. The simulation results from univariate and multivariate cases show that our k-groups algorithms perform as well as Hartigan and Wong’s k-means algorithm when clusters are well-separated and normally distributed. Moreover, both k-groups algorithms perform better than k-means when data does not have a finite first moment or data has strong skewness and heavy tails. For non–spherical clusters, both k-groups algorithms performed better than k-means in high dimension, and k-groups by first variation is consistent as dimension increases. In a case study on dermatology data with 34 features, both k-groups algorithms performed better than k-means. KI, KR Robustness Indicators The KI statistic falls between 0 and 1, gives a value of 1 for a perfect model, and gives 0 for a completely random model. This gives it an intuitive feel for a good model metric as Marcade (KXEN) suggests it should. KI is calculated as a “percent of perfect”. Kinesic-Proxemic-Message Gate(KPM-Gate) In crowded social scenarios with a myriad of external stimuli, human brains exhibit a natural ability to filter out irrelevant information and narrowly focus their attention. In the midst of multiple groups of people, humans use such sensory gating to effectively further their own group’s interactional goals. In this work, we consider the design of a policy network to model multi-group multi-person communication. Our policy takes as input the state of the world such as an agent’s gaze direction, body pose of other agents or history of past actions, and outputs an optimal action such as speaking, listening or responding (communication modes). Inspired by humans’ natural neurobiological filtering process, a central component of our policy network design is an information gating function, termed the Kinesic-Proxemic-Message Gate (KPM-Gate), that models the ability of an agent to selectively gather information from specific neighboring agents. The degree of influence of a neighbor is based on dynamic non-verbal cues such as body motion, head pose (kinesics) and interpersonal space (proxemics). We further show that the KPM-Gate can be used to discover social groups using its natural interpretation as a social attention mechanism. We pose the communication policy learning problem as a multi-agent imitation learning problem. We learn a single policy shared by all agents under the assumption of a decentralized Markov decision process. We term our policy network as the Multi-Agent Group Discovery and Communication Mode Network (MAGDAM network), as it learns social group structure in addition to the dynamics of group communication. Our experimental validation on both synthetic and real world data shows that our model is able to both discover social group structure and learn an accurate multi-agent communication policy. Kinetic Compressive Sensing(KCS) Parametric images provide insight into the spatial distribution of physiological parameters, but they are often extremely noisy, due to low SNR of tomographic data. Direct estimation from projections allows accurate noise modeling, improving the results of post-reconstruction fitting. We propose a method, which we name kinetic compressive sensing (KCS), based on a hierarchical Bayesian model and on a novel reconstruction algorithm, that encodes sparsity of kinetic parameters. Parametric maps are reconstructed by maximizing the joint probability, with an Iterated Conditional Modes (ICM) approach, alternating the optimization of activity time series (OS-MAP-OSL), and kinetic parameters (MAP-LM). We evaluated the proposed algorithm on a simulated dynamic phantom: a bias/variance study confirmed how direct estimates can improve the quality of parametric maps over a post-reconstruction fitting, and showed how the novel sparsity prior can further reduce their variance, without affecting bias. Real FDG PET human brain data (Siemens mMR, 40min) images were also processed. Results enforced how the proposed KCS-regularized direct method can produce spatially coherent images and parametric maps, with lower spatial noise and better tissue contrast. A GPU-based open source implementation of the algorithm is provided. Kinetic Euclidean Distance Matrix(KEDM) Euclidean distance matrices (EDMs) are a major tool for localization from distances, with applications ranging from protein structure determination to global positioning and manifold learning. They are, however, static objects which serve to localize points from a snapshot of distances. If the objects move, one expects to do better by modeling the motion. In this paper, we introduce Kinetic Euclidean Distance Matrices (KEDMs)—a new kind of time-dependent distance matrices that incorporate motion. The entries of KEDMs become functions of time, the squared time-varying distances. We study two smooth trajectory models—polynomial and bandlimited trajectories—and show that these trajectories can be reconstructed from incomplete, noisy distance observations, scattered over multiple time instants. Our main contribution is a semidefinite relaxation (SDR), inspired by SDRs for static EDMs. Similarly to the static case, the SDR is followed by a spectral factorization step; however, because spectral factorization of polynomial matrices is more challenging than for constant matrices, we propose a new factorization method that uses anchor measurements. Extensive numerical experiments show that KEDMs and the new semidefinite relaxation accurately reconstruct trajectories from noisy, incomplete distance data and that, in fact, motion improves rather than degrades localization if properly modeled. This makes KEDMs a promising tool for problems in geometry of dynamic points sets. kinn A graph based regression model from flat unstructured dataset. Each line in the input data set is treated as a node from which an edge to another line (node) can be formed. In the training process, a model is created which contains sparse graph adjacency matrix. This model is then used for prediction by taking a predictor and the model as inputs and outputs a prediction which is an average of the most similar node and its neighbours in the model graph. kinn Kitematic Kitematic is an open source project built to simplify and streamline using Docker on a Mac or Windows (coming soon) PC. Kitematic automates the Docker installation and setup process and provides an intuitive graphical user interface (GUI) for running Docker containers. Kitematic integrates with Docker Machine to provision a VirtualBox VM and install the Docker Engine locally on your machine. Once installed, the Kitematic GUI launches and from the home screen you will be presented with curated images that you can run instantly. You can search for any public images on Docker Hub from Kitematic just by typing in the search bar. You can use the GUI to create, run and manage your containers just by clicking on buttons. Kitematic allows you to switch back and forth between the Docker CLI and the GUI. Kitematic also automates advanced features such as managing ports and configuring volumes. You can use Kitematic to change environment variables, stream logs, and single click terminal into your Docker container all from the GUI. KITTI Benchmark We take advantage of our autonomous driving platform Annieway to develop novel challenging real-world computer vision benchmarks. Our tasks of interest are: stereo, optical flow, visual odometry, 3D object detection and 3D tracking. For this purpose, we equipped a standard station wagon with two high-resolution color and grayscale video cameras. Accurate ground truth is provided by a Velodyne laser scanner and a GPS localization system. Our datsets are captured by driving around the mid-size city of Karlsruhe, in rural areas and on highways. Up to 15 cars and 30 pedestrians are visible per image. Besides providing all data in raw format, we extract benchmarks for each task. For each of our benchmarks, we also provide an evaluation metric and this evaluation website. Preliminary experiments show that methods ranking high on established benchmarks such as Middlebury perform below average when being moved outside the laboratory to the real world. Our goal is to reduce this bias and complement existing benchmarks by providing real-world benchmarks with novel difficulties to the community. Kleinberg’s Impossibility Theorem Although the study of clustering is centered around an intuitively compelling goal, it has been very difficult to develop a unified framework for reasoning about it at a technical level, and pro- foundly diverse approaches to clustering abound in the research community. Here we suggest a formal perspective on the difficulty in finding such a unification, in the form of an impossibility theorem: for a set of three simple properties, we show that there is no clustering function satisfying all three. Relaxations of these properties expose some of the interesting (and unavoidable) trade-offs at work in well-studied clustering techniques such as single-linkage, sum-of-pairs, k-means, and k-median. KL-Hardness We introduce KL-hardness, a new notion of hardness for search problems which on the one hand is satisfied by all one-way functions and on the other hand implies both next-block pseudoentropy and inaccessible-entropy, two forms of computational entropy used in recent constructions of pseudorandom generators and statistically hiding commitment schemes, respectively. Thus, KL-hardness unifies the latter two notions of computational entropy and sheds light on the apparent ‘duality’ between them. Additionally, it yields a more modular and illuminating proof that one-way functions imply next-block inaccessible entropy, similar in structure to the proof that one-way functions imply next-block pseudoentropy (Vadhan and Zheng, STOC ’12). KloakDB A private data federation enables data owners to pool their information for querying without disclosing their secret tuples to one another. Here, a client queries the union of the records of all data owners. The data owners work together to answer the query using privacy-preserving algorithms that prevent them from learning unauthorized information about the inputs of their peers. Only the client, and a federation coordinator, learn the query’s output. KloakDB is a private data federation that uses trusted hardware to process SQL queries over the inputs of two or more parties. Currently private data federations compute their queries fully-obliviously, guaranteeing that no information is revealed about the sensitive inputs of a data owner to their peers by observing the query’s instruction traces and memory access patterns. Oblivious querying almost always exacts multiple orders of magnitude slowdown in query runtimes compared to plaintext execution, making it impractical for many applications. KloakDB offers a semi-oblivious computing framework, $k$-anonymous query processing. We make the query’s observable transcript $k$-anonymous because it is a popular standard for data release in many domains including medicine, educational research, and government data. KloakDB’s queries run such that each data owner may deduce information about no fewer than $k$ individuals in the data of their peers. In addition, stakeholders set $k$, creating a novel trade-off between privacy and performance. Our results show that KloakDB enjoys speedups of up to $117$X using k-anonymous query processing over full-oblivious evaluation. Klout Score Klout is a website and mobile app that uses social media analytics to rank its users according to online social influence via the ‘Klout Score’, which is a numerical value between 1 and 100. In determining the user score, Klout measures the size of a user’s social media network and correlates the content created to measure how other users interact with that content. Klout Score: Measuring Influence Across Multiple Social Networks RKlout KlusTree Graph structured data on the web is now massive as well as diverse, ranging from social networks, web graphs to knowledge-bases. Effectively querying this graph structured data is non-trivial and has led to research in a variety of directions — structured queries, keyword and natural language queries, automatic translation of these queries to structured queries, etc. We are concerned with a class of queries called relationship queries, which are usually expressed as a set of keywords (each keyword denoting a named entity). The results returned are a set of ranked trees, each of which denotes relationships among the various keywords. The result list could consist of hundreds of answers. The problem of keyword search on graphs has been explored for over a decade now, but an important aspect that is not as extensively studied is that of user experience. We propose KlusTree, which presents clustered results to the users instead of a list of all the results. In our approach, the result trees are represented using language models and are clustered using JS divergence as a distance measure. We compare KlusTree with the well-known approaches based on isomorphism and tree-edit distance based clustering. The user evaluations show that KlusTree outperforms the other two in providing better clustering, thereby enriching user experience, revealing interesting patterns and improving result interpretation by the user. K-Means k-means clustering is a method of vector quantization, originally from signal processing, that is popular for cluster analysis in data mining. k-means clustering aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster. This results in a partitioning of the data space into Voronoi cells. Interactive visualisation clustering using k-means K-Means Batch Bayesian Optimization(KMBBO) We present K-Means Batch Bayesian Optimization (KMBBO), a novel batch sampling algorithm for Bayesian Optimization (BO). KMBBO uses unsupervised learning to efficiently estimate peaks of the model acquisition function. We show in empirical experiments that our method outperforms the current state-of-the-art batch allocation algorithms on a variety of test problems including tuning of algorithm hyper-parameters and a challenging drug discovery problem. In order to accommodate the real-world problem of high dimensional data, we propose a modification to KMBBO by combining it with compressed sensing to project the optimization into a lower dimensional subspace. We demonstrate empirically that this 2-step method is competitive with algorithms where no dimensionality reduction has taken place. K-Means Hadoop MapReduce(KM-HMR) ➘ “K‑Means Modified Inter and Intra Clustering” K‑Means Modified Inter and Intra Clustering(KM-I2C) Big data has become popular for processing, storing and managing massive volumes of data. The clustering of datasets has become a challenging issue in the field of big data analytics. The K-means algorithm is best suited for finding similarities between entities based on distance measures with small datasets. Existing clustering algorithms require scalable solutions to manage large datasets. This study presents two approaches to the clustering of large datasets using MapReduce. The first approach, K-Means Hadoop MapReduce (KM-HMR), focuses on the MapReduce implementation of standard K-means. The second approach enhances the quality of clusters to produce clusters with maximum intra-cluster and minimum inter-cluster distances for large datasets. The results of the proposed approaches show significant improvements in the efficiency of clustering in terms of execution times. Experiments conducted on standard K-means and proposed solutions show that the KM-I2C approach is both effective and efficient. K-Means-Generative Adversarial Model(KM-GAN) Generative Adversarial Networks (GANs) have achieved great success in generating realistic images. Most of these are conditional models, although acquisition of class labels is expensive and time-consuming in practice. To reduce the dependence on labeled data, we propose an un-conditional generative adversarial model, called K-Means-generative adversarial model (KM-GAN), which incorporates the idea of updating centers in K-Means into GANs. Specifically, we redesign the framework of GANs by applying K-Means on the features extracted from the discriminator. With obtained labels from K-Means, we propose new objective functions from the perspective of deep metric learning (DML). Distinct from previous works, the discriminator is treated as a feature extractor rather than a classifier in KM-GAN, meanwhile utilization of K-Means makes features of the discriminator more representative. Experiments are conducted on various datasets, such as MNIST, Fashion-10, CIFAR-10 and CelebA, and show that the quality of samples generated by KM-GAN is comparable to some conditional generative adversarial models. k-meansNet In this paper, we study how to make clustering benefiting from differentiable programming whose basic idea is treating the neural network as a language instead of a machine learning method. To this end, we recast the vanilla $k$-means as a novel feedforward neural network in an elegant way. Our contribution is two-fold. On the one hand, the proposed \textit{k}-meansNet is a neural network implementation of the vanilla \textit{k}-means, which enjoys four advantages highly desired, i.e., robustness to initialization, fast inference speed, the capability of handling new coming data, and provable convergence. On the other hand, this work may provide novel insights into differentiable programming. More specifically, most existing differentiable programming works unroll an \textbf{optimizer} as a \textbf{recurrent neural network}, namely, the neural network is employed to solve an existing optimization problem. In contrast, we reformulate the \textbf{objective function} of \textit{k}-means as a \textbf{feedforward neural network}, namely, we employ the neural network to describe a problem. In such a way, we advance the boundary of differentiable programming by treating the neural network as from an alternative optimization approach to the problem formulation. Extensive experimental studies show that our method achieves promising performance comparing with 12 clustering methods on some challenging datasets. k-medoids The k-medoids algorithm is a clustering algorithm related to the k-means algorithm and the medoidshift algorithm. Both the k-means and k-medoids algorithms are partitional (breaking the dataset up into groups) and both attempt to minimize the distance between points labeled to be in a cluster and a point designated as the center of that cluster. In contrast to the k-means algorithm, k-medoids chooses datapoints as centers (medoids or exemplars) and works with an arbitrary matrix of distances between datapoints instead of l2. This method was proposed in 1987 for the work with l1 norm and other distances. k-mer The term k-mer typically refers to all the possible substrings, of length k, that are contained in a string. In Computational genomics, k-mers refer to all the possible subsequences (of length k) from a read obtained through DNA Sequencing. The amount of k-mers possible given a string of length, L, is L-k+1 whilst the number of possible k-mers given n possibilities (4 in the case of DNA e.g. ACTG) is n^{k}. K-mers are typically used during Sequence assembly, but can also be used in Sequence alignment. ➘ “n-gram” kmer km-means The $k$-means algorithm is the most popular nonparametric clustering method in use, but cannot generally be applied to data sets with missing observations. The usual practice with such data sets is to either impute the values under an assumption of a missing-at-random mechanism or to ignore the incomplete records, and then to use the desired clustering method. We develop an efficient version of the $k$-means algorithm that allows for clustering cases where not all the features have observations recorded. Our extension is called $k_m$-means and reduces to the $k$-means algorithm when all records are complete. We also provide strategies to initialize our algorithm and to estimate the number of groups in the data set. Illustrations and simulations demonstrate the efficacy of our approach in a variety of settings and patterns of missing data. Our methods are also applied to the clustering of gamma-ray bursts and to the analysis of activation images obtained from a functional Magnetic Resonance Imaging experiment. K-Modes The k-means algorithm is well known for its efficiency in clustering large data sets. However, working only on numeric values prohibits it from being used to cluster real world data containing categorical values. In this paper we present two algorithms which extend the k-means algorithm to categorical domains and domains with mixed numeric and categorical values. The k-modes algorithm uses a simple matching dissimilarity measure to deal with categorical objects, replaces the means of clusters with modes, and uses a frequency-based method to update modes in the clustering process to minimise the clustering cost function. With these extensions the k-modes algorithm enables the clustering of categorical data in a fashion similar to k-means. The k-prototypes algorithm, through the definition of a combined dissimilarity measure, further integrates the k-means and k-modes algorithms to allow for clustering objects described by mixed numeric and categorical attributes. https://…/k-modes https://…/kmodes Knapsack Problem The knapsack problem or rucksack problem is a problem in combinatorial optimization: Given a set of items, each with a weight and a value, determine the number of each item to include in a collection so that the total weight is less than or equal to a given limit and the total value is as large as possible. It derives its name from the problem faced by someone who is constrained by a fixed-size knapsack and must fill it with the most valuable items. The problem often arises in resource allocation where there are financial constraints and is studied in fields such as combinatorics, computer science, complexity theory, cryptography, applied mathematics, and daily fantasy sports. The knapsack problem has been studied for more than a century, with early works dating as far back as 1897. The name ‘knapsack problem’ dates back to the early works of mathematician Tobias Dantzig (1884-1956), and refers to the commonplace problem of packing your most valuable or useful items without overloading your luggage. Knative Knative is a new open source project started by engineers from Google, Pivotal, and other industry leaders. It’s a collection of components that extend Kubernetes. It includes three major parts: Serving, Build, and Eventing. How to use Knative to deploy a Serverless Application on Kubernetes k-nearest neighbors(kNN) In pattern recognition, the k-Nearest Neighbors algorithm (or k-NN for short) is a non-parametric method used for classification and regression. In both cases, the input consists of the k closest training examples in the feature space. The output depends on whether k-NN is used for classification or regression: 1. In k-NN classification, the output is a class membership. An object is classified by a majority vote of its neighbors, with the object being assigned to the class most common among its k nearest neighbors (k is a positive integer, typically small). If k = 1, then the object is simply assigned to the class of that single nearest neighbor. 2. In k-NN regression, the output is the property value for the object. This value is the average of the values of its k nearest neighbors. k-NN is a type of instance-based learning, or lazy learning, where the function is only approximated locally and all computation is deferred until classification. The k-NN algorithm is among the simplest of all machine learning algorithms. K-Nearest Oracles Borderline(KNORA-B) Dynamic Ensemble Selection (DES) techniques aim to select locally competent classifiers for the classification of each new test sample. Most DES techniques estimate the competence of classifiers using a given criterion over the region of competence of the test sample (its the nearest neighbors in the validation set). The K-Nearest Oracles Eliminate (KNORA-E) DES selects all classifiers that correctly classify all samples in the region of competence of the test sample, if such classifier exists, otherwise, it removes from the region of competence the sample that is furthest from the test sample, and the process repeats. When the region of competence has samples of different classes, KNORA-E can reduce the region of competence in such a way that only samples of a single class remain in the region of competence, leading to the selection of locally incompetent classifiers that classify all samples in the region of competence as being from the same class. In this paper, we propose two DES techniques: K-Nearest Oracles Borderline (KNORA-B) and K-Nearest Oracles Borderline Imbalanced (KNORA-BI). KNORA-B is a DES technique based on KNORA-E that reduces the region of competence but maintains at least one sample from each class that is in the original region of competence. KNORA-BI is a variation of KNORA-B for imbalance datasets that reduces the region of competence but maintains at least one minority class sample if there is any in the original region of competence. Experiments are conducted comparing the proposed techniques with 19 DES techniques from the literature using 40 datasets. The results show that the proposed techniques achieved interesting results, with KNORA-BI outperforming state-of-art techniques. K-Nearest Oracles Borderline Imbalanced(KNORA-BI) Dynamic Ensemble Selection (DES) techniques aim to select locally competent classifiers for the classification of each new test sample. Most DES techniques estimate the competence of classifiers using a given criterion over the region of competence of the test sample (its the nearest neighbors in the validation set). The K-Nearest Oracles Eliminate (KNORA-E) DES selects all classifiers that correctly classify all samples in the region of competence of the test sample, if such classifier exists, otherwise, it removes from the region of competence the sample that is furthest from the test sample, and the process repeats. When the region of competence has samples of different classes, KNORA-E can reduce the region of competence in such a way that only samples of a single class remain in the region of competence, leading to the selection of locally incompetent classifiers that classify all samples in the region of competence as being from the same class. In this paper, we propose two DES techniques: K-Nearest Oracles Borderline (KNORA-B) and K-Nearest Oracles Borderline Imbalanced (KNORA-BI). KNORA-B is a DES technique based on KNORA-E that reduces the region of competence but maintains at least one sample from each class that is in the original region of competence. KNORA-BI is a variation of KNORA-B for imbalance datasets that reduces the region of competence but maintains at least one minority class sample if there is any in the original region of competence. Experiments are conducted comparing the proposed techniques with 19 DES techniques from the literature using 40 datasets. The results show that the proposed techniques achieved interesting results, with KNORA-BI outperforming state-of-art techniques. KNet Knet (pronounced ‘kay-net’) is the Koç University machine learning framework implemented in Julia, a high-level, high-performance, dynamic programming language. Unlike gradient generating compilers like Theano and TensorFlow which restrict users into a modeling mini-language, Knet allows models to be defined by just describing their forward computation in plain Julia, allowing the use of loops, conditionals, recursion, closures, tuples, dictionaries, array indexing, concatenation and other high level language features. High performance is achieved by combining automatic differentiation of most of Julia with efficient GPU kernels and memory management. Several examples and benchmarks are provided to demonstrate that GPU support and automatic differentiation of a high level language are sufficient for concise definition and efficient training of sophisticated models. GitXiv Knockoff Filter In many fields of science, we observe a response variable together with a large number of potential explanatory variables, and would like to be able to discover which variables are truly associated with the response. At the same time, we need to know that the false discovery rate (FDR) – the expected fraction of false discoveries among all discoveries – is not too high, in order to assure the scientist that most of the discoveries are indeed true and replicable. This paper introduces the knockoff filter, a new variable selection procedure controlling the FDR in the statistical linear model whenever there are at least as many observations as variables. This method achieves exact FDR control in finite sample settings no matter the design or covariates, the number of variables in the model, and the amplitudes of the unknown regression coefficients, and does not require any knowledge of the noise level. As the name suggests, the method operates by manufacturing knockoff variables that are cheap – their construction does not require any new data – and are designed to mimic the correlation structure found within the existing variables, in a way that allows for accurate FDR control, beyond what is possible with permutation-based methods. The method of knockoffs is very general and flexible, and can work with a broad class of test statistics. Knockoff Net Machine Learning (ML) models are increasingly deployed in the wild to perform a wide range of tasks. In this work, we ask to what extent can an adversary steal functionality of such ‘victim’ models based solely on blackbox interactions: image in, predictions out. In contrast to prior work, we present an adversary lacking knowledge of train/test data used by the model, its internals, and semantics over model outputs. We formulate model functionality stealing as a two-step approach: (i) querying a set of input images to the blackbox model to obtain predictions; and (ii) training a ‘knockoff’ with queried image-prediction pairs. We make multiple remarkable observations: (a) querying random images from a different distribution than that of the blackbox training data results in a well-performing knockoff; (b) this is possible even when the knockoff is represented using a different architecture; and (c) our reinforcement learning approach additionally improves query sample efficiency in certain settings and provides performance gains. We validate model functionality stealing on a range of datasets and tasks, as well as on a popular image analysis API where we create a reasonable knockoff for as little as $30. K-Norm Gradient(KNG) This paper presents a new mechanism for producing sanitized statistical summaries that achieve \emph{differential privacy}, called the \emph{K-Norm Gradient} Mechanism, or KNG. This new approach maintains the strong flexibility of the exponential mechanism, while achieving the powerful utility performance of objective perturbation. KNG starts with an inherent objective function (often an empirical risk), and promotes summaries that are close to minimizing the objective by weighting according to how far the gradient of the objective function is from zero. Working with the gradient instead of the original objective function allows for additional flexibility as one can penalize using different norms. We show that, unlike the exponential mechanism, the noise added by KNG is asymptotically negligible compared to the statistical error for many problems. In addition to theoretical guarantees on privacy and utility, we confirm the utility of KNG empirically in the settings of linear and quantile regression through simulations. KnowBias We introduce KnowBias, a system for detecting the degree of political bias in textual content such as social media posts and news articles. In the space of scalable text classification, a common problem is domain mismatch, where easily accessible training data (i.e., tweets) does not correspond in format to the desired testing domain (i.e., longer form article content). While universal text encoders such as word or sentence embeddings could be leveraged to train target agnostic classifiers, such schemes result in poor performance on long-form articles. Our key insight is that long-form articles are a mix of neutral and political sentences, while tweets are concentrated with opinion. We propose a two-step classification system that first automatically filters out neutral sentences from the input text document at evaluation time, and then the resulting text is input into a polarity classifier. We evaluate our two-step approach using a variety of test suites, including a set of tweets and long-form articles where annotations were crowd-sourced to decrease label noise, measuring accuracy and Spearman-rho rank correlation. In practice, KnowBias achieves a high accuracy of 86% (rho = 0.65) on these tweets and 75% (rho = 0.69) on long-form articles. Knowledge Aided Reader(KAR) To apply general knowledge to machine reading comprehension (MRC), we propose an innovative MRC approach, which consists of a WordNet-based data enrichment method and an MRC model named as Knowledge Aided Reader (KAR). The data enrichment method uses the semantic relations of WordNet to extract semantic level inter-word connections from each passage-question pair in the MRC dataset, and allows us to control the amount of the extraction results by setting a hyper-parameter. KAR uses the extraction results of the data enrichment method as explicit knowledge to assist the prediction of answer spans. According to the experimental results, the single model of KAR achieves an Exact Match (EM) of$72.4$and an F1 Score of$81.1$on the development set of SQuAD, and more importantly, by applying different settings in the data enrichment method to change the amount of the extraction results, there is a$2\%variation in the resulting performance of KAR, which implies that the explicit knowledge provided by the data enrichment method plays an effective role in the training of KAR. Knowledge and-or Graph(AOG) This paper focuses on semantic task planning, i.e., predicting a sequence of actions toward accomplishing a specific task under a certain scene, which is a new problem in computer vision research. The primary challenges are how to model task-specific knowledge and how to integrate this knowledge into the learning procedure. In this work, we propose training a recurrent long short-term memory (LSTM) network to address this problem, i.e., taking a scene image (including pre-located objects) and the specified task as input and recurrently predicting action sequences. However, training such a network generally requires large numbers of annotated samples to cover the semantic space (e.g., diverse action decomposition and ordering). To overcome this issue, we introduce a knowledge and-or graph (AOG) for task description, which hierarchically represents a task as atomic actions. With this AOG representation, we can produce many valid samples (i.e., action sequences according to common sense) by training another auxiliary LSTM network with a small set of annotated samples. Furthermore, these generated samples (i.e., task-oriented action sequences) effectively facilitate training of the model for semantic task planning. In our experiments, we create a new dataset that contains diverse daily tasks and extensively evaluate the effectiveness of our approach. Knowledge as a Service(KaaS) In this paper, we introduce and explore a new computing paradigm we call knowledge as a service, in which a knowledge service provider, via its knowledge server, answers queries presented by some knowledge consumers. The knowledge server’s answers are based on knowledge models that may be expensive or impossible to obtain for the knowledge consumers. Knowledge as a Service Actionable Knowledge As A Service (AKAAS) Knowledge Authoring Logic Machine(KALM) Knowledge representation and reasoning (KRR) is one of the key areas in artificial intelligence (AI) field. It is intended to represent the world knowledge in formal languages (e.g., Prolog, SPARQL) and then enhance the expert systems to perform querying and inference tasks. Currently, constructing large scale knowledge bases (KBs) with high quality is prohibited by the fact that the construction process requires many qualified knowledge engineers who not only understand the domain-specific knowledge but also have sufficient skills in knowledge representation. Unfortunately, qualified knowledge engineers are in short supply. Therefore, it would be very useful to build a tool that allows the user to construct and query the KB simply via text. Although there is a number of systems developed for knowledge extraction and question answering, they mainly fail in that these system don’t achieve high enough accuracy whereas KRR is highly sensitive to erroneous data. In this thesis proposal, I will present Knowledge Authoring Logic Machine (KALM), a rule-based system which allows the user to author knowledge and query the KB in text. The experimental results show that KALM achieved superior accuracy in knowledge authoring and question answering as compared to the state-of-the-art systems. Knowledge Base A knowledge base (KB) is a technology used to store complex structured and unstructured information used by a computer system. The initial use of the term was in connection with expert systems which were the first knowledge-based systems. The original use of the term knowledge-base was to describe one of the two sub-systems of a knowledge-based system. A knowledge-based system consists of a knowledge-base that represents facts about the world and an inference engine that can reason about those facts and use rules and other forms of logic to deduce new facts or highlight inconsistencies. Knowledge Base LSTM(KBLSTM) This paper focuses on how to take advantage of external knowledge bases (KBs) to improve recurrent neural networks for machine reading. Traditional methods that exploit knowledge from KBs encode knowledge as discrete indicator features. Not only do these features generalize poorly, but they require task-specific feature engineering to achieve good performance. We propose KBLSTM, a novel neural model that leverages continuous representations of KBs to enhance the learning of recurrent neural networks for machine reading. To effectively integrate background knowledge with information from the currently processed text, our model employs an attention mechanism with a sentinel to adaptively decide whether to attend to background knowledge and which information from KBs is useful. Experimental results show that our model achieves accuracies that surpass the previous state-of-the-art results for both entity extraction and event extraction on the widely used ACE2005 dataset. Knowledge Based end-to-end Memory Network(KOBE) End-to-end dialog systems have become very popular because they hold the promise of learning directly from human to human dialog interaction. Retrieval and Generative methods have been explored in this area with mixed results. A key element that is missing so far, is the incorporation of a-priori knowledge about the task at hand. This knowledge may exist in the form of structured or unstructured information. As a first step towards this direction, we present a novel approach, Knowledge based end-to-end memory networks (KB-memN2N), which allows special handling of named entities for goal-oriented dialog tasks. We present results on two datasets, DSTC6 challenge dataset and dialog bAbI tasks. KnOwledge Based pEronalized Product Description Generation Model Quality product descriptions are critical for providing competitive customer experience in an E-commerce platform. An accurate and attractive description not only helps customers make an informed decision but also improves the likelihood of purchase. However, crafting a successful product description is tedious and highly time-consuming. Due to its importance, automating the product description generation has attracted considerable interests from both research and industrial communities. Existing methods mainly use templates or statistical methods, and their performance could be rather limited. In this paper, we explore a new way to generate the personalized product description by combining the power of neural networks and knowledge base. Specifically, we propose a KnOwledge Based pEronalized (or KOBE) product description generation model in the context of E-commerce. In KOBE, we extend the encoder-decoder framework, the Transformer, to a sequence modeling formulation using self-attention. In order to make the description both informative and personalized, KOBE considers a variety of important factors during text generation, including product aspects, user categories, and knowledge base, etc. Experiments on real-world datasets demonstrate that the proposed method out-performs the baseline on various metrics. KOBE can achieve an improvement of 9.7% over state-of-the-arts in terms of BLEU. We also present several case studies as the anecdotal evidence to further prove the effectiveness of the proposed approach. The framework has been deployed in Taobao, the largest online E-commerce platform in China. Knowledge Compilation Knowledge compilation is a family of approaches for addressing the intractability of a number of artificial intelligence problems. A propositional model is compiled in an off-line phase in order to support some queries in polytime. Many ways of compiling a propositional models exist. Among others: NNF, DNNF, d-DNNF, BDD, SDD, MDD, DNF and CNF. Different compiled representations have different properties. The three main properties are: • The compactness of the representation • The queries that are supported in polytime • The transformations of the representations that can be performed in polytime Knowledge Discovery(KD / KDD) Knowledge discovery describes the process of automatically searching large volumes of data for patterns that can be considered knowledge about the data. It is often described as deriving knowledge from the input data. Knowledge discovery developed out of the data mining domain, and is closely related to it both in terms of methodology and terminology. The most well-known branch of data mining is knowledge discovery, also known as knowledge discovery in databases (KDD). Just as many other forms of knowledge discovery it creates abstractions of the input data. The knowledge obtained through the process may become additional data that can be used for further usage and discovery. KnOwledge Discovery by Accuracy Maximization(KODAMA) Here we describe KODAMA (knowledge discovery by accuracy maximization), an unsupervised and semisupervised learning algorithm that performs feature extraction from noisy and high-dimensional data. Unlike other data mining methods, the peculiarity of KODAMA is that it is driven by an integrated procedure of cross-validation of the results. The discovery of a local manifold’s topology is led by a classifier through a Monte Carlo procedure of maximization of cross-validated predictive accuracy. Briefly, our approach differs from previous methods in that it has an integrated procedure of validation of the results. In this way, the method ensures the highest robustness of the obtained solution. http://www.kodama-project.com KODAMA Knowledge Distillation Knowledge distillation (KD) consists of transferring knowledge from one machine learning model (the teacher}) to another (the student). Commonly, the teacher is a high-capacity model with formidable performance, while the student is more compact. By transferring knowledge, one hopes to benefit from the student’s compactness. Knowledge Extraction Knowledge extraction is the creation of knowledge from structured (relational databases, XML) and unstructured (text, documents, images) sources. The resulting knowledge needs to be in a machine-readable and machine-interpretable format and must represent knowledge in a manner that facilitates inferencing. Although it is methodically similar to information extraction (NLP) and ETL (data warehouse), the main criteria is that the extraction result goes beyond the creation of structured information or the transformation into a relational schema. It requires either the reuse of existing formal knowledge (reusing identifiers or ontologies) or the generation of a schema based on the source data. Knowledge Graph The Knowledge Graph is a knowledge base used by Google to enhance its search engine’s search results with semantic-search information gathered from a wide variety of sources. Knowledge Graph display was added to Google’s search engine in 2012, starting in the United States, having been announced on May 16, 2012. It provides structured and detailed information about the topic in addition to a list of links to other sites. The goal is that users would be able to use this information to resolve their query without having to navigate to other sites and assemble the information themselves. http://…/google-launches-knowledge-graph-121585 Knowledge Graph Attention Network(KGAT) To provide more accurate, diverse, and explainable recommendation, it is compulsory to go beyond modeling user-item interactions and take side information into account. Traditional methods like factorization machine (FM) cast it as a supervised learning problem, which assumes each interaction as an independent instance with side information encoded. Due to the overlook of the relations among instances or items (e.g., the director of a movie is also an actor of another movie), these methods are insufficient to distill the collaborative signal from the collective behaviors of users. In this work, we investigate the utility of knowledge graph (KG), which breaks down the independent interaction assumption by linking items with their attributes. We argue that in such a hybrid structure of KG and user-item graph, high-order relations — which connect two items with one or multiple linked attributes — are an essential factor for successful recommendation. We propose a new method named Knowledge Graph Attention Network (KGAT) which explicitly models the high-order connectivities in KG in an end-to-end fashion. It recursively propagates the embeddings from a node’s neighbors (which can be users, items, or attributes) to refine the node’s embedding, and employs an attention mechanism to discriminate the importance of the neighbors. Our KGAT is conceptually advantageous to existing KG-based recommendation methods, which either exploit high-order relations by extracting paths or implicitly modeling them with regularization. Empirical results on three public benchmarks show that KGAT significantly outperforms state-of-the-art methods like Neural FM and RippleNet. Further studies verify the efficacy of embedding propagation for high-order relation modeling and the interpretability benefits brought by the attention mechanism. Knowledge Graph Completion(KGC) Knowledge Graphs (KGs) have been applied to many tasks including Web search, link prediction, recommendation, natural language processing, and entity linking. However, most KGs are far from complete and are growing at a rapid pace. To address these problems, Knowledge Graph Completion (KGC) has been proposed to improve KGs by filling in its missing connections. Unlike existing methods which hold a closed-world assumption, i.e., where KGs are fixed and new entities cannot be easily added, in the present work we relax this assumption and propose a new open-world KGC task. As a first attempt to solve this task we introduce an open-world KGC model called ConMask. This model learns embeddings of the entity’s name and parts of its text-description to connect unseen entities to the KG. To mitigate the presence of noisy text descriptions, ConMask uses a relationship-dependent content masking to extract relevant snippets and then trains a fully convolutional neural network to fuse the extracted snippets with entities in the KG. Experiments on large data sets, both old and new, show that ConMask performs well in the open-world KGC task and even outperforms existing KGC models on the standard closed-world KGC task. Knowledge Graph Convolutional Network(KGCN) Knowledge graphs capture interlinked information between entities and they represent an attractive source of structured information that can be harnessed for recommender systems. However, existing recommender engines use knowledge graphs by manually designing features, do not allow for end-to-end training, or provide poor scalability. Here we propose Knowledge Graph Convolutional Networks (KGCN), an end-to-end trainable framework that harnesses item relationships captured by the knowledge graph to provide better recommendations. Conceptually, KGCN computes user-specific item embeddings by first applying a trainable function that identifies important knowledge graph relations for a given user and then transforming the knowledge graph into a user-specific weighted graph. Then, KGCN applies a graph convolutional neural network that computes an embedding of an item node by propagating and aggregating knowledge graph neighborhood information. Moreover, to provide better inductive bias KGCN uses label smoothness (LS), which provides regularization over edge weights and we prove that it is equivalent to label propagation scheme on a graph. Finally, We unify KGCN and LS regularization, and present a scalable minibatch implementation for KGCN-LS model. Experiments show that KGCN-LS outperforms strong baselines in four datasets. KGCN-LS also achieves great performance in sparse scenarios and is highly scalable with respect to the knowledge graph size. Knowledge Graph Embedding(KGE) Commonsense knowledge is paramount to enable intelligent systems. Typically, it is characterized as being implicit and ambiguous, hindering thereby the automation of its acquisition. To address these challenges, this paper presents semantically enhanced models to enable reasoning through resolving part of commonsense ambiguity. The proposed models enhance in a knowledge graph embedding (KGE) framework for knowledge base completion. Experimental results show the effectiveness of the new semantic models in commonsense reasoning. Knowledge graph embedding (KGE) aims to find low dimensional vector representations of entities and relations so that their similarities can be quantized. Scoring functions (SFs), which are used to build a model to measure the similarity between entities based on a given relation, have developed as the crux of KGE. AutoKGE: Searching Scoring Functions for Knowledge Graph Embedding Knowledge Graph Embedding bi-Vector Model Knowledge graph embedding (KGE) models have been proposed to improve the performance of knowledge graph reasoning. However, there is a general phenomenon in most of KGEs, as the training progresses, the symmetric relations tend to zero vector, if the symmetric triples ratio is high enough in the dataset. This phenomenon causes subsequent tasks, e.g. link prediction etc., of symmetric relations to fail. The root cause of the problem is that KGEs do not utilize the semantic information of symmetric relations. We propose Knowledge graph embedding bi-vector models, which represent the symmetric relations as vector pair, significantly increasing the processing capability of the symmetry relations. We generate the benchmark datasets based on FB15k and WN18 by completing the symmetric relation triples to verify models. The experiment results of our models clearly affirm the effectiveness and superiority of our models against baseline. Knowledge Intensive Business Services(KIBS) Knowledge Intensive Business Services (commonly known as KIBS) are services and business operations heavily reliant on professional knowledge. They are mainly concerned with providing knowledge-intensive support for the business processes of other organizations. As a result, their employment structures are heavily weighted towards scientists, engineers, and other experts. It is common to distinguish between T-KIBS, (those with high use of scientific and technological knowledge – R&D services, engineering services, computer services, etc.), and P-KIBS, who are more traditional professional services – legal, accountancy, and many management consultancy and marketing services. These services either supply products which are themselves primary sources of information and knowledge, or use their specialist knowledge to produce services which facilitate their clients own activities. Consequently, KIBS usually have other businesses as their main clients, though the public sector and sometimes voluntary organisations can be important customers, and to some extent households will feature as consumers of, for instance, legal and accountancy services. Knowledge Intensive Organization(KIO) ➘ “Knowledge Intensive Business Services” Book: Management of Knowledge-Intensive Organizations Knowledge Into the Network(KINN) The promise of ANNs to automatically discover and extract useful features/patterns from data without dwelling on domain expertise although seems highly promising but comes at the cost of high reliance on large amount of accurately labeled data, which is often hard to acquire and formulate especially in time-series domains like anomaly detection, natural disaster management, predictive maintenance and healthcare. As these networks completely rely on data and ignore a very important modality i.e. expert, they are unable to harvest any benefit from the expert knowledge, which in many cases is very useful. In this paper, we try to bridge the gap between these data driven and expert knowledge based systems by introducing a novel framework for incorporating expert knowledge into the network (KINN). Integrating expert knowledge into the network has three key advantages: (a) Reduction in the amount of data needed to train the model, (b) provision of a lower bound on the performance of the resulting classifier by obtaining the best of both worlds, and (c) improved convergence of model parameters (model converges in smaller number of epochs). Although experts are extremely good in solving different tasks, there are some trends and patterns, which are usually hidden only in the data. Therefore, KINN employs a novel residual knowledge incorporation scheme, which can automatically determine the quality of the predictions made by the expert and rectify it accordingly by learning the trends/patterns from data. Specifically, the method tries to use information contained in one modality to complement information missed by the other. We evaluated KINN on a real world traffic flow prediction problem. KINN significantly superseded performance of both the expert and as well as the base network (LSTM in this case) when evaluated in isolation, highlighting its superiority for the task. Knowledge Management Knowledge management (KM) is the process of capturing, developing, sharing, and effectively using organisational knowledge. It refers to a multi-disciplined approach to achieving organisational objectives by making the best use of knowledge. An established discipline since 1991 (see Nonaka 1991), KM includes courses taught in the fields of business administration, information systems, management, and library and information sciences (Alavi & Leidner 1999). More recently, other fields have started contributing to KM research; these include information and media, computer science, public health, and public policy. Columbia University, Kent State University and the University of Haifa offer dedicated Master of Science degrees in Knowledge Management. Many large companies, public institutions and non-profit organisations have resources dedicated to internal KM efforts, often as a part of their business strategy, information technology, or human resource management departments. Several consulting companies provide strategy and advice regarding KM to these organisations. Knowledge management efforts typically focus on organisational objectives such as improved performance, competitive advantage, innovation, the sharing of lessons learned, integration and continuous improvement of the organisation. KM efforts overlap with organisational learning and may be distinguished from that by a greater focus on the management of knowledge as a strategic asset and a focus on encouraging the sharing of knowledge. It is an enabler of organisational learning. Knowledge of Preconditions Principle(KoP) The Knowledge of Preconditions principle (KoP) is proposed as a widely applicable connection between knowledge and action in multi-agent systems. Roughly speaking, it asserts that if some condition is a necessary condition for performing a given action A, then knowing that this condition holds is also a necessary condition for performing A. Since the specifications of tasks often involve necessary conditions for actions, the KoP principle shows that such specifications induce knowledge preconditions for the actions. Distributed protocols or multi-agent plans that satisfy the specifications must ensure that this knowledge be attained, and that it is detected by the agents as a condition for action. The knowledge of preconditions principle is formalised in the runs and systems framework, and is proven to hold in a wide class of settings. Well-known connections between knowledge and coordinated action are extended and shown to derive directly from the KoP principle: a ‘common knowledge of preconditions’ principle is established showing that common knowledge is a necessary condition for performing simultaneous actions, and a ‘nested knowledge of preconditions’ principle is proven, showing that coordinating actions to be performed in linear temporal order requires a corresponding form of nested knowledge. Knowledge Representation and Reasoning(KRR) Knowledge representation and reasoning (KRR) is one of the key areas in artificial intelligence (AI) field. It is intended to represent the world knowledge in formal languages (e.g., Prolog, SPARQL) and then enhance the expert systems to perform querying and inference tasks. Currently, constructing large scale knowledge bases (KBs) with high quality is prohibited by the fact that the construction process requires many qualified knowledge engineers who not only understand the domain-specific knowledge but also have sufficient skills in knowledge representation. Unfortunately, qualified knowledge engineers are in short supply. Therefore, it would be very useful to build a tool that allows the user to construct and query the KB simply via text. Although there is a number of systems developed for knowledge extraction and question answering, they mainly fail in that these system don’t achieve high enough accuracy whereas KRR is highly sensitive to erroneous data. In this thesis proposal, I will present Knowledge Authoring Logic Machine (KALM), a rule-based system which allows the user to author knowledge and query the KB in text. The experimental results show that KALM achieved superior accuracy in knowledge authoring and question answering as compared to the state-of-the-art systems. Knowledge Representation Learning(KRL) Knowledge representation learning (KRL) aims to represent entities and relations in knowledge graph in low-dimensional semantic space, which have been widely used in massive knowledge-driven tasks. In this article, we introduce the reader to the motivations for KRL, and overview existing approaches for KRL. Afterwards, we extensively conduct and quantitative comparison and analysis of several typical KRL methods on three evaluation tasks of knowledge acquisition including knowledge graph completion, triple classification, and relation extraction. We also review the real-world applications of KRL, such as language modeling, question answering, information retrieval, and recommender systems. Finally, we discuss the remaining challenges and outlook the future directions for KRL. Defeats GAN: A Simpler Model Outperforms in Knowledge Representation Learning Knowledge Space Theory Knowledge space theory by Doignon and Falmagne (1999) is a set- and order-theoretical framework which proposes mathematical formalisms to operationalize knowledge structures in a particular domain. kstIO Knowledge Tracing Machine Knowledge tracing is a sequence prediction problem where the goal is to predict the outcomes of students over questions as they are interacting with a learning platform. By tracking the evolution of the knowledge of some student, one can optimize instruction. Existing methods are either based on temporal latent variable models, or factor analysis with temporal features. We here show that factorization machines (FMs), a model for regression or classification, encompass several existing models in the educational literature as special cases, notably additive factor model, performance factor model, and multidimensional item response theory. We show, using several real datasets of tens of thousands of users and items, that FMs can estimate student knowledge accurately and fast even when student data is sparsely observed, and handle side information such as multiple knowledge components and number of attempts at item or skill level. Our approach allows to fit student models of higher dimension than existing models, and provides a testbed to try new combinations of features in order to improve existing models. Knowledge Transfer Adversarial Network(KTAN) To reduce the large computation and storage cost of a deep convolutional neural network, the knowledge distillation based methods have pioneered to transfer the generalization ability of a large (teacher) deep network to a light-weight (student) network. However, these methods mostly focus on transferring the probability distribution of the softmax layer in a teacher network and thus neglect the intermediate representations. In this paper, we propose a knowledge transfer adversarial network to better train a student network. Our technique holistically considers both intermediate representations and probability distributions of a teacher network. To transfer the knowledge of intermediate representations, we set high-level teacher feature maps as a target, toward which the student feature maps are trained. Specifically, we arrange a Teacher-to-Student layer for enabling our framework suitable for various student structures. The intermediate representation helps the student network better understand the transferred generalization as compared to the probability distribution only. Furthermore, we infuse an adversarial learning process by employing a discriminator network, which can fully exploit the spatial correlation of feature maps in training a student network. The experimental results demonstrate that the proposed method can significantly improve the performance of a student network on both image classification and object detection tasks. Knowledge Worker Knowledge workers are workers whose main capital is knowledge. Typical examples may include software engineers, doctors, architects, engineers, scientists, public accountants, lawyers, and academics, whose job is to “think for a living”. Knowledge-Augmented Column Network Recently, deep models have been successfully applied in several applications, especially with low-level representations. However, sparse, noisy samples and structured domains (with multiple objects and interactions) are some of the open challenges in most deep models. Column Networks, a deep architecture, can succinctly capture such domain structure and interactions, but may still be prone to sub-optimal learning from sparse and noisy samples. Inspired by the success of human-advice guided learning in AI, especially in data-scarce domains, we propose Knowledge-augmented Column Networks that leverage human advice/knowledge for better learning with noisy/sparse samples. Our experiments demonstrate that our approach leads to either superior overall performance or faster convergence (i.e., both effective and efficient). Knowledge-Augmented Language Model(KALM) Traditional language models are unable to efficiently model entity names observed in text. All but the most popular named entities appear infrequently in text providing insufficient context. Recent efforts have recognized that context can be generalized between entity names that share the same type (e.g., \emph{person} or \emph{location}) and have equipped language models with access to an external knowledge base (KB). Our Knowledge-Augmented Language Model (KALM) continues this line of work by augmenting a traditional model with a KB. Unlike previous methods, however, we train with an end-to-end predictive objective optimizing the perplexity of text. We do not require any additional information such as named entity tags. In addition to improving language modeling performance, KALM learns to recognize named entities in an entirely unsupervised way by using entity type information latent in the model. On a Named Entity Recognition (NER) task, KALM achieves performance comparable with state-of-the-art supervised models. Our work demonstrates that named entities (and possibly other types of world knowledge) can be modeled successfully using predictive learning and training on large corpora of text without any additional information. Knowledge-aware Path Recurrent Network(KPRN) Incorporating knowledge graph into recommender systems has attracted increasing attention in recent years. By exploring the interlinks within a knowledge graph, the connectivity between users and items can be discovered as paths, which provide rich and complementary information to user-item interactions. Such connectivity not only reveals the semantics of entities and relations, but also helps to comprehend a user’s interest. However, existing efforts have not fully explored this connectivity to infer user preferences, especially in terms of modeling the sequential dependencies within and holistic semantics of a path. In this paper, we contribute a new model named Knowledge-aware Path Recurrent Network (KPRN) to exploit knowledge graph for recommendation. KPRN can generate path representations by composing the semantics of both entities and relations. By leveraging the sequential dependencies within a path, we allow effective reasoning on paths to infer the underlying rationale of a user-item interaction. Furthermore, we design a new weighted pooling operation to discriminate the strengths of different paths in connecting a user with an item, endowing our model with a certain level of explainability. We conduct extensive experiments on two datasets about movie and music, demonstrating significant improvements over state-of-the-art solutions Collaborative Knowledge Base Embedding and Neural Factorization Machine. Knowledge-Based Distant Regularization Framework Exploiting the appropriate inductive bias based on the knowledge of data is essential for achieving good performance in statistical machine learning. In practice, however, the domain knowledge of interest often provides information on the relationship of data attributes only distantly, which hinders direct utilization of such domain knowledge in popular regularization methods. In this paper, we propose the knowledge-based distant regularization framework, in which we utilize the distant information encoded in a knowledge graph for regularization of probabilistic model estimation. In particular, we propose to impose prior distributions on model parameters specified by knowledge graph embeddings. As an instance of the proposed framework, we present the factor analysis model with the knowledge-based distant regularization. We show the results of preliminary experiments on the improvement of the generalization capability of such model. Knowledge-Based Long Short Term Memory Learning to solve diagrammatic reasoning (DR) can be a challenging but interesting problem to the computer vision research community. It is believed that next generation pattern recognition applications should be able to simulate human brain to understand and analyze reasoning of images. However, due to the lack of benchmarks of diagrammatic reasoning, the present research primarily focuses on visual reasoning that can be applied to real-world objects. In this paper, we present a diagrammatic reasoning dataset that provides a large variety of DR problems. In addition, we also propose a Knowledge-based Long Short Term Memory (KLSTM) to solve diagrammatic reasoning problems. Our proposed analysis is arguably the first work in this research area. Several state-of-the-art learning frameworks have been used to compare with the proposed KLSTM framework in the present context. Preliminary results indicate that the domain is highly related to computer vision and pattern recognition research with several challenging avenues. Knowledge-Based MRC Machine reading comprehension (MRC) requires reasoning about both the knowledge involved in a document and knowledge about the world. However, existing datasets are typically dominated by questions that can be well solved by context matching, which fail to test this capability. To encourage the progress on knowledge-based reasoning in MRC, we present knowledge-based MRC in this paper, and build a new dataset consisting of 40,047 question-answer pairs. The annotation of this dataset is designed so that successfully answering the questions requires understanding and the knowledge involved in a document. We implement a framework consisting of both a question answering model and a question generation model, both of which take the knowledge extracted from the document as well as relevant facts from an external knowledge base such as Freebase/ProBase/Reverb/NELL. Results show that incorporating side information from external KB improves the accuracy of the baseline question answer system. We compare it with a standard MRC model BiDAF, and also provide the difficulty of the dataset and lay out remaining challenges. Knowledge-Driven Encode, Retrieve, Paraphrase(KERP) Generating long and semantic-coherent reports to describe medical images poses great challenges towards bridging visual and linguistic modalities, incorporating medical domain knowledge, and generating realistic and accurate descriptions. We propose a novel Knowledge-driven Encode, Retrieve, Paraphrase (KERP) approach which reconciles traditional knowledge- and retrieval-based methods with modern learning-based methods for accurate and robust medical report generation. Specifically, KERP decomposes medical report generation into explicit medical abnormality graph learning and subsequent natural language modeling. KERP first employs an Encode module that transforms visual features into a structured abnormality graph by incorporating prior medical knowledge; then a Retrieve module that retrieves text templates based on the detected abnormalities; and lastly, a Paraphrase module that rewrites the templates according to specific cases. The core of KERP is a proposed generic implementation unit—Graph Transformer (GTR) that dynamically transforms high-level semantics between graph-structured data of multiple domains such as knowledge graphs, images and sequences. Experiments show that the proposed approach generates structured and robust reports supported with accurate abnormality description and explainable attentive regions, achieving the state-of-the-art results on two medical report benchmarks, with the best medical abnormality and disease classification accuracy and improved human evaluation performance. Knowledge-routed Deep Q-network(KR-DQN) Beyond current conversational chatbots or task-oriented dialogue systems that have attracted increasing attention, we move forward to develop a dialogue system for automatic medical diagnosis that converses with patients to collect additional symptoms beyond their self-reports and automatically makes a diagnosis. Besides the challenges for conversational dialogue systems (e.g. topic transition coherency and question understanding), automatic medical diagnosis further poses more critical requirements for the dialogue rationality in the context of medical knowledge and symptom-disease relations. Existing dialogue systems (Madotto, Wu, and Fung 2018; Wei et al. 2018; Li et al. 2017) mostly rely on data-driven learning and cannot be able to encode extra expert knowledge graph. In this work, we propose an End-to-End Knowledge-routed Relational Dialogue System (KR-DS) that seamlessly incorporates rich medical knowledge graph into the topic transition in dialogue management, and makes it cooperative with natural language understanding and natural language generation. A novel Knowledge-routed Deep Q-network (KR-DQN) is introduced to manage topic transitions, which integrates a relational refinement branch for encoding relations among different symptoms and symptom-disease pairs, and a knowledge-routed graph branch for topic decision-making. Extensive experiments on a public medical dialogue dataset show our KR-DS significantly beats state-of-the-art methods (by more than 8% in diagnosis accuracy). We further show the superiority of our KR-DS on a newly collected medical dialogue system dataset, which is more challenging retaining original self-reports and conversational data between patients and doctors. Known-class Aware Adaptation(KAA) Existing domain adaptation methods generally assume different domains have the identical label space, which is quite restrict for real-world applications. In this paper, we focus on a more realistic and challenging case of open set domain adaptation. Particularly, in open set domain adaptation, we allow the classes from the source and target domains to be partially overlapped. In this case, the assumption of conventional distribution alignment does not hold anymore, due to the different label spaces in two domains. To tackle this challenge, we propose a new approach coined as Known-class Aware Self-Ensemble (KASE), which is built upon the recently developed self-ensemble model. In KASE, we first introduce a Known-class Aware Recognition (KAR) module to identify the known and unknown classes from the target domain, which is achieved by encouraging a low cross-entropy for known classes and a high entropy based on the source data from the unknown class. Then, we develop a Known-class Aware Adaptation (KAA) module to better adapt from the source domain to the target by reweighing the adaptation loss based on the likeliness to belong to known classes of unlabeled target samples as predicted by KAR. Extensive experiments on multiple benchmark datasets demonstrate the effectiveness of our approach. Known-class Aware Self-Ensemble(KASE) Existing domain adaptation methods generally assume different domains have the identical label space, which is quite restrict for real-world applications. In this paper, we focus on a more realistic and challenging case of open set domain adaptation. Particularly, in open set domain adaptation, we allow the classes from the source and target domains to be partially overlapped. In this case, the assumption of conventional distribution alignment does not hold anymore, due to the different label spaces in two domains. To tackle this challenge, we propose a new approach coined as Known-class Aware Self-Ensemble (KASE), which is built upon the recently developed self-ensemble model. In KASE, we first introduce a Known-class Aware Recognition (KAR) module to identify the known and unknown classes from the target domain, which is achieved by encouraging a low cross-entropy for known classes and a high entropy based on the source data from the unknown class. Then, we develop a Known-class Aware Adaptation (KAA) module to better adapt from the source domain to the target by reweighing the adaptation loss based on the likeliness to belong to known classes of unlabeled target samples as predicted by KAR. Extensive experiments on multiple benchmark datasets demonstrate the effectiveness of our approach. KnowNER KnowNER is a multilingual Named Entity Recognition (NER) system that leverages different degrees of external knowledge. A novel modular framework divides the knowledge into four categories according to the depth of knowledge they convey. Each category consists of a set of features automatically generated from different information sources (such as a knowledge-base, a list of names or document-specific semantic annotations) and is used to train a conditional random field (CRF). Since those information sources are usually multilingual, KnowNER can be easily trained for a wide range of languages. In this paper, we show that the incorporation of deeper knowledge systematically boosts accuracy and compare KnowNER with state-of-the-art NER approaches across three languages (i.e., English, German and Spanish) performing amongst state-of-the art systems in all of them. K-NRM This paper proposes K-NRM, a kernel based neural model for document ranking. Given a query and a set of documents, K-NRM uses a translation matrix that models word-level similarities via word embeddings, a new kernel-pooling technique that uses kernels to extract multi-level soft match features, and a learning-to-rank layer that combines those features into the final ranking score. The whole model is trained end-to-end. The ranking layer learns desired feature patterns from the pairwise ranking loss. The kernels transfer the feature patterns into soft-match targets at each similarity level and enforce them on the translation matrix. The word embeddings are tuned accordingly so that they can produce the desired soft matches. Experiments on a commercial search engine’s query log demonstrate the improvements of K-NRM over prior feature-based and neural-based states-of-the-art, and explain the source of K-NRM’s advantage: Its kernel-guided embedding encodes a similarity metric tailored for matching query words to document words, and provides effective multi-level soft matches. Kolmogorov Distance ➘ “Kolmogorov-Smirnov Test” Kolmogorov-Smirnov Test(KS) In statistics, the Kolmogorov-Smirnov test (K-S test or KS test) is a nonparametric test of the equality of continuous, one-dimensional probability distributions that can be used to compare a sample with a reference probability distribution (one-sample K-S test), or to compare two samples (two-sample K-S test). The Kolmogorov-Smirnov statistic quantifies a distance between the empirical distribution function of the sample and the cumulative distribution function of the reference distribution, or between the empirical distribution functions of two samples. The null distribution of this statistic is calculated under the null hypothesis that the samples are drawn from the same distribution (in the two-sample case) or that the sample is drawn from the reference distribution (in the one-sample case). In each case, the distributions considered under the null hypothesis are continuous distributions but are otherwise unrestricted. The two-sample K-S test is one of the most useful and general nonparametric methods for comparing two samples, as it is sensitive to differences in both location and shape of the empirical cumulative distribution functions of the two samples. The Kolmogorov-Smirnov test can be modified to serve as a goodness of fit test. In the special case of testing for normality of the distribution, samples are standardized and compared with a standard normal distribution. This is equivalent to setting the mean and variance of the reference distribution equal to the sample estimates, and it is known that using these to define the specific reference distribution changes the null distribution of the test statistic: see below. Various studies have found that, even in this corrected form, the test is less powerful for testing normality than the Shapiro-Wilk test or Anderson-Darling test. However, other tests have their own disadvantages. For instance the Shapiro-Wilk test is known not to work well with many ties (many identical values). Konstanz Information Miner(KNIME) KNIME, the Konstanz Information Miner, is an open source data analytics, reporting and integration platform. KNIME integrates various components for machine learning and data mining through its modular data pipelining concept. A graphical user interface allows assembly of nodes for data preprocessing (ETL: Extraction, Transformation, Loading), for modeling and data analysis and visualization. Since 2006, KNIME has been used in pharmaceutical research, but is also used in other areas like CRM customer data analysis, business intelligence and financial data analysis. http://www.knime.org K-optimal Pattern Discovery(KOPD) K-optimal pattern discovery is a data mining technique that provides an alternative to the frequent pattern discovery approach that underlies most association rule learning techniques. Frequent pattern discovery techniques find all patterns for which there are sufficiently frequent examples in the sample data. In contrast, k-optimal pattern discovery techniques find the k patterns that optimize a user-specified measure of interest. The parameter k is also specified by the user. K-optimal Rule Discovery(KORD) K-optimal rule discovery finds the k rules that optimize a user-specified measure of rule value with respect to a set of sample data and user-specified constraints. This approach avoids many limitations of the frequent itemset approach of association rule discovery. This paper presents a scalable algorithm applicable to a wide range of k-optimal rule discovery tasks and demonstrates its efficiency. Korkine Zolotarev(KZ) In mathematics, the goal of lattice basis reduction is given an integer lattice basis as input, to find a basis with short, nearly orthogonal vectors. This is realized using different algorithms, whose running time is usually at least exponential in the dimension of the lattice. k-PDTM Analyzing the sub-level sets of the distance to a compact sub-manifold of R d is a common method in TDA to understand its topology. The distance to measure (DTM) was introduced by Chazal, Cohen-Steiner and M{\’e}rigot in [7] to face the non-robustness of the distance to a compact set to noise and outliers. This function makes possible the inference of the topology of a compact subset of R d from a noisy cloud of n points lying nearby in the Wasserstein sense. In practice, these sub-level sets may be computed using approximations of the DTM such as the q-witnessed distance [10] or other power distance [6]. These approaches lead eventually to compute the homology of unions of n growing balls, that might become intractable whenever n is large. To simultaneously face the two problems of large number of points and noise, we introduce the k-power distance to measure (k-PDTM). This new approximation of the distance to measure may be thought of as a k-coreset based approximation of the DTM. Its sublevel sets consist in union of k-balls, k << n, and this distance is also proved robust to noise. We assess the quality of this approximation for k possibly dramatically smaller than n, for instance k = n 1 3 is proved to be optimal for 2-dimensional shapes. We also provide an algorithm to compute this k-PDTM. K-Pg Many of the most popular scalable data-processing frameworks are fundamentally limited in the generality of computations they can express and efficiently execute. In particular, we observe that systems’ abstractions limit their ability to share and reuse indexed state within and across computations. These limitations result in an inability to express and efficiently implement algorithms in domains where the scales of data call for them most. In this paper, we present the design and implementation of K-Pg, a data-processing framework that provides high-throughput, low-latency incremental view maintenance for a general class of iterative data-parallel computations. This class includes SQL, stratified Datalog with negation and non-monotonic aggregates, and much of graph processing. Our evaluation indicates that K-Pg’s performance is either comparable to, or exceeds, that of specialized systems in multiple domains, while at the same time significantly generalizing their capabilities. k-POD k-POD, a novel method of k-means clustering on partially observed data that employs a majorization-minimization algorithm to identify a clustering that is consistent with the observed data. By bypassing the completely observed data formulation, k-POD retains all information in the data and avoids committing to distributional assumptions on the missingness patterns. kpodclustr K-Prototypes The k-means algorithm is well known for its efficiency in clustering large data sets. However, working only on numeric values prohibits it from being used to cluster real world data containing categorical values. In this paper we present two algorithms which extend the k-means algorithm to categorical domains and domains with mixed numeric and categorical values. The k-modes algorithm uses a simple matching dissimilarity measure to deal with categorical objects, replaces the means of clusters with modes, and uses a frequency-based method to update modes in the clustering process to minimise the clustering cost function. With these extensions the k-modes algorithm enables the clustering of categorical data in a fashion similar to k-means. The k-prototypes algorithm, through the definition of a combined dissimilarity measure, further integrates the k-means and k-modes algorithms to allow for clustering objects described by mixed numeric and categorical attributes. KPTransfer In this paper, we present a novel approach called KPTransfer for improving modeling performance for keypoint detection deep neural networks via domain transfer between different keypoint subsets. This approach is motivated by the notion that rich contextual knowledge can be transferred between different keypoint subsets representing separate domains. In particular, the proposed method takes into account various keypoint subsets/domains by sequentially adding and removing keypoints. Contextual knowledge is transferred between two separate domains via domain transfer. Experiments to demonstrate the efficacy of the proposed KPTransfer approach were performed for the task of human pose estimation on the MPII dataset, with comparisons against random initialization and frozen weight extraction configurations. Experimental results demonstrate the efficacy of performing domain transfer between two different joint subsets resulting in a PCKh improvement of up to 1.1 over random initialization on joints such as wrists and knee in certain joint splits with an overall PCKh improvement of 0.5. Domain transfer from a different set of joints not only results in improved accuracy but also results in faster convergence because of mutual co-adaptations of weights resulting from the contextual knowledge of the pose from a different set of joints. KPynq K-means is a popular but computation-intensive algorithm for unsupervised learning. To address this issue, we present KPynq, a work-efficient triangle-inequality based K-means on FPGA for handling large-size, high-dimension datasets. KPynq leverages an algorithm-level optimization to balance the performance and computation irregularity, and a hardware architecture design to fully exploit the pipeline and parallel processing capability of various FPGAs. In the experiment, KPynq consistently outperforms the CPU-based standard K-means in terms of its speedup (up to 4.2x) and significant energy-efficiency (up to 218x). K-Quantiles Clustering A new cluster analysis method,K$-quantiles clustering, is introduced.$K$-quantiles clustering can be computed by a simple greedy algorithm in the style of the classical Lloyd’s algorithm for$K$-means. It can be applied to large and high-dimensional datasets. It allows for within-cluster skewness and internal variable scaling based on within-cluster variation. Different versions allow for different levels of parsimony and computational efficiency. Although$K$-quantiles clustering is conceived as nonparametric, it can be connected to a fixed partition model of generalized asymmetric Laplace-distributions. The consistency of$K$-quantiles clustering is proved, and it is shown that$K$-quantiles clusters correspond to well separated mixture components in a nonparametric mixture. In a simulation,$K$-quantiles clustering is compared with a number of popular clustering methods with good results. A high-dimensional microarray dataset is clustered by$K$-quantiles. Kraljic Matrix The Kraljic Matrix works by by mapping the profit impact of a product on one axis, and our vulnerability to the supplier’s disappearance on the other. It essentially provides a portfolio management approach to managing an organization’s many suppliers. This enables us to see which relationships are important so we can focus on strengthing these, as well as identifying less important relationships where we might employ traditional supplier management techniques such as offshoring. The Kraljic matrix help us in the first step of supplier management – identifying important suppliers. How you then actually manage those suppliers is up to you. KraljicMatrix Krazy World We consider the problem of exploration in meta reinforcement learning. Two new meta reinforcement learning algorithms are suggested: E-MAML and E-$\text{RL}^2$. Results are presented on a novel environment we call Krazy World’ and a set of maze environments. We show E-MAML and E-$\text{RL}^2$deliver better performance on tasks where exploration is important. K-RelNet As engineered systems expand, become more interdependent, and operate in real-time, reliability assessment is indispensable to support investment and decision making. However, network reliability problems are known to be #P-complete, a computational complexity class largely believed to be intractable. The computational intractability of network reliability motivates our quest for reliable approximations. Based on their theoretical foundations, available methods can be grouped as follows: (i) exact or bounds, (ii) guarantee-less sampling, and (iii) probably approximately correct (PAC). Group (i) is well regarded due to its useful byproducts, but it does not scale in practice. Group (ii) scales well and verifies desirable properties, such as the bounded relative error, but it lacks error guarantees. Group (iii) is of great interest when precision and scalability are required, as it harbors computationally feasible approximation schemes with PAC-guarantees. We give a comprehensive review of classical methods before introducing modern techniques and our developments. We introduce K-RelNet, an extended counting-based estimation method that delivers PAC-guarantees for the K-terminal reliability problem. Then, we test methods’ performance using various benchmark systems. We highlight the range of application of algorithms and provide the foundation for future resilience engineering as it increasingly necessitates methods for uncertainty quantification in complex systems. Kriging In statistics, originally in geostatistics, Kriging or Gaussian process regression is a method of interpolation for which the interpolated values are modeled by a Gaussian process governed by prior covariances, as opposed to a piecewise-polynomial spline chosen to optimize smoothness of the fitted values. Under suitable assumptions on the priors, Kriging gives the best linear unbiased prediction of the intermediate values. Interpolating methods based on other criteria such as smoothness need not yield the most likely intermediate values. The method is widely used in the domain of spatial analysis and computer experiments. The technique is also known as Wiener-Kolmogorov prediction (after Norbert Wiener and Andrey Kolmogorov). The theoretical basis for the method was developed by the French mathematician Georges Matheron based on the Master’s thesis of Danie G. Krige, the pioneering plotter of distance-weighted average gold grades at the Witwatersrand reef complex in South Africa. Krige sought to estimate the most likely distribution of gold based on samples from a few boreholes. The English verb is to krige and the most common noun is Kriging; both are often pronounced with a hard ‘g’, following the pronunciation of the name ‘Krige’. Spatio-Temporal Kriging in R moko Kriging Models In statistics, originally in geostatistics, Kriging or Gaussian process regression is a method of interpolation for which the interpolated values are modeled by a Gaussian process governed by prior covariances, as opposed to a piecewise-polynomial spline chosen to optimize smoothness of the fitted values. Under suitable assumptions on the priors, Kriging gives the best linear unbiased prediction of the intermediate values. Interpolating methods based on other criteria such as smoothness need not yield the most likely intermediate values. The method is widely used in the domain of spatial analysis and computer experiments. The technique is also known as Kolmogorov Wiener prediction. GPareto Kripke Structure A Kripke Structure is a Variation of the Transition System, Originally Proposed by Saul Kripke,[1] Used in Model Checking[2] to Represent the Behavior of a System. It is Basically a Graph Whose Nodes Represent the Reachable States of the System and Whose Edges Represent State Transitions. A Labelling Function Maps Each Node to a set of Properties That Hold in the Corresponding State. Temporal Logics are Traditionally Interpreted in Terms of Kripke Structures.[citation Needed] Krippendorff’s Alpha icr Kronecker Recurrent Units(KRU) Our work addresses two important issues with recurrent neural networks: (1) they are over-parameterized, and (2) the recurrence matrix is ill-conditioned. The former increases the sample complexity of learning and the training time. The latter causes the vanishing and exploding gradient problem. We present a flexible recurrent neural network model called Kronecker Recurrent Units (KRU). KRU achieves parameter efficiency in RNNs through a Kronecker factored recurrent matrix. It overcomes the ill-conditioning of the recurrent matrix by enforcing soft unitary constraints on the factors. Thanks to the small dimensionality of the factors, maintaining these constraints is computationally efficient. Our experimental results on five standard data-sets reveal that KRU can reduce the number of parameters by three orders of magnitude in the recurrent weight matrix compared to the existing recurrent models, without trading the statistical performance. These results in particular show that while there are advantages in having a high dimensional recurrent space, the capacity of the recurrent part of the model can be dramatically reduced. Kruskal’s Algorithm Kruskal’s algorithm is a greedy algorithm in graph theory that finds a minimum spanning tree for a connected weighted graph. This means it finds a subset of the edges that forms a tree that includes every vertex, where the total weight of all the edges in the tree is minimized. If the graph is not connected, then it finds a minimum spanning forest (a minimum spanning tree for each connected component). k-Same-Siamese-GAN(kSS-GAN) In recent years, advances in camera and computing hardware have made it easy to capture and store amounts of image and video data. Consider a data holder, such as a hospital or a government entity, who has a privately held collection of personal data. Then, how can we ensure that the data holder does conceal the identity of each individual in the imagery of personal data while still preserving certain useful aspects of the data after de-identification? In this work, we proposed a novel approach towards high-resolution facial image de-identification, called k-Same-Siamese-GAN (kSS-GAN), which leverages k-Same-Anonymity mechanism, Generative Adversarial Network (GAN), and hyperparameter tuning. To speed up training and reduce memory consumption, the mixed precision training (MPT) technique is also applied to make kSS-GAN provide guarantees regarding privacy protection on close-form identities and be trained much more efficiently as well. Finally, we dedicated our system to an actual dataset: RafD dataset for performance testing. Besides protecting privacy of high resolution of facial images, the proposed system is also justified for its ability in automating parameter tuning and breaking through the limitation of the number of adjustable parameters. K-separable GGM In high-dimensional graph learning problems, some topological properties of the graph, such as bounded node degree or tree structure, are typically assumed to hold so that the sample complexity of recovering the graph structure can be reduced. With bounded degree or separability assumptions, quantified by a measure$k$, a$p$-dimensional Gaussian graphical model (GGM) can be learnt with sample complexity$\Omega (k \: \text{log} \: p)$. Our work in this paper aims to do away with these assumptions by introducing an algorithm that can identify whether a GGM indeed has these topological properties without any initial topological assumptions. We show that we can check whether a GGM has node degree bounded by$k$with sample complexity$\Omega (k \: \text{log} \: p)$. More generally, we introduce the notion of a strongly K-separable GGM, and show that our algorithm can decide whether a GGM is strongly$k$-separable or not, with sample complexity$\Omega (k \: \text{log} \: p)$. We introduce the notion of a generalized feedback vertex set (FVS), an extension of the typical FVS, and show that we can use this identification technique to learn GGMs with generalized FVSs. KSQL KSQL is an open source, Apache 2.0 licensed streaming SQL engine that enables stream processing against Apache Kafka®. KSQL makes it easy to read, write, and process streaming data in real-time, at scale, using SQL-like semantics. It offers an easy way to express stream processing transformations as an alternative to writing an application in a programming language such as Java or Python. Currently available as a developer preview, KSQL provides powerful stream processing capabilities such as joins, aggregations, event-time windowing, and more! K-SVD In applied mathematics, K-SVD is a dictionary learning algorithm for creating a dictionary for sparse representations, via a singular value decomposition approach. K-SVD is a generalization of the k-means clustering method, and it works by iteratively alternating between sparse coding the input data based on the current dictionary, and updating the atoms in the dictionary to better fit the data. K-SVD can be found widely in use in applications such as image processing, audio processing, biology, and document analysis. Analysis K-SVD: A Dictionary-Learning Algorithm for the Analysis Sparse Model k-SVRG In recent years, many variance reduced algorithms for empirical risk minimization have been introduced. In contrast to vanilla SGD, these methods converge linearly on strong convex problems. To obtain the variance reduction, current methods either require frequent passes over the full data to recompute gradients—without making any progress during this time (like in SVRG), or they require memory of the same size as the input problem (like SAGA). In this work, we propose k-SVRG, an algorithm that interpolates between those two extremes: it makes best use of the available memory and in turn does avoid full passes over the data without making progress. We prove linear convergence of k-SVRG on strongly convex problems and convergence to stationary points on non-convex problems. Numerical experiments show the effectiveness of our method. KTBoost In this article, we introduce a novel boosting algorithm called KTBoost’, which combines kernel boosting and tree boosting. In each boosting iteration, the algorithm adds either a regression tree or reproducing kernel Hilbert space (RKHS) regression function to the ensemble of base learners. Intuitively, the idea is that discontinuous trees and continuous RKHS regression functions complement each other, and that this combination allows for better learning of both continuous and discontinuous functions as well as functions that exhibit parts with varying degrees of regularity. We empirically show that KTBoost outperforms both tree and kernel boosting in terms of predictive accuracy on a wide array of data sets. KT-Speech-Crawler In this paper, we describe KT-Speech-Crawler: an approach for automatic dataset construction for speech recognition by crawling YouTube videos. We outline several filtering and post-processing steps, which extract samples that can be used for training end-to-end neural speech recognition systems. In our experiments, we demonstrate that a single-core version of the crawler can obtain around 150 hours of transcribed speech within a day, containing an estimated 3.5% word error rate in the transcriptions. Automatically collected samples contain reading and spontaneous speech recorded in various conditions including background noise and music, distant microphone recordings, and a variety of accents and reverberation. When training a deep neural network on speech recognition, we observed around 40\% word error rate reduction on the Wall Street Journal dataset by integrating 200 hours of the collected samples into the training set. The demo (http://emnlp-demo.lakomkin.me ) and the crawler code (https://…/KTSpeechCrawler ) are publicly available. kubeCDN A self-hosted content delivery network based on Kubernetes. Easily setup Kubernetes clusters in multiple AWS regions and deploy resilient and reliable services to a global user base within minutes. This project was developed by Ilhaan Rasheed during his tenure as a DevOps Engineering Fellow at Insight. The capabilities of this project have been demonstrated using video streaming as an example. Kubeflow The Machine Learning Toolkit for Kubernetes. The Kubeflow project is dedicated to making deployments of machine learning (ML) workflows on Kubernetes simple, portable and scalable. Our goal is not to recreate other services, but to provide a straightforward way to deploy best-of-breed open-source systems for ML to diverse infrastructures. Anywhere you are running Kubernetes, you should be able to run Kubeflow. Kullback-Leibler – Local Interpretable Model-agnostic Explanations(KL-LIME) We introduce a method, KL-LIME, for explaining predictions of Bayesian predictive models by projecting the information in the predictive distribution locally to a simpler, interpretable explanation model. The proposed approach combines the recent Local Interpretable Model-agnostic Explanations (LIME) method with ideas from Bayesian projection predictive variable selection methods. The information theoretic basis helps in navigating the trade-off between explanation fidelity and complexity. We demonstrate the method in explaining MNIST digit classifications made by a Bayesian deep convolutional neural network. Kullback-Leibler Divergence(KLIC, KLD) In probability theory and information theory, the Kullback-Leibler divergence (also information divergence, information gain, relative entropy, or KLIC; here abbreviated as KL divergence) is a non-symmetric measure of the difference between two probability distributions P and Q. Specifically, the Kullback-Leibler divergence of Q from P, denoted DKL(P||Q), is a measure of the information lost when Q is used to approximate P: The KL divergence measures the expected number of extra bits required to code samples from P when using a code based on Q, rather than using a code based on P. Typically P represents the “true” distribution of data, observations, or a precisely calculated theoretical distribution. The measure Q typically represents a theory, model, description, or approximation of P. Although it is often intuited as a metric or distance, the KL divergence is not a true metric – for example, it is not symmetric: the KL divergence from P to Q is generally not the same as that from Q to P. However, its infinitesimal form, specifically its Hessian, is a metric tensor: it is the Fisher information metric. Kurtosis In probability theory and statistics, kurtosis (from the Greek word kurtos, meaning curved, arching) is any measure of the ‘peakedness’ of the probability distribution of a real-valued random variable. In a similar way to the concept of skewness, kurtosis is a descriptor of the shape of a probability distribution and, just as for skewness, there are different ways of quantifying it for a theoretical distribution and corresponding ways of estimating it from a sample from a population. There are various interpretations of kurtosis, and of how particular measures should be interpreted; these are primarily peakedness (width of peak), tail weight, and lack of shoulders (distribution primarily peak and tails, not in between). ‘Student’, on Kurtosis Kurtosis as Peakedness, 1905-2014. R.I.P. The incorrect notion that kurtosis somehow measures ‘peakedness’ (flatness, pointiness, or modality) of a distribution is remarkably persistent, despite attempts by statisticians to set the record straight. This article puts the notion to rest once and for all. Kurtosis tells you virtually nothing about the shape of the peak – its only unambiguous interpretation is in terms of tail extremity, that is, either existing outliers (for the sample kurtosis) or propensity to produce outliers (for the kurtosis of a probability distribution). To clarify this point, relevant literature is reviewed, counterexample distributions are given, and it is shown that the proportion of the kurtosis that is determined by the central μ ± σ range is usually quite small. Kusto Azure Data Explorer is a fast, fully managed data analytics service for real-time analysis on large volumes of data streaming from applications, websites, IoT devices, and more. You can use Azure Data Explorer to collect, store, and analyze diverse data to improve products, enhance customer experiences, monitor devices, and boost operations. KV-Index Time series data have exploded due to the popularity of new applications, like data center management and IoT. Time series data management system (TSDB), emerges to store and query the large volume of time series data. Subsequence matching is critical in many time series mining algorithms, and extensive approaches have been proposed. However, the shift of distributed storage system and the performance gap make these approaches not compatible with TSDB. To fill this gap, we propose a new index structure, KV-index, and the corresponding matching algorithm, KV-match. KV-index is a file-based structure, which can be easily implemented on local files, HDFS or HBase tables. KV-match algorithm probes the index efficiently with a few sequential scans. Moreover, two optimization techniques, window reduction and window reordering, are proposed to further accelerate the processing. To support the query of arbitrary lengths, we extend KV-match to KV-match$_{DP}\$, which utilizes multiple varied length indexes to process the query simultaneously. A two-dimensional dynamic programming algorithm is proposed to find the optimal query segmentation. We implement our approach on both local files and HBase tables, and conduct extensive experiments on synthetic and real-world datasets. Results show that our index is of comparable size to the popular tree-style index while our query processing is order of magnitudes more efficient. KV-Match ➚ “KV-Index” Kyrix Scalable interactive visual data exploration is crucial in many domains due to increasingly large datasets generated at rapid rates. Details-on-demand provides a useful interaction paradigm for exploring large datasets, where users start at an overview, find regions of interest, zoom in to see detailed views, zoom out and then repeat. This paradigm is the primary user interaction mode of widely-used systems such as Google Maps, Aperture Tiles and ForeCache. These earlier systems, however, are highly customized with hardcoded visual representations and optimizations. A more general framework is needed to facilitate the development of visual data exploration systems at scale. In this paper, we present Kyrix, an end-to-end system for developing scalable details-on-demand data exploration applications. Kyrix provides developers with a declarative model for easy specification of general visualizations. Behind the scenes, Kyrix utilizes a suite of performance optimization techniques to achieve a response time within 500ms for various user interactions. We also report results from a performance study which shows that a novel dynamic fetching scheme adopted by Kyrix outperforms tile-based fetching used in earlier systems. Kyso Collaborative & Reproducible Data Science. Jupyter notebooks published as beautifully-rendered blogs and dashboards for effective data visualization, analysis and exploration