# WhatIs-I

 I Don’t Know – Prediction Cascades Framework Advances in deep learning have led to substantial increases in prediction accuracy as well as the cost of rendering predictions. We conjecture that for a majority of real-world inputs, the recent advances in deep learning have created models that effectively ‘over-think’ on simple inputs. In this paper we revisit the classic idea of prediction cascades to reduce prediction costs. We introduce the ‘I Don’t Know’ (IDK) prediction cascades framework, a general framework for constructing prediction cascades for arbitrary multi-class prediction tasks. We propose two baseline methods for constructing cascades as well as a new objective within this framework and evaluate these techniques on a range of benchmark and real-world datasets to demonstrate the prediction cascades can achieve 1.7-10.5x speedups in image classification tasks while maintaining comparable accuracy to state-of-the-art models. When combined with human experts, prediction cascades can achieve nearly perfect accuracy(within 5%) while requiring human intervention on less than 30% of the queries. Ibis Ibis is a new Python data analysis framework with the goal of enabling data scientists and data engineers to be as productive working with big data as they are working with small and medium data today. In doing so, we will enable Python to become a true first-class language for Apache Hadoop, without compromises in functionality, usability, or performance. Having spent much of the last decade improving the usability of the single-node Python experience (with pandas and other projects), we are looking to achieve: · 100% Python end-to-end user workflows · Native hardware speeds for a broad set of use cases · Full-fidelity data analysis without extractions or sampling · Scalability for big data · Integration with the existing Python data ecosystem (pandas, scikit-learn, NumPy, and so on) IBN-Net Convolutional neural networks (CNNs) have achieved great successes in many computer vision problems. Unlike existing works that designed CNN architectures to improve performance on a single task of a single domain and not generalizable, we present IBN-Net, a novel convolutional architecture, which remarkably enhances a CNN’s modeling ability on one domain (e.g. Cityscapes) as well as its generalization capacity on another domain (e.g. GTA5) without finetuning. IBN-Net carefully integrates Instance Normalization (IN) and Batch Normalization (BN) as building blocks, and can be wrapped into many advanced deep networks to improve their performances. This work has three key contributions. (1) By delving into IN and BN, we disclose that IN learns features that are invariant to appearance changes, such as colors, styles, and virtuality/reality, while BN is essential for preserving content related information. (2) IBN-Net can be applied to many advanced deep architectures, such as DenseNet, ResNet, ResNeXt, and SENet, and consistently improve their performance without increasing computational cost. (3) When applying the trained networks to new domains, e.g. from GTA5 to Cityscapes, IBN-Net achieves comparable improvements as domain adaptation methods, even without using data from the target domain. With IBN-Net, we won the 1st place on the WAD 2018 Challenge Drivable Area track, with an mIoU of 86.18%. Iceberg Secret In a nutshell, the Iceberg secret speaks to the apparent gap between technical and non-technical stakeholders when it comes to evaluating the quality and progress of building an AI-based solution. Often the solution is judged on the visualization of the data or the scalar output of the predictions (the 10%), and little regard is given to the bulk of the work that is spent on the data preparation (the 90%). For the Data or ML Engineer, they understand the exponential value in carefully focussing on understanding the data, preparing the data and how important it is to ensure that a mature data pipeline is in place before any time is spent on the top 10%. IceBreaker Internet has brought about a tremendous increase in content of all forms and, in that, video content constitutes the major backbone of the total content being published as well as watched. Thus it becomes imperative for video recommendation engines such as Hulu to look for novel and innovative ways to recommend the newly added videos to their users. However, the problem with new videos is that they lack any sort of metadata and user interaction so as to be able to rate the videos for the consumers. To this effect, this paper introduces the several techniques we develop for the Content Based Video Relevance Prediction (CBVRP) Challenge being hosted by Hulu for the ACM Multimedia Conference 2018. We employ different architectures on the CBVRP dataset to make use of the provided frame and video level features and generate predictions of videos that are similar to the other videos. We also implement several ensemble strategies to explore complementarity between both the types of provided features. The obtained results are encouraging and will impel the boundaries of research for multimedia based video recommendation systems. ICMEN Prediction over edges and nodes in graphs requires appropriate and efficiently achieved data representation. Recent research on representation learning for dynamic networks resulted in a significant progress. However, the more precise and accurate methods, the greater computational and memory complexity. Here, we introduce ICMEN – the first-in-class incremental meta-embedding method that produces vector representations of nodes respecting temporal dependencies in the graph. ICMEN efficiently constructs nodes’ embedding from historical representations by linearly convex combinations making the process less memory demanding than state-of-the-art embedding algorithms. The method is capable of constructing representation for inactive and new nodes without a need to re-embed. The results of link prediction on several real-world datasets shown that applying ICMEN incremental meta-method to any base embedding approach, we receive similar results and save memory and computational power. Taken together, our work proposes a new way of efficient online representation learning in dynamic complex networks. IDEBench Existing benchmarks for analytical database systems such as TPC-DS and TPC-H are designed for static reporting scenarios. The main metric of these benchmarks is the performance of running individual SQL queries over a synthetic database. In this paper, we argue that such benchmarks are not suitable for evaluating database workloads originating from interactive data exploration (IDE) systems where most queries are ad-hoc, not based on predefined reports, and built incrementally. As a main contribution, we present a novel benchmark called IDEBench that can be used to evaluate the performance of database systems for IDE workloads. As opposed to traditional benchmarks for analytical database systems, our goal is to provide more meaningful workloads and datasets that can be used to benchmark IDE query engines, with a particular focus on metrics that capture the trade-off between query performance and quality of the result. As a second contribution, this paper evaluates and discusses the performance results of selected IDE query engines using our benchmark. The study includes two commercial systems, as well as two research prototypes (IDEA, approXimateDB/XDB), and one traditional analytical database system (MonetDB). iDocNADE We address two challenges in topic models: (1) Context information around words helps in determining their actual meaning, e.g., ‘networks’ used in the contexts artificial neural networks vs. biological neuron networks. Generative topic models infer topic-word distributions, taking no or only little context into account. Here, we extend a neural autoregressive topic model to exploit the full context information around words in a document in a language modeling fashion. The proposed model is named as iDocNADE. (2) Due to the small number of word occurrences (i.e., lack of context) in short text and data sparsity in a corpus of few documents, the application of topic models is challenging on such texts. Therefore, we propose a simple and efficient way of incorporating external knowledge into neural autoregressive topic models: we use embeddings as a distributional prior. The proposed variants are named as DocNADE2 and iDocNADE2. We present novel neural autoregressive topic model variants that consistently outperform state-of-the-art generative topic models in terms of generalization, interpretability (topic coherence) and applicability (retrieval and classification) over 6 long-text and 8 short-text datasets from diverse domains. iDocNADE2 ➚ “iDocNADE” iFair People are rated and ranked, towards algorithmic decision making in an increasing number of applications, typically based on machine learning. Research on how to incorporate fairness into such tasks has prevalently pursued the paradigm of group fairness: ensuring that each ethnic or social group receives its fair share in the outcome of classifiers and rankings. In contrast, the alternative paradigm of individual fairness has received relatively little attention. This paper introduces a method for probabilistically clustering user records into a low-rank representation that captures individual fairness yet also achieves high accuracy in classification and regression models. Our notion of individual fairness requires that users who are similar in all task-relevant attributes such as job qualification, and disregarding all potentially discriminating attributes such as gender, should have similar outcomes. Since the case for fairness is ubiquitous across many tasks, we aim to learn general representations that can be applied to arbitrary downstream use-cases. We demonstrate the versatility of our method by applying it to classification and learning-to-rank tasks on two real-world datasets. Our experiments show substantial improvements over the best prior work for this setting. IFSAD Ensemble learning for anomaly detection of data structured into complex network has been barely studied due to the inconsistent performance of complex network characteristics and lack of inherent objective function. In this paper, we propose the IFSAD, a new two-phase ensemble method for anomaly detection based on intuitionistic fuzzy set, and applies it to the abnormal behavior detection problem in temporal complex networks. First, it constructs the intuitionistic fuzzy set of single network characteristic which quantifies the degree of membership, non-membership and hesitation of each of network characteristic to the defined linguistic variables so that makes the unuseful or noise characteristics become part of the detection. To build an objective intuitionistic fuzzy relationship, we propose an Gaussian distribution-based membership function which gives a variable hesitation degree. Then, for the fuzzification of multiple network characteristics, the intuitionistic fuzzy weighted geometric operator is adopted to fuse multiple IFSs and to avoid the inconsistent of multiple characteristics. Finally, the score function and precision function are used to sort the fused IFS. Finally we carried out extensive experiments on several complex network datasets for anomaly detection, and the results demonstrate the superiority of our method to state-of-the-art approaches, validating the effectiveness of our method. I-GOS Understanding and interpreting the decisions made by deep learning models is valuable in many domains. In computer vision, computing heatmaps from a deep network is a popular approach for visualizing and understanding deep networks. However, heatmaps that do not correlate with the network may mislead human, hence the performance of heatmaps in providing a faithful explanation to the underlying deep network is crucial. In this paper, we propose I-GOS, which optimizes for a heatmap so that the classification scores on the masked image would maximally decrease. The main novelty of the approach is to compute descent directions based on the integrated gradients instead of the normal gradient, which avoids local optima and speeds up convergence. Compared with previous approaches, our method can flexibly compute heatmaps at any resolution for different user needs. Extensive experiments on several benchmark datasets show that the heatmaps produced by our approach are more correlated with the decision of the underlying deep network, in comparison with other state-of-the-art approaches. IllinoisSL IllinoisSL is a Java library for learning structured prediction models. It supports structured Support Vector Machines and structured Perceptron. The library consists of a core learning module and several applications, which can be executed from command-lines. Documentation is provided to guide users. In Comparison to other structured learning libraries, IllinoisSL is efficient, general, and easy to use. IL-Net Deep neural networks (DNN) excel at extracting patterns. Through representation learning and automated feature engineering on large datasets, such models have been highly successful in computer vision and natural language applications. Designing optimal network architectures from a principled or rational approach however has been less than successful, with the best successful approaches utilizing an additional machine learning algorithm to tune the network hyperparameters. However, in many technical fields, there exist established domain knowledge and understanding about the subject matter. In this work, we develop a novel furcated neural network architecture that utilizes domain knowledge as high-level design principles of the network. We demonstrate proof-of-concept by developing IL-Net, a furcated network for predicting the properties of ionic liquids, which is a class of complex multi-chemicals entities. Compared to existing state-of-the-art approaches, we show that furcated networks can improve model accuracy by approximately 20-35%, without using additional labeled data. Lastly, we distill two key design principles for furcated networks that can be adapted to other domains. Image Enhancement Generative Adversarial Network(IEGAN) Despite the breakthroughs in quality of image enhancement, an end-to-end solution for simultaneous recovery of the finer texture details and sharpness for degraded images with low resolution is still unsolved. Some existing approaches focus on minimizing the pixel-wise reconstruction error which results in a high peak signal-to-noise ratio. The enhanced images fail to provide high-frequency details and are perceptually unsatisfying, i.e., they fail to match the quality expected in a photo-realistic image. In this paper, we present Image Enhancement Generative Adversarial Network (IEGAN), a versatile framework capable of inferring photo-realistic natural images for both artifact removal and super-resolution simultaneously. Moreover, we propose a new loss function consisting of a combination of reconstruction loss, feature loss and an edge loss counterpart. The feature loss helps to push the output image to the natural image manifold and the edge loss preserves the sharpness of the output image. The reconstruction loss provides low-level semantic information to the generator regarding the quality of the generated images compared to the original. Our approach has been experimentally proven to recover photo-realistic textures from heavily compressed low-resolution images on public benchmarks and our proposed high-resolution World100 dataset. Image Processing Language for Performance Portability on Heterogeneous Systems(ImageCL) Modern computer systems typically conbine multicore CPUs with accelerators like GPUs for inproved performance and energy efficiency. However, these sys- tems suffer from poor performance portability, code tuned for one device must be retuned to achieve high performance on another. Image processing is increas- ing in importance , with applications ranging from seismology and medicine to Photoshop. Based on our experience with medical image processing, we propose ImageCL, a high-level domain-specific language and source-to-source compiler, targeting heterogeneous hardware. ImageCL resembles OpenCL, but abstracts away per- formance optimization details, allowing the programmer to focus on algorithm development, rather than performance tuning. The latter is left to our source-to- source compiler and auto-tuner. From high-level ImageCL kernels, our source- to-source compiler can generate multiple OpenCL implementations with different optimizations applied. We rely on auto-tuning rather than machine models or ex- pert programmer knowledge to determine which optimizations to apply, making our tuning procedure highly robust. Furthermore, we can generate high perform- ing implementations for different devices from a single source code, thereby im- proving performance portability. We evaluate our approach on three image processing benchmarks, on different GPU and CPU devices, and are able to outperform other state of the art solutions in several cases, achieving speedups of up to 4.57x. Image Registration Image registration is the process of transforming different sets of data into one coordinate system. Data may be multiple photographs, data from different sensors, times, depths, or viewpoints. It is used in computer vision, medical imaging, military automatic target recognition, and compiling and analyzing images and data from satellites. Registration is necessary in order to be able to compare or integrate the data obtained from these different measurements. Image Score There has long been debates on how we could interpret neural networks and understand the decisions our models make. Specifically, why deep neural networks tend to be error-prone when dealing with samples that output low softmax scores. We present an efficient approach to measure the confidence of decision-making steps by statistically investigating each unit’s contribution to that decision. Instead of focusing on how the models react on datasets, we study the datasets themselves given a pre-trained model. Our approach is capable of assigning a score to each sample within a dataset that measures the frequency of occurrence of that sample’s chain of activation. We demonstrate with experiments that our method could select useful samples to improve deep neural networks in a semi-supervised leaning setting. ImageNet-C In this paper we establish rigorous benchmarks for image classifier robustness. Our first benchmark, ImageNet-C, standardizes and expands the corruption robustness topic, while showing which classifiers are preferable in safety-critical applications. Then we propose a new dataset called ImageNet-P which enables researchers to benchmark a classifier’s robustness to common perturbations. Unlike recent robustness research, this benchmark evaluates performance on common corruptions and perturbations not worst-case adversarial perturbations. We find that there are negligible changes in relative corruption robustness from AlexNet classifiers to ResNet classifiers. Afterward we discover ways to enhance corruption and perturbation robustness. We even find that a bypassed adversarial defense provides substantial common perturbation robustness. Together our benchmarks may aid future work toward networks that robustly generalize. ImageNet-P In this paper we establish rigorous benchmarks for image classifier robustness. Our first benchmark, ImageNet-C, standardizes and expands the corruption robustness topic, while showing which classifiers are preferable in safety-critical applications. Then we propose a new dataset called ImageNet-P which enables researchers to benchmark a classifier’s robustness to common perturbations. Unlike recent robustness research, this benchmark evaluates performance on common corruptions and perturbations not worst-case adversarial perturbations. We find that there are negligible changes in relative corruption robustness from AlexNet classifiers to ResNet classifiers. Afterward we discover ways to enhance corruption and perturbation robustness. We even find that a bypassed adversarial defense provides substantial common perturbation robustness. Together our benchmarks may aid future work toward networks that robustly generalize. Image-Text-Image(I2T2I) Translating information between text and image is a fundamental problem in artificial intelligence that connects natural language processing and computer vision. In the past few years, performance in image caption generation has seen significant improvement through the adoption of recurrent neural networks (RNN). Meanwhile, text-to-image generation begun to generate plausible images using datasets of specific categories like birds and flowers. We’ve even seen image generation from multi-category datasets such as the Microsoft Common Objects in Context (MSCOCO) through the use of generative adversarial networks (GANs). Synthesizing objects with a complex shape, however, is still challenging. For example, animals and humans have many degrees of freedom, which means that they can take on many complex shapes. We propose a new training method called Image-Text-Image (I2T2I) which integrates text-to-image and image-to-text (image captioning) synthesis to improve the performance of text-to-image synthesis. We demonstrate that %the capability of our method to understand the sentence descriptions, so as to I2T2I can generate better multi-categories images using MSCOCO than the state-of-the-art. We also demonstrate that I2T2I can achieve transfer learning by using a pre-trained image captioning module to generate human images on the MPII Human Pose Imagination-Augmented Agents(I2A) We introduce Imagination-Augmented Agents (I2As), a novel architecture for deep reinforcement learning combining model-free and model-based aspects. In contrast to most existing model-based reinforcement learning and planning methods, which prescribe how a model should be used to arrive at a policy, I2As learn to interpret predictions from a learned environment model to construct implicit plans in arbitrary ways, by using the predictions as additional context in deep policy networks. I2As show improved data efficiency, performance, and robustness to model misspecification compared to several baselines. IMEXnet Deep convolutional neural networks have revolutionized many machine learning and computer vision tasks. Despite their enormous success, remaining key challenges limit their wider use. Pressing challenges include improving the network’s robustness to perturbations of the input images and simplifying the design of architectures that generalize. Another problem relates to the limited ‘field of view’ of convolution operators, which means that very deep networks are required to model nonlocal relations in high-resolution image data. We introduce the IMEXnet that addresses these challenges by adapting semi-implicit methods for partial differential equations. Compared to similar explicit networks such as the residual networks (ResNets) our network is more stable. This stability has been recently shown to reduce the sensitivity to small changes in the input features and improve generalization. The implicit step connects all pixels in the images and therefore addresses the field of view problem, while being comparable to standard convolutions in terms of the number of parameters and computational complexity. We also present a new dataset for semantic segmentation and demonstrate the effectiveness of our architecture using the NYU depth dataset. Imitation Learning Learning from Demonstration’: Imitation learning, a.k.a behavioral cloning, is learning from demonstration. In other words, in imitation learning, a machine learns how to behave by looking at what a teacher (or expert) does and then mimics that behavior. An example can be when we collect driving data from human and then use that data for a self driving car. Imitation Learning in Tensorflow Imitation Network In this paper, we propose imitation networks, a simple but effective method for training neural networks with a limited amount of training data. Our approach inherits the idea of knowledge distillation that transfers knowledge from a deep or wide reference model to a shallow or narrow target model. The proposed method employs this idea to mimic predictions of reference estimators that are much more robust against overfitting than the network we want to train. Different from almost all the previous work for knowledge distillation that requires a large amount of labeled training data, the proposed method requires only a small amount of training data. Instead, we introduce pseudo training examples that are optimized as a part of model parameters. Experimental results for several benchmark datasets demonstrate that the proposed method outperformed all the other baselines, such as naive training of the target model and standard knowledge distillation. Imitative Model Imitation learning provides an appealing framework for autonomous control: in many tasks, demonstrations of preferred behavior can be readily obtained from human experts, removing the need for costly and potentially dangerous online data collection in the real world. However, policies learned with imitation learning have limited flexibility to accommodate varied goals at test time. Model-based reinforcement learning (MBRL) offers considerably more flexibility, since a predictive model learned from data can be used to achieve various goals at test time. However, MBRL suffers from two shortcomings. First, the predictive model does not help to choose desired or safe outcomes — it reasons only about what is possible, not what is preferred. Second, MBRL typically requires additional online data collection to ensure that the model is accurate in those situations that are actually encountered when attempting to achieve test time goals. Collecting this data with a partially trained model can be dangerous and time-consuming. In this paper, we aim to combine the benefits of imitation learning and MBRL, and propose imitative models: probabilistic predictive models able to plan expert-like trajectories to achieve arbitrary goals. We find this method substantially outperforms both direct imitation and MBRL in a simulated autonomous driving task, and can be learned efficiently from a fixed set of expert demonstrations without additional online data collection. We also show our model can flexibly incorporate user-supplied costs as test-time, can plan to sequences of goals, and can even perform well with imprecise goals, including goals on the wrong side of the road. IMMIGRATE By balancing margin-quantity maximization and margin-quality maximization, the proposed IMMIGRATE algorithm considers both local and global information when using margin-based frameworks. We here derive a new mathematical interpretation of margin-based cost function by using the quadratic form distance (QFD) and applying both the large-margin and max-min entropy principles. We also design a new principle for classifying new samples and propose a Bayesian framework to iteratively minimize the cost function. We demonstrate the power of our new method by comparing it with 16 widely used classifiers (e.g. Support Vector Machine, k-nearest neighbors, RELIEF, etc.) including some classifiers that are capable of identifying interaction terms (e.g. SODA, hierNet, etc.) on synthetic dataset, five gene expression datasets, and twenty UCI machine learning datasets. Our method is able to outperform other methods in most cases. Imperialist Competitive Algorithm(ICA) In computer science, Imperialist Competitive Algorithm (ICA) is a computational method that is used to solve optimization problems of different types. Like most of the methods in the area of evolutionary computation, ICA does not need the gradient of the function in its optimization process. From a specific point of view, ICA can be thought of as the social counterpart of genetic algorithms (GAs). ICA is the mathematical model and the computer simulation of human social evolution, while GAs are based on the biological evolution of species. ICAFF,ICAOD Implicit Association Test(IAT) The implicit-association test (IAT) is a measure within social psychology designed to detect the strength of a person’s automatic association between mental representations of objects (concepts) in memory. The IAT was introduced in the scientific literature in 1998 by Anthony Greenwald, Debbie McGhee, Joyce Sherry, and Jordan Schwartz. The IAT is now widely used in social psychology research and is used to some extent in clinical, cognitive, and developmental psychology research. Although some controversy still exists regarding the IAT and what it measures, much research into its validity and psychometric properties has been conducted since its introduction into the literature. IATscores Implicit Kernel Learning(IKL) Kernels are powerful and versatile tools in machine learning and statistics. Although the notion of universal kernels and characteristic kernels has been studied, kernel selection still greatly influences the empirical performance. While learning the kernel in a data driven way has been investigated, in this paper we explore learning the spectral distribution of kernel via implicit generative models parametrized by deep neural networks. We called our method Implicit Kernel Learning (IKL). The proposed framework is simple to train and inference is performed via sampling random Fourier features. We investigate two applications of the proposed IKL as examples, including generative adversarial networks with MMD (MMD GAN) and standard supervised learning. Empirically, MMD GAN with IKL outperforms vanilla predefined kernels on both image and text generation benchmarks; using IKL with Random Kitchen Sinks also leads to substantial improvement over existing state-of-the-art kernel learning algorithms on popular supervised learning benchmarks. Theory and conditions for using IKL in both applications are also studied as well as connections to previous state-of-the-art methods. Implicit Maximum Likelihood Estimation Implicit probabilistic models are models defined naturally in terms of a sampling procedure and often induces a likelihood function that cannot be expressed explicitly. We develop a simple method for estimating parameters in implicit models that does not require knowledge of the form of the likelihood function or any derived quantities, but can be shown to be equivalent to maximizing likelihood under some conditions. Our result holds in the non-asymptotic parametric setting, where both the capacity of the model and the number of data examples are finite. We also demonstrate encouraging experimental results. Implicit Policy We introduce Implicit Policy, a general class of expressive policies that can flexibly represent complex action distributions in reinforcement learning, with efficient algorithms to compute entropy regularized policy gradients. We empirically show that, despite its simplicity in implementation, entropy regularization combined with a rich policy class can attain desirable properties displayed under maximum entropy reinforcement learning framework, such as robustness and multi-modality. Implicit Regression In 2011, Wooten introduced Non-Response Analysis the founding theory in Implicit Regression where Implicit Regression treats the variables implicitly as codependent variables and not as an explicit function with dependent or independent variables as in standard regression. The motivation of this paper is to introduce methods of implicit regression to determine the constant nature of a variable or the interactive term, and address inverse relationship among measured variables with random error present in both directions. Implicit Stochastic Gradient Descent(ISGD) Arguably the biggest challenge in applying neural networks is tuning the hyperparameters, in particular the learning rate. The sensitivity to the learning rate is due to the reliance on backpropagation to train the network. In this paper we present the first application of Implicit Stochastic Gradient Descent (ISGD) to train neural networks, a method known in convex optimization to be unconditionally stable and robust to the learning rate. Our key contribution is a novel layer-wise approximation of ISGD which makes its updates tractable for neural networks. Experiments show that our method is more robust to high learning rates and generally outperforms standard backpropagation on a variety of tasks. ImplicitCE Although modern recommendation systems can exploit the structure in users’ item feedback, most are powerless in the face of new users who provide no structure for them to exploit. In this paper we introduce ImplicitCE, an algorithm for recommending items to new users during their sign-up flow. ImplicitCE works by transforming users’ implicit feedback towards auxiliary domain items into an embedding in the target domain item embedding space. ImplicitCE learns these embedding spaces and transformation function in an end-to-end fashion and can co-embed users and items with any differentiable similarity function. To train ImplicitCE we explore methods for maximizing the correlations between model predictions and users’ affinities and introduce Sample Correlation Update, a novel and extremely simple training strategy. Finally, we show that ImplicitCE trained with Sample Correlation Update outperforms a variety of state of the art algorithms and loss functions on both a large scale Twitter dataset and the DBLP dataset. Import Vector Machines The Import Vector Machines (Zhu and Hastie 2005) are a sparse, discriminative and probabilistic classifier. The algorithm is based on the Kernel Logistic Regression model, but uses only a few data points to define the decision hyperplane in the feature space. These data points are called import vectors. The Import Vector Machine shows similar results to the widely used Support Vector Machine, but has a probabilistic output. Importance Sampling In statistics, importance sampling is a general technique for estimating properties of a particular distribution, while only having samples generated from a different distribution than the distribution of interest. It is related to umbrella sampling in computational physics. Depending on the application, the term may refer to the process of sampling from this alternative distribution, the process of inference, or both. Importance Weighted Autoencoder(IWAE) The variational autoencoder (VAE; Kingma, Welling (2014)) is a recently proposed generative model pairing a top-down generative network with a bottom-up recognition network which approximates posterior inference. It makes two strong assumptions about posterior inference: that the posterior distribution is approximately factorial, and that its parameters can be approximated with nonlinear regression from the observations. As we show empirically, the VAE objective can lead to overly simplified representations which fail to use the network’s entire modeling capacity. We present the importance weighted autoencoder (IWAE), a generative model with the same architecture as the VAE, but which uses a strictly tighter log-likelihood lower bound derived from importance weighting. In the IWAE, the recognition network uses multiple samples to approximate the posterior, giving it increased flexibility to model complex posteriors which do not fit the VAE modeling assumptions. We show empirically that IWAEs learn richer latent space representations than VAEs, leading to improved test log-likelihood on density estimation benchmarks. GitXiv Importance-Weighted Actor Learner Architecture(IMPALA) In this work we aim to solve a large collection of tasks using a single reinforcement learning agent with a single set of parameters. A key challenge is to handle the increased amount of data and extended training time, which is already a problem in single task learning. We have developed a new distributed agent IMPALA (Importance-Weighted Actor Learner Architecture) that can scale to thousands of machines and achieve a throughput rate of 250,000 frames per second. We achieve stable learning at high throughput by combining decoupled acting and learning with a novel off-policy correction method called V-trace, which was critical for achieving learning stability. We demonstrate the effectiveness of IMPALA for multi-task reinforcement learning on DMLab-30 (a set of 30 tasks from the DeepMind Lab environment (Beattie et al., 2016)) and Atari-57 (all available Atari games in Arcade Learning Environment (Bellemare et al., 2013a)). Our results show that IMPALA is able to achieve better performance than previous agents, use less data and crucially exhibits positive transfer between tasks as a result of its multi-task approach. Imputation In statistics, imputation is the process of replacing missing data with substituted values. When substituting for a data point, it is known as “unit imputation”; when substituting for a component of a data point, it is known as “item imputation”. Because missing data can create problems for analyzing data, imputation is seen as a way to avoid pitfalls involved with listwise deletion of cases that have missing values. That is to say, when one or more values are missing for a case, most statistical packages default to discarding any case that has a missing value, which may introduce bias or affect the representativeness of the results. Imputation preserves all cases by replacing missing data with a probable value based on other available information. Once all missing values have been imputed, the data set can then be analysed using standard techniques for complete data. Imputation Regularized Optimization(IRO) Missing data are frequently encountered in high-dimensional data analysis, but they are usually difficult to deal with using standard algorithms, such as the EM algorithm and its variants. You can refer to Liang, F., Jia, B., Xue, J., Li, Q. and Luo, Y. (2018) at for detail. The publication ‘An Imputation Regularized Optimization Algorithm for High-Dimensional Missing Data Problems and Beyond’ will be appear on Journal of the Royal Statistical Society Series B soon. IROmiss In situ TensorView(TensorView) Convolutional Neural Networks(CNNs) are complex systems. They are trained so they can adapt their internal connections to recognize images, texts and more. It is both interesting and helpful to visualize the dynamics within such deep artificial neural networks so that people can understand how these artificial networks are learning and making predictions. In the field of scientific simulations, visualization tools like Paraview have long been utilized to provide insights and understandings. We present in situ TensorView to visualize the training and functioning of CNNs as if they are systems of scientific simulations. In situ TensorView is a loosely coupled in situ visualization open framework that provides multiple viewers to help users to visualize and understand their networks. It leverages the capability of co-processing from Paraview to provide real-time visualization during training and predicting phases. This avoid heavy I/O overhead for visualizing large dynamic systems. Only a small number of lines of codes are injected in TensorFlow framework. The visualization can provide guidance to adjust the architecture of networks, or compress the pre-trained networks. We showcase visualizing the training of LeNet-5 and VGG16 using in situ TensorView. Inception Fusion Network Object detection plays a vital role in natural scene and aerial scene and is full of challenges. Although many advanced algorithms have succeeded in the natural scene, the progress in the aerial scene has been slow due to the complexity of the aerial image and the large degree of freedom of remote sensing objects in scale, orientation, and density. In this paper, a novel multi-category rotation detector is proposed, which can efficiently detect small objects, arbitrary direction objects, and dense objects in complex remote sensing images. Specifically, the proposed model adopts a targeted feature fusion strategy called inception fusion network, which fully considers factors such as feature fusion, anchor sampling, and receptive field to improve the ability to handle small objects. Then we combine the pixel attention network and the channel attention network to weaken the noise information and highlight the objects feature. Finally, the rotational object detection algorithm is realized by redefining the rotating bounding box. Experiments on public datasets including DOTA, NWPU VHR-10 demonstrate that the proposed algorithm significantly outperforms state-of-the-art methods. The code and models will be available at https://…/R2CNN-Plus-Plus_Tensorflow. Inception Layer Inception Layer is a combination of all those layers (namely, 1×1 Convolutional layer, 3×3 Convolutional layer, 5×5 Convolutional layer) with their output filter banks concatenated into a single output vector forming the input of the next stage. ➘ “Inception Network” Inception Network We propose a deep convolutional neural network architecture codenamed ‘Inception’, which was responsible for setting the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC 2014). The main hallmark of this architecture is the improved utilization of the computing resources inside the network. This was achieved by a carefully crafted design that allows for increasing the depth and width of the network while keeping the computational budget constant. To optimize quality, the architectural decisions were based on the Hebbian principle and the intuition of multi-scale processing. One particular incarnation used in our submission for ILSVRC 2014 is called GoogLeNet, a 22 layers deep network, the quality of which is assessed in the context of classification and detection. Incident Analytics http://…through-big-data-predictive-analytics.pdf Incremental Cascade Regression(ICR) Traditional face alignment based on machine learning usually tracks the localizations of facial landmarks employing a static model trained offline where all of the training data is available in advance. When new training samples arrive, the static model must be retrained from scratch, which is excessively time-consuming and memory-consuming. In many real-time applications, the training data is obtained one by one or batch by batch. It results in that the static model limits its performance on sequential images with extensive variations. Therefore, the most critical and challenging aspect in this field is dynamically updating the tracker’s models to enhance predictive and generalization capabilities continuously. In order to address this question, we develop a fast and accurate online learning algorithm for face alignment. Particularly, we incorporate on-line sequential extreme learning machine into a parallel cascaded regression framework, coined incremental cascade regression (ICR). To the best of our knowledge, this is the first incremental cascaded framework with the non-linear regressor. One main advantage of ICR is that the tracker model can be fast updated in an incremental way without the entire retraining process when a new input is incoming. Experimental results demonstrate that the proposed ICR is more accurate and efficient on still or sequential images compared with the recent state-of-the-art cascade approaches. Furthermore, the incremental learning proposed in this paper can update the trained model in real time. Incremental Classifier and Representation Learning(iCaRL) A major open problem on the road to artificial intelligence is the development of incrementally learning systems that learn about more and more concepts over time from a stream of data. In this work, we introduce a new training strategy, iCaRL, that allows learning in such a class-incremental way: only the training data for a small number of classes has to be present at the same time and new classes can be added progressively. iCaRL learns strong classifiers and a data representation simultaneously. This distinguishes it from earlier works that were fundamentally limited to fixed data representations and therefore incompatible with deep learning architectures. We show by experiments on the CIFAR-100 and ImageNet ILSVRC 2012 datasets that iCaRL can learn many classes incrementally over a long period of time where other strategies quickly fail. Incremental Decision Tree An incremental decision tree algorithm is an online machine learning algorithm that outputs a decision tree. Many decision tree methods, such as C4.5, construct a tree using a complete dataset. Incremental decision tree methods allow an existing tree to be updated using only new individual data instances, without having to re-process past instances. This may be useful in situations where the entire dataset is not available when the tree is updated (i.e. the data was not stored), the original data set is too large to process or the characteristics of the data change over time. Incremental IRL(I2RL) Inverse reinforcement learning (IRL) is the problem of learning the preferences of an agent from the observations of its behavior on a task. While this problem has been well investigated, the related problem of {\em online} IRL—where the observations are incrementally accrued, yet the demands of the application often prohibit a full rerun of an IRL method—has received relatively less attention. We introduce the first formal framework for online IRL, called incremental IRL (I2RL), and a new method that advances maximum entropy IRL with hidden variables, to this setting. Our formal analysis shows that the new method has a monotonically improving performance with more demonstration data, as well as probabilistically bounded error, both under full and partial observability. Experiments in a simulated robotic application of penetrating a continuous patrol under occlusion shows the relatively improved performance and speed up of the new method and validates the utility of online IRL. Incremental Kernel PCA Incremental versions of batch algorithms are often desired, for increased time efficiency in the streaming data setting, or increased memory efficiency in general. In this paper we present a novel algorithm for incremental kernel PCA, based on rank one updates to the eigendecomposition of the kernel matrix, which is more computationally efficient than comparable existing algorithms. We extend our algorithm to incremental calculation of the Nystr\’om approximation to the kernel matrix, the first such algorithm proposed. Incremental calculation of the Nystr\’om approximation leads to further gains in memory efficiency, and allows for empirical evaluation of when a subset of sufficient size has been obtained. Incremental Nearest Neighbor(NN) Aggregating different image features for image retrieval has recently shown its effectiveness. While highly effective, though, the question of how to uplift the impact of the best features for a specific query image persists as an open computer vision problem. In this paper, we propose a computationally efficient approach to fuse several hand-crafted and deep features, based on the probabilistic distribution of a given membership score of a constrained cluster in an unsupervised manner. First, we introduce an incremental nearest neighbor (NN) selection method, whereby we dynamically select k-NN to the query. We then build several graphs from the obtained NN sets and employ constrained dominant sets (CDS) on each graph G to assign edge weights which consider the intrinsic manifold structure of the graph, and detect false matches to the query. Finally, we elaborate the computation of feature positive-impact weight (PIW) based on the dispersive degree of the characteristics vector. To this end, we exploit the entropy of a cluster membership-score distribution. In addition, the final NN set bypasses a heuristic voting scheme. Experiments on several retrieval benchmark datasets show that our method can improve the state-of-the-art result. Incremental Pruning Based on Less Training(IPLT) Pre-training of models in pruning algorithms plays an important role in pruning decision-making. We find that excessive pre-training is not necessary for pruning algorithms. According to this idea, we propose a pruning algorithm—Incremental pruning based on less training (IPLT). Compared with the traditional pruning algorithm based on a large number of pre-training, IPLT has competitive compression effect than the traditional pruning algorithm under the same simple pruning strategy. On the premise of ensuring accuracy, IPLT can achieve 8x-9x compression for VGG-19 on CIFAR-10 and only needs to pre-train few epochs. For VGG-19 on CIFAR-10, we can not only achieve 10 times test acceleration, but also about 10 times training acceleration. At present, the research mainly focuses on the compression and acceleration in the application stage of the model, while the compression and acceleration in the training stage are few. We newly proposed a pruning algorithm that can compress and accelerate in the training stage. It is novel to consider the amount of pre-training required by pruning algorithm. Our results have implications: Too much pre-training may be not necessary for pruning algorithms. Incremental Regularization(IncReg) Parameter pruning is a promising approach for CNN compression and acceleration by eliminating redundant model parameters with tolerable performance loss. Despite its effectiveness, existing regularization-based parameter pruning methods usually drive weights towards zero with large and constant regularization factors, which neglects the fact that the expressiveness of CNNs is fragile and needs a more gentle way of regularization for the networks to adapt during pruning. To solve this problem, we propose a new regularization-based pruning method (named IncReg) to incrementally assign different regularization factors to different weight groups based on their relative importance, whose effectiveness is proved on popular CNNs compared with state-of-the-art methods. Incremental Sequence Learning Deep learning research over the past years has shown that by increasing the scope or difficulty of the learning problem over time, increasingly complex learning problems can be addressed. We study incremental learning in the context of sequence learning, using generative RNNs in the form of multi-layer recurrent Mixture Density Networks. We introduce Incremental Sequence Learning, a simple incremental approach to sequence learning. Incremental Sequence Learning starts out by using only the first few steps of each sequence as training data. Each time a performance criterion has been reached, the length of the parts of the sequences used for training is increased. To evaluate Incremental Sequence Learning and comparison methods, we introduce and make available a novel sequence learning task and data set: predicting and classifying MNIST pen stroke sequences, where the familiar handwritten digit images have been transformed to pen stroke sequences representing the skeletons of the digits. We find that Incremental Sequence Learning greatly speeds up sequence learning and reaches the best test performance level of regular sequence learning 20 times faster, reduces the test error by 74%, and in general performs more robustly; it displays lower variance and achieves sustained progress after all three comparison method have stopped improving. A trained sequence prediction model is also used in transfer learning to the task of sequence classification, where it is found that transfer learning realizes improved classification performance compared to methods that learn to classify from scratch. Incremental Sparse Bayesian Ordinal Regression(ISBOR) Ordinal Regression (OR) aims to model the ordering information between different data categories, which is a crucial topic in multi-label learning. An important class of approaches to OR models the problem as a linear combination of basis functions that map features to a high dimensional non-linear space. However, most of the basis function-based algorithms are time consuming. We propose an incremental sparse Bayesian approach to OR tasks and introduce an algorithm to sequentially learn the relevant basis functions in the ordinal scenario. Our method, called Incremental Sparse Bayesian Ordinal Regression (ISBOR), automatically optimizes the hyper-parameters via the type-II maximum likelihood method. By exploiting fast marginal likelihood optimization, ISBOR can avoid big matrix inverses, which is the main bottleneck in applying basis function-based algorithms to OR tasks on large-scale datasets. We show that ISBOR can make accurate predictions with parsimonious basis functions while offering automatic estimates of the prediction uncertainty. Extensive experiments on synthetic and real word datasets demonstrate the efficiency and effectiveness of ISBOR compared to other basis function-based OR approaches. IncSQL We present a sequence-to-action parsing approach for the natural language to SQL task that incrementally fills the slots of a SQL query with feasible actions from a pre-defined inventory. To account for the fact that typically there are multiple correct SQL queries with the same or very similar semantics, we draw inspiration from syntactic parsing techniques and propose to train our sequence-to-action models with non-deterministic oracles. We evaluate our models on the WikiSQL dataset and achieve an execution accuracy of 83.7% on the test set, a 2.1% absolute improvement over the model trained with traditional static oracles assuming a single correct target SQL query. When further combined with the execution-guided decoding strategy, our model sets a new state-of-the-art performance at an execution accuracy of 87.1%. This is a work-in-progress technical report. In-Database Entity Linking(IDEL) We present a novel architecture, In-Database Entity Linking (IDEL), in which we integrate the analytics-optimized RDBMS MonetDB with neural text mining abilities. Our system design abstracts core tasks of most neural entity linking systems for MonetDB. To the best of our knowledge, this is the first defacto implemented system integrating entity-linking in a database. We leverage the ability of MonetDB to support in-database-analytics with user defined functions (UDFs) implemented in Python. These functions call machine learning libraries for neural text mining, such as TensorFlow. The system achieves zero cost for data shipping and transformation by utilizing MonetDB’s ability to embed Python processes in the database kernel and exchange data in NumPy arrays. IDEL represents text and relational data in a joint vector space with neural embeddings and can compensate errors with ambiguous entity representations. For detecting matching entities, we propose a novel similarity function based on joint neural embeddings which are learned via minimizing pairwise contrastive ranking loss. This function utilizes a high dimensional index structures for fast retrieval of matching entities. Our first implementation and experiments using the WebNLG corpus show the effectiveness and the potentials of IDEL. Independence Gap Mind the Independence Gap Independent and identically distributed(iid, i.i.d.) In probability theory and statistics, a sequence or other collection of random variables is independent and identically distributed (i.i.d.) if each random variable has the same probability distribution as the others and all are mutually independent. The abbreviation i.i.d. is particularly common in statistics (often as iid, sometimes written IID), where observations in a sample are often assumed to be effectively i.i.d. for the purposes of statistical inference. The assumption (or requirement) that observations be i.i.d. tends to simplify the underlying mathematics of many statistical methods. However, in practical applications of statistical modeling the assumption may or may not be realistic. To test how realistic the assumption is on a given data set the autocorrelation can be computed, lag plots drawn or turning point test performed. The generalization of exchangeable random variables is often sufficient and more easily met. ➘ “Mathematical Statistics” ➘ “Statistical Theory” Independent and Periodically Identically Distributed Processes(i.p.i.d.) A new class of stochastic processes called independent and periodically identically distributed (i.p.i.d.) processes is defined to capture periodically varying statistical behavior. Algorithms are proposed to detect changes in such i.p.i.d. processes. It is shown that the algorithms can be computed recursively and are asymptotically optimal. This problem has applications in anomaly detection in traffic data, social network data, and neural data, where periodic statistical behavior has been observed. Independent Component Analysis(ICA) In signal processing, independent component analysis (ICA) is a computational method for separating a multivariate signal into additive subcomponents. This is done by assuming that the subcomponents are non-Gaussian signals and that they are statistically independent from each other. ICA is a special case of blind source separation. A common example application is the ‘cocktail party problem’ of listening in on one person’s speech in a noisy room. ➘ “Principal Component Analysis” ica,PGICA Independent-Component Layer(IC) In this work, we propose a novel technique to boost training efficiency of a neural network. Our work is based on an excellent idea that whitening the inputs of neural networks can achieve a fast convergence speed. Given the well-known fact that independent components must be whitened, we introduce a novel Independent-Component (IC) layer before each weight layer, whose inputs would be made more independent. However, determining independent components is a computationally intensive task. To overcome this challenge, we propose to implement an IC layer by combining two popular techniques, Batch Normalization and Dropout, in a new manner that we can rigorously prove that Dropout can quadratically reduce the mutual information and linearly reduce the correlation between any pair of neurons with respect to the dropout layer parameter $p$. As demonstrated experimentally, the IC layer consistently outperforms the baseline approaches with more stable training process, faster convergence speed and better convergence limit on CIFAR10/100 and ILSVRC2012 datasets. The implementation of our IC layer makes us rethink the common practices in the design of neural networks. For example, we should not place Batch Normalization before ReLU since the non-negative responses of ReLU will make the weight layer updated in a suboptimal way, and we can achieve better performance by combining Batch Normalization and Dropout together as an IC layer. Independently Interpretable Lasso(IILasso) Sparse regularization such as $\ell_1$ regularization is a quite powerful and widely used strategy for high dimensional learning problems. The effectiveness of sparse regularization have been supported practically and theoretically by several studies. However, one of the biggest issues in sparse regularization is that its performance is quite sensitive to correlations between features. Ordinary $\ell_1$ regularization often selects variables correlated with each other, which results in deterioration of not only its generalization error but also interpretability. In this paper, we propose a new regularization method, ‘Independently Interpretable Lasso’ (IILasso for short). Our proposed regularizer suppresses selecting correlated variables, and thus each active variables independently affect the objective variable in the model. Hence, we can interpret regression coefficients intuitively and also improve the performance by avoiding overfitting. We analyze theoretical property of IILasso and show that the proposed method is much advantageous for its sign recovery and achieves almost minimax optimal convergence rate. Synthetic and real data analyses also indicate the effectiveness of IILasso. Independently Recurrent Long Short-Term Memory(IndyLSTM) We introduce Independently Recurrent Long Short-term Memory cells: IndyLSTMs. These differ from regular LSTM cells in that the recurrent weights are not modeled as a full matrix, but as a diagonal matrix, i.e.\ the output and state of each LSTM cell depends on the inputs and its own output/state, as opposed to the input and the outputs/states of all the cells in the layer. The number of parameters per IndyLSTM layer, and thus the number of FLOPS per evaluation, is linear in the number of nodes in the layer, as opposed to quadratic for regular LSTM layers, resulting in potentially both smaller and faster models. We evaluate their performance experimentally by training several models on the popular \iamondb and CASIA online handwriting datasets, as well as on several of our in-house datasets. We show that IndyLSTMs, despite their smaller size, consistently outperform regular LSTMs both in terms of accuracy per parameter, and in best accuracy overall. We attribute this improved performance to the IndyLSTMs being less prone to overfitting. Independently Recurrent Neural Network(IndRNN) Recurrent neural networks (RNNs) have been widely used for processing sequential data. However, RNNs are commonly difficult to train due to the well-known gradient vanishing and exploding problems and hard to learn long-term patterns. Long short-term memory (LSTM) and gated recurrent unit (GRU) were developed to address these problems, but the use of hyperbolic tangent and the sigmoid action functions results in gradient decay over layers. Consequently, construction of an efficiently trainable deep network is challenging. In addition, all the neurons in an RNN layer are entangled together and their behaviour is hard to interpret. To address these problems, a new type of RNN, referred to as independently recurrent neural network (IndRNN), is proposed in this paper, where neurons in the same layer are independent of each other and they are connected across layers. We have shown that an IndRNN can be easily regulated to prevent the gradient exploding and vanishing problems while allowing the network to learn long-term dependencies. Moreover, an IndRNN can work with non-saturated activation functions such as relu (rectified linear unit) and be still trained robustly. Multiple IndRNNs can be stacked to construct a network that is deeper than the existing RNNs. Experimental results have shown that the proposed IndRNN is able to process very long sequences (over 5000 time steps), can be used to construct very deep networks (21 layers used in the experiment) and still be trained robustly. Better performances have been achieved on various tasks by using IndRNNs compared with the traditional RNN and LSTM. Index of Sensitivity to Nonignorability(ISNI) Standard methods of analysis can give misleading results when some observations are nonignorably missing. Analysts currently assess nonignorability by performing sensitivity analyses using models with and without a nonignorable component. Because this approach can involve complicated modeling and arduous computation, and can yield results that are highly sensitive to untestable model assumptions, there is a need for a simple screening tool that measures the potential impact of nonignorability on an analysis. We propose a measure based on a Taylor-series approximation to the nonignorable likelihood, evaluated at the parameter estimates under the assumption of ignorability. From this approximate likelihood, we derive an index of sensitivity to nonignorability, or ISNI. One can compute ISNI without estimating a nonignorable model or positing specific values of a nonignorability parameter. We interpret ISNI in terms of an intuitive parameter that captures the extent of sensitivity. We derive a general expression for ISNI in the generalized linear model with fully observed predictors and potentially missing outcomes. isni Indexation Indexation is a technique to adjust income payments by means of a price index, in order to maintain the purchasing power of the public after inflation, while Deindexation refers to the unwinding of indexation. From a macroeconomics standpoint there are four main categories of indexation: wage indexation, financial instruments rate indexation, tax rate indexation, and exchange rate indexation. The first three are indexed to inflation. The last one is typically indexed to a foreign currency mainly the US dollar. Any of these different types of indexation can be reversed (deindexation). Indirect Inference Indirect inference is a simulation-based method for estimating the parameters of economic models. Its hallmark is the use of an auxiliary model to capture aspects of the data upon which to base the estimation. The parameters of the auxiliary model can be estimated using either the observed data or data simulated from the economic model. Indirect inference chooses the parameters of the economic model so that these two estimates of the parameters of the auxiliary model are as close as possible. The auxiliary model need not be correctly specified; when it is, indirect inference is equivalent to maximum likelihood. Individual Bayes Imbalance Impact Index(IBI^3) Recent studies have shown that imbalance ratio is not the only cause of the performance loss of a classifier in imbalanced data classification. In fact, other data factors, such as small disjuncts, noises and overlapping, also play the roles in tandem with imbalance ratio, which makes the problem difficult. Thus far, the empirical studies have demonstrated the relationship between the imbalance ratio and other data factors only. To the best of our knowledge, there is no any measurement about the extent of influence of class imbalance on the classification performance of imbalanced data. Further, it is also unknown for a dataset which data factor is actually the main barrier for classification. In this paper, we focus on Bayes optimal classifier and study the influence of class imbalance from a theoretical perspective. Accordingly, we propose an instance measure called Individual Bayes Imbalance Impact Index ($IBI^3$) and a data measure called Bayes Imbalance Impact Index ($BI^3$). $IBI^3$ and $BI^3$ reflect the extent of influence purely by the factor of imbalance in terms of each minority class sample and the whole dataset, respectively. Therefore, $IBI^3$ can be used as an instance complexity measure of imbalance and $BI^3$ is a criterion to show the degree of how imbalance deteriorates the classification. As a result, we can therefore use $BI^3$ to judge whether it is worth using imbalance recovery methods like sampling or cost-sensitive methods to recover the performance loss of a classifier. The experiments show that $IBI^3$ is highly consistent with the increase of prediction score made by the imbalance recovery methods and $BI^3$ is highly consistent with the improvement of F1 score made by the imbalance recovery methods on both synthetic and real benchmark datasets. Individual Survival Distribution(ISD) An accurate model of a patient’s individual survival distribution can help determine the appropriate treatment for terminal patients. Unfortunately, risk scores (e.g., from Cox Proportional Hazard models) do not provide survival probabilities, single-time probability models (e.g., the Gail model, predicting 5 year probability) only provide for a single time point, and standard Kaplan-Meier survival curves provide only population averages for a large class of patients meaning they are not specific to individual patients. This motivates an alternative class of tools that can learn a model which provides an individual survival distribution which gives survival probabilities across all times – such as extensions to the Cox model, Accelerated Failure Time, an extension to Random Survival Forests, and Multi-Task Logistic Regression. This paper first motivates such ‘individual survival distribution’ (ISD) models, and explains how they differ from standard models. It then discusses ways to evaluate such models – namely Concordance, 1-Calibration, Brier score, and various versions of L1-loss – and then motivates and defines a novel approach ‘D-Calibration’, which determines whether a model’s probability estimates are meaningful. We also discuss how these measures differ, and use them to evaluate several ISD prediction tools, over a range of survival datasets. Individually-Private Information Retrieval with Side Information(IPIR-SI) We consider a multi-user variant of the private information retrieval problem described as follows. Suppose there are $D$ users, each of which wants to privately retrieve a distinct message from a server with the help of a trusted agent. We assume that the agent has a random subset of $M$ messages that is not known to the server. The goal of the agent is to collectively retrieve the users’ requests from the server. For protecting the privacy of users, we introduce the notion of individual-privacy — the agent is required to protect the privacy only for each individual user (but may leak some correlations among user requests). We refer to this problem as Individually-Private Information Retrieval with Side Information (IPIR-SI). We first establish a lower bound on the capacity, which is defined as the maximum achievable download rate, of the IPIR-SI problem by presenting a novel achievability protocol. Next, we characterize the capacity of IPIR-SI problem for $M = 1$ and $D = 2$. In the process of characterizing the capacity for arbitrary $M$ and $D$ we present a novel combinatorial conjecture, that may be of independent interest. Inducibility The quantity that captures the asymptotic value of the maximum number of appearances of a given topological tree (a rooted tree with no vertices of outdegree $1$) $S$ with $k$ leaves in an arbitrary tree with sufficiently large number of leaves is called the inducibility of $S$. Inductive Logic Programming(ILP) Inductive logic programming (ILP) is a subfield of machine learning which uses logic programming as a uniform representation for examples, background knowledge and hypotheses. Given an encoding of the known background knowledge and a set of examples represented as a logical database of facts, an ILP system will derive a hypothesised logic program which entails all the positive and none of the negative examples. Schema: positive examples + negative examples + background knowledge => hypothesis. Inductive logic programming is particularly useful in bioinformatics and natural language processing. Ehud Shapiro laid the theoretical foundation for inductive logic programming and built its first implementation (Model Inference System) in 1981: a Prolog program that inductively inferred logic programs from positive and negative examples. The term Inductive Logic Programming was first introduced in a paper by Stephen Muggleton in 1991. The term ‘inductive’ here refers to philosophical (i.e. suggesting a theory to explain observed facts) rather than mathematical (i.e. proving a property for all members of a well-ordered set) induction. Inductive Reasoning Inductive reasoning (as opposed to deductive reasoning or abductive reasoning) is a method of reasoning in which the premises are viewed as supplying some evidence for the truth of the conclusion. While the conclusion of a deductive argument is certain, the truth of the conclusion of an inductive argument may be probable, based upon the evidence given. Many dictionaries define inductive reasoning as the derivation of general principles from specific observations, though some sources disagree with this usage. The philosophical definition of inductive reasoning is more nuanced than simple progression from particular/individual instances to broader generalizations. Rather, the premises of an inductive logical argument indicate some degree of support (inductive probability) for the conclusion but do not entail it; that is, they suggest truth but do not ensure it. In this manner, there is the possibility of moving from general statements to individual instances (for example, statistical syllogisms, discussed below). Inductive Statistical Inference An Introduction to Inductive Statistical Inference — from Parameter Estimation to Decision-Making Industry 4.0 Industry 4.0 is a project in the high-tech strategy of the German government, which promotes the computerization of the manufacturing industry. The goal is the intelligent factory (Smart Factory), which is characterized by adaptability, resource efficiency and ergonomics as well as the integration of customers and business partners in business and value processes. Technological basis are cyber-physical systems and the Internet of Things. Experts believe that Industry 4.0 or the fourth industrial revolution could be a reality in about 10 to 20 years. In-Edge AI Recently, along with the rapid development of mobile communication technology, edge computing theory and techniques have been attracting more and more attentions from global researchers and engineers, which can significantly bridge the capacity of cloud and requirement of devices by the network edges, and thus can accelerate the content deliveries and improve the quality of mobile services. In order to bring more intelligence to the edge systems, compared to traditional optimization methodology, and driven by the current deep learning techniques, we propose to integrate the Deep Reinforcement Learning techniques and Federated Learning framework with the mobile edge systems, for optimizing the mobile edge computing, caching and communication. And thus, we design the ‘In-Edge AI’ framework in order to intelligently utilize the collaboration among devices and edge nodes to exchange the learning parameters for a better training and inference of the models, and thus to carry out dynamic system-level optimization and application-level enhancement while reducing the unnecessary system communication load. ‘In-Edge AI’ is evaluated and proved to have near-optimal performance but relatively low overhead of learning, while the system is cognitive and adaptive to the mobile communication systems. Finally, we discuss several related challenges and opportunities for unveiling a promising upcoming future of ‘In-Edge AI’. Inertial Regularization and Selection(IRS) In this paper, we develop a new sequential regression modeling approach for data streams. Data streams are commonly found around us, e.g in a retail enterprise sales data is continuously collected every day. A demand forecasting model is an important outcome from the data that needs to be continuously updated with the new incoming data. The main challenge in such modeling arises when there is a) high dimensional and sparsity, b) need for an adaptive use of prior knowledge, and/or c) structural changes in the system. The proposed approach addresses these challenges by incorporating an adaptive L1-penalty and inertia terms in the loss function, and thus called Inertial Regularization and Selection (IRS). The former term performs model selection to handle the first challenge while the latter is shown to address the last two challenges. A recursive estimation algorithm is developed, and shown to outperform the commonly used state-space models, such as Kalman Filters, in experimental studies and real data. Inexact Variant of SARAH(iSARAH) We develop and analyze a variant of variance reducing stochastic gradient algorithm, known as SARAH, which does not require computation of the exact gradient. Thus this new method can be applied to general expectation minimization problems rather than only finite sum problems. While the original SARAH algorithm, as well as its predecessor, SVRG, require an exact gradient computation on each outer iteration, the inexact variant of SARAH (iSARAH), which we develop here, requires only stochastic gradient computed on a mini-batch of sufficient size. The proposed method combines variance reduction via sample size selection and iterative stochastic gradient updates. We analyze the convergence rate of the algorithms for strongly convex, convex, and nonconvex cases with appropriate mini-batch size selected for each case. We show that with an additional, reasonable, assumption iSARAH achieves the best known complexity among stochastic methods in the case of general convex case stochastic value functions. Inferactive Data Analysis We describe inferactive data analysis, so-named to denote an interactive approach to data analysis with an emphasis on inference after data analysis. Our approach is a compromise between Tukey’s exploratory (roughly speaking ‘model free’) and confirmatory data analysis (roughly speaking classical and ‘model based’), also allowing for Bayesian data analysis. We view this approach as close in spirit to current practice of applied statisticians and data scientists while allowing frequentist guarantees for results to be reported in the scientific literature, or Bayesian results where the data scientist may choose the statistical model (and hence the prior) after some initial exploratory analysis. While this approach to data analysis does not cover every scenario, and every possible algorithm data scientists may use, we see this as a useful step in concrete providing tools (with frequentist statistical guarantees) for current data scientists. The basis of inference we use is selective inference [Lee et al., 2016, Fithian et al., 2014], in particular its randomized form [Tian and Taylor, 2015a]. The randomized framework, besides providing additional power and shorter confidence intervals, also provides explicit forms for relevant reference distributions (up to normalization) through the {\em selective sampler} of Tian et al. [2016]. The reference distributions are constructed from a particular conditional distribution formed from what we call a DAG-DAG — a Data Analysis Generative DAG. As sampling conditional distributions in DAGs is generally complex, the selective sampler is crucial to any practical implementation of inferactive data analysis. Our principal goal is in reviewing the recent developments in selective inference as well as describing the general philosophy of selective inference. Inference Enterprise Model An Inference enterprise is an entity within an organization that uses data, tools, people, and processes to make inferences about variables that are critical to the success of the organization. Inference Tree(IT) We introduce inference trees (ITs), a new class of inference methods that build on ideas from Monte Carlo tree search to perform adaptive sampling in a manner that balances exploration with exploitation, ensures consistency, and alleviates pathologies in existing adaptive methods. ITs adaptively sample from hierarchical partitions of the parameter space, while simultaneously learning these partitions in an online manner. This enables ITs to not only identify regions of high posterior mass, but also maintain uncertainty estimates to track regions where significant posterior mass may have been missed. ITs can be based on any inference method that provides a consistent estimate of the marginal likelihood. They are particularly effective when combined with sequential Monte Carlo, where they capture long-range dependencies and yield improvements beyond proposal adaptation alone. Inferential Model(IM) Probability is a useful tool for describing uncertainty, so it is natural to strive for a system of statistical inference based on probabilities for or against various hypotheses. But existing probabilistic inference methods struggle to provide a meaningful interpretation of the probabilities across experiments in sufficient generality. In this paper we further develop a promising new approach based on what are called inferential models (IMs). The fundamental idea behind IMs is that there is an unobservable auxiliary variable that itself describes the inherent uncertainty about the parameter of interest, and that posterior probabilistic inference can be accomplished by predicting this unobserved quantity. We describe a simple and intuitive threestep construction of a random set of candidate parameter values, each being consistent with the model, the observed data, and a auxiliary variable prediction. Then prior-free posterior summaries of the available statistical evidence for and against a hypothesis of interest are obtained by calculating the probability that this random set falls completely in and completely out of the hypothesis, respectively. We prove that these IM-based measures of evidence are calibrated in a frequentist sense, showing that IMs give easily-interpretable results both within and across experiments. Inferential Statistics In statistics, statistical inference is the process of drawing conclusions from data that are subject to random variation, for example, observational errors or sampling variation. More substantially, the terms statistical inference, statistical induction and inferential statistics are used to describe systems of procedures that can be used to draw conclusions from datasets arising from systems affected by random variation, such as observational errors, random sampling, or random experimentation. Initial requirements of such a system of procedures for inference and induction are that the system should produce reasonable answers when applied to well-defined situations and that it should be general enough to be applied across a range of situations. Inferential statistics are used to test hypotheses and make estimations using sample data. InferLine The dominant cost in production machine learning workloads is not training individual models but serving predictions from increasingly complex prediction pipelines spanning multiple models, machine learning frameworks, and parallel hardware accelerators. Due to the complex interaction between model configurations and parallel hardware, prediction pipelines are challenging to provision and costly to execute when serving interactive latency-sensitive applications. This challenge is exacerbated by the unpredictable dynamics of bursty workloads. In this paper we introduce InferLine, a system which efficiently provisions and executes ML inference pipelines subject to end-to-end latency constraints by proactively optimizing and reactively controlling per-model configuration in a fine-grained fashion. Unpredictable changes in the serving workload are dynamically and cost-optimally accommodated with minimal service level degradation. InferLine introduces (1) automated model profiling and pipeline lineage extraction, (2) a fine-grain, cost-minimizing pipeline configuration planner, and (3) a fine-grain reactive controller. InferLine is able to configure and deploy prediction pipelines across a wide range of workload patterns and latency goals. It outperforms coarse-grained configuration alternatives by up 7.6x in cost while achieving up to 32x lower SLO miss rate on real workloads and generalizes across state-of-the-art model serving frameworks. InferSent We develop and investigate several cross-lingual alignment approaches for neural sentence embedding models, such as the supervised inference classifier, InferSent, and sequential encoder-decoder models. We evaluate three alignment frameworks applied to these models: joint modeling, representation transfer learning, and sentence mapping, using parallel text to guide the alignment. Our results support representation transfer as a scalable approach for modular cross-lingual alignment of neural sentence embeddings, where we observe better performance compared to joint models in intrinsic and extrinsic evaluations, particularly with smaller sets of parallel data. Infinite Factorial Finite State Machine Model New communication standards need to deal with machine-to-machine communications, in which users may start or stop transmitting at any time in an asynchronous manner. Thus, the number of users is an unknown and time-varying parameter that needs to be accurately estimated in order to properly recover the symbols transmitted by all users in the system. In this paper, we address the problem of joint channel parameter and data estimation in a multiuser communication channel in which the number of transmitters is not known. For that purpose, we develop the infinite factorial finite state machine model, a Bayesian nonparametric model based on the Markov Indian buffet that allows for an unbounded number of transmitters with arbitrary channel length. We propose an inference algorithm that makes use of slice sampling and particle Gibbs with ancestor sampling. Our approach is fully blind as it does not require a prior channel estimation step, prior knowledge of the number of transmitters, or any signaling information. Our experimental results, loosely based on the LTE random access channel, show that the proposed approach can effectively recover the data-generating process for a wide range of scenarios, with varying number of transmitters, number of receivers, constellation order, channel length, and signal-to-noise ratio. Infinite Feature Selection(IFS) Supervised Infinite Feature Selection Infinite Gaussian Mixture Model Coupled With (bi-Directional) Generative Adversarial Network(IGMM-GAN) Detecting anomalous activity in human mobility data has a number of applications including road hazard sensing, telematic based insurance, and fraud detection in taxi services and ride sharing. In this paper we address two challenges that arise in the study of anomalous human trajectories: 1) a lack of ground truth data on what defines an anomaly and 2) the dependence of existing methods on significant pre-processing and feature engineering. While generative adversarial networks seem like a natural fit for addressing these challenges, we find that existing GAN based anomaly detection algorithms perform poorly due to their inability to handle multimodal patterns. For this purpose we introduce an infinite Gaussian mixture model coupled with (bi-directional) generative adversarial networks, IGMM-GAN, that is able to generate synthetic, yet realistic, human mobility data and simultaneously facilitates multimodal anomaly detection. Through estimation of a generative probability density on the space of human trajectories, we are able to generate realistic synthetic datasets that can be used to benchmark existing anomaly detection methods. The estimated multimodal density also allows for a natural definition of outlier that we use for detecting anomalous trajectories. We illustrate our methodology and its improvement over existing GAN anomaly detection on several human mobility datasets, along with MNIST. Infinite Latent Feature Selection Feature selection is playing an increasingly significant role with respect to many computer vision applications spanning from object recognition to visual object tracking. However, most of the recent solutions in feature selection are not robust across different and heterogeneous set of data. In this paper, we address this issue proposing a robust probabilistic latent graph-based feature selection algorithm that performs the ranking step while considering all the possible subsets of features, as paths on a graph, bypassing the combinatorial problem analytically. An appealing characteristic of the approach is that it aims to discover an abstraction behind low-level sensory data, that is, relevancy. Relevancy is modelled as a latent variable in a PLSA-inspired generative process that allows the investigation of the importance of a feature when injected into an arbitrary set of cues. The proposed method has been tested on ten diverse benchmarks, and compared against eleven state of the art feature selection methods. Results show that the proposed approach attains the highest performance levels across many different scenarios and difficulties, thereby confirming its strong robustness while setting a new state of the art in feature selection domain. Infinite Layer Networks(ILN) Infinite Layer Networks (ILN) have recently been proposed as an architecture that mimics neural networks while enjoying some of the advantages of kernel methods. ILN are networks that integrate over infinitely many nodes within a single hidden layer. It has been demonstrated by several authors that the problem of learning ILN can be reduced to the kernel trick, implying that whenever a certain integral can be computed analytically they are efficiently learnable. In this work we give an online algorithm for ILN, which avoids the kernel trick assumption. More generally and of independent interest, we show that kernel methods in general can be exploited even when the kernel cannot be efficiently computed but can only be estimated via sampling. We provide a regret analysis for our algorithm, showing that it matches the sample complexity of methods which have access to kernel values. Thus, our method is the first to demonstrate that the kernel trick is not necessary as such, and random features suffice to obtain comparable performance. Infinite Variational Autoencoder(VAE) This paper presents an infinite variational autoencoder (VAE) whose capacity adapts to suit the input data. This is achieved using a mixture model where the mixing coefficients are modeled by a Dirichlet process, allowing us to integrate over the coefficients when performing inference. Critically, this then allows us to automatically vary the number of autoencoders in the mixture based on the data. Experiments show the flexibility of our method, particularly for semi-supervised learning, where only a small number of training samples are available. InfiniteBoost In machine learning ensemble methods have demonstrated high accuracy for the variety of problems in different areas. The most known algorithms intensively used in practice are random forests and gradient boosting. In this paper we present InfiniteBoost – a novel algorithm, which combines the best properties of these two approaches. The algorithm constructs the ensemble of trees for which two properties hold: trees of the ensemble incorporate the mistakes done by others; at the same time the ensemble could contain the infinite number of trees without the over-fitting effect. The proposed algorithm is evaluated on the regression, classification, and ranking tasks using large scale, publicly available datasets. InfiniteInsight Function Library(IFL) InfiniteInsight function library (“IFL”) for SAP HANA to allow in-memory execution of InfiniteInsight-classic workflows. Infinitely Differentiable Monte-Carlo Estimator(DiCE) The score function estimator is widely used for estimating gradients of stochastic objectives in Stochastic Computation Graphs (SCG), eg. in reinforcement learning and meta-learning. While deriving the first-order gradient estimators by differentiating a surrogate loss (SL) objective is computationally and conceptually simple, using the same approach for higher-order gradients is more challenging. Firstly, analytically deriving and implementing such estimators is laborious and not compliant with automatic differentiation. Secondly, repeatedly applying SL to construct new objectives for each order gradient involves increasingly cumbersome graph manipulations. Lastly, to match the first-order gradient under differentiation, SL treats part of the cost as a fixed sample, which we show leads to missing and wrong terms for higher-order gradient estimators. To address all these shortcomings in a unified way, we introduce DiCE, which provides a single objective that can be differentiated repeatedly, generating correct gradient estimators of any order in SCGs. Unlike SL, DiCE relies on automatic differentiation for performing the requisite graph manipulations. We verify the correctness of DiCE both through a proof and through numerical evaluation of the DiCE gradient estimates. We also use DiCE to propose and evaluate a novel approach for multi-agent learning. Our code is available at https://goo.gl/xkkGxN. Influence Diagram(ID) Influence Diagrams (ID) are a flexible tool to represent discrete stochastic optimization problems, including Markov Decision Process (MDP) and Partially Observable MDP as standard examples. More precisely, given random variables considered as vertices of an acyclic digraph, a probabilistic graphical model defines a joint distribution via the conditional distributions of vertices given their parents. In ID, the random variables are represented by a probabilistic graphical model whose vertices are partitioned into three types : chance, decision and utility vertices. The user chooses the distribution of the decision vertices conditionally to their parents in order to maximize the expected utility. Leveraging the notion of rooted junction tree, we present a mixed integer linear formulation for solving an ID, as well as valid inequalities, which lead to a computationally efficient algorithm. We also show that the linear relaxation yields an optimal integer solution for instances that can be solved by the ‘single policy update’, the default algorithm for addressing IDs. Influence Dispersion Tree(IDT) Despite a long history of use of citation count as a measure to assess the impact or influence of a scientific paper, the evolution of follow-up work inspired by the paper and their interactions through citation links have rarely been explored to quantify how the paper enriches the depth and breadth of a research field. We propose a novel data structure, called Influence Dispersion Tree (IDT) to model the organization of follow-up papers and their dependencies through citations. We also propose the notion of an ideal IDT for every paper and show that an ideal (highly influential) paper should increase the knowledge of a field vertically and horizontally. Upon suitably exploring the structural properties of IDT, we derive a suite of metrics, namely Influence Dispersion Index (IDI), Normalized Influence Divergence (NID) to quantify the influence of a paper. Our theoretical analysis shows that an ideal IDT configuration should have equal depth and breadth (and thus minimize the NID value). We establish the superiority of NID as a better influence measure in two experimental settings. First, on a large real-world bibliographic dataset, we show that NID outperforms raw citation count as an early predictor of the number of new citations a paper will receive within a certain period after publication. Second, we show that NID is superior to the raw citation count at identifying the papers recognized as highly influential through Test of Time Award among all their contemporary papers (published in the same venue). We conclude that in order to quantify the influence of a paper, along with the total citation count, one should also consider how the citing papers are organized among themselves to better understand the influence of a paper on the research field. For reproducibility, the code and datasets used in this study are being made available to the community. Influence Maximization with INFluencer vECTORs(IMINFECTOR) Although influence maximization has been studied extensively in the past, the majority of works focus on the algorithmic aspect of the problem, overlooking several practical improvements that can be derived by data-driven observations or the inclusion of machine learning. The main challenges lie on the one hand on the computational demand of the algorithmic solution which restricts the scalability, and on the other the quality of the predicted influence spread. In this work, we propose IMINFECTOR (Influence Maximization with INFluencer vECTORs), a method that aspires to address both problems using representation learning. It comprises of two parts. The first is based on a multi-task neural network that uses logs of diffusion cascades to embed diffusion probabilities between nodes as well as the ability of a node to create massive cascades. The second part uses diffusion probabilities to reformulate influence maximization as a weighted bipartite matching problem and capitalizes on the learned representations to find a seed set using a greedy heuristic approach. We apply our method in three sizable networks accompanied by diffusion cascades and evaluate it using unseen diffusion cascades from future time steps. We observe that our method outperforms various competitive algorithms and metrics from the diverse landscape of influence maximization, in terms of prediction precision and seed set quality. Infobesity Information overload (also known as infobesity or infoxication) refers to the difficulty a person can have understanding an issue and making decisions that can be caused by the presence of too much information. The term is popularized by Alvin Toffler in his bestselling 1970 book Future Shock, but is mentioned in a 1964 book by Bertram Gross, The Managing of Organizations. Speier et al. (1999) stated: ‘Information overload occurs when the amount of input to a system exceeds its processing capacity. Decision makers have fairly limited cognitive processing capacity. Consequently, when information overload occurs, it is likely that a reduction in decision quality will occur.’ In recent years, the term ‘information overload’ has evolved into phrases such as ‘information glut’ and ‘data smog’ (Shenk, 1997). What was once a term grounded in cognitive psychology has evolved into a rich metaphor used outside the world of academia. In many ways, the advent of information technology has increased the focus on information overload: information technology may be a primary reason for information overload due to its ability to produce more information more quickly and to disseminate this information to a wider audience than ever before (Evaristo, Adams, & Curley, 1995; Hiltz & Turoff, 1985). INFODENS The advent of representation learning methods enabled large performance gains on various language tasks, alleviating the need for manual feature engineering. While engineered representations are usually based on some linguistic understanding and are therefore more interpretable, learned representations are harder to interpret. Empirically studying the complementarity of both approaches can provide more linguistic insights that would help reach a better compromise between interpretability and performance. We present INFODENS, a framework for studying learned and engineered representations of text in the context of text classification tasks. It is designed to simplify the tasks of feature engineering as well as provide the groundwork for extracting learned features and combining both approaches. INFODENS is flexible, extensible, with a short learning curve, and is easy to integrate with many of the available and widely used natural language processing tools. Inforence In this paper, a novel approach, Inforence, is proposed to isolate the suspicious codes that likely contain faults. Inforence employs a feature selection method, based on mutual information, to identify those bug-related statements that may cause the program to fail. Because the majority of a program faults may be revealed as undesired joint effect of the program statements on each other and on program termination state, unlike the state-of-the-art methods, Inforence tries to identify and select groups of interdependent statements which altogether may affect the program failure. The interdependence amongst the statements is measured according to their mutual effect on each other and on the program termination state. To provide the context of failure, the selected bug-related statements are chained to each other, considering the program static structure. Eventually, the resultant cause-effect chains are ranked according to their combined causal effect on program failure. To validate Inforence, the results of our experiments with seven sets of programs include Siemens suite, gzip, grep, sed, space, make and bash are presented. The experimental results are then compared with those provided by different fault localization techniques for the both single-fault and multi-fault programs. The experimental results prove the outperformance of the proposed method compared to the state-of-the-art techniques. Information Based Control(IBC) An information based method for solving stochastic control problems with partial observation has been proposed. First, the information-theoretic lower bounds of the cost function has been analysed. It has been shown, under rather weak assumptions, that reduction of the expected cost with closed-loop control compared to the best open-loop strategy is upper bounded by non-decreasing function of mutual information between control variables and the state trajectory. On the basis of this result, an \textit{Information Based Control} method has been developed. The main idea of the IBC consists in replacing the original control task by a sequence of control problems that are relatively easy to solve and such that information about the state of the system is actively generated. Two examples of the operation of the IBC are given. It has been shown that the IBC is able to find the optimal solution without using dynamic programming at least in these examples. Hence the computational complexity of the IBC is substantially smaller than complexity of dynamic programming, which is the main advantage of the proposed method. Information Coefficient(IC) The information coefficient (IC) is a measure of the merit of a predicted value. In finance, the information coefficient is used as a performance metric for the predictive skill of a financial analyst. The information coefficient is similar to correlation in that it can be seen to measure the linear relationship between two random variables, e.g. predicted stock returns and the actualized returns. The information coefficient ranges from 0 to 1, with 0 denoting no linear relationship between predictions and actual values (poor forecasting skills) and 1 denoting a perfect linear relationship (good forecasting skills). Information Extraction(IE) Information extraction (IE) is the task of automatically extracting structured information from unstructured and/or semi-structured machine-readable documents. In most of the cases this activity concerns processing human language texts by means of natural language processing (NLP). Recent activities in multimedia document processing like automatic annotation and content extraction out of images/audio/video could be seen as information extraction. Information Extraction Technology With rise of digital age, there is an explosion of information in the form of news, articles, social media, and so on. Much of this data lies in unstructured form and manually managing and effectively making use of it is tedious, boring and labor intensive. This explosion of information and need for more sophisticated and efficient information handling tools gives rise to Information Extraction(IE) and Information Retrieval(IR) technology. Information Extraction systems takes natural language text as input and produces structured information specified by certain criteria, that is relevant to a particular application. Various sub-tasks of IE such as Named Entity Recognition, Coreference Resolution, Named Entity Linking, Relation Extraction, Knowledge Base reasoning forms the building blocks of various high end Natural Language Processing (NLP) tasks such as Machine Translation, Question-Answering System, Natural Language Understanding, Text Summarization and Digital Assistants like Siri, Cortana and Google Now. This paper introduces Information Extraction technology, its various sub-tasks, highlights state-of-the-art research in various IE subtasks, current challenges and future research directions. Information Fusion Information integration (II) (also called deduplication and referential integrity) is the merging of information from heterogeneous sources with differing conceptual, contextual and typographical representations. It is used in data mining and consolidation of data from unstructured or semi-structured resources. Typically, information integration refers to textual representations of knowledge but is sometimes applied to rich-media content. information fusion which is a related term involves the combination of information into a new set of information towards reducing uncertainty. Information Fuzzy Networks(IFN) Info Fuzzy Networks (IFN) is a greedy machine learning algorithm for supervised learning. The data structure produced by the learning algorithm is also called Info Fuzzy Network. IFN construction is quite similar to decision trees’ construction. However, IFN constructs a directed graph and not a tree. IFN also uses the conditional mutual information metric in order to choose features during the construction stage while decision trees usually use other metrics like entropy or gini. Information Gain In information theory and machine learning, information gain is a synonym for Kullback-Leibler divergence. However, in the context of decision trees, the term is sometimes used synonymously with mutual information, which is the expectation value of the Kullback-Leibler divergence of a conditional probability distribution. ➘ “Kullback-Leibler Divergence” Information Harvesting Information Harvesting (IH) was an early data mining product from the 1990s. It was invented by Ralphe Wiggins and produced by the Ryan Corp, later Information Harvesting Inc., of Cambridge, Massachusetts. IH sought to infer rules from sets of data. It did this first by classifying various input variables into one of a number of bins, thereby putting some structure on the continuous variables in the input. IH then proceeds to generate rules, trading off generalization against memorization, that will infer the value of the prediction variable, possibly creating many levels of rules in the process. It included strategies for checking if overfitting took place and, if so, correcting for it. Because of its strategies for correcting for overfitting by considering more data, and refining the rules based on that data, IH might also be considered to be a form of machine learning. Information Integration Information integration (II) (also called deduplication and referential integrity) is the merging of information from heterogeneous sources with differing conceptual, contextual and typographical representations. It is used in data mining and consolidation of data from unstructured or semi-structured resources. Typically, information integration refers to textual representations of knowledge but is sometimes applied to rich-media content. information fusion which is a related term involves the combination of information into a new set of information towards reducing uncertainty. Information Leakage Information leakage happens whenever a system that is designed to be closed to an eavesdropper reveals some information to unauthorized parties nonetheless. For example, when designing an encrypted instant messaging network, a network engineer without the capacity to crack encryption codes could see when messages are transmitted, even if he could not read them. During the Second World War, the Japanese for a while were using secret codes such as PURPLE; even before such codes were cracked, some basic information could be extracted about the content of the messages by looking at which relay stations sent a message onward. As another example of information leakage, GPU drivers do not erase their memories and thus, in shared/local/global memories, data values persist after deallocation. These data can be retrieved by a malicious agent. Information Maximization(Infomax) Infomax is an optimization principle for artificial neural networks and other information processing systems. It prescribes that a function that maps a set of input values I to a set of output values O should be chosen or learned so as to maximize the average Shannon mutual information between I and O, subject to a set of specified constraints and/or noise processes. Infomax algorithms are learning algorithms that perform this optimization process. The principle was described by Linsker in 1987. Infomax, in its zero-noise limit, is related to the principle of redundancy reduction proposed for biological sensory processing by Horace Barlow in 1961, and applied quantitatively to retinal processing by Atick and Redlich. One of the applications of infomax has been to an independent component analysis algorithm that finds independent signals by maximising entropy. Infomax-based ICA was described by Bell and Sejnowski in 1995. Information Potential Auto-Encoders In this paper, we suggest a framework to make use of mutual information as a regularization criterion to train Auto-Encoders (AEs). In the proposed framework, AEs are regularized by minimization of the mutual information between input and encoding variables of AEs during the training phase. In order to estimate the entropy of the encoding variables and the mutual information, we propose a non-parametric method. We also give an information theoretic view of Variational AEs (VAEs), which suggests that VAEs can be considered as parametric methods that estimate entropy. Experimental results show that the proposed non-parametric models have more degree of freedom in terms of representation learning of features drawn from complex distributions such as Mixture of Gaussians, compared to methods which estimate entropy using parametric approaches, such as Variational AEs. Information Power(P) A time series is uniquely represented by its geometric shape, which also carries information. A time series can be modelled as the trajectory of a particle moving in a force field with one degree of freedom. The force acting on the particle shapes the trajectory of its motion, which is made up of elementary shapes of infinitesimal neighborhoods of points in the trajectory. It has been proved that an infinitesimal neighborhood of a point in a continuous time series can have at least 29 different shapes or configurations. So information can be encoded in it in at least 29 different ways. A 3-point neighborhood (the smallest) in a discrete time series can have precisely 13 different shapes or configurations. In other words, a discrete time series can be expressed as a string of 13 symbols. Across diverse real as well as simulated data sets it has been observed that 6 of them occur more frequently and the remaining 7 occur less frequently. Based on frequency distribution of 13 configurations or 13 different ways of information encoding a novel entropy measure, called semantic entropy (E), has been defined. Following notion of power in Newtonian mechanics of the moving particle whose trajectory is the time series, a notion of information power (P) has been introduced for time series. E/P turned out to be an important indicator of synchronous behaviour of time series as observed in epileptic EEG signals. Information Retrieval(IR) Information retrieval is the activity of obtaining information resources relevant to an information need from a collection of information resources. Searches can be based on metadata or on full-text (or other content-based) indexing. Automated information retrieval systems are used to reduce what has been called “information overload”. Many universities and public libraries use IR systems to provide access to books, journals and other documents. Web search engines are the most visible IR applications. An information retrieval process begins when a user enters a query into the system. Queries are formal statements of information needs, for example search strings in web search engines. In information retrieval a query does not uniquely identify a single object in the collection. Instead, several objects may match the query, perhaps with different degrees of relevancy. Information Theoretic Metric Learning(ITML) Relational Constraints for Metric Learning on Relational Data Information Value(IV) In statistical data mining, sometimes we need to determine out of a set of variables which ones are best in capturing a desired behavior. For example, let’s say you have a pool of customers for your credit card company, and you want to determine who out of them are about to default (i.e. refuse to pay up after possibly making a huge expense). You need to then identify which of the attributes you have on the customer can potentially identify and alert you of such behavior. One of the popular ways in which this is done by analysts is by looking at something called ‘Information Value’. In the context of data mining is also sometimes referred to by the short form – InfoVal. Information Visualization Information visualization or information visualisation is the study of (interactive) visual representations of abstract data to reinforce human cognition. The abstract data include both numerical and non-numerical data, such as text and geographic information. However, information visualization differs from scientific visualization: “it’s infovis (information visualization) when the spatial representation is chosen, and it’s scivis (scientific visualization) when the spatial representation is given”. Information-Anchored Sensitivity Analysis Analysis of longitudinal randomised controlled trials is frequently complicated because patients deviate from the protocol. Where such deviations are relevant for the estimand, we are typically required to make an untestable assumption about post-deviation behaviour in order to perform our primary analysis and estimate the treatment effect. In such settings, it is now widely recognised that we should follow this with sensitivity analyses to explore the robustness of our inferences to alternative assumptions about post-deviation behaviour. Although there has been a lot of work on how to conduct such sensitivity analyses, little attention has been given to the appropriate loss of information due to missing data within sensitivity analysis. We argue more attention needs to be given to this issue, showing it is quite possible for sensitivity analysis to decrease and increase the information about the treatment effect. To address this critical issue, we introduce the concept of information-anchored sensitivity analysis. By this we mean sensitivity analysis in which the proportion of information about the treatment estimate lost due to missing data is the same as the proportion of information about the treatment estimate lost due to missing data in the primary analysis. We argue this forms a transparent, practical starting point for interpretation of sensitivity analysis. We then derive results showing that, for longitudinal continuous data, a broad class of controlled and reference-based sensitivity analyses performed by multiple imputation are information-anchored. We illustrate the theory with simulations and an analysis of a peer review trial, then discuss our work in the context of other recent work in this area. Our results give a theoretical basis for the use of controlled multiple imputation procedures for sensitivity analysis. Information-Based Optimal Subdata Selection(IBOSS) The information-based optimal subdata selection (IBOSS) is a computationally efficient method to select informative data points from large data sets through processing full data by columns. However, when the volume of a data set is too large to be processed in the available memory of a machine, it is infeasible to implement the IBOSS procedure. This paper develops a divide-and-conquer IBOSS approach to solving this problem, in which the full data set is divided into smaller partitions to be loaded into the memory and then subsets of data are selected from each partitions using the IBOSS algorithm. We derive both finite sample properties and asymptotic properties of the resulting estimator. Asymptotic results show that if the full data set is partitioned randomly and the number of partitions is not very large, then the resultant estimator has the same estimation efficiency as the original IBOSS estimator. We also carry out numerical experiments to evaluate the empirical performance of the proposed method. Information-Directed Exploration ➘ “Information-Directed Sampling” Information-Directed Sampling(IDS) Efficient exploration remains a major challenge for reinforcement learning. One reason is that the variability of the returns often depends on the current state and action, and is therefore heteroscedastic. Classical exploration strategies such as upper confidence bound algorithms and Thompson sampling fail to appropriately account for heteroscedasticity, even in the bandit setting. Motivated by recent findings that address this issue in bandits, we propose to use Information-Directed Sampling (IDS) for exploration in reinforcement learning. As our main contribution, we build on recent advances in distributional reinforcement learning and propose a novel, tractable approximation of IDS for deep Q-learning. The resulting exploration strategy explicitly accounts for both parametric uncertainty and heteroscedastic observation noise. We evaluate our method on Atari games and demonstrate a significant improvement over alternative approaches. Information-Theoretic Active Learning(ITAL) We propose Information-Theoretic Active Learning (ITAL), a novel batch-mode active learning method for binary classification, and apply it for acquiring meaningful user feedback in the context of content-based image retrieval. Instead of combining different heuristics such as uncertainty, diversity, or density, our method is based on maximizing the mutual information between the predicted relevance of the images and the expected user feedback regarding the selected batch. We propose suitable approximations to this computationally demanding problem and also integrate an explicit model of user behavior that accounts for possible incorrect labels and unnameable instances. Furthermore, our approach does not only take the structure of the data but also the expected model output change caused by the user feedback into account. In contrast to other methods, ITAL turns out to be highly flexible and provides state-of-the-art performance across various datasets, such as MIRFLICKR and ImageNet. Information-Theoretic Canonical Correlation Analysis(ITCCA) Canonical Correlation Analysis (CCA) is a linear representation learning method that seeks maximally correlated variables in multi-view data. Non-linear CCA extends this notion to a broader family of transformations, which are more powerful for many real-world applications. Given the joint probability, the Alternating Conditional Expectation (ACE) provides an optimal solution to the non-linear CCA problem. However, it suffers from limited performance and an increasing computational burden when only a finite number of observations is available. In this work we introduce an information-theoretic framework for the non-linear CCA problem (ITCCA), which extends the classical ACE approach. Our suggested framework seeks compressed representations of the data that allow a maximal level of correlation. This way we control the trade-off between the flexibility and the complexity of the representation. Our approach demonstrates favorable performance at a reduced computational burden, compared to non-linear alternatives, in a finite sample size regime. Further, ITCCA provides theoretical bounds and optimality conditions, as we establish fundamental connections to rate-distortion theory, the information bottleneck and remote source coding. In addition, it implies a ‘soft’ dimensionality reduction, as the compression level is measured (and governed) by the mutual information between the original noisy data and the signals that we extract. Informed Machine Learning Despite the great successes of machine learning, it can have its limits when dealing with insufficient training data. A potential solution is to incorporate additional knowledge into the training process which leads to the idea of informed machine learning. We present a research survey and structured overview of various approaches in this field. We aim to establish a taxonomy which can serve as a classification framework that considers the kind of additional knowledge, its representation,and its integration into the machine learning pipeline. The evaluation of numerous papers on the bases of the taxonomy uncovers key methods in this field. InfoSSM The goal of system identification is to learn about underlying physics dynamics behind the observed time-series data. To model the nonparametric and probabilistic dynamics model, Gaussian process state-space models (GPSSMs) have been widely studied; GPs are not only capable to represent nonlinear dynamics, but estimate the uncertainty of prediction and avoid over-fitting. Traditional GPSSMs, however, are based on Gaussian transition model, thus often have difficulty in describing multi-modal motions. To resolve the challenge, this thesis proposes a model using multiple GPs and extends the GPSSM to information-theoretic framework by introducing a mutual information regularizer helping the model to learn interpretable and disentangled representation of multi-modal transition dynamics model. Experiment results show that the proposed model not only successfully represents the observed system but distinguishes the dynamics mode that governs the given observation sequence. Inhomogeneous Self-Exciting Process(IHSEP) IHSEP Initial Data Analysis(IDA) The most important distinction between the initial data analysis phase and the main analysis phase, is that during initial data analysis one refrains from any analysis that is aimed at answering the original research question. The initial data analysis phase is guided by the following four questions: · Quality of data · Quality of measurements · Initial transformations · Did the implementation of the study fulfill the intentions of the research design? Inner Average Ensemble(IAE) Ensemble learning is a method of combining multiple trained models to improve the model accuracy. We introduce the usage of such methods, specifically ensemble average inside Convolutional Neural Networks (CNNs) architectures. By Inner Average Ensemble (IEA) of multiple convolutional neural layers (CNLs) replacing the single CNLs inside the CNN architecture, the accuracy of the CNN increased. A visual and a similarity score analysis of the features generated from IEA explains why it boosts the model performance. Empirical results using different benchmarking datasets and well-known deep model architectures shows that IEA outperforms the ordinary CNL used in CNNs. Inner Source Inner source is the use of open source software development best practices and the establishment of an open source-like culture within organizations. The organization may still develop proprietary software, but internally opens up its development. The term was coined by Tim O’Reilly in 2000. Inner-Imaging Architecture Despite the tremendous success in computer vision, deep convolutional networks suffer from serious computation cost and redundancies. Although previous works address this issue by enhancing diversities of filters, they ignore that both complementarity and completeness are required in the internal structure of convolutional network. In this setting, we propose a novel Inner-Imaging architecture, which allows relationships between channels to meet the above requirement. Specifically, we organize the filter signal points in groups using convolutional kernels to model both the intra- and inter-group relationships simultaneously. Consequently, we not only increase diversities of channels but also explicitly enhance the complementarity and completeness. Our proposed architecture is lightweight and easy to be implemented for improving the modelling efficiency and performance. We conduct extensive experiments on CIFAR, SVHN and ImageNet and verify the effectiveness of our inner-imaging architecture with residual networks as the backbone. Innovation Management Innovation management is the management of innovation processes. It refers both to product and organizational innovation. Innovation management includes a set of tools that allow managers and engineers to cooperate with a common understanding of processes and goals. Innovation management allows the organization to respond to external or internal opportunities, and use its creativity to introduce new ideas, processes or products. It is not relegated to R&D; it involves workers at every level in contributing creatively to a company’s product development, manufacturing and marketing. Innovation Pursuit(iPursuit) In subspace clustering, a group of data points belonging to a union of subspaces are assigned membership to their respective subspaces. This paper presents a new approach dubbed Innovation Pursuit (iPursuit) to the problem of subspace clustering using a new geometrical idea whereby each subspace is identified based on its novelty with respect to the other subspaces. The proposed approach finds the subspaces consecutively by solving a series of simple linear optimization problems, each searching for some direction in the span of the data that is potentially orthogonal to all subspaces except for the one to be identified in one step of the algorithm. A detailed mathematical analysis is provided establishing sufficient conditions for the proposed approach to correctly cluster the data points. Remarkably, the proposed approach can provably yield exact clustering even when the subspaces have significant intersections under mild conditions on the distribution of the data points in the subspaces. Moreover, It is shown that the complexity of iPursuit is almost independent of the dimension of the data. The numerical simulations demonstrate that iPursuit can often outperform the state-of-the-art subspace clustering algorithms, more so for subspaces with significant intersections. iNNvestigate In recent years, deep neural networks have revolutionized many application domains of machine learning and are key components of many critical decision or predictive processes. Therefore, it is crucial that domain specialists can understand and analyze actions and predictions, even of the most complex neural network architectures. Despite these arguments neural networks are often treated as black boxes. In the attempt to alleviate this short- coming many analysis methods were proposed, yet the lack of reference implementations often makes a systematic comparison between the methods a major effort. The presented library iNNvestigate addresses this by providing a common interface and out-of-the- box implementation for many analysis methods, including the reference implementation for PatternNet and PatternAttribution as well as for LRP-methods. To demonstrate the versatility of iNNvestigate, we provide an analysis of image classifications for variety of state-of-the-art neural network architectures. Inoculation by Fine-Tuning Several datasets have recently been constructed to expose brittleness in models trained on existing benchmarks. While model performance on these challenge datasets is significantly lower compared to the original benchmark, it is unclear what particular weaknesses they reveal. For example, a challenge dataset may be difficult because it targets phenomena that current models cannot capture, or because it simply exploits blind spots in a model’s specific training set. We introduce inoculation by fine-tuning, a new analysis method for studying challenge datasets by exposing models (the metaphorical patient) to a small amount of data from the challenge dataset (a metaphorical pathogen) and assessing how well they can adapt. We apply our method to analyze the NLI ‘stress tests’ (Naik et al., 2018) and the Adversarial SQuAD dataset (Jia and Liang, 2017). We show that after slight exposure, some of these datasets are no longer challenging, while others remain difficult. Our results indicate that failures on challenge datasets may lead to very different conclusions about models, training datasets, and the challenge datasets themselves. Input Fast-Forwarding This paper introduces a new architectural framework, known as input fast-forwarding, that can enhance the performance of deep networks. The main idea is to incorporate a parallel path that sends representations of input values forward to deeper network layers. This scheme is substantially different from ‘deep supervision’ in which the loss layer is re-introduced to earlier layers. The parallel path provided by fast-forwarding enhances the training process in two ways. First, it enables the individual layers to combine higher-level information (from the standard processing path) with lower-level information (from the fast-forward path). Second, this new architecture reduces the problem of vanishing gradients substantially because the fast-forwarding path provides a shorter route for gradient backpropagation. In order to evaluate the utility of the proposed technique, a Fast-Forward Network (FFNet), with 20 convolutional layers along with parallel fast-forward paths, has been created and tested. The paper presents empirical results that demonstrate improved learning capacity of FFNet due to fast-forwarding, as compared to GoogLeNet (with deep supervision) and CaffeNet, which are 4x and 18x larger in size, respectively. All of the source code and deep learning models described in this paper will be made available to the entire research community Input-Free Attack Recent studies have highlighted that deep neural networks (DNNs) are vulnerable to adversarial attacks, even in a black-box scenario. However, most of the existing black-box attack algorithms need to make a huge amount of queries to perform attacks, which is not practical in the real world. We note one of the main reasons for the massive queries is that the adversarial example is required to be visually similar to the original image, but in many cases, how adversarial examples look like does not matter much. It inspires us to introduce a new attack called \emph{input-free} attack, under which an adversary can choose an arbitrary image to start with and is allowed to add perceptible perturbations on it. Following this approach, we propose two techniques to significantly reduce the query complexity. First, we initialize an adversarial example with a gray color image on which every pixel has roughly the same importance for the target model. Then we shrink the dimension of the attack space by perturbing a small region and tiling it to cover the input image. To make our algorithm more effective, we stabilize a projected gradient ascent algorithm with momentum, and also propose a heuristic approach for region size selection. Through extensive experiments, we show that with only 1,701 queries on average, we can perturb a gray image to any target class of ImageNet with a 100\% success rate on InceptionV3. Besides, our algorithm has successfully defeated two real-world systems, the Clarifai food detection API and the Baidu Animal Identification API. Insertion Transformer We present the Insertion Transformer, an iterative, partially autoregressive model for sequence generation based on insertion operations. Unlike typical autoregressive models which rely on a fixed, often left-to-right ordering of the output, our approach accommodates arbitrary orderings by allowing for tokens to be inserted anywhere in the sequence during decoding. This flexibility confers a number of advantages: for instance, not only can our model be trained to follow specific orderings such as left-to-right generation or a binary tree traversal, but it can also be trained to maximize entropy over all valid insertions for robustness. In addition, our model seamlessly accommodates both fully autoregressive generation (one insertion at a time) and partially autoregressive generation (simultaneous insertions at multiple locations). We validate our approach by analyzing its performance on the WMT 2014 English-German machine translation task under various settings for training and decoding. We find that the Insertion Transformer outperforms many prior non-autoregressive approaches to translation at comparable or better levels of parallelism, and successfully recovers the performance of the original Transformer while requiring only logarithmically many iterations during decoding. inst2vec With the recent success of embeddings in natural language processing, research has been conducted into applying similar methods to code analysis. Most works attempt to process the code directly or use a syntactic tree representation, treating it like sentences written in a natural language. However, none of the existing methods are sufficient to comprehend program semantics robustly, due to structural features such as function calls, branching, and interchangeable order of statements. In this paper, we propose a novel processing technique to learn code semantics, and apply it to a variety of program analysis tasks. In particular, we stipulate that a robust distributional hypothesis of code applies to both human- and machine-generated programs. Following this hypothesis, we define an embedding space, inst2vec, based on an Intermediate Representation (IR) of the code that is independent of the source programming language. We provide a novel definition of contextual flow for this IR, leveraging both the underlying data- and control-flow of the program. We then analyze the embeddings qualitatively using analogies and clustering, and evaluate the learned representation on three different high-level tasks. We show that with a single RNN architecture and pre-trained fixed embeddings, inst2vec outperforms specialized approaches for performance prediction (compute device mapping, optimal thread coarsening); and algorithm classification from raw code (104 classes), where we set a new state-of-the-art. InstaNAS Neural Architecture Search (NAS) aims at finding one ‘single’ architecture that achieves the best accuracy for a given task such as image recognition.In this paper, we study the instance-level variation,and demonstrate that instance-awareness is an important yet currently missing component of NAS. Based on this observation, we propose InstaNAS for searching toward instance-level architectures;the controller is trained to search and form a ‘distribution of architectures’ instead of a single final architecture. Then during the inference phase, the controller selects an architecture from the distribution, tailored for each unseen image to achieve both high accuracy and short latency. The experimental results show that InstaNAS reduces the inference latency without compromising classification accuracy. On average, InstaNAS achieves 48.9% latency reduction on CIFAR-10 and 40.2% latency reduction on CIFAR-100 with respect to MobileNetV2 architecture. Instance Segmentation Instance segmentation is the problem of detecting and delineating each object of interest appearing in an image. Current instance segmentation approaches consist of ensembles of modules that are trained independently of each other, thus missing learning opportunities. Instance Selection(IS) In supervised learning, a training set providing previously known information is used to classify new instances. Commonly, several instances are stored in the training set but some of them are not useful for classifying therefore it is possible to get acceptable classification rates ignoring non useful cases; this process is known as instance selection. Through instance selection the training set is reduced which allows reducing runtimes in the classification and/or training stages of classifiers. Instance-Aware GAN(InstaGAN) Unsupervised image-to-image translation has gained considerable attention due to the recent impressive progress based on generative adversarial networks (GANs). However, previous methods often fail in challenging cases, in particular, when an image has multiple target instances and a translation task involves significant changes in shape, e.g., translating pants to skirts in fashion images. To tackle the issues, we propose a novel method, coined instance-aware GAN (InstaGAN), that incorporates the instance information (e.g., object segmentation masks) and improves multi-instance transfiguration. The proposed method translates both an image and the corresponding set of instance attributes while maintaining the permutation invariance property of the instances. To this end, we introduce a context preserving loss that encourages the network to learn the identity function outside of target instances. We also propose a sequential mini-batch inference/training technique that handles multiple instances with a limited GPU memory and enhances the network to generalize better for multiple instances. Our comparative evaluation demonstrates the effectiveness of the proposed method on different image datasets, in particular, in the aforementioned challenging cases. Instance-Based Entropy Fuzzy Support Vector Machine(IEFSVM) Imbalanced classification has been a major challenge for machine learning because many standard classifiers mainly focus on balanced datasets and tend to have biased results towards the majority class. We modify entropy fuzzy support vector machine (EFSVM) and introduce instance-based entropy fuzzy support vector machine (IEFSVM). Both EFSVM and IEFSVM use the entropy information of k-nearest neighbors to determine the fuzzy membership value for each sample which prioritizes the importance of each sample. IEFSVM considers the diversity of entropy patterns for each sample when increasing the size of neighbors, k, while EFSVM uses single entropy information of the fixed size of neighbors for all samples. By varying k, we can reflect the component change of sample’s neighbors from near to far distance in the determination of fuzzy value membership. Numerical experiments on 35 public and 12 real-world imbalanced datasets are performed to validate IEFSVM and area under the receiver operating characteristic curve (AUC) is used to compare its performance with other SVMs and machine learning methods. IEFSVM shows a much higher AUC value for datasets with high imbalance ratio, implying that IEFSVM is effective in dealing with the class imbalance problem. Instance-Based Learning(IBL) In machine learning, instance-based learning or memory-based learning is a family of learning algorithms that, instead of performing explicit generalization, compares new problem instances with instances seen in training, which have been stored in memory. Instance-based learning is a kind of lazy learning. Instance-Level Meta Normalization(ILM~Norm) This paper presents a normalization mechanism called Instance-Level Meta Normalization (ILM~Norm) to address a learning-to-normalize problem. ILM~Norm learns to predict the normalization parameters via both the feature feed-forward and the gradient back-propagation paths. ILM~Norm provides a meta normalization mechanism and has several good properties. It can be easily plugged into existing instance-level normalization schemes such as Instance Normalization, Layer Normalization, or Group Normalization. ILM~Norm normalizes each instance individually and therefore maintains high performance even when small mini-batch is used. The experimental results show that ILM~Norm well adapts to different network architectures and tasks, and it consistently improves the performance of the original models. The code is available at url{https://…/ILM-Norm. Instancewise Feature Selection We introduce instancewise feature selection as a methodology for model interpretation. Our method is based on learning a function to extract a subset of features that are most informative for each given example. This feature selector is trained to maximize the mutual information between selected features and the response variable, where the conditional distribution of the response variable given the input is the model to be explained. We develop an efficient variational approximation to the mutual information, and show that the resulting method compares favorably to other model explanation methods on a variety of synthetic and real data sets using both quantitative metrics and human evaluation. Instantaneous Rates(IRATE) The Instantaneous Rates (IRATE) model is used to analyze tagging data. It is based on the Hoenig et al. (1998) alternate formulation of the Brownie et al. (1985) band recovery models that allow fishing and natural mortality to be derived from the exploitation rate and survival rate estimates of a Type II (continuous) fishery. IRATE allows both age-independent and age-dependent instantaneous rates models (Hoenig et al., 1998; Jiang et al., 2007) to be fitted to multi-year fish tag return data. IRATE allows model development with either age-dependent harvest-only or harvest and catch-release tag returns or similar age independent models. The software, developed by Dr. Gary Nelson of the Massachusetts Division of Marine Fisheries, also allows estimation of harvest reporting rates, catch and release reporting rates, and tag retention of harvested and/or released fish. However, not all parameters in the model can be estimated simultaneously with tag data alone. Some parameters must be fixed and assumed known (usually reporting rate and tag loss) to obtain good estimates of remaining parameters. Additionally, the model can account for non-mixing of the tagged fish in the first release year and adjust for harvest and M selectivity in the age-based models. The negative log likelihood is used as the objective function to obtain maximum likelihood estimates of parameters. Several model fit statistics are provided that can be used to select the best model formulation; these include the Akaike Information Criterion (AIC), c-hat (a measure of overdispersion) and standard residuals. The calculation engine is written in AD Model Builder. IRATER Instruction-based Behavior Explanation(IBE) In cooperation, the workers must know how co-workers behave. However, an agent’s policy, which is embedded in a statistical machine learning model, is hard to understand, and requires much time and knowledge to comprehend. Therefore, it is difficult for people to predict the behavior of machine learning robots, which makes Human Robot Cooperation challenging. In this paper, we propose Instruction-based Behavior Explanation (IBE), a method to explain an autonomous agent’s future behavior. In IBE, an agent can autonomously acquire the expressions to explain its own behavior by reusing the instructions given by a human expert to accelerate the learning of the agent’s policy. IBE also enables a developmental agent, whose policy may change during the cooperation, to explain its own behavior with sufficient time granularity. Instrumental Panel Data Models ivpanel Instrumental Variable(IV) In statistics, econometrics, epidemiology and related disciplines, the method of instrumental variables (IV) is used to estimate causal relationships when controlled experiments are not feasible or when a treatment is not successfully delivered to every unit in a randomized experiment. Instrumental variable methods allow consistent estimation when the explanatory variables (covariates) are correlated with the error terms of a regression relationship. Such correlation may occur when the dependent variable causes at least one of the covariates (‘reverse’ causation), when there are relevant explanatory variables which are omitted from the model, or when the covariates are subject to measurement error. In this situation, ordinary linear regression generally produces biased and inconsistent estimates. However, if an instrument is available, consistent estimates may still be obtained. An instrument is a variable that does not itself belong in the explanatory equation and is correlated with the endogenous explanatory variables, conditional on the other covariates. In linear models, there are two main requirements for using an IV: · The instrument must be correlated with the endogenous explanatory variables, conditional on the other covariates. · The instrument cannot be correlated with the error term in the explanatory equation (conditional on the other covariates), that is, the instrument cannot suffer from the same problem as the original predicting variable. ivmodel Instrumental Variables Estimation In statistics, econometrics, epidemiology and related disciplines, the method of instrumental variables (IV) is used to estimate causal relationships when controlled experiments are not feasible or when a treatment is not successfully delivered to every unit in a randomized experiment.[1] Intuitively, IV is used when an explanatory variable of interest is correlated with the error term, in which case ordinary least squares and ANOVA gives biased results. A valid instrument induces changes in the explanatory variable but has no independent effect on the dependent variable, allowing a researcher to uncover the causal effect of the explanatory variable on the dependent variable. Instrumental variable methods allow for consistent estimation when the explanatory variables (covariates) are correlated with the error terms in a regression model. Such correlation may occur 1) when changes in the dependent variable change the value of at least one of the covariates (‘reverse’ causation), 2) when there are omitted variables that affect both the dependent and independent variables, or 3) when the covariates are subject to non-random measurement error. Explanatory variables which suffer from one or more of these issues in the context of a regression are sometimes referred to as endogenous. In this situation, ordinary least squares produces biased and inconsistent estimates.[2] However, if an instrument is available, consistent estimates may still be obtained. An instrument is a variable that does not itself belong in the explanatory equation but is correlated with the endogenous explanatory variables, conditional on the value of other covariates. Integer Echo State Network(intESN) We propose an integer approximation of Echo State Networks (ESN) based on the mathematics of hyperdimensional computing. The reservoir of the proposed Integer Echo State Network (intESN) contains only n-bits integers and replaces the recurrent matrix multiply with an efficient cyclic shift operation. Such an architecture results in dramatic improvements in memory footprint and computational efficiency, with minimal performance loss. Our architecture naturally supports the usage of the trained reservoir in symbolic processing tasks of analogy making and logical inference. Integer Linear Programming(ILP) This paper explores the use of Column Generation (CG) techniques in constructing univariate binary decision trees for classification tasks. We propose a novel Integer Linear Programming (ILP) formulation, based on paths in decision trees. We show that the associated pricing problem is NP-hard and propose a random procedure for column selection. In addition, to speed up column generation, we use a restricted parameter set via a sampling procedure using the well-known CART algorithm. Extensive numerical experiments show that our approach outperforms the state-of-the-art ILP-based algorithms in the recent literature both in computation time and solution quality. We also find better solutions that have higher training and testing accuracy than an optimized version of CART. Furthermore, our approach is capable of handling big data sets with tens of thousands of data rows, unlike other ILP-based algorithms. In addition, our approach has the advantage of being able to easily incorporate different objectives. Integrated Discrimination Improvement(IDI) Integrated Discrimination Improvement (IDI) described in the paper: Jialiang Li (2013) . mcca Integrated Nested Laplace Approximation(INLA) A fully automatic approach for approximate inference in latent Gaussian models. INLA,meta4diag Integrated Principal Components Analysis(iPCA) Data integration, or the strategic analysis of multiple sources of data simultaneously, can often lead to discoveries that may be hidden in individualistic analyses of a single data source. We develop a new statistical data integration method named Integrated Principal Components Analysis (iPCA), which is a model-based generalization of PCA and serves as a practical tool to find and visualize common patterns that occur in multiple datasets. The key idea driving iPCA is the matrix-variate normal model, whose Kronecker product covariance structure captures both individual patterns within each dataset and joint patterns shared by multiple datasets. Building upon this model, we develop several penalized (sparse and non-sparse) covariance estimators for iPCA and study their theoretical properties. We show that our sparse iPCA estimator consistently estimates the underlying joint subspace, and using geodesic convexity, we prove that our non-sparse iPCA estimator converges to the global solution of a non-convex problem. We also demonstrate the practical advantages of iPCA through simulations and a case study application to integrative genomics for Alzheimer’s Disease. In particular, we show that the joint patterns extracted via iPCA are highly predictive of a patient’s cognition and Alzheimer’s diagnosis. Integrated Rational Prediction and Motionless TextbfANalysis(IRON-MAN) Analyzing video for traffic categorization is an important pillar of Intelligent Transport Systems. However, it is difficult to analyze and predict traffic based on image frames because the representation of each frame may vary significantly within a short time period. This also would inaccurately represent the traffic over a longer period of time such as the case of video. We propose a novel bio-inspired methodology that integrates analysis of the previous image frames of the video to represent the analysis of the current image frame, the same way a human being analyzes the current situation based on past experience. In our proposed methodology, called IRON-MAN (Integrated Rational prediction and Motionless textbfANalysis), we utilize Bayesian update on top of the individual image frame analysis in the videos and this has resulted in highly accurate prediction of Temporal Motionless Analysis of the Videos (TMAV) for most of the chosen test cases. The proposed approach could be used for TMAV using Convolutional Neural Network (CNN) for applications where the number of objects in an image is the deciding factor for prediction and results also show that our proposed approach outperforms the state-of-the-art for the chosen test case. We also introduce a new metric named, Energy Consumption per Training Image (ECTI). Since, different CNN based models have different training capability and computing resource utilization, some of the models are more suitable for embedded device implementation than the others, and ECTI metric is useful to assess the suitability of using a CNN model in multi-processor systems-on-chips (MPSoCs) with a focus on energy consumption and reliability in terms of lifespan of the embedded device using these MPSoCs. Integrative Connectionist Learning Systems(ICOS) The so far developed and widely utilized connectionist systems (artificial neural networks) are mainly based on a single brain-like connectionist principle of information processing, where learning and information exchange occur in the connections. This paper extends this paradigm of connectionist systems to a new trend-integrative connectionist learning systems (ICOS) that integrate in their structure and learning algorithms principles from different hierarchical levels of information processing in the brain, including neuronal-, genetic-, quantum. Spiking neural networks (SNN) are used as a basic connectionist learning model which is further extended with other information learning principles to create different ICOS. For example, evolving SNN for multitask learning are presented and illustrated on a case study of person authentification based on multimodal auditory and visual information. Integrative gene-SNN are presented, where gene interactions are included in the functioning of a spiking neuron. They are applied on a case study of computational neurogenetic modeling. Integrative quantum-SNN are introduced with a quantum Hebbian learning, where input features as well as information spikes are represented by quantum bits that result in exponentially faster feature selection and model learning. ICOS can be used to solve more efficiently challenging biological and engineering problems when fast adaptive learning systems are needed to incrementally learn in a large dimensional space. They can also help to better understand complex information processes in the brain especially how information processes at different information levels interact. Open questions, challenges and directions for further research are presented. Intel Machine Learning Scalability Library(MLSL) The exponential growth in use of large deep neural networks has accelerated the need for training these deep neural networks in hours or even minutes. This can only be achieved through scalable and efficient distributed training, since a single node/card cannot satisfy the compute, memory, and I/O requirements of today’s state-of-the-art deep neural networks. However, scaling synchronous Stochastic Gradient Descent (SGD) is still a challenging problem and requires continued research/development. This entails innovations spanning algorithms, frameworks, communication libraries, and system design. In this paper, we describe the philosophy, design, and implementation of Intel Machine Learning Scalability Library (MLSL) and present proof-points demonstrating scaling DL training on 100s to 1000s of nodes across Cloud and HPC systems. Intel nGraph The Deep Learning (DL) community sees many novel topologies published each year. Achieving high performance on each new topology remains challenging, as each requires some level of manual effort. This issue is compounded by the proliferation of frameworks and hardware platforms. The current approach, which we call ‘direct optimization’, requires deep changes within each framework to improve the training performance for each hardware backend (CPUs, GPUs, FPGAs, ASICs) and requires $\mathcal{O}(fp)$ effort; where $f$ is the number of frameworks and $p$ is the number of platforms. While optimized kernels for deep-learning primitives are provided via libraries like Intel Math Kernel Library for Deep Neural Networks (MKL-DNN), there are several compiler-inspired ways in which performance can be further optimized. Building on our experience creating neon (a fast deep learning library on GPUs), we developed Intel nGraph, a soon to be open-sourced C++ library to simplify the realization of optimized deep learning performance across frameworks and hardware platforms. Initially-supported frameworks include TensorFlow, MXNet, and Intel neon framework. Initial backends are Intel Architecture CPUs (CPU), the Intel(R) Nervana Neural Network Processor(R) (NNP), and NVIDIA GPUs. Currently supported compiler optimizations include efficient memory management and data layout abstraction. In this paper, we describe our overall architecture and its core components. In the future, we envision extending nGraph API support to a wider range of frameworks, hardware (including FPGAs and ASICs), and compiler optimizations (training versus inference optimizations, multi-node and multi-device scaling via efficient sub-graph partitioning, and HW-specific compounding of operations). Intelligence Amplification Intelligence amplification (IA) (also referred to as cognitive augmentation and machine augmented intelligence) refers to the effective use of information technology in augmenting human intelligence. The idea was first proposed in the 1950s and 1960s by cybernetics and early computer pioneers. IA is sometimes contrasted with AI (Artificial Intelligence), that is, the project of building a human-like intelligence in the form of an autonomous technological system such as a computer or robot. AI has encountered many fundamental obstacles, practical as well as theoretical, which for IA seem moot, as it needs technology merely as an extra support for an autonomous intelligence that has already proven to function. Moreover, IA has a long history of success, since all forms of information technology, from the abacus to writing to the Internet, have been developed basically to extend the information processing capabilities of the human mind (see extended mind and distributed cognition). Intelligence Graph In fact, there exist three genres of intelligence architectures: logics (e.g. \textit{Random Forest, A$^*$ Searching}), neurons (e.g. \textit{CNN, LSTM}) and probabilities (e.g. \textit{Naive Bayes, HMM}), all of which are incompatible to each other. However, to construct powerful intelligence systems with various methods, we propose the intelligence graph (short as \textbf{\textit{iGraph}}), which is composed by both of neural and probabilistic graph, under the framework of forward-backward propagation. By the paradigm of iGraph, we design a recommendation model with semantic principle. First, the probabilistic distributions of categories are generated from the embedding representations of users/items, in the manner of neurons. Second, the probabilistic graph infers the distributions of features, in the manner of probabilities. Last, for the recommendation diversity, we perform an expectation computation then conduct a logic judgment, in the manner of logics. Experimentally, we beat the state-of-the-art baselines and verify our conclusions. Intelligent Data Analytics(IDA) The art of Conquering Data with Intelligent Systems includes all areas of Research and Development in Intelligent Data Analytics , the area including Data Analytics and Intelligent Systems, that focus on computational, mathematical, statistical, cognitive, and algorithmic techniques for modeling high dimensional data with the ultimate goal of extracting meaning from (raw) data. This requires methods ranging from learning, inference, prediction, knowledge discovery and visualisation that are applicable on both small and large volumes of mostly dynamic data sets collected and integrated from multiple sources, across multiple modalities. These methods and techniques trigger the need for assessment and evaluation: automated and by humans. Intelligent Data Analytics enables automated hypothesis generation, event correlation, and anomaly detection and helps in explaining phenomena and inferring results that would otherwise remain hidden. Intelligent Data Analytics is a cornerstone in modern Big Data, amplifying perhaps its most important aspect: Value. Intelligent K-Means(ik-Means) Intelligent K-Means (iK-Means) is an K-Means initialization algorithm. It is a simple algorithm based on the concept of anomalous patterns, its of easy implementation and may even help you to find how many clusters there are in a dataset (remember, you need to know this in order to run K-Means!). Intelligent Choice of the Number of Clusters in K -Means Clustering: An Experimental Study with Different Cluster Spreads Intelligent Personal Agent(IPA) An Intelligent Personal Agent (IPA) is an agent that has the purpose of helping the user to gain information through reliable resources with the help of knowledge navigation techniques and saving time to search the best content. The agent is also responsible for responding to the chat-based queries with the help of Conversation Corpus. Intelligent Software Christopher Bishop: “Software that can adapt, learn and reason” IntelligentCrowd The prosperity of smart mobile devices has made mobile crowdsensing (MCS) a promising paradigm for completing complex sensing and computation tasks. In the past, great efforts have been made on the design of incentive mechanisms and task allocation strategies from MCS platform’s perspective to motivate mobile users’ participation. However, in practice, MCS participants face many uncertainties coming from their sensing environment as well as other participants’ strategies, and how do they interact with each other and make sensing decisions is not well understood. In this paper, we take MCS participants’ perspective to derive an online sensing policy to maximize their payoffs via MCS participation. Specifically, we model the interactions of mobile users and sensing environments as a multi-agent Markov decision process. Each participant cannot observe others’ decisions, but needs to decide her effort level in sensing tasks only based on local information, e.g., its own record of sensed signals’ quality. To cope with the stochastic sensing environment, we develop an intelligent crowdsensing algorithm IntelligentCrowd by leveraging the power of multi-agent reinforcement learning (MARL). Our algorithm leads to the optimal sensing policy for each user to maximize the expected payoff against stochastic sensing environments, and can be implemented at individual participant’s level in a distributed fashion. Numerical simulations demonstrate that IntelligentCrowd significantly improves users’ payoffs in sequential MCS tasks under various sensing dynamics. Intensive Principal Component Analysis(InPCA) Unsupervised learning makes manifest the underlying structure of data without curated training and specific problem definitions. However, the inference of relationships between data points is frustrated by the curse of dimensionality’ in high-dimensions. Inspired by replica theory from statistical mechanics, we consider replicas of the system to tune the dimensionality and take the limit as the number of replicas goes to zero. The result is the intensive embedding, which is not only isometric (preserving local distances) but allows global structure to be more transparently visualized. We develop the Intensive Principal Component Analysis (InPCA) and demonstrate clear improvements in visualizations of the Ising model of magnetic spins, a neural network, and the dark energy cold dark matter ({\Lambda}CDM) model as applied to the Cosmic Microwave Background. Intent Classification A brief introduction to Intent Classification Intent-Aware Multi-Agent Reinforcement Learning(IAMARL) This paper proposes an intent-aware multi-agent planning framework as well as a learning algorithm. Under this framework, an agent plans in the goal space to maximize the expected utility. The planning process takes the belief of other agents’ intents into consideration. Instead of formulating the learning problem as a partially observable Markov decision process (POMDP), we propose a simple but effective linear function approximation of the utility function. It is based on the observation that for humans, other people’s intents will pose an influence on our utility for a goal. The proposed framework has several major advantages: i) it is computationally feasible and guaranteed to converge. ii) It can easily integrate existing intent prediction and low-level planning algorithms. iii) It does not suffer from sparse feedbacks in the action space. We experiment our algorithm in a real-world problem that is non-episodic, and the number of agents and goals can vary over time. Our algorithm is trained in a scene in which aerial robots and humans interact, and tested in a novel scene with a different environment. Experimental results show that our algorithm achieves the best performance and human-like behaviors emerge during the dynamic process. INTENT-CAPSNET User intent detection plays a critical role in question-answering and dialog systems. Most previous works treat intent detection as a classification problem where utterances are labeled with predefined intents. However, it is labor-intensive and time-consuming to label users’ utterances as intents are diversely expressed and novel intents will continually be involved. Instead, we study the zero-shot intent detection problem, which aims to detect emerging user intents where no labeled utterances are currently available. We propose two capsule-based architectures: INTENT-CAPSNET that extracts semantic features from utterances and aggregates them to discriminate existing intents, and INTENTCAPSNET-ZSL which gives INTENTCAPSNET the zero-shot learning ability to discriminate emerging intents via knowledge transfer from existing intents. Experiments on two real-world datasets show that our model not only can better discriminate diversely expressed existing intents, but is also able to discriminate emerging intents when no labeled utterances are available. INTENTCAPSNET-ZSL ➚ “INTENT-CAPSNET” Intention Analysis Intention Analysis is the identification of intentions from text, be it the intention to purchase or the intention to sell or to complain, accuse, inquire, opine, advocate or to quit, in incoming customer messages or in call center transcripts. Intention analysis using topic models Inter Rater Reliability(IRR) In statistics, inter-rater reliability, inter-rater agreement, or concordance is the degree of agreement among raters. It gives a score of how much homogeneity, or consensus, there is in the ratings given by judges. It is useful in refining the tools given to human judges, for example by determining if a particular scale is appropriate for measuring a particular variable. If various raters do not agree, either the scale is defective or the raters need to be re-trained. There are a number of statistics which can be used to determine inter-rater reliability. Different statistics are appropriate for different types of measurement. Some options are: joint-probability of agreement, Cohen’s kappa and the related Fleiss’ kappa, inter-rater correlation, concordance correlation coefficient and intra-class correlation. rhoR Interaction Measure(IM) Recovering pairwise interactions, i.e. pairs of input features whose joint effect on an output is different from the sum of their marginal effects, is central in many scientific applications. We conceptualize a solution to this problem as a two-stage procedure: first, we model the relationship between the features and the output using a flexible hybrid neural network; second, we detect feature interactions from the trained model. For the second step we propose a simple and intuitive interaction measure (IM), which has no specific requirements on the machine learning model used in the first step, only that it defines a mapping from an input to an output. And in a special case it reduces to the averaged Hessian of the input-output mapping. Importantly, our method upper bounds the interaction recovery error with the error of the learning model, which ensures that we can improve the recovered interactions by training a more accurate model. We present analyses of simulated and real-world data which demonstrate the benefits of our method compared to available alternatives, and theoretically analyse its properties and relation to other methods. Interaction Network Self-organization is a natural phenomenon that emerges in systems with a large number of interacting components. Self-organized systems show robustness, scalability, and flexibility, which are essential properties when handling real-world problems. Swarm intelligence seeks to design nature-inspired algorithms with a high degree of self-organization. Yet, we do not know why swarm-based algorithms work well and neither we can compare the different approaches in the literature. The lack of a common framework capable of characterizing these several swarm-based algorithms, transcending their particularities, has led to a stream of publications inspired by different aspects of nature without much regard as to whether they are similar to already existing approaches. We address this gap by introducing a network-based framework – the interaction network – to examine computational swarm-based systems via the optics of social dynamics. We discuss the social dimension of several swarm classes and provide a case study of the Particle Swarm Optimization. The interaction network enables a better understanding of the plethora of approaches currently available by looking at them from a general perspective focusing on the structure of the social interactions. Interaction-Aware Factorization Machine(IFM) Factorization Machine (FM) is a widely used supervised learning approach by effectively modeling of feature interactions. Despite the successful application of FM and its many deep learning variants, treating every feature interaction fairly may degrade the performance. For example, the interactions of a useless feature may introduce noises; the importance of a feature may also differ when interacting with different features. In this work, we propose a novel model named \emph{Interaction-aware Factorization Machine} (IFM) by introducing Interaction-Aware Mechanism (IAM), which comprises the \emph{feature aspect} and the \emph{field aspect}, to learn flexible interactions on two levels. The feature aspect learns feature interaction importance via an attention network while the field aspect learns the feature interaction effect as a parametric similarity of the feature interaction vector and the corresponding field interaction prototype. IFM introduces more structured control and learns feature interaction importance in a stratified manner, which allows for more leverage in tweaking the interactions on both feature-wise and field-wise levels. Besides, we give a more generalized architecture and propose Interaction-aware Neural Network (INN) and DeepIFM to capture higher-order interactions. To further improve both the performance and efficiency of IFM, a sampling scheme is developed to select interactions based on the field aspect importance. The experimental results from two well-known datasets show the superiority of the proposed models over the state-of-the-art methods. Interaction-Aware Mechanism(IAM) ➘ “Interaction-Aware Factorization Machine” Interactive Growing Hierarchical SOM(interactive GHSOM) Self Organizing Map is trained using unsupervised learning to produce a two-dimensional discretized representation of input space of the training cases. Growing Hierarchical SOM is an architecture which grows both in a hierarchical way representing the structure of data distribution and in a horizontal way representation the size of each individual maps. The control method of the growing degree of GHSOM by pruning off the redundant branch of hierarchy in SOM is proposed in this paper. Moreover, the interface tool for the proposed method called interactive GHSOM is developed. We discuss the computation results of Iris data by using the developed tool. Interactive Matching Network(IMN) In this paper, we propose an interactive matching network (IMN) to enhance the representations of contexts and responses at both the word level and sentence level for the multi-turn response selection task. First, IMN constructs word representations from three aspects to address the challenge of out-of-vocabulary (OOV) words. Second, an attentive hierarchical recurrent encoder (AHRE), which is capable of encoding sentences hierarchically and generating more descriptive representations by aggregating with an attention mechanism, is designed. Finally, the bidirectional interactions between whole multi-turn contexts and response candidates are calculated to derive the matching information between them. Experiments on four public datasets show that IMN significantly outperforms the baseline models by large margins on all metrics, achieving new state-of-the-art performance and demonstrating compatibility across domains for multi-turn response selection. Interactive Reinforcement Learning Interactive reinforcement learning (IRL) extends traditional reinforcement learning (RL) by allowing an agent to interact with parent-like trainers during a task. Multi-modal Feedback for Affordance-driven Interactive Reinforcement Learning Interactive Report An “Interactive Report” provides a new paradigm to fill the gap between Static Report and BI Tool. It has the following characteristics … 1. Like a static report, “Interactive Report” is still based on “static data”, which is a fixed set of data generated in a periodic batch fashion. 2. Unlike static report, this pre-generated “static data” is much larger and wider that covers a broader scope of questions that the execs may ask. 3. Because the “static data” is large and wide, it is impossible to visualize all aspects in the report. Therefore, only one perspective of the static data (based on the exec’s pre-specified requirement) is shown in the report. 4. However, if the exec wants to ask a different question, he/she can switch to a different perspective of the same “static data”. Interactive Similarity Projection(iSP) Recent advances in machine learning allow us to analyze and describe the content of high-dimensional data like text, audio, images or other signals. In order to visualize that data in 2D or 3D, usually Dimensionality Reduction (DR) techniques are employed. Most of these techniques, e.g., PCA or t-SNE, produce static projections without taking into account corrections from humans or other data exploration scenarios. In this work, we propose the interactive Similarity Projection (iSP), a novel interactive DR framework based on similarity embeddings, where we form a differentiable objective based on the user interactions and perform learning using gradient descent, with an end-to-end trainable architecture. Two interaction scenarios are evaluated. First, a common methodology in multidimensional projection is to project a subset of data, arrange them in classes or clusters, and project the rest unseen dataset based on that manipulation, in a kind of semi-supervised interpolation. We report results that outperform competitive baselines in a wide range of metrics and datasets. Second, we explore the scenario of manipulating some classes, while enriching the optimization with high-dimensional neighbor information. Apart from improving classification precision and clustering on images and text documents, the new emerging structure of the projection unveils semantic manifolds. For example, on the Head Pose dataset, by just dragging the faces looking far left to the left and those looking far right to the right, all faces are re-arranged on a continuum even on the vertical axis (face up and down). This end-to-end framework can be used for fast, visual semi-supervised learning, manifold exploration, interactive domain adaptation of neural embeddings and transfer learning. Inter-Annotator Agreement Network This work develops a simple information theoretic framework that captures the dynamic of the inter-annotator agreement process and unifies a wide range of approaches in unsupervised learning. Our model consists of a pair of annotators whose goal is to maximize the mutual information between their annotations. Training the model with standard stochastic gradient descent is challenging, but we find an ablation of the model that admits variational approximation to be empirically effective. We illustrate the strength our framework by achieving new state-of-the-art accuracy on unsupervised part-of-speech tagging, in particular 78.7% on the 45-tag Penn WSJ dataset. We also show clear performance improvement in unsupervised entity typing. Interest Narrowness The number of posts made by a single user account on a social media platform Twitter in any given time interval is usually quite low. However, there is a subset of users whose volume of posts is much higher than the median. In this paper, we investigate the content diversity and the social neighborhood of these extreme users and others. We define a metric called ‘interest narrowness’, and identify that a subset of extreme users, termed anomalous users, write posts with very low topic diversity, including posts with no text content. Using a few interaction patterns we show that anomalous groups have the strongest within-group interactions, compared to their interaction with others. Further, they exhibit different information sharing behaviors with other anomalous users compared to non-anomalous extreme tweeters. Interevent Time(IET) Analytically Solvable Autocorrelation Function for Correlated Interevent Times Interior Point(IP) Interior point methods (also referred to as barrier methods) are a certain class of algorithms to solve linear and nonlinear convex optimization problems. Example solution John von Neumann suggested an interior point method of linear programming which was neither a polynomial time method nor an efficient method in practice. In fact, it turned out to be slower in practice compared to simplex method which is not a polynomial time method. In 1984, Narendra Karmarkar developed a method for linear programming called Karmarkar’s algorithm which runs in provably polynomial time and is also very efficient in practice. It enabled solutions of linear programming problems which were beyond the capabilities of simplex method. Contrary to the simplex method, it reaches a best solution by traversing the interior of the feasible region. The method can be generalized to convex programming based on a self-concordant barrier function used to encode the convex set. Any convex optimization problem can be transformed into minimizing (or maximizing) a linear function over a convex set by converting to the epigraph form. The idea of encoding the feasible set using a barrier and designing barrier methods was studied by Anthony V. Fiacco, Garth P. McCormick, and others in the early 1960s. These ideas were mainly developed for general nonlinear programming, but they were later abandoned due to the presence of more competitive methods for this class of problems (e.g. sequential quadratic programming). Yurii Nesterov and Arkadi Nemirovski came up with a special class of such barriers that can be used to encode any convex set. They guarantee that the number of iterations of the algorithm is bounded by a polynomial in the dimension and accuracy of the solution. Karmarkar’s breakthrough revitalized the study of interior point methods and barrier problems, showing that it was possible to create an algorithm for linear programming characterized by polynomial complexity and, moreover, that was competitive with the simplex method. Already Khachiyan’s ellipsoid method was a polynomial time algorithm; however, it was too slow to be of practical interest. The class of primal-dual path-following interior point methods is considered the most successful. Mehrotra’s predictor-corrector algorithm provides the basis for most implementations of this class of methods. Interior Point Optimizer(Ipopt) Ipopt (Interior Point OPTimizer, pronounced eye-pea-Opt) is a software package for large-scale ​nonlinear optimization. It is designed to find (local) solutions of mathematical optimization problems of the form: min f(x) for x in R^n, so that gL <= g(x) <= gU; xL <= x <= xU. Ipopt is written in C++ and is released as open source code under the Eclipse Public License (EPL). It is available from the ​COIN-OR initiative. The code has been written by ​Andreas Wächter and ​Carl Laird. The COIN-OR project managers for Ipopt are ​Andreas Wächter und ​Stefan Vigerske. Intermediate Level Attack(ILA) Neural networks are vulnerable to adversarial examples, malicious inputs crafted to fool trained models. Adversarial examples often exhibit black-box transfer, meaning that adversarial examples for one model can fool another model. However, adversarial examples may be overfit to exploit the particular architecture and feature representation of a source model, resulting in sub-optimal black-box transfer attacks to other target models. This leads us to introduce the Intermediate Level Attack (ILA), which attempts to fine-tune an existing adversarial example for greater black-box transferability by increasing its perturbation on a pre-specified layer of the source model. We show that our method can effectively achieve this goal and that we can decide a nearly-optimal layer of the source model to perturb without any knowledge of the target models. Intermediate Representation(IR) Domain specific accelerators present new challenges and opportunities for code generation onto novel instruction sets, communication fabrics, and memory architectures. In this paper we introduce an intermediate representation (IR) which enables both deep learning computational kernels and hardware capabilities to be described in the same IR. We then formulate and apply instruction mapping to determine the possible ways a computation can be performed on a hardware system. Next, our scheduler chooses a specific mapping and determines the data movement and computation order. In order to manage the large search space of mappings and schedules, we developed a flexible framework that allows heuristics, cost models, and potentially machine learning to facilitate this search problem. With this system, we demonstrate the automated extraction of matrix multiplication kernels out of recent deep learning kernels such as depthwise-separable convolution. In addition, we demonstrate two to five times better performance on DeepBench sized GEMMs and GRU RNN execution when compared to state-of-the-art (SOTA) implementations on new hardware and up to 85% of the performance for SOTA implementations on existing hardware. INtermediate representations for FuturE pRediction(INFER) In urban driving scenarios, forecasting future trajectories of surrounding vehicles is of paramount importance. While several approaches for the problem have been proposed, the best-performing ones tend to require extremely detailed input representations (eg. image sequences). But, such methods do not generalize to datasets they have not been trained on. We propose intermediate representations that are particularly well-suited for future prediction. As opposed to using texture (color) information, we rely on semantics and train an autoregressive model to accurately predict future trajectories of traffic participants (vehicles) (see fig. above). We demonstrate that using semantics provides a significant boost over techniques that operate over raw pixel intensities/disparities. Uncharacteristic of state-of-the-art approaches, our representations and models generalize to completely different datasets, collected across several cities, and also across countries where people drive on opposite sides of the road (left-handed vs right-handed driving). Additionally, we demonstrate an application of our approach in multi-object tracking (data association). To foster further research in transferrable representations and ensure reproducibility, we release all our code and data. Intermittent Learning In this paper, we introduce the concept of intermittent learning, which enables energy harvested computing platforms to execute certain classes of machine learning tasks. We identify unique challenges to intermittent learning relating to the data and application semantics of machine learning tasks. To address these challenges, we devise an algorithm that determines a sequence of actions to achieve the desired learning objective under tight energy constraints. We further increase the energy efficiency of the system by proposing three heuristics that help an intermittent learner decide whether to learn or discard training examples at run-time. In order to provide a probabilistic bound on the completion of a learning task, we perform an energy event-based analysis that helps us analyze intermittent learning systems where the uncertainty lies in both energy and training example generation. We implement and evaluate three intermittent learning applications that learn the air quality, human presence, and vibration using solar, RF, and kinetic energy harvesters, respectively. We demonstrate that the proposed framework improves the energy efficiency of a learner by up to 100% and cuts down the number of learning examples by up to 50% when compared to state-of-the-art intermittent computing systems without our framework. Internal Node Bagging We introduce a novel view to understand how dropout works as an inexplicit ensemble learning method, which do not point out how many and which nodes to learn a certain feature. We propose a new training method named internal node bagging, this method explicitly force a group of nodes to learn a certain feature in training time, and combine those nodes to be one node in inference time. It means we can use much more parameters to improve model’s fitting ability in training time while keeping model small in inference time. We test our method on several benchmark datasets and find it significantly more efficiency than dropout on small model. International Conference on Data Mining(ICDM) The IEEE International Conference on Data Mining series (ICDM) has established itself as the world’s premier research conference in data mining. It provides an international forum for presentation of original research results, as well as exchange and dissemination of innovative, practical development experiences. The conference covers all aspects of data mining, including algorithms, software and systems, and applications. ICDM draws researchers and application developers from a wide range of data mining related areas such as statistics, machine learning, pattern recognition, databases and data warehousing, data visualization, knowledge-based systems, and high performance computing. By promoting novel, high quality research findings, and innovative solutions to challenging data mining problems, the conference seeks to continuously advance the state-of-the-art in data mining. Besides the technical program, the conference features workshops, tutorials, panels and, since 2007, the ICDM data mining contest. International Institute for Analytics(IIA) Founded in 2010 by CEO Jack Phillips and Research Director Thomas H. Davenport, the International Institute for Analytics is an independent research firm that works with organizations to build strong and competitive analytics programs. IIA offers unbiased advice in an industry dominated by hardware and software vendors, consultants and system integrators. With a vast network of analytics experts, academics and leaders at successful companies, we guide our clients as they build and grow successful analytics programs. International Mathematics and Statistics Library(IMSL) IMSL (International Mathematics and Statistics Library) is a commercial collection of software libraries of numerical analysis functionality that are implemented in the computer programming languages of C, Java, C#.NET, and Fortran. A Python interface is also available. The IMSL Libraries are provided by Rogue Wave Software. International Phonetic Alphabet(IPA) The International Phonetic Alphabet (unofficially-though commonly-abbreviated IPA) is an alphabetic system of phonetic notation based primarily on the Latin alphabet. It was devised by the International Phonetic Association as a standardized representation of the sounds of oral language. The IPA is used by lexicographers, foreign language students and teachers, linguists, speech-language pathologists, singers, actors, constructed language creators, and translators. The IPA is designed to represent only those qualities of speech that are part of oral language: phones, phonemes, intonation, and the separation of words and syllables. To represent additional qualities of speech, such as tooth gnashing, lisping, and sounds made with a cleft palate, an extended set of symbols called the Extensions to the IPA may be used. IPA symbols are composed of one or more elements of two basic types, letters and diacritics. For example, the sound of the English letter ⟨t⟩ may be transcribed in IPA with a single letter, , or with a letter plus diacritics, , depending on how precise one wishes to be. Often, slashes are used to signal broad or phonemic transcription; thus, /t/ is less specific than, and could refer to, either or , depending on the context and language. Occasionally letters or diacritics are added, removed, or modified by the International Phonetic Association. As of the most recent change in 2005, there are 107 letters, 52 diacritics, and four prosodic marks in the IPA. These are shown in the current IPA chart, posted below in this article and at the website of the IPA. International Phonetic Association Internet of Everything(IoE) The Internet of Everything describes the networked connections between devices, people, processes and data. The Digitally Connected World. ➘ “Internet of Things” ➘ “Internet of Us” Internet of NanoThing(IoNT) This chapter focuses on Internet of Things from the nanoscale point of view. The chapter starts with section 1 which provides an introduction of nanothings and nanotechnologies. The nanoscale communication paradigms and the different approaches are discussed for nanodevices development. Nanodevice characteristics are discussed and the architecture of wireless nanodevices are outlined. Section 2 describes Internet of NanoThing(IoNT), its network architecture, and the challenges of nanoscale communication which is essential for enabling IoNT. Section 3 gives some practical applications of IoNT. The internet of Bio-NanoThing (IoBNT) and relevant biomedical applications are discussed. Other Applications such as military, industrial, and environmental applications are also outlined. Internet of Things(IoT) The Internet of Things (IoT) is the interconnection of uniquely identifiable embedded computing devices within the existing Internet infrastructure. Typically, IoT is expected to offer advanced connectivity of devices, systems, and services that goes beyond machine-to-machine communications (M2M) and covers a variety of protocols, domains, and applications. The interconnection of these embedded devices (including smart objects), is expected to usher in automation in nearly all fields, while also enabling advanced applications like a Smart Grid. Things, in the IoT, can refer to a wide variety of devices such as heart monitoring implants, biochip transponders on farm animals, automobiles with built-in sensors, or field operation devices that assist fire-fighters in search and rescue. Current market examples include smart thermostat systems and washer/dryers that utilize wifi for remote monitoring. Internet of Us(IoU) Call it the internet of bodies, call it emotionally intelligent wearable tech. Designers, engineers and artists want to wake the mainstream tech giants up to the realities of asking people to wear technology. https://…/111811056605813020209 Internet Shopping Problem Introduced by Blazewicz et al. (2010), where a customer wants to buy a list of products at the lowest possible total cost from shops which offer discounts when purchases exceed a certain threshold. The problem is NP-hard. InterpNET Humans are able to explain their reasoning. On the contrary, deep neural networks are not. This paper attempts to bridge this gap by introducing a new way to design interpretable neural networks for classification, inspired by physiological evidence of the human visual system’s inner-workings. This paper proposes a neural network design paradigm, termed InterpNET, which can be combined with any existing classification architecture to generate natural language explanations of the classifications. The success of the module relies on the assumption that the network’s computation and reasoning is represented in its internal layer activations. While in principle InterpNET could be applied to any existing classification architecture, it is evaluated via an image classification and explanation task. Experiments on a CUB bird classification and explanation dataset show qualitatively and quantitatively that the model is able to generate high-quality explanations. While the current state-of-the-art METEOR score on this dataset is 29.2, InterpNET achieves a much higher METEOR score of 37.9. Interpolation Consistency Training(ICT) We introduce Interpolation Consistency Training (ICT), a simple and computation efficient algorithm for training Deep Neural Networks in the semi-supervised learning paradigm. ICT encourages the prediction at an interpolation of unlabeled points to be consistent with the interpolation of the predictions at those points. In classification problems, ICT moves the decision boundary to low-density regions of the data distribution. Our experiments show that ICT achieves state-of-the-art performance when applied to standard neural network architectures on the CIFAR-10 and SVHN benchmark datasets. Interpretable BoW Network The standard approach to providing interpretability to deep convolutional neural networks (CNNs) consists of visualizing either their feature maps, or the image regions that contribute the most to the prediction. In this paper, we introduce an alternative strategy to interpret the results of a CNN. To this end, we leverage a Bag of visual Word representation within the network and associate a visual and semantic meaning to the corresponding codebook elements via the use of a generative adversarial network. The reason behind the prediction for a new sample can then be interpreted by looking at the visual representation of the most highly activated codeword. We then propose to exploit our interpretable BoW networks for adversarial example detection. To this end, we build upon the intuition that, while adversarial samples look very similar to real images, to produce incorrect predictions, they should activate codewords with a significantly different visual representation. We therefore cast the adversarial example detection problem as that of comparing the input image with the most highly activated visual codeword. As evidenced by our experiments, this allows us to outperform the state-of-the-art adversarial example detection methods on standard benchmarks, independently of the attack strategy. Interpretable Reasoning Network Multi-relation Question Answering is a challenging task, due to the requirement of elaborated analysis on questions and reasoning over multiple fact triples in knowledge base. In this paper, we present a novel model called Interpretable Reasoning Network that employs an interpretable, hop-by-hop reasoning process for question answering. The model dynamically decides which part of an input question should be analyzed at each hop; predicts a relation that corresponds to the current parsed results; utilizes the predicted relation to update the question representation and the state of the reasoning process; and then drives the next-hop reasoning. Experiments show that our model yields state-of-the-art results on two datasets. More interestingly, the model can offer traceable and observable intermediate predictions for reasoning analysis and failure diagnosis. Interpretive Structural Modelling(ISM) The development of ISM was made by Warfield in 1974. ISM is the process of collaborating distinct or related essentials into a simplified and an organized format. Hence, ISM is a methodology that seeks the interrelationships among the various elements considered and endows with a hierarchical and multilevel structure. ISM Inter-Rater Inter-Rater quantifies the reliability between multiple raters who evaluate a group of subjects. It calculates the group quantity, Fleiss kappa, and it improves on existing software by keeping information about each user and quantifying how each user agreed with the rest of the group. This is accomplished through permutations of user pairs. The software was written in Python, can be run in Linux, and the code is deposited in Zenodo and GitHub. This software can be used for evaluation of inter-rater reliability in systematic reviews, medical diagnosis algorithms, education applications, and others. Inter-rater Reliability (Concordance) In statistics, inter-rater reliability, inter-rater agreement, or concordance is the degree of agreement among raters. It gives a score of how much homogeneity, or consensus, there is in the ratings given by judges. It is useful in refining the tools given to human judges, for example by determining if a particular scale is appropriate for measuring a particular variable. If various raters do not agree, either the scale is defective or the raters need to be re-trained. Intersection over Union(IoU) Intersection over Union (IoU) is the most popular evaluation metric used in the object detection benchmarks. ➘ “Jaccard Index” Interval-based Prediction Uncertainty Bounding(IPUB) The problem of machine learning with missing values is common in many areas. A simple approach is to first construct a dataset without missing values simply by discarding instances with missing entries or by imputing a fixed value for each missing entry, and then train a prediction model with the new dataset. A drawback of this naive approach is that the uncertainty in the missing entries is not properly incorporated in the prediction. In order to evaluate prediction uncertainty, the multiple imputation (MI) approach has been studied, but the performance of MI is sensitive to the choice of the probabilistic model of the true values in the missing entries, and the computational cost of MI is high because multiple models must be trained. In this paper, we propose an alternative approach called the Interval-based Prediction Uncertainty Bounding (IPUB) method. The IPUB method represents the uncertainties due to missing entries as intervals, and efficiently computes the lower and upper bounds of the prediction results when all possible training sets constructed by imputing arbitrary values in the intervals are considered. The IPUB method can be applied to a wide class of convex learning algorithms including penalized least-squares regression, support vector machine (SVM), and logistic regression. We demonstrate the advantages of the IPUB method by comparing it with an existing method in numerical experiment with benchmark datasets. Intervention Analysis(IA) Intervention analysis is the application of modeling procedures for incorporating the effects of exogenous forces or interventions in time series analysis. These interventions, like policy changes, strikes, floods, and price changes, cause unusual changes in time series, resulting in unexpected, extraordinary observations known as outliers. Specifically, four types of outliers resulting from interventions, additive outliers (AO), innovational outliers (IO), temporary changes (TC), and level shifts (LS), have generated a lot of interest in literature. They pose nonstationarity challenges, which cannot be represented by the usual Box and Jenkins (1976) autoregressive integrated moving average (ARIMA) models alone. The most popular modeling procedures are those where ‘intervention’ detection and estimation is paramount. Box and Tiao (1975) pioneered this type of analysis in their quest to solve the Los Angeles pollution problem. Important extensions and contributions have been made by Chan … Intervention in Prediction Measure(IPM) Random forests are a popular method in many fields since they can be successfully applied to complex data, with a small sample size, complex interactions and correlations, mixed type predictors, etc. Furthermore, they provide variable importance measures that aid qualitative interpretation and also the selection of relevant predictors. However, most of these measures rely on the choice of a performance measure. But measures of prediction performance are not unique or there is not even a clear definition, as in the case of multivariate response random forests. A new alternative importance measure, called Intervention in Prediction Measure, is investigated. It depends on the structure of the trees, without depending on performance measures. It is compared with other well-known variable importance measures in different contexts, such as a classification problem with variables of different types, another classification problem with correlated predictor variables, and problems with multivariate responses and predictors of different types. IPMRF Intervention Time Series Analysis(ITSA) Intervention time series analysis (ITSA) is an important method for analysing the effect of sudden events on time series data. ITSA methods are quasi-experimental in nature and the validity of modelling with these methods depends upon assumptions about the timing of the intervention and the response of the process to it. Interventional Robustness Score The ability to learn disentangled representations that split underlying sources of variation in high dimensional, unstructured data is of central importance for data efficient and robust use of neural networks. Various approaches aiming towards this goal have been proposed in the recent time — validating existing work is hence a crucial task to guide further development. Previous validation methods focused on shared information between generative factors and learned features. The effects of rare events or cumulative influences from multiple factors on encodings, however, remain uncaptured. Our experiments show that this already becomes noticeable in a simple, noise free dataset. This is why we introduce the interventional robustness score, which provides a quantitative evaluation of robustness in learned representations with respect to interventions on generative factors and changing nuisance factors. We show how this score can be estimated from labeled observational data, that may be confounded, and further provide an efficient algorithm that scales linearly in the dataset size. The benefits of our causally motivated framework are illustrated in extensive experiments. Intra- and Inter-epoch Temporal Context Network(IITNet) This study proposes a novel deep learning model, called IITNet, to learn intra- and inter-epoch temporal contexts from a raw single channel electroencephalogram (EEG) for automatic sleep stage scoring. When sleep experts identify the sleep stage of a 30-second PSG data called an epoch, they investigate the sleep-related events such as sleep spindles, K-complex, and frequency components from local segments of an epoch (sub-epoch) and consider the relations between sleep-related events of successive epochs to follow the transition rules. Inspired by this, IITNet learns how to encode sub-epoch into representative feature via a deep residual network, then captures contextual information in the sequence of representative features via BiLSTM. Thus, IITNet can extract features in sub-epoch level and consider temporal context not only between epochs but also in an epoch. IITNet is an end-to-end architecture and does not need any preprocessing, handcrafted feature design, balanced sampling, pre-training, or fine-tuning. Our model was trained and evaluated in Sleep-EDF and MASS datasets and outperformed other state-of-the-art results on both the datasets with the overall accuracy (ACC) of 84.0% and 86.6%, macro F1-score (MF1) of 77.7 and 80.8, and Cohen’s kappa of 0.78 and 0.80 in Sleep-EDF and MASS, respectively. Intrablocks Correspondence Analysis(IBCA) We propose a new method to describe contingency tables with double partition structures in columns and rows. Furthermore, we propose new superimposed representations, based on the introduction of variable dilations for the partial clouds associated with the partitions of the columns and the rows. pamctdp Intra-Class Correlation(ICC) In statistics, the intraclass correlation (or the intraclass correlation coefficient, abbreviated ICC) is a descriptive statistic that can be used when quantitative measurements are made on units that are organized into groups. It describes how strongly units in the same group resemble each other. While it is viewed as a type of correlation, unlike most other correlation measures it operates on data structured as groups, rather than data structured as paired observations. The intraclass correlation is commonly used to quantify the degree to which individuals with a fixed degree of relatedness (e.g. full siblings) resemble each other in terms of a quantitative trait. Another prominent application is the assessment of consistency or reproducibility of quantitative measurements made by different observers measuring the same quantity. ICC.Sample.Size Intra-Class Variation Isolation in Conditional GAN(IVI-GAN) Current state-of-the-art conditional generative adversarial networks (C-GANs) require strong supervision via labeled datasets in order to generate images with continuously adjustable, disentangled semantics. In this paper we introduce a new formulation of the C-GAN that is able to learn realistic models with continuous, semantically meaningful input parameters and which has the advantage of requiring only the weak supervision of binary attribute labels. We coin the method intra-class variation isolation (IVI) and the resulting network the IVI-GAN. The method allows continuous control over the attributes in synthesised images where precise labels are not readily available. For example, given only labels found using a simple classifier of ambient / non-ambient lighting in images, IVI has enabled us to learn a generative face-image model with controllable lighting that is disentangled from other factors in the synthesised images, such as the identity. We evaluate IVI-GAN on the CelebA and CelebA-HQ datasets, learning to disentangle attributes such as lighting, pose, expression and age, and provide a quantitative comparison of IVI-GAN with a classical continuous C-GAN. Intra-Ensemble Improving model performance is always the key problem in machine learning including deep learning. However, stand-alone neural networks always suffer from marginal effect when stacking more layers. At the same time, ensemble is a useful technique to further enhance model performance. Nevertheless, training several independent stand-alone deep neural networks costs multiple resources. In this work, we propose Intra-Ensemble, an end-to-end strategy with stochastic training operations to train several sub-networks simultaneously within one neural network. Additional parameter size is marginal since the majority of parameters are mutually shared. Meanwhile, stochastic training increases the diversity of sub-networks with weight sharing, which significantly enhances intra-ensemble performance. Extensive experiments prove the applicability of intra-ensemble on various kinds of datasets and network architectures. Our models achieve comparable results with the state-of-the-art architectures on CIFAR-10 and CIFAR-100. Intrinsic Credible Regions This paper defines intrinsic credible regions, a method to produce objective Bayesian credible regions which only depends on the assumed model and the available data. Lowest posterior loss (LPL) regions are defined as Bayesian credible regions which contain values of minimum posterior expected loss: they depend both on the loss function and on the prior specification. An invariant, information-theory based loss function, the intrinsic discrepancy is argued to be appropriate for scientific communication. Intrinsic credible regions are the lowest posterior loss regions with respect to the intrinsic discrepancy loss and the appropriate reference prior. The proposed procedure is completely general, and it is invariant under both reparametrization and marginalization. The exact derivation of intrinsic credible regions often requires numerical integration, but good analytical approximations are provided. Special attention is given to one-dimensional intrinsic credible intervals; their coverage properties show that they are always approximate (and sometimes exact) frequentist confidence intervals. Intrinsic Dimension(ID) In signal processing of multidimensional signals, for example in computer vision, the intrinsic dimension of the signal describes how many variables are needed to represent the signal. For a signal of N variables, its intrinsic dimension M satisfies 0 = M = N. Usually the intrinsic dimension of a signal relates to variables defined in a Cartesian coordinate system. In general, however, it is also possible to describe the concept for non-Cartesian coordinates, for example, using polar coordinates. IDmining Intrinsic Style Transfer We explore neural painters, a generative model for brushstrokes learned from a real non-differentiable and non-deterministic painting program. We show that when training an agent to ‘paint’ images using brushstrokes, using a differentiable neural painter leads to much faster convergence. We propose a method for encouraging this agent to follow human-like strokes when reconstructing digits. We also explore the use of a neural painter as a differentiable image parameterization. By directly optimizing brushstrokes to activate neurons in a pre-trained convolutional network, we can directly visualize ImageNet categories and generate ‘ideal’ paintings of each class. Finally, we present a new concept called intrinsic style transfer. By minimizing only the content loss from neural style transfer, we allow the artistic medium, in this case, brushstrokes, to naturally dictate the resulting style. Introspection Learning Traditional reinforcement learning agents learn from experience, past or present, gained through interaction with their environment. Our approach synthesizes experience, without requiring an agent to interact with their environment, by asking the policy directly ‘Are there situations X, Y, and Z, such that in these situations you would select actions A, B, and C?’ In this paper we present Introspection Learning, an algorithm that allows for the asking of these types of questions of neural network policies. Introspection Learning is reinforcement learning algorithm agnostic and the states returned may be used as an indicator of the health of the policy or to shape the policy in a myriad of ways. We demonstrate the usefulness of this algorithm both in the context of speeding up training and improving robustness with respect to safety constraints. Invariance Induction Framework Data representations that contain all the information about target variables but are invariant to nuisance factors benefit supervised learning algorithms by preventing them from learning associations between these factors and the targets, thus reducing overfitting. We present a novel unsupervised invariance induction framework for neural networks that learns a split representation of data through competitive training between the prediction task and a reconstruction task coupled with disentanglement, without needing any labeled information about nuisance factors or domain knowledge. We describe an adversarial instantiation of this framework and provide analysis of its working. Our unsupervised model outperforms state-of-the-art methods, which are supervised, at inducing invariance to inherent nuisance factors, effectively using synthetic data augmentation to learn invariance, and domain adaptation. Our method can be applied to any prediction task, eg., binary/multi-class classification or regression, without loss of generality. Invariant Causal Prediction(ICP) InvariantCausalPrediction Invariant Coordinate Selection(ICS) A general method for exploring multivariate data by comparing different estimates of multivariate scatter is presented. The method is based upon the eigenvalue-eigenvector decomposition of one scatter matrix relative to another. In particular, it is shown that the eigenvectors can be used to generate an affine invariant coordinate system for the multivariate data. Consequently, we view this method as a method for invariant coordinate selection (ICS). By plotting the data with respect to this new invariant coordinate system, various data structures can be revealed. For example, under certain independent components models, it is shown that the invariant coordinates correspond to the independent components. Another example pertains to mixtures of elliptical distributions. In this case, it is shown that a subset of the invariant coordinates corresponds to Fisher’s linear discriminant subspace, even though the class identi cations of the data points are unknown. Invariant Co-Ordinate Selection Multivariate Outlier Detection With ICS ICS Invariant Encoding Generative Adversarial Network(IVE-GAN) Generative adversarial networks (GANs) are a powerful framework for generative tasks. However, they are difficult to train and tend to miss modes of the true data generation process. Although GANs can learn a rich representation of the covered modes of the data in their latent space, the framework misses an inverse mapping from data to this latent space. We propose Invariant Encoding Generative Adversarial Networks (IVE-GANs), a novel GAN framework that introduces such a mapping for individual samples from the data by utilizing features in the data which are invariant to certain transformations. Since the model maps individual samples to the latent space, it naturally encourages the generator to cover all modes. We demonstrate the effectiveness of our approach in terms of generative performance and learning rich representations on several datasets including common benchmark image generation tasks. Invariant Transformer Net Convolutional Neural Networks (CNNs) define an exceptionally powerful class of models for image classification, but the theoretical background and the understanding of how invariances to certain transformations are learned is limited. In a large scale screening with images modified by different affine and nonaffine transformations of varying magnitude, we analyzed the behavior of the CNN architectures AlexNet and ResNet. If the magnitude of different transformations does not exceed a class- and transformation dependent threshold, both architectures show invariant behavior. In this work we furthermore introduce a new learnable module, the Invariant Transformer Net, which enables us to learn differentiable parameters for a set of affine transformations. This allows us to extract the space of transformations to which the CNN is invariant and its class prediction robust. Inverse Autoregressive Flows(IAF) ➘ “Neural Autoregressive Flow” Inverse Classification Inverse classification is the process of perturbing an instance in a meaningful way such that it is more likely to conform to a specific class. Historical methods that address such a problem are often framed to leverage only a single classifier, or specific set of classifiers. These works are often accompanied by naive assumptions. In this work we propose generalized inverse classification (GIC), which avoids restricting the classification model that can be used. We incorporate this formulation into a refined framework in which GIC takes place. Under this framework, GIC operates on features that are immediately actionable. Each change incurs an individual cost, either linear or non-linear. Such changes are subjected to occur within a specified level of cumulative change (budget). Furthermore, our framework incorporates the estimation of features that change as a consequence of direct actions taken (indirectly changeable features). To solve such a problem, we propose three real-valued heuristic-based methods and two sensitivity analysis-based comparison methods, each of which is evaluated on two freely available real-world datasets. Our results demonstrate the validity and benefits of our formulation, framework, and methods. Inverse Conditional Probability Weighting(ICPW) Estimating the average treatment causal effect in clustered data often involves dealing with unmeasured cluster-specific confounding variables. Such variables may be correlated with the measured unit covariates and outcome. When the correlations are ignored, the causal effect estimation can be biased. By utilizing sufficient statistics, we propose an inverse conditional probability weighting (ICPW) method, which is robust to both (i) the correlation between the unmeasured cluster-specific confounding variable and the covariates and (ii) the correlation between the unmeasured cluster-specific confounding variable and the outcome. Assumptions and conditions for the ICPW method are presented. We establish the asymptotic properties of the proposed estimators. Simulation studies and a case study are presented for illustration. Inverse Distance Weighting(IDW) Inverse Distance Weighting (IDW) is a type of deterministic method for multivariate interpolation with a known scattered set of points. The assigned values to unknown points are calculated with a weighted average of the values available at the known points. The name given to this type of methods was motivated by the weighted average applied, since it resorts to the inverse of the distance to each known point (‘amount of proximity’) when assigning weights. geosptdb Inverse Propensity Scores(IPS) Offline Comparison of Ranking Functions using Randomized Data Inverse Reinforcement Learning(IRL) Inverse Reinforcement Learning (IRL) in Markov decision processes is the problem of extracting a reward function given observed, optimal behavior. Inverse Reinforcement Learning Method for Architecture Search(IRLAS) In this paper, we propose an inverse reinforcement learning method for architecture search (IRLAS), which trains an agent to learn to search network structures that are topologically inspired by human-designed network. Most existing architecture search approaches totally neglect the topological characteristics of architectures, which results in complicated architecture with a high inference latency. Motivated by the fact that human-designed networks are elegant in topology with a fast inference speed, we propose a mirror stimuli function inspired by biological cognition theory to extract the abstract topological knowledge of an expert human-design network (ResNeXt). To avoid raising a too strong prior over the search space, we introduce inverse reinforcement learning to train the mirror stimuli function and exploit it as a heuristic guidance for architecture search, easily generalized to different architecture search algorithms. On CIFAR-10, the best architecture searched by our proposed IRLAS achieves 2.60% error rate. For ImageNet mobile setting, our model achieves a state-of-the-art top-1 accuracy 75.28%, while being 2~4x faster than most auto-generated architectures. A fast version of this model achieves 10% faster than MobileNetV2, while maintaining a higher accuracy. Inverse Reward Design(IRD) Autonomous agents optimize the reward function we give them. What they don’t know is how hard it is for us to design a reward function that actually captures what we want. When designing the reward, we might think of some specific training scenarios, and make sure that the reward will lead to the right behavior in those scenarios. Inevitably, agents encounter new scenarios (e.g., new types of terrain) where optimizing that same reward may lead to undesired behavior. Our insight is that reward functions are merely observations about what the designer actually wants, and that they should be interpreted in the context in which they were designed. We introduce inverse reward design (IRD) as the problem of inferring the true objective based on the designed reward and the training MDP. We introduce approximate methods for solving IRD problems, and use their solution to plan risk-averse behavior in test MDPs. Empirical results suggest that this approach can help alleviate negative side effects of misspecified reward functions and mitigate reward hacking. Inverse Transport Network We introduce inverse transport networks as a learning architecture for inverse rendering problems where, given input image measurements, we seek to infer physical scene parameters such as shape, material, and illumination. During training, these networks are evaluated not only in terms of how close they can predict groundtruth parameters, but also in terms of whether the parameters they produce can be used, together with physically-accurate graphics renderers, to reproduce the input image measurements. To enable training of inverse transport networks using stochastic gradient descent, we additionally create a general-purpose, physically-accurate differentiable renderer, which can be used to estimate derivatives of images with respect to arbitrary physical scene parameters. Our experiments demonstrate that inverse transport networks can be trained efficiently using differentiable rendering, and that they generalize to scenes with completely unseen geometry and illumination better than networks trained without appearance- matching regularization. Inverse Visual Question Answering(iVQA) In recent years, visual question answering (VQA) has become topical as a long-term goal to drive computer vision and multi-disciplinary AI research. The premise of VQA’s significance, is that both the image and textual question need to be well understood and mutually grounded in order to infer the correct answer. However, current VQA models perhaps understand’ less than initially hoped, and instead master the easier task of exploiting cues given away in the question and biases in the answer distribution. In this paper we propose the inverse problem of VQA (iVQA), and explore its suitability as a benchmark for visuo-linguistic understanding. The iVQA task is to generate a question that corresponds to a given image and answer pair. Since the answers are less informative than the questions, and the questions have less learnable bias, an iVQA model needs to better understand the image to be successful. We pose question generation as a multi-modal dynamic inference process and propose an iVQA model that can gradually adjust its focus of attention guided by both a partially generated question and the answer. For evaluation, apart from existing linguistic metrics, we propose a new ranking metric. This metric compares the ground truth question’s rank among a list of distractors, which allows the drawbacks of different algorithms and sources of error to be studied. Experimental results show that our model can generate diverse, grammatically correct and content correlated questions that match the given answer. Inverted Attention(IA) Improving object detectors against occlusion, blur and noise is a critical step to deploy detectors in real applications. Since it is not possible to exhaust all image defects through data collection, many researchers seek to generate hard samples in training. The generated hard samples are either images or feature maps with coarse patches dropped out in the spatial dimensions. Significant overheads are required in training the extra hard samples and/or estimating drop-out patches using extra network branches. In this paper, we improve object detectors using a highly efficient and fine-grain mechanism called Inverted Attention (IA). Different from the original detector network that only focuses on the dominant part of objects, the detector network with IA iteratively inverts attention on feature maps and puts more attention on complementary object parts, feature channels and even context. Our approach (1) operates along both the spatial and channels dimensions of the feature maps; (2) requires no extra training on hard samples, no extra network parameters for attention estimation, and no testing overheads. Experiments show that our approach consistently improved both two-stage and single-stage detectors on benchmark databases. Invertible Residual Network Reversible deep networks provide useful theoretical guarantees and have proven to be a powerful class of functions in many applications. Usually, they rely on analytical inverses using dimension splitting, fundamentally constraining their structure compared to common architectures. Based on recent links between ordinary differential equations and deep networks, we provide a sufficient condition when standard ResNets are invertible. This condition allows unconstrained architectures for residual blocks, while only requiring an adaption to their regularization scheme. We numerically compute their inverse, which has O(1) memory cost and computational cost of 5-20 forward passes. Finally, we show that invertible ResNets perform on par with standard ResNets on classifying MNIST and CIFAR10 images. Iodide Iodide lets you do data science entirely in your browser. Create, share, collaborate, and reproduce powerful reports and visualizations with tools you already know. IOHprofiler IOHprofiler is a new tool for analyzing and comparing iterative optimization heuristics. Given as input algorithms and problems written in C or Python, it provides as output a statistical evaluation of the algorithms’ performance by means of the distribution on the fixed-target running time and the fixed-budget function values. In addition, IOHprofiler also allows to track the evolution of algorithm parameters, making our tool particularly useful for the analysis, comparison, and design of (self-)adaptive algorithms. IOHprofiler is a ready-to-use software. It consists of two parts: an experimental part, which generates the running time data, and a post-processing part, which produces the summarizing comparisons and statistical evaluations. The experimental part is build on the COCO software, which has been adjusted to cope with optimization problems that are formulated as functions $f:\mathcal{S}^n \to \R$ with $\mathcal{S}$ being a discrete alphabet of integers. The post-processing part is our own work. It can be used as a stand-alone tool for the evaluation of running time data of arbitrary benchmark problems. It accepts as input files not only the output files of IOHprofiler, but also original COCO data files. The post-processing tool is designed for an interactive evaluation, allowing the user to chose the ranges and the precision of the displayed data according to his/her needs. IOHprofiler is available on GitHub at \url{https://…/IOHprofiler}. I-Optimality The generalized linear model plays an important role in statistical analysis and the related design issues are undoubtedly challenging. The state-of-the-art works mostly apply to design criteria on the estimates of regression coefficients. It is of importance to study optimal designs for generalized linear models, especially on the prediction aspects. In this work, we propose a prediction-oriented design criterion, I-optimality, and develop an efficient sequential algorithm of constructing I-optimal designs for generalized linear models. Through establishing the General Equivalence Theorem of the I-optimality for generalized linear models, we obtain an insightful understanding for the proposed algorithm on how to sequentially choose the support points and update the weights of support points of the design. The proposed algorithm is computationally efficient with guaranteed convergence property. Numerical examples are conducted to evaluate the feasibility and computational efficiency of the proposed algorithm. IPMAN We present a new methodology, called IPMAN, that combines interior point methods and generative adversarial networks to solve constrained optimization problems with feasible sets that are non-convex or not explicitly defined. Our methodology produces {\epsilon}-optimal solutions and demonstrates that, when there are multiple global optima, it learns a distribution over the optimal set. We apply our approach to synthetic examples to demonstrate its effectiveness and to a problem in radiation therapy treatment optimization with a non-convex feasible set. IPOC The performance of a reinforcement learning algorithm can vary drastically during learning because of exploration. Existing algorithms provide little information about their current policy’s quality before executing it, and thus have limited use in high-stakes applications like healthcare. In this paper, we address such a lack of accountability by proposing that algorithms output policy certificates, which upper bound the suboptimality in the next episode, allowing humans to intervene when the certified quality is not satisfactory. We further present a new learning framework (IPOC) for finite-sample analysis with policy certificates, and develop two IPOC algorithms that enjoy guarantees for the quality of both their policies and certificates. IRGAN Generative adversarial nets (GANs) have been widely studied during the recent development of deep learning and unsupervised learning. With an adversarial training mechanism, GAN manages to train a generative model to fit the underlying unknown real data distribution under the guidance of the discriminative model estimating whether a data instance is real or generated. Such a framework is originally proposed for fitting continuous data distribution such as images, thus it is not straightforward to be directly applied to information retrieval scenarios where the data is mostly discrete, such as IDs, text and graphs. In this tutorial, we focus on discussing the GAN techniques and the variants on discrete data fitting in various information retrieval scenarios. (i) We introduce the fundamentals of GAN framework and its theoretic properties; (ii) we carefully study the promising solutions to extend GAN onto discrete data generation; (iii) we introduce IRGAN, the fundamental GAN framework of fitting single ID data distribution and the direct application on information retrieval; (iv) we further discuss the task of sequential discrete data generation tasks, e.g., text generation, and the corresponding GAN solutions; (v) we present the most recent work on graph/network data fitting with node embedding techniques by GANs. Meanwhile, we also introduce the relevant open-source platforms such as IRGAN and Texygen to help audience conduct research experiments on GANs in information retrieval. Finally, we conclude this tutorial with a comprehensive summarization and a prospect of further research directions for GANs in information retrieval. Iris Today’s conversational agents are restricted to simple standalone commands. In this paper, we present Iris, an agent that draws on human conversational strategies to combine commands, allowing it to perform more complex tasks that it has not been explicitly designed to support: for example, composing one command to ‘plot a histogram’ with another to first ‘log-transform the data’. To enable this complexity, we introduce a domain specific language that transforms commands into automata that Iris can compose, sequence, and execute dynamically by interacting with a user through natural language, as well as a conversational type system that manages what kinds of commands can be combined. We have designed Iris to help users with data science tasks, a domain that requires support for command combination. In evaluation, we find that data scientists complete a predictive modeling task significantly faster (2.6 times speedup) with Iris than a modern non-conversational programming environment. Iris supports the same kinds of commands as today’s agents, but empowers users to weave together these commands to accomplish complex goals. Iris R-CNN Despite the significant advances in iris segmentation, accomplishing accurate iris segmentation in non-cooperative environment remains a grand challenge. In this paper, we present a deep learning framework, referred to as Iris R-CNN, to offer superior accuracy for iris segmentation. The proposed framework is derived from Mask R-CNN, and several novel techniques are proposed to carefully explore the unique characteristics of iris. First, we propose two novel networks: (i) Double-Circle Region Proposal Network (DC-RPN), and (ii) Double-Circle Classification and Regression Network (DC-CRN) to take into account the iris and pupil circles to maximize the accuracy for iris segmentation. Second, we propose a novel normalization scheme for Regions of Interest (RoIs) to facilitate a radically new pooling operation over a double-circle region. Experimental results on two challenging iris databases, UBIRIS.v2 and MICHE, demonstrate the superior accuracy of the proposed approach over other state-of-the-art methods. IRNet We present a neural approach called IRNet for complex and cross-domain Text-to-SQL. IRNet aims to address two challenges: 1) the mismatch between intents expressed in natural language (NL) and the implementation details in SQL; 2) the challenge in predicting columns caused by the large number of out-of-domain words. Instead of end-to-end synthesizing a SQL query, IRNet decomposes the synthesis process into three phases. In the first phase, IRNet performs a schema linking over a question and a database schema. Then, IRNet adopts a grammar-based neural model to synthesize a SemQL query which is an intermediate representation that we design to bridge NL and SQL. Finally, IRNet deterministically infers a SQL query from the synthesized SemQL query with domain knowledge. On the challenging Text-to-SQL benchmark Spider, IRNet achieves 46.7% accuracy, obtaining 19.5% absolute improvement over previous state-of-the-art approaches. At the time of writing, IRNet achieves the first position on the Spider leaderboard. Iroko Recent networking research has identified that data-driven congestion control (CC) can be more efficient than traditional CC in TCP. Deep reinforcement learning (RL), in particular, has the potential to learn optimal network policies. However, RL suffers from instability and over-fitting, deficiencies which so far render it unacceptable for use in datacenter networks. In this paper, we analyze the requirements for RL to succeed in the datacenter context. We present a new emulator, Iroko, which we developed to support different network topologies, congestion control algorithms, and deployment scenarios. Iroko interfaces with the OpenAI gym toolkit, which allows for fast and fair evaluation of different RL and traditional CC algorithms under the same conditions. We present initial benchmarks on three deep RL algorithms compared to TCP New Vegas and DCTCP. Our results show that these algorithms are able to learn a CC policy which exceeds the performance of TCP New Vegas on a dumbbell and fat-tree topology. We make our emulator open-source and publicly available: https://…/iroko Irregular Convolutional Neural Network(ICNN) Convolutional kernels are basic and vital components of deep Convolutional Neural Networks (CNN). In this paper, we equip convolutional kernels with shape attributes to generate the deep Irregular Convolutional Neural Networks (ICNN). Compared to traditional CNN applying regular convolutional kernels like ${3\times3}$, our approach trains irregular kernel shapes to better fit the geometric variations of input features. In other words, shapes are learnable parameters in addition to weights. The kernel shapes and weights are learned simultaneously during end-to-end training with the standard back-propagation algorithm. Experiments for semantic segmentation are implemented to validate the effectiveness of our proposed ICNN. Irrelevant Variability We say that data variability is correlated with a specific task “if the removal of this variability from the data deteriorates (on average) the results of clustering or retrieval”. Variability is irrelevant if it is “maintained in the data” but “not correlated with the specific task” ISA Mapper Domain specific accelerators present new challenges and opportunities for code generation onto novel instruction sets, communication fabrics, and memory architectures. In this paper we introduce an intermediate representation (IR) which enables both deep learning computational kernels and hardware capabilities to be described in the same IR. We then formulate and apply instruction mapping to determine the possible ways a computation can be performed on a hardware system. Next, our scheduler chooses a specific mapping and determines the data movement and computation order. In order to manage the large search space of mappings and schedules, we developed a flexible framework that allows heuristics, cost models, and potentially machine learning to facilitate this search problem. With this system, we demonstrate the automated extraction of matrix multiplication kernels out of recent deep learning kernels such as depthwise-separable convolution. In addition, we demonstrate two to five times better performance on DeepBench sized GEMMs and GRU RNN execution when compared to state-of-the-art (SOTA) implementations on new hardware and up to 85% of the performance for SOTA implementations on existing hardware. ISBNet Recent years have witnessed growing interests in designing efficient neural networks and neural architecture search (NAS). Although remarkable efficiency and accuracy have been achieved, existing expert designed and NAS models neglect that input instances are of varying complexity thus different amount of computation is required. Therefore, inference with a fixed model that processes all instances through the same transformations would waste plenty of computational resources. Customizing the model capacity in an instance-aware manner is highly demanded. In this paper, we introduce a novel network ISBNet to address this issue, which supports efficient instance-level inference by selectively bypassing transformation branches of infinitesimal importance weight. We also propose lightweight hypernetworks SelectionNet to generate these importance weights instance-wisely. Extensive experiments have been conducted to evaluate the efficiency of ISBNet and the results show that ISBNet achieves extremely efficient inference comparing to existing networks. For example, ISBNet takes only 12.45% parameters and 45.79% FLOPs of the state-of-the-art efficient network ShuffleNetV2 with comparable accuracy. Ising Model The Ising model, named after the physicist Ernst Ising, is a mathematical model of ferromagnetism in statistical mechanics. The model consists of discrete variables that represent magnetic dipole moments of atomic spins that can be in one of two states (+1 or -1). The spins are arranged in a graph, usually a lattice, allowing each spin to interact with its neighbors. The model allows the identification of phase transitions, as a simplified model of reality. The two-dimensional square-lattice Ising model is one of the simplest statistical models to show a phase transition. The Ising model was invented by the physicist Wilhelm Lenz (1920), who gave it as a problem to his student Ernst Ising. The one-dimensional Ising model has no phase transition and was solved by Ising (1925) himself in his 1924 thesis. The two-dimensional square lattice Ising model is much harder, and was given an analytic description much later, by Lars Onsager (1944). It is usually solved by a transfer-matrix method, although there exist different approaches, more related to quantum field theory. In dimensions greater than four, the phase transition of the Ising model is described by mean field theory. Interpreting the Ising Model: The Input Matters Isolation Forest Most existing model-based approaches to anomaly detection construct a profile of normal instances, then identify instances that do not conform to the normal profile as anomalies. This paper proposes a fundamentally different model-based method that explicitly isolates anomalies instead of profiles normal points. To our best knowledge, the concept of isolation has not been explored in current literature. The use of isolation enables the proposed method, iForest, to exploit sub-sampling to an extent that is not feasible in existing methods, creating an algorithm which has a linear time complexity with a low constant and a low memory requirement. Our empirical evaluation shows that iForest performs favourably to ORCA, a near-linear time complexity distance-based method, LOF and Random Forests in terms of AUC and processing time, and especially in large data sets. iForest also works well in high dimensional problems which have a large number of irrelevant attributes, and in situations where training set does not contain any anomalies. ➚ “Extended Isolation Forest” Isomeric Condition Prevalent matrix completion theories reply on an assumption that the locations of missing data are distributed independently and randomly (i.e., uniform sampling). Nevertheless, the reason for an observation being missing often depends on the unseen observations themselves, and thus the locations of the missing data in practice usually occur in a correlated fashion (i.e., nonuniform sampling) rather than independently. To break through the limits of uniform sampling, we introduce in this work a new hypothesis called isomeric condition, which is provably weaker than the assumption of uniform sampling. Equipped with this new tool, we prove a collection of theorems for missing data recovery as well as matrix completion. In particular, we prove that the exact solutions that identify the target matrix are included as critical points by the commonly used bilinear programs. Even more, when an extra condition called relative well-conditionedness is obeyed as well, we prove that the local optimality of the exact solutions is guaranteed in a deterministic fashion. Among other things, we study in detail a Schatten quasi-norm induced method termed isomeric dictionary pursuit (IsoDP), and we show that IsoDP exhibits some distinct behaviors absent in the traditional bilinear programs. Isometry Blind Dynamic Time Warping(IBDTW) In this work, we explore the problem of aligning two time-ordered point clouds which are spatially transformed and re-parameterized versions of each other. This has a diverse array of applications such as cross modal time series synchronization (e.g. MOCAP to video) and alignment of discretized curves in images. Most other works that address this problem attempt to jointly uncover a spatial alignment and correspondences between the two point clouds, or to derive local invariants to spatial transformations such as curvature before computing correspondences. By contrast, we sidestep spatial alignment completely by using self-similarity matrices (SSMs) as a proxy to the time-ordered point clouds, since self-similarity matrices are blind to isometries and respect global geometry. Our algorithm, dubbed ‘Isometry Blind Dynamic Time Warping’ (IBDTW), is simple and general, and we show that its associated dissimilarity measure lower bounds the L1 Gromov-Hausdorff distance between the two point sets when restricted to warping paths. We also present a local, partial alignment extension of IBDTW based on the Smith Waterman algorithm. This eliminates the need for tedious manual cropping of time series, which is ordinarily necessary for global alignment algorithms to function properly. Isotonic Proportional Hazards Model isoph Isotonic Regression(IR) General isotonic regression is approximating given series of values with values satisfying a given partial ordering. The idea is to fit a piecewise-constant non-decreasing function to the data. http://…/Isotonic_regression ISOTYPE Isotype (International System of TYpographic Picture Education) is a method of showing social, technological, biological and historical connections in pictorial form. It was first known as the Vienna Method of Pictorial Statistics (Wiener Methode der Bildstatistik), due to its having been developed at the Gesellschafts- und Wirtschaftsmuseum in Wien (Social and economic museum of Vienna) between 1925 and 1934. The founding director of this museum, Otto Neurath, was the initiator and chief theorist of the Vienna Method. The term Isotype was applied to the method around 1935, after its key practitioners were forced to leave Vienna by the rise of Austrian fascism. http://…/Haroz_CHI_2015.pdf IT Operations Analytics(ITOA) In the fields of information technology and systems management, IT Operations Analytics (ITOA) is an approach or method applied to application software designed to retrieve, analyze and report data for IT operations. ITOA has been described as applying big data analytics to the IT realm. In its Hype Cycle Report, Gartner rated the business impact of ITOA as being ‘high’, meaning that its use will see businesses enjoy significantly increased revenue or cost saving opportunities. IT Operations Analytics (ITOA) (also known as Advanced Operational Analytics, or IT Data Analytics) technologies are primarily used to discover complex patterns in high volumes of often ‘noisy’ IT system availability and performance data. Forrester Research defines IT analytics as ‘The use of mathematical algorithms and other innovations to extract meaningful information from the sea of raw data collected by management and monitoring technologies.’ Taking a Horizontal Approach to Big Data for Better IT and Business Outcomes Item Explorer Item explorer is an approach to provide insights into a ubiquitous class of business questions like: · what kind of products do customers typically buy together? · what kind of web pages (on a web site) do users visit? · what combination of symptoms do patients have? · … For this class of business questions, the exponential number of combinations poses a severe practical challenge. Due to the explorative nature, visualization is well-suited for such business questions. More specifically, a visualization can provide a unique representation for both revealing insights and for intuitive user interaction based on business knowledge or own hypotheses. Item Factor Analysis ➘ “Item Response Theory” ifaTools Item Response Theory(IRT) Item response theory (IRT) models are a class of statistical models used to describe the response behaviors of individuals to a set of items having a certain number of options. They are adopted by researchers in social science, particularly in the analysis of performance or attitudinal data, in psychology, education, medicine, marketing and other fields where the aim is to measure latent constructs. Most IRT analyses use parametric models that rely on assumptions that often are not satisfied. In such cases, a nonparametric approach might be preferable; nevertheless, there are not many software implementations allowing to use that. MLCIRTwithin IteRank Neighbor-based collaborative ranking (NCR) techniques follow three consecutive steps to recommend items to each target user: first they calculate the similarities among users, then they estimate concordance of pairwise preferences to the target user based on the calculated similarities. Finally, they use estimated pairwise preferences to infer the total ranking of items for the target user. This general approach faces some problems as the rank data is usually sparse as users usually have compared only a few pairs of items and consequently, the similarities among users is calculated based on limited information and is not accurate enough for inferring true values of preference concordance and can lead to an invalid ranking of items. This article presents a novel framework, called IteRank, that models the data as a bipartite network containing users and pairwise preferences. It then simultaneously refines users’ similarities and preferences’ concordances using a random walk method on this graph structure. It uses the information in this first step in another network structure for simultaneously adjusting the concordances of preferences and rankings of items. Using this approach, IteRank can overcome some existing problems caused by the sparsity of the data. Experimental results show that IteRank improves the performance of recommendation compared to the state of the art NCR techniques that use the traditional NCR framework for recommendation. Iterated Filtering Iterated filtering algorithms are a tool for maximum likelihood inference on partially observed dynamical systems. Stochastic perturbations to the unknown parameters are used to explore the parameter space. Applying sequential Monte Carlo (the particle filter) to this extended model results in the selection of the parameter values that are more consistent with the data. Appropriately constructed procedures, iterating with successively diminished perturbations, converge to the maximum likelihood estimate. Iterated filtering methods have so far been used most extensively to study infectious disease transmission dynamics. Case studies include cholera, Ebola virus, influenza, malaria, HIV, pertussis, poliovirus and measles. Other areas which have been proposed to be suitable for these methods include ecological dynamics and finance. The perturbations to the parameter space play several different roles. Firstly, they smooth out the likelihood surface, enabling the algorithm to overcome small-scale features of the likelihood during early stages of the global search. Secondly, Monte Carlo variation allows the search to escape from local minima. Thirdly, the iterated filtering update uses the perturbed parameter values to construct an approximation to the derivative of the log likelihood even though this quantity is not typically available in closed form. Fourthly, the parameter perturbations help to overcome numerical difficulties that can arise during sequential Monte Carlo. Accelerate iterated filtering Iterated Local Model(ILM) On-line social networks, such as in Facebook and Twitter, are often studied from the perspective of friendship ties between agents in the network. Adversarial ties, however, also play an important role in the structure and function of social networks, but are often hidden. Underlying generative mechanisms of social networks are predicted by structural balance theory, which postulates that triads of agents, prefer to be transitive, where friends of friends are more likely friends, or anti-transitive, where adversaries of adversaries become friends. The previously proposed Iterated Local Transitivity (ILT) and Iterated Local Anti-Transitivity (ILAT) models incorporated transitivity and anti-transitivity, respectively, as evolutionary mechanisms. These models resulted in graphs with many observable properties of social networks, such as low diameter, high clustering, and densification. We propose a new, generative model, referred to as the Iterated Local Model (ILM) for social networks synthesizing both transitive and anti-transitive triads over time. In ILM, we are given a countably infinite binary sequence as input, and that sequence determines whether we apply a transitive or an anti-transitive step. The resulting model exhibits many properties of complex networks observed in the ILT and ILAT models. In particular, for any input binary sequence, we show that asymptotically the model generates finite graphs that densify, have clustering coefficient bounded away from 0, have diameter at most 3, and exhibit bad spectral expansion. We also give a thorough analysis of the chromatic number, domination number, Hamiltonicity, and isomorphism types of induced subgraphs of ILM graphs. Iterative Amortized Inference Inference models are a key component in scaling variational inference to deep latent variable models, most notably as encoder networks in variational auto-encoders (VAEs). By replacing conventional optimization-based inference with a learned model, inference is amortized over data examples and therefore more computationally efficient. However, standard inference models are restricted to direct mappings from data to approximate posterior estimates. The failure of these models to reach fully optimized approximate posterior estimates results in an amortization gap. We aim toward closing this gap by proposing iterative inference models, which learn to perform inference optimization through repeatedly encoding gradients. Our approach generalizes standard inference models in VAEs and provides insight into several empirical findings, including top-down inference techniques. We demonstrate the inference optimization capabilities of iterative inference models and show that they outperform standard inference models on several benchmark data sets of images and text. Iterative Classification Algorithm(ICA) see also ➘ “Recurrent Collective Classification” Iterative Classroom Teaching We consider the machine teaching problem in a classroom-like setting wherein the teacher has to deliver the same examples to a diverse group of students. Their diversity stems from differences in their initial internal states as well as their learning rates. We prove that a teacher with full knowledge about the learning dynamics of the students can teach a target concept to the entire classroom using O(min{d,N} log(1/eps)) examples, where d is the ambient dimension of the problem, N is the number of learners, and eps is the accuracy parameter. We show the robustness of our teaching strategy when the teacher has limited knowledge of the learners’ internal dynamics as provided by a noisy oracle. Further, we study the trade-off between the learners’ workload and the teacher’s cost in teaching the target concept. Our experiments validate our theoretical results and suggest that appropriately partitioning the classroom into homogenous groups provides a balance between these two objectives. Iterative Compressed-Thresholding and K-Means(IcTKM) In this paper we show that the computational complexity of the Iterative Thresholding and K-Residual-Means (ITKrM) algorithm for dictionary learning can be significantly reduced by using dimensionality reduction techniques based on the Johnson-Lindenstrauss Lemma. We introduce the Iterative Compressed-Thresholding and K-Means (IcTKM) algorithm for fast dictionary learning and study its convergence properties. We show that IcTKM can locally recover a generating dictionary with low computational complexity up to a target error $\tilde{\varepsilon}$ by compressing $d$-dimensional training data into $m < d$ dimensions, where $m$ is proportional to $\log d$ and inversely proportional to the distortion level $\delta$ incurred by compressing the data. Increasing the distortion level $\delta$ reduces the computational complexity of IcTKM at the cost of an increased recovery error and reduced admissible sparsity level for the training data. For generating dictionaries comprised of $K$ atoms, we show that IcTKM can stably recover the dictionary with distortion levels up to the order $\delta \leq O(1/\sqrt{\log K})$. The compression effectively shatters the data dimension bottleneck in the computational cost of the ITKrM algorithm. For training data with sparsity levels $S \leq O(K^{2/3})$, ITKrM can locally recover the dictionary with a computational cost that scales as $O(d K \log(\tilde{\varepsilon}^{-1}))$ per training signal. We show that for these same sparsity levels the computational cost can be brought down to $O(\log^5 (d) K \log(\tilde{\varepsilon}^{-1}))$ with IcTKM, a significant reduction when high-dimensional data is considered. Our theoretical results are complemented with numerical simulations which demonstrate that IcTKM is a powerful, low-cost algorithm for learning dictionaries from high-dimensional data sets. Iterative Dichotomiser 3(ID3) In decision tree learning, ID3 (Iterative Dichotomiser 3) is an algorithm invented by Ross Quinlan used to generate a decision tree from a dataset. ID3 is the precursor to the C4.5 algorithm, and is typically used in the machine learning and natural language processing domains. Iterative Gradient Attack(IGA) Deep neural network has shown remarkable performance in solving computer vision and some graph evolved tasks, such as node classification and link prediction. However, the vulnerability of deep model has also been revealed by carefully designed adversarial examples generated by various adversarial attack methods. With the wider application of deep model in complex network analysis, in this paper we define and formulate the link prediction adversarial attack problem and put forward a novel iterative gradient attack (IGA) based on the gradient information in trained graph auto-encoder (GAE). To our best knowledge, it is the first time link prediction adversarial attack problem is defined and attack method is brought up. Not surprisingly, GAE was easily fooled by adversarial network with only a few links perturbed on the clean network. By conducting comprehensive experiments on different real-world data sets, we can conclude that most deep model based and other state-of-art link prediction algorithms cannot escape the adversarial attack just like GAE. We can benefit the attack as an efficient privacy protection tool from link prediction unknown violation, on the other hand, link prediction attack can be a robustness evaluation metric for current link prediction algorithm in attack defensibility. Iterative Method In computational mathematics, an iterative method is a mathematical procedure that generates a sequence of improving approximate solutions for a class of problems. A specific implementation of an iterative method, including the termination criteria, is an algorithm of the iterative method. An iterative method is called convergent if the corresponding sequence converges for given initial approximations. A mathematically rigorous convergence analysis of an iterative method is usually performed; however, heuristic-based iterative methods are also common. In the problems of finding the root of an equation (or a solution of a system of equations), an iterative method uses an initial guess to generate successive approximations to a solution. In contrast, direct methods attempt to solve the problem by a finite sequence of operations. In the absence of rounding errors, direct methods would deliver an exact solution (like solving a linear system of equations Ax=b by Gaussian elimination). Iterative methods are often the only choice for nonlinear equations. However, iterative methods are often useful even for linear problems involving a large number of variables (sometimes of the order of millions), where direct methods would be prohibitively expensive (and in some cases impossible) even with the best available computing power. Iterative Nonnegative Matrix Factorization(INOM) Matrix decomposition is ubiquitous and has applications in various fields like speech processing, data mining and image processing to name a few. Under matrix decomposition, nonnegative matrix factorization is used to decompose a nonnegative matrix into a product of two nonnegative matrices which gives some meaningful interpretation of the data. Thus, nonnegative matrix factorization has an edge over the other decomposition techniques. In this paper, we propose two novel iterative algorithms based on Majorization Minimization (MM)-in which we formulate a novel upper bound and minimize it to get a closed form solution at every iteration. Since the algorithms are based on MM, it is ensured that the proposed methods will be monotonic. The proposed algorithms differ in the updating approach of the two nonnegative matrices. The first algorithm-Iterative Nonnegative Matrix Factorization (INOM) sequentially updates the two nonnegative matrices while the second algorithm-Parallel Iterative Nonnegative Matrix Factorization (PARINOM) parallely updates them. We also prove that the proposed algorithms converge to the stationary point of the problem. Simulations were conducted to compare the proposed methods with the existing ones and was found that the proposed algorithms performs better than the existing ones in terms of computational speed and convergence. KeyWords: Nonnegative matrix factorization, Majorization Minimization, Big Data, Parallel, Multiplicative Update Iterative Normalization(IterNorm) Batch Normalization (BN) is ubiquitously employed for accelerating neural network training and improving the generalization capability by performing standardization within mini-batches. Decorrelated Batch Normalization (DBN) further boosts the above effectiveness by whitening. However, DBN relies heavily on either a large batch size, or eigen-decomposition that suffers from poor efficiency on GPUs. We propose Iterative Normalization (IterNorm), which employs Newton’s iterations for much more efficient whitening, while simultaneously avoiding the eigen-decomposition. Furthermore, we develop a comprehensive study to show IterNorm has better trade-off between optimization and generalization, with theoretical and experimental support. To this end, we exclusively introduce Stochastic Normalization Disturbance (SND), which measures the inherent stochastic uncertainty of samples when applied to normalization operations. With the support of SND, we provide natural explanations to several phenomena from the perspective of optimization, e.g., why group-wise whitening of DBN generally outperforms full-whitening and why the accuracy of BN degenerates with reduced batch sizes. We demonstrate the consistently improved performance of IterNorm with extensive experiments on CIFAR-10 and ImageNet over BN and DBN. Iterative Proportional Fitting The iterative proportional fitting procedure (IPFP, also known as biproportional fitting in statistics, RAS algorithm in economics and matrix ranking or matrix scaling in computer science) is an iterative algorithm for estimating cell values of a contingency table such that the marginal totals remain fixed and the estimated table decomposes into an outer product. Iterative Proportional Fitting Procedure(IPFP) The iterative proportional fitting procedure (IPFP, also known as biproportional fitting in statistics, RAS algorithm in economics and matrix raking or matrix scaling in computer science) is an iterative algorithm for estimating cell values of a contingency table such that the marginal totals remain fixed and the estimated table decomposes into an outer product. mipfp Iterative Self-Organizing Data Analysis Technique(ISODATA) This is a more sophisticated algorithm which allows the number of clusters to be automatically adjusted during the iteration by merging similar clusters and splitting clusters with large standard deviations. Iterative Sequential Regression(ISR) Imputation of missing values is one of the major tasks for data pre-processing in many areas. Whenever imputation of data from o cial statistics comes into mind, several (additional) challenges almost always arise, like large data sets, data sets consisting of a mixture of di erent variable types, or data outliers. The aim is to propose an automatic algorithm called IRMI for iterative model-based imputation using robust methods, encountering for the mentioned challenges, and to provide a software tool in R. This algorithm is compared to the algorithm IVEWARE, which is the \recommended software’ for imputations in international and national statistical institutions. Using arti cial data and real data sets from o cial statistics and other elds, the advantages of IRMI over IVEWARE { especially with respect to robustness { are demonstrated. ISR3 Iterative Supervised Principal Components(ISPC) In high-dimensional prediction problems, where the number of features may greatly exceed the number of training instances, fully Bayesian approach with a sparsifying prior is known to produce good results but is computationally challenging. To alleviate this computational burden, we propose to use a preprocessing step where we first apply a dimension reduction to the original data to reduce the number of features to something that is computationally conveniently handled by Bayesian methods. To do this, we propose a new dimension reduction technique, called iterative supervised principal components (ISPC), which combines variable screening and dimension reduction and can be considered as an extension to the existing technique of supervised principal components (SPCs). Our empirical evaluations confirm that, although not foolproof, the proposed approach provides very good results on several microarray benchmark datasets with very affordable computation time, and can also be very useful for visualizing high-dimensional data. Iterative Sure Independence Screening(ISIS) On the sure screening property of the iterative sure independence screening algorithm Iterative Thresholding and K-Residual Means(ITKrM) Dictionary learning – from local towards global and adaptive Compressed Dictionary Learning Iterative Weighted Least Squares(IWLS) The Iterative Weighted Least Squares (IWLS) method is one of the estimation procedures in logistic regression modeling. ➘ “Iteratively Reweighted Least Squares” https://…/1176349165 Iteratively Reweighted Least Squares(IRLS) IRLS is used to find the maximum likelihood estimates of a generalized linear model, and in robust regression to find an M-estimator, as a way of mitigating the influence of outliers in an otherwise normally-distributed data set. For example, by minimizing the least absolute error rather than the least square error. Although not a linear regression problem, Weiszfeld’s algorithm for approximating the geometric median can also be viewed as a special case of iteratively reweighted least squares, in which the objective function is the sum of distances of the estimator from the samples. One of the advantages of IRLS over linear programming and convex programming is that it can be used with Gauss-Newton and Levenberg-Marquardt numerical algorithms. iTM-VAE This work focuses on combining nonparametric topic models with Auto-Encoding Variational Bayes (AEVB). Specifically, we first propose iTM-VAE, where the topics are treated as trainable parameters and the document-specific topic proportions are obtained by a stick-breaking construction. The inference of iTM-VAE is modeled by neural networks such that it can be computed in a simple feed-forward manner. We also describe how to introduce a hyper-prior into iTM-VAE so as to model the uncertainty of the prior parameter. Actually, the hyper-prior technique is quite general and we show that it can be applied to other AEVB based models to alleviate the {\it collapse-to-prior} problem elegantly. Moreover, we also propose HiTM-VAE, where the document-specific topic distributions are generated in a hierarchical manner. HiTM-VAE is even more flexible and can generate topic distributions with better variability. Experimental results on 20News and Reuters RCV1-V2 datasets show that the proposed models outperform the state-of-the-art baselines significantly. The advantages of the hyper-prior technique and the hierarchical model construction are also confirmed by experiments. IvaNet Driven by Convolutional Neural Networks, object detection and semantic segmentation have gained significant improvements. However, existing methods on the basis of a full top-down module have limited robustness in handling those two tasks simultaneously. To this end, we present a joint multi-task framework, termed IvaNet. Different from existing methods, our IvaNet backwards abstract semantic information from higher layers to augment lower layers using local top-down modules. The comparisons against some counterparts on the PASCAL VOC and MS COCO datasets demonstrate the functionality of IvaNet.