Training machine learning (ML) models on large datasets requires considerable computing power. To speed up training, it is typical to distribute training across several machines, often with specialized hardware like GPUs or TPUs. Managing a distributed training job is complex and requires dealing with resource contention, distributed configurations, monitoring, and fault tolerance. In this paper, we describe TonY, an open-source orchestrator for distributed ML jobs built at LinkedIn to address these challenges.
The increasing availability of large but noisy data sets with a large number of heterogeneous variables leads to the increasing interest in the automation of common tasks for data analysis. The most time-consuming part of this process is the Exploratory Data Analysis, crucial for better domain understanding, data cleaning, data validation, and feature engineering. There is a growing number of libraries that attempt to automate some of the typical Exploratory Data Analysis tasks to make the search for new insights easier and faster. In this paper, we present a systematic review of existing tools for Automated Exploratory Data Analysis (autoEDA). We explore the features of twelve popular R packages to identify the parts of analysis that can be effectively automated with the current tools and to point out new directions for further autoEDA development.
This paper investigates the integration of gradient boosted decision trees and varying coefficient models. We introduce the tree boosted varying coefficient framework which justifies the implementation of decision tree boosting as the nonparametric effect modifiers in varying coefficient models. This framework requires no structural assumptions in the space containing the varying coefficient covariates, is easy to implement, and keeps a balance between model complexity and interpretability. To provide statistical guarantees, we prove the asymptotic consistency of the proposed method under the regression settings with $L^2$ loss. We further conduct a thorough empirical study to show that the proposed method is capable of providing accurate predictions as well as intelligible visual explanations.
We consider the problem of obfuscating sensitive information while preserving utility. Given that an analytical solution is often not feasible because of un-scalability and because the background knowledge may be too complicated to determine, we propose an approach based on machine learning, inspired by the GANs (Generative Adversarial Networks) paradigm. The idea is to set up two nets: the generator, that tries to produce an optimal obfuscation mechanism to protect the data, and the classifier, that tries to de-obfuscate the data. By letting the two nets compete against each other, the mechanism improves its degree of protection, until an equilibrium is reached. We apply our method to the case of location privacy, and we perform experiments on synthetic data and on real data from the Gowalla dataset. We evaluate the privacy of the mechanism not only by its capacity to defeat the classificator, but also in terms of the Bayes error, which represents the strongest possible adversary. We compare the privacy-utility tradeoff of our method with that of the planar Laplace mechanism used in geo-indistinguishability, showing favorable results.
Machine learning (ML) has progressed rapidly during the past decade and the major factor that drives such development is the unprecedented large-scale data. As data generation is a continuous process, this leads to ML service providers updating their models frequently with newly-collected data in an online learning scenario. In consequence, if an ML model is queried with the same set of data samples at two different points in time, it will provide different results. In this paper, we investigate whether the change in the output of a black-box ML model before and after being updated can leak information of the dataset used to perform the update. This constitutes a new attack surface against black-box ML models and such information leakage severely damages the intellectual property and data privacy of the ML model owner/provider. In contrast to membership inference attacks, we use an encoder-decoder formulation that allows inferring diverse information ranging from detailed characteristics to full reconstruction of the dataset. Our new attacks are facilitated by state-of-the-art deep learning techniques. In particular, we propose a hybrid generative model (BM-GAN) that is based on generative adversarial networks (GANs) but includes a reconstructive loss that allows generating accurate samples. Our experiments show effective prediction of dataset characteristics and even full reconstruction in challenging conditions.
We develop an error-free, nonuniform phase-stepping algorithm (nPSA) based on principal component analysis (PCA). PCA-based algorithms typically give phase-demodulation errors when applied to nonuniform phase-shifted interferograms. We present a straightforward way to correct those PCA phase-demodulation errors. We give mathematical formulas to fully analyze PCA-based nPSA (PCA-nPSA). These formulas give a) the PCA-nPSA frequency transfer function (FTF), b) its corrected Lissajous figure, c) the corrected PCA-nPSA formula, d) its harmonic robustness, and e) its signal-to-noise-ratio (SNR). We show that the PCA-nPSA can be seen as a linear quadrature filter, and as consequence, one can find its FTF. Using the FTF, we show why plain PCA often fails to demodulate nonuniform phase-shifted fringes. Previous works on PCA-nPSA (without FTF), give specific numerical/experimental fringe data to ‘visually demonstrate’ that their new nPSA works better than competitors. This often leads to biased/favorable fringe pattern selections which ‘visually demonstrate’ the superior performance of their new nPSA. This biasing is herein totally avoided because we provide figures-of-merit formulas based on linear systems and stochastic process theories. However, and for illustrative purposes only, we provide specific fringe data phase-demodulation, including comprehensive analysis and comparisons.
Zero-shot learning (ZSL) aims at understanding unseen categories with no training examples from class-level descriptions. To improve the discriminative power of zero-shot learning, we model the visual learning process of unseen categories with an inspiration from the psychology of human creativity for producing novel art. We relate ZSL to human creativity by observing that zero-shot learning is about recognizing the unseen and creativity is about creating a likable unseen. We introduce a learning signal inspired by creativity literature that explores the unseen space with hallucinated class-descriptions and encourages careful deviation of their visual feature generations from seen classes while allowing knowledge transfer from seen to unseen classes. Empirically, we show consistent improvement over the state of the art of several percents on the largest available benchmarks on the challenging task or generalized ZSL from a noisy text that we focus on, using the CUB and NABirds datasets. We also show the advantage of our approach on Attribute-based ZSL on three additional datasets (AwA2, aPY, and SUN).
The theory of deep learning is now considered largely solved, and is well understood by researchers and influencers alike. To maintain our relevance, we therefore seek to apply our skills to under-explored, lucrative applications of this technology. To this end, we propose and Deep Industrial Espionage, an efficient end-to-end framework for industrial information propagation and productisation. Specifically, given a single image of a product or service, we aim to reverse-engineer, rebrand and distribute a copycat of the product at a profitable price-point to consumers in an emerging market—all within in a single forward pass of a Neural Network. Differently from prior work in machine perception which has been restricted to classifying, detecting and reasoning about object instances, our method offers tangible business value in a wide range of corporate settings. Our approach draws heavily on a promising recent arxiv paper until its original authors’ names can no longer be read (we use felt tip pen). We then rephrase the anonymised paper, add the word ‘novel’ to the title, and submit it a prestigious, closed-access espionage journal who assure us that someday, we will be entitled to some fraction of their extortionate readership fees.
Generative models often use human evaluations to determine and justify progress. Unfortunately, existing human evaluation methods are ad-hoc: there is currently no standardized, validated evaluation that: (1) measures perceptual fidelity, (2) is reliable, (3) separates models into clear rank order, and (4) ensures high-quality measurement without intractable cost. In response, we construct Human-eYe Perceptual Evaluation (HYPE), a human metric that is (1) grounded in psychophysics research in perception, (2) reliable across different sets of randomly sampled outputs from a model, (3) results in separable model performances, and (4) efficient in cost and time. We introduce two methods. The first, HYPE-Time, measures visual perception under adaptive time constraints to determine the minimum length of time (e.g., 250ms) that model output such as a generated face needs to be visible for people to distinguish it as real or fake. The second, HYPE-Infinity, measures human error rate on fake and real images with no time constraints, maintaining stability and drastically reducing time and cost. We test HYPE across four state-of-the-art generative adversarial networks (GANs) on unconditional image generation using two datasets, the popular CelebA and the newer higher-resolution FFHQ, and two sampling techniques of model outputs. By simulating HYPE’s evaluation multiple times, we demonstrate consistent ranking of different models, identifying StyleGAN with truncation trick sampling (27.6% HYPE-Infinity deception rate, with roughly one quarter of images being misclassified by humans) as superior to StyleGAN without truncation (19.0%) on FFHQ. See https://hype.stanford.edu for details.
Since the use of computers in the business world, data collection has become one of the most important issues due to the available knowledge in the data; such data has been stored in the database. The database system was developed which led to the evolvement of hierarchical and relational database followed by Standard Query Language (SQL). As data size increases, the need for more control and information retrieval increase. These increases lead to the development of data mining systems and data warehouses. This paper focuses on the use of a data warehouse as a supporting tool in decision making. We to study the effectiveness of data warehouse techniques in the sense of time and flexibility in our case study (Manpower Employment). The study will conclude with a comparison of traditional relational database and the use of data warehouse. The fundamental role of a data warehouse is to provide data for supporting the decision-making process. Data in a data warehouse environment is a multidimensional data store. We can simply say that data warehouse is a process, not a product, for assembling and managing data from various sources for the purpose of gaining a single detailed view of part or all an establishment. The data warehouse concept has changed the nature of the decision support system, by adding new benefits for improving and expanding the scope, accuracy, and accessibility of data.
It is often necessary to disclose training data to the public domain, while protecting privacy of certain sensitive labels. We use information theoretic measures to develop such privacy preserving data disclosure mechanisms. Our mechanism involves perturbing the data vectors in a manner that strikes a balance in the privacy-utility trade-off. We use maximal information leakage between the output data vector and the confidential label as our privacy metric. We first study the theoretical Bernoulli-Gaussian model and study the privacy-utility trade-off when only the mean of the Gaussian distributions can be perturbed. We show that the optimal solution is the same as the case when the utility is measured using probability of error at the adversary. We then consider an application of this framework to a data driven setting and provide an empirical approximation to the Sibson mutual information. By performing experiments on the MNIST and FERG data-sets, we show that our proposed framework achieves equivalent or better privacy than previous methods based on mutual information.
Image classifiers based on deep neural networks suffer from harassment caused by adversarial examples. Two defects exist in black-box iterative attacks that generate adversarial examples by incrementally adjusting the noise-adding direction for each step. On the one hand, existing iterative attacks add noises monotonically along the direction of gradient ascent, resulting in a lack of diversity and adaptability of the generated iterative trajectories. On the other hand, it is trivial to perform adversarial attack by adding excessive noises, but currently there is no refinement mechanism to squeeze redundant noises. In this work, we propose Curls & Whey black-box attack to fix the above two defects. During Curls iteration, by combining gradient ascent and descent, we curl’ up iterative trajectories to integrate more diversity and transferability into adversarial examples. Curls iteration also alleviates the diminishing marginal effect in existing iterative attacks. The Whey optimization further squeezes the whey’ of noises by exploiting the robustness of adversarial perturbation. Extensive experiments on Imagenet and Tiny-Imagenet demonstrate that our approach achieves impressive decrease on noise magnitude in l2 norm. Curls & Whey attack also shows promising transferability against ensemble models as well as adversarially trained models. In addition, we extend our attack to the targeted misclassification, effectively reducing the difficulty of targeted attacks under black-box condition.
Representing features at multiple scales is of great importance for numerous vision tasks. Recent advances in backbone convolutional neural networks (CNNs) continually demonstrate stronger multi-scale representation ability, leading to consistent performance gains on a wide range of applications. However, most existing methods represent the multi-scale features in a layer-wise manner. In this paper, we propose a novel building block for CNNs, namely Res2Net, by constructing hierarchical residual-like connections within one single residual block. The Res2Net represents multi-scale features at a granular level and increases the range of receptive fields for each network layer. The proposed Res2Net block can be plugged into the state-of-the-art backbone CNN models, e.g., ResNet, ResNeXt, and DLA. We evaluate the Res2Net block on all these models and demonstrate consistent performance gains over baseline models on widely-used datasets, e.g., CIFAR-100 and ImageNet. Further ablation studies and experimental results on representative computer vision tasks, i.e., object detection, class activation mapping, and salient object detection, further verify the superiority of the Res2Net over the state-of-the-art baseline methods. The source code and trained models will be made publicly available.
Fueled by the rapid development of communication networks and sensors in portable devices, today many mobile users are invited by content providers to sense and send back real-time useful information (e.g., traffic observations and sensor data) to keep the freshness of the providers’ content updates. However, due to the sampling cost in sensing and transmission, an individual may not have the incentive to contribute the real-time information to help a content provider reduce the age of information (AoI). Accordingly, we propose dynamic pricing for the provider to offer age-dependent monetary returns and encourage users to sample information at different rates over time. This dynamic pricing design problem needs to balance the monetary payments to users and the AoI evolution over time, and is challenging to solve especially under the incomplete information about users’ arrivals and their private sampling costs. For analysis tractability, we linearize the nonlinear AoI evolution in the constrained dynamic programming problem, by approximating the dynamic AoI reduction as a time-average term and solving the approximate dynamic pricing in closed-form. Then, we estimate this approximate term based on Brouwer’s fixed-point theorem. Finally, we provide the steady-state analysis of the optimized approximate dynamic pricing scheme for an infinite time horizon, and show that the pricing scheme can be further simplified to an e-optimal version without recursive computing over time.
Learning portable neural networks is very essential for computer vision for the purpose that pre-trained heavy deep models can be well applied on edge devices such as mobile phones and micro sensors. Most existing deep neural network compression and speed-up methods are very effective for training compact deep models, when we can directly access the training dataset. However, training data for the given deep network are often unavailable due to some practice problems (e.g. privacy, legal issue, and transmission), and the architecture of the given network are also unknown except some interfaces. To this end, we propose a novel framework for training efficient deep neural networks by exploiting generative adversarial networks (GANs). To be specific, the pre-trained teacher networks are regarded as a fixed discriminator and the generator is utilized for derivating training samples which can obtain the maximum response on the discriminator. Then, an efficient network with smaller model size and computational complexity is trained using the generated data and the teacher network, simultaneously. Efficient student networks learned using the proposed Data-Free Learning (DFL) method achieve 92.22% and 74.47% accuracies without any training data on the CIFAR-10 and CIFAR-100 datasets, respectively. Meanwhile, our student network obtains an 80.56% accuracy on the CelebA benchmark.
Distribution and sample models are two popular model choices in model-based reinforcement learning (MBRL). However, learning these models can be intractable, particularly when the state and action spaces are large. Expectation models, on the other hand, are relatively easier to learn due to their compactness and have also been widely used for deterministic environments. For stochastic environments, it is not obvious how expectation models can be used for planning as they only partially characterize a distribution. In this paper, we propose a sound way of using approximate expectation models for MBRL. In particular, we 1) show that planning with an expectation model is equivalent to planning with a distribution model if the state value function is linear in state features, 2) analyze two common parametrization choices for approximating the expectation: linear and non-linear expectation models, 3) propose a sound model-based policy evaluation algorithm and present its convergence results, and 4) empirically demonstrate the effectiveness of the proposed planning algorithm.
We present Habitat, a new platform for research in embodied artificial intelligence (AI). Habitat enables training embodied agents (virtual robots) in highly efficient photorealistic 3D simulation, before transferring the learned skills to reality. Specifically, Habitat consists of the following: 1. Habitat-Sim: a flexible, high-performance 3D simulator with configurable agents, multiple sensors, and generic 3D dataset handling (with built-in support for SUNCG, Matterport3D, Gibson datasets). Habitat-Sim is fast — when rendering a scene from the Matterport3D dataset, Habitat-Sim achieves several thousand frames per second (fps) running single-threaded, and can reach over 10,000 fps multi-process on a single GPU, which is orders of magnitude faster than the closest simulator. 2. Habitat-API: a modular high-level library for end-to-end development of embodied AI algorithms — defining embodied AI tasks (e.g. navigation, instruction following, question answering), configuring and training embodied agents (via imitation or reinforcement learning, or via classic SLAM), and benchmarking using standard metrics. These large-scale engineering contributions enable us to answer scientific questions requiring experiments that were till now impracticable or `merely’ impractical. Specifically, in the context of point-goal navigation (1) we revisit the comparison between learning and SLAM approaches from two recent works and find evidence for the opposite conclusion — that learning outperforms SLAM, if scaled to total experience far surpassing that of previous investigations, and (2) we conduct the first cross-dataset generalization experiments {train, test} x {Matterport3D, Gibson} for multiple sensors {blind, RGB, RGBD, D} and find that only agents with depth (D) sensors generalize across datasets. We hope that our open-source platform and these findings will advance research in embodied AI.
Anomaly detection is a classical problem where the aim is to detect anomalous data that do not belong to the normal data distribution. Current state-of-the-art methods for anomaly detection on complex high-dimensional data are based on the generative adversarial network (GAN). However, the traditional GAN loss is not directly aligned with the anomaly detection objective: it encourages the distribution of the generated samples to overlap with the real data and so the resulting discriminator has been found to be ineffective as an anomaly detector. In this paper, we propose simple modifications to the GAN loss such that the generated samples lie at the boundary of the real data distribution. With our modified GAN loss, our anomaly detection method, called Fence GAN (FGAN), directly uses the discriminator score as an anomaly threshold. Our experimental results using the MNIST, CIFAR10 and KDD99 datasets show that Fence GAN yields the best anomaly classification accuracy compared to state-of-the-art methods.
An energy based approach for stabilizing a mechanical system has offered a simple yet powerful control scheme. However, since it does not impose such strong constraints on parameter space of the controller, finding appropriate parameter values for an optimal controller is known to be hard. This paper intends to generate an optimal energy-based controller for swinging up a rotary inverted pendulum, also known as the Furuta pendulum, by applying the Bayesian optimization called Entropy Search. Simulations and experiments show that the optimal controller has an improved performance compared to a nominal controller for various initial conditions.
An autoencoder is a neural network which data projects to and from a lower dimensional latent space, where this data is easier to understand and model. The autoencoder consists of two sub-networks, the encoder and the decoder, which carry out these transformations. The neural network is trained such that the output is as close to the input as possible, the data having gone through an information bottleneck : the latent space. This tool bears significant ressemblance to Principal Component Analysis (PCA), with two main differences. Firstly, the autoencoder is a non-linear transformation, contrary to PCA, which makes the autoencoder more flexible and powerful. Secondly, the axes found by a PCA are orthogonal, and are ordered in terms of the amount of variability which the data presents along these axes. This makes the interpretability of the PCA much greater than that of the autoencoder, which does not have these attributes. Ideally, then, we would like an autoencoder whose latent space consists of independent components, ordered by decreasing importance to the data. In this paper, we propose an algorithm to create such a network. We create an iterative algorithm which progressively increases the size of the latent space, learning a new dimension at each step. Secondly, we propose a covariance loss term to add to the standard autoencoder loss function, as well as a normalisation layer just before the latent space, which encourages the latent space components to be statistically independent. We demonstrate the results of this autoencoder on simple geometric shapes, and find that the algorithm indeed finds a meaningful representation in the latent space. This means that subsequent interpolation in the latent space has meaning with respect to the geometric properties of the images.
Modern web applications can now offer desktop-like experiences from within the browser, thanks to technologies such as WebSockets, which enable low-latency duplex communication between the browser and the server. While these advances are great for the user experience, they represent a new responsibility for web developers who now need to manage and verify the correctness of more complex and potentially stateful interactions in their application. In this paper, we present a technique for developing interactive web applications that are statically guaranteed to communicate following a given protocol. First, the global interaction protocol is described in the Scribble protocol language — based on multiparty session types. Scribble protocols are checked for well-formedness, and then each role is projected to a Finite State Machine representing the structure of communication from the perspective of the role. We use source code generation and a novel type-level encoding of FSMs using multi-parameter type classes to leverage the type system of the target language and guarantee only programs that communicate following the protocol will type check. Our work targets PureScript — a functional language that compiles to JavaScript — which crucially has an expressive enough type system to provide static linearity guarantees. We demonstrate the effectiveness of our approach through a web-based Battleship game where communication is performed through WebSocket connections.
With the explosive development of mobile Internet, short text has been applied extensively. The difference between classifying short text and long documents is that short text is of shortness and sparsity. Thus, it is challenging to deal with short text classification owing to its less semantic information. In this paper, we propose a novel topic-based convolutional neural network (TB-CNN) based on Latent Dirichlet Allocation (LDA) model and convolutional neural network. Comparing to traditional CNN methods, TB-CNN generates topic words with LDA model to reduce the sparseness and combines the embedding vectors of topic words and input words to extend feature space of short text. The validation results on IMDB movie review dataset show the improvement and effectiveness of TB-CNN.
As deep reinforcement learning driven by visual perception becomes more widely used there is a growing need to better understand and probe the learned agents. Understanding the decision making process and its relationship to visual inputs can be very valuable to identify problems in learned behavior. However, this topic has been relatively under-explored in the research community. In this work we present a method for synthesizing visual inputs of interest for a trained agent. Such inputs or states could be situations in which specific actions are necessary. Further, critical states in which a very high or a very low reward can be achieved are often interesting to understand the situational awareness of the system as they can correspond to risky states. To this end, we learn a generative model over the state space of the environment and use its latent space to optimize a target function for the state of interest. In our experiments we show that this method can generate insights for a variety of environments and reinforcement learning methods. We explore results in the standard Atari benchmark games as well as in an autonomous driving simulator. Based on the efficiency with which we have been able to identify behavioural weaknesses with this technique, we believe this general approach could serve as an important tool for AI safety applications.
A method for change point detection is proposed. In a sequence of independent and piecewise identically distributed random variables we aim at detecting both, changes in the expectation as well as changes in the variance. We propose a statistical test for the null hypothesis of the absence of change points, and an algorithm for change point detection. For that we exploit the joint dynamics of the mean and the empirical variance in the context of bivariate moving sum statistics. The joint consideration helps to improve change point inference as compared to separate univariate approaches. We infer on the effects, i.e., on the strength and the type of the changes, with confidence. Non-parametric methodology allows for a high variety of data to be analyzed. A multi-scale aspect addresses the detection of complex patterns in change points and effects. We demonstrate the performance through theoretical results and simulations studies. A companion R-package jcp (available on CRAN) is discussed.
In this article a novel approach for training deep neural networks using Bayesian techniques is presented. The Bayesian methodology allows for an easy evaluation of model uncertainty and additionally is robust to overfitting. These are commonly the two main problems classical, i.e. non-Bayesian, architectures have to struggle with. The proposed approach applies variational inference in order to approximate the intractable posterior distribution. In particular, the variational distribution is defined as product of multiple multivariate normal distributions with tridiagonal covariance matrices. Each single normal distribution belongs either to the weights, or to the biases corresponding to one network layer. The layer-wise a posteriori variances are defined based on the corresponding expectation values and further the correlations are assumed to be identical. Therefore, only a few additional parameters need to be optimized compared to non-Bayesian settings. The novel approach is successfully evaluated on basis of the popular benchmark datasets MNIST and CIFAR-10.
Since Internet is so popular and prevailing in human life, countering cyber threats, especially attack detection, is a challenging area of research in the field of cyber security. Intrusion detection systems (IDSs) are essential entities in a network topology aiming to safeguard the integrity and availability of sensitive assets in the protected systems. Although many supervised and unsupervised learning approaches from the field of machine learning and pattern recognition have been used to increase the efficacy of IDSs, it is still a problem to deal with lots of redundant and irrelevant features in high-dimension datasets for network anomaly detection. To this end, we propose a novel methodology combining the benefits of correlation-based feature selection(CFS) and bat algorithm(BA) with an ensemble classifier based on C4.5, Random Forest(RF), and Forest by Penalizing Attributes(Forest PA), which can be able to classify both common and rare types of attacks with high accuracy and efficiency. The experimental results, using a novel intrusion detection dataset, namely CIC-IDS2017, reveal that our CFS-BA-Ensemble method is able to contribute more critical features and significantly outperforms individual approaches, achieving high accuracy and low false alarm rate. Moreover, compared with the majority of the existing state-of-the-art and legacy techniques, our approach exhibits better performance under several classification metrics in the context of classification accuracy, f-measure, attack detection rate, and false alarm rate.
We propose a fully convolutional one-stage object detector (FCOS) to solve object detection in a per-pixel prediction fashion, analogue to semantic segmentation. Almost all state-of-the-art object detectors such as RetinaNet, SSD, YOLOv3, and Faster R-CNN rely on pre-defined anchor boxes. In contrast, our proposed detector FCOS is anchor-box free, as well as proposal free. By eliminating the pre-defined set of anchor boxes, FCOS completely avoids the complicated computation related to anchor boxes such as calculating overlapping during training and significantly reduces the training memory footprint. More importantly, we also avoid all hyper-parameters related to anchor boxes, which are often very sensitive to the final detection performance. With the only post-processing non-maximum suppression (NMS), our detector FCOS outperforms previous anchor-based one-stage detectors with the advantage of being much simpler. For the first time, we demonstrate a much simpler and flexible detection framework achieving improved detection accuracy. We hope that the proposed FCOS framework can serve as a simple and strong alternative for many other instance-level tasks.
With massive explosion of social media such as Twitter and Instagram, people daily share billions of multimedia posts, containing images and text. Typically, text in these posts is short, informal and noisy, leading to ambiguities which can be resolved using images. In this paper we explore text-centric Named Entity Recognition task on these multimedia posts. We propose an end to end model which learns a joint representation of a text and an image. Our model extends multi-dimensional self attention technique, where now image helps to enhance relationship between words. Experiments show that our model is capable of capturing both textual and visual contexts with greater accuracy, achieving state-of-the-art results on Twitter multimodal Named Entity Recognition dataset.
We study the general integer programming problem where the number of variables $n$ is a variable part of the input. We consider two natural parameters of the constraint matrix $A$: its numeric measure $a$ and its sparsity measure $d$. We show that integer programming can be solved in time $g(a,d)\textrm{poly}(n,L)$, where $g$ is some computable function of the parameters $a$ and $d$, and $L$ is the binary encoding length of the input. In particular, integer programming is fixed-parameter tractable parameterized by $a$ and $d$, and is solvable in polynomial time for every fixed $a$ and $d$. Our results also extend to nonlinear separable convex objective functions. Moreover, for linear objectives, we derive a strongly-polynomial algorithm, that is, with running time $g(a,d)\textrm{poly}(n)$, independent of the rest of the input data. We obtain these results by developing an algorithmic framework based on the idea of iterative augmentation: starting from an initial feasible solution, we show how to quickly find augmenting steps which rapidly converge to an optimum. A central notion in this framework is the Graver basis of the matrix $A$, which constitutes a set of fundamental augmenting steps. The iterative augmentation idea is then enhanced via the use of other techniques such as new and improved bounds on the Graver basis, rapid solution of integer programs with bounded variables, proximity theorems and a new proximity-scaling algorithm, the notion of a reduced objective function, and others. As a consequence of our work, we advance the state of the art of solving block-structured integer programs. In particular, we develop near-linear time algorithms for $n$-fold, tree-fold, and $2$-stage stochastic integer programs. We also discuss some of the many applications of these classes.
Residual connections significantly boost the performance of deep neural networks. However, there are few theoretical results that address the influence of residuals on the hypothesis complexity and the generalization ability of deep neural networks. This paper studies the influence of residual connections on the hypothesis complexity of the neural network in terms of the covering number of its hypothesis space. We prove that the upper bound of the covering number is the same as chain-like neural networks, if the total numbers of the weight matrices and nonlinearities are fixed, no matter whether they are in the residuals or not. This result demonstrates that residual connections may not increase the hypothesis complexity of the neural network compared with the chain-like counterpart. Based on the upper bound of the covering number, we then obtain an $\mathcal O(1 / \sqrt{N})$ margin-based multi-class generalization bound for ResNet, as an exemplary case of any deep neural network with residual connections. Generalization guarantees for similar state-of-the-art neural network architectures, such as DenseNet and ResNeXt, are straight-forward. From our generalization bound, a practical implementation is summarized: to approach a good generalization ability, we need to use regularization terms to control the magnitude of the norms of weight matrices not to increase too much, which justifies the standard technique of weight decay.
Transfer learning aims at transferring knowledge from a well-labeled domain to a similar but different domain with limited or no labels. Unfortunately, existing learning-based methods often involve intensive model selection and hyperparameter tuning to obtain good results. Moreover, cross-validation is not possible for tuning hyperparameters since there are often no labels in the target domain. This would restrict wide applicability of transfer learning especially in computationally-constraint devices such as wearables. In this paper, we propose a practically Easy Transfer Learning (EasyTL) approach which requires no model selection and hyperparameter tuning, while achieving competitive performance. By exploiting intra-domain structures, EasyTL is able to learn both non-parametric transfer features and classifiers. Extensive experiments demonstrate that, compared to state-of-the-art traditional and deep methods, EasyTL satisfies the Occam’s Razor principle: it is extremely easy to implement and use while achieving comparable or better performance in classification accuracy and much better computational efficiency. Additionally, it is shown that EasyTL can increase the performance of existing transfer feature learning methods.
We propose an effective deep learning approach to aesthetics quality assessment that relies on a new type of pre-trained features, and apply it to the AVA data set, the currently largest aesthetics database. While previous approaches miss some of the information in the original images, due to taking small crops, down-scaling or warping the originals during training, we propose the first method that efficiently supports full resolution images as an input, and can be trained on variable input sizes. This allows us to significantly improve upon the state of the art, increasing the Spearman rank-order correlation coefficient (SRCC) of ground-truth mean opinion scores (MOS) from the existing best reported of 0.612 to 0.756. To achieve this performance, we extract multi-level spatially pooled (MLSP) features from all convolutional blocks of a pre-trained InceptionResNet-v2 network, and train a custom shallow Convolutional Neural Network (CNN) architecture on these new features.
Missing data are a concern in many real world data sets and imputation methods are often needed to estimate the values of missing data, but data sets with excessive missingness and high dimensionality challenge most approaches to imputation. Here we show that appropriate feature selection can be an effective preprocessing step for imputation, allowing for more accurate imputation and subsequent model predictions. The key feature of this preprocessing is that it incorporates uncertainty: by accounting for uncertainty due to missingness when selecting features we can reduce the degree of missingness while also limiting the number of uninformative features being used to make predictive models. We introduce a method to perform uncertainty-aware feature selection (UAFS), provide a theoretical motivation, and test UAFS on both real and synthetic problems, demonstrating that across a variety of data sets and levels of missingness we can improve the accuracy of imputations. Improved imputation due to UAFS also results in improved prediction accuracy when performing supervised learning using these imputed data sets. Our UAFS method is general and can be fruitfully coupled with a variety of imputation methods.
Real-world semantic or knowledge-based systems, e.g., in the biomedical domain, can become large and complex. Tool support for the localization and repair of faults within knowledge bases of such systems can therefore be essential for their practical success. Correspondingly, a number of knowledge base debugging approaches, in particular for ontology-based systems, were proposed throughout recent years. Query-based debugging is a comparably recent interactive approach that localizes the true cause of an observed problem by asking knowledge engineers a series of questions. Concrete implementations of this approach exist, such as the OntoDebug plug-in for the ontology editor Prot\’eg\’e. To validate that a newly proposed method is favorable over an existing one, researchers often rely on simulation-based comparisons. Such an evaluation approach however has certain limitations and often cannot fully inform us about a method’s true usefulness. We therefore conducted different user studies to assess the practical value of query-based ontology debugging. One main insight from the studies is that the considered interactive approach is indeed more efficient than an alternative algorithmic debugging based on test cases. We also observed that users frequently made errors in the process, which highlights the importance of a careful design of the queries that users need to answer.
Distributed word vector spaces are considered hard to interpret which hinders the understanding of natural language processing (NLP) models. In this work, we introduce a new method to interpret arbitrary samples from a word vector space. To this end, we train a neural model to conceptualize word vectors, which means that it activates higher order concepts it recognizes in a given vector. Contrary to prior approaches, our model operates in the original vector space and is capable of learning non-linear relations between word vectors and concepts. Furthermore, we show that it produces considerably less entropic concept activation profiles than the popular cosine similarity.
In this paper, we analyze content sharing between news sources in the alternative and mainstream media using a dataset of 713K articles and 194 sources. We find that content sharing happens in tightly formed communities, and these communities represent relatively homogeneous portions of the media landscape. Through a mix-method analysis, we find several primary content sharing behaviors. First, we find that the vast majority of shared articles are only shared with similar news sources (i.e. same community). Second, we find that despite these echo-chambers of sharing, specific sources, such as The Drudge Report, mix content from both mainstream and conspiracy communities. Third, we show that while these differing communities do not always share news articles, they do report on the same events, but often with competing and counter-narratives. Overall, we find that the news is homogeneous within communities and diverse in between, creating different spirals of sameness.
We propose a novel learning paradigm for Deep Neural Networks (DNN) by using Boolean logic algebra. We first present the basic differentiable operators of a Boolean system such as conjunction, disjunction and exclusive-OR and show how these elementary operators can be combined in a simple and meaningful way to form Neural Logic Networks (NLNs). We examine the effectiveness of the proposed NLN framework in learning Boolean functions and discrete-algorithmic tasks. We demonstrate that, in contrast to the implicit learning in MLP approach, the proposed neural logic networks can learn the logical functions explicitly that can be verified and interpreted by human. In particular, we propose a new framework for learning the inductive logic programming (ILP) problems by exploiting the explicit representational power of NLN. We show the proposed neural ILP solver is capable of feats such as predicate invention and recursion and can outperform the current state of the art neural ILP solvers using a variety of benchmark tasks such as decimal addition and multiplication, and sorting on ordered list.
Mathematical reasoning—a core ability within human intelligence—presents some unique challenges as a domain: we do not come to understand and solve mathematical problems primarily on the back of experience and evidence, but on the basis of inferring, learning, and exploiting laws, axioms, and symbol manipulation rules. In this paper, we present a new challenge for the evaluation (and eventually the design) of neural architectures and similar system, developing a task suite of mathematics problems involving sequential questions and answers in a free-form textual input/output format. The structured nature of the mathematics domain, covering arithmetic, algebra, probability and calculus, enables the construction of training and test splits designed to clearly illuminate the capabilities and failure-modes of different architectures, as well as evaluate their ability to compose and relate knowledge and learned processes. Having described the data generation process and its potential future expansions, we conduct a comprehensive analysis of models from two broad classes of the most powerful sequence-to-sequence architectures and find notable differences in their ability to resolve mathematical problems and generalize their knowledge.