An Instance Transfer based Approach Using Enhanced Recurrent Neural Network for Domain Named Entity Recognition

Recently, neural networks have shown promising results for named entity recognition (NER), which needs a number of labeled data to for model training. When meeting a new domain (target domain) for NER, there is no or a few labeled data, which makes domain NER much more difficult. As NER has been researched for a long time, some similar domain already has well labelled data (source domain). Therefore, in this paper, we focus on domain NER by studying how to utilize the labelled data from such similar source domain for the new target domain. We design a kernel function based instance transfer strategy by getting similar labelled sentences from a source domain. Moreover, we propose an enhanced recurrent neural network (ERNN) by adding an additional layer that combines the source domain labelled data into traditional RNN structure. Comprehensive experiments are conducted on two datasets. The comparison results among HMM, CRF and RNN show that RNN performs bette than others. When there is no labelled data in domain target, compared to directly using the source domain labelled data without selecting transferred instances, our enhanced RNN approach gets improvement from 0.8052 to 0.9328 in terms of F1 measure.

Adversarial Text Generation Without Reinforcement Learning

Generative Adversarial Networks (GANs) have experienced a recent surge in popularity, performing competitively in a variety of tasks, especially in computer vision. However, GAN training has shown limited success in natural language processing. This is largely because sequences of text are discrete, and thus gradients cannot propagate from the discriminator to the generator. Recent solutions use reinforcement learning to propagate approximate gradients to the generator, but this is inefficient to train. We propose to utilize an autoencoder to learn a low-dimensional representation of sentences. A GAN is then trained to generate its own vectors in this space, which decode to realistic utterances. We report both random and interpolated samples from the generator. Visualization of sentence vectors indicate our model correctly learns the latent space of the autoencoder. Both human ratings and BLEU scores show that our model generates realistic text against competitive baselines.

U-Net: Machine Reading Comprehension with Unanswerable Questions

Machine reading comprehension with unanswerable questions is a new challenging task for natural language processing. A key subtask is to reliably predict whether the question is unanswerable. In this paper, we propose a unified model, called U-Net, with three important components: answer pointer, no-answer pointer, and answer verifier. We introduce a universal node and thus process the question and its context passage as a single contiguous sequence of tokens. The universal node encodes the fused information from both the question and passage, and plays an important role to predict whether the question is answerable and also greatly improves the conciseness of the U-Net. Different from the state-of-art pipeline models, U-Net can be learned in an end-to-end fashion. The experimental results on the SQuAD 2.0 dataset show that U-Net can effectively predict the unanswerability of questions and achieves an F1 score of 71.7 on SQuAD 2.0.

Rough Concept Analysis

The theory introduced, presented and developed in this paper, is concerned with Rough Concept Analysis. This theory is a synthesis of the theory of Rough Sets pioneered by Zdzislaw Pawlak with the theory of Formal Concept Analysis pioneered by Rudolf Wille. The central notion in this paper of a rough formal concept combines in a natural fashion the notion of a rough set with the notion of a formal concept: ‘rough set + formal concept = rough formal concept’. A follow-up paper will provide a synthesis of the two important data modeling techniques: conceptual scaling of Formal Concept Analysis and Entity-Relationship database modeling.

Why We Do Not Evolve Software? Analysis of Evolutionary Algorithms

In this paper, we review the state-of-the-art results in evolutionary computation and observe that we do not evolve non trivial software from scratch and with no human intervention. A number of possible explanations are considered, but we conclude that computational complexity of the problem prevents it from being solved as currently attempted. A detailed analysis of necessary and available computational resources is provided to support our findings.

Non-computability of human intelligence

We revisit the question (most famously) initiated by Turing:Can human intelligence be completely modelled by a Turing machine? To give away the ending we show here that the answer is \emph{no}. More specifically we show that at least some thought processes of the brain cannot be Turing computable. In particular some physical processes are not Turing computable, which is not entirely expected. The main difference of our argument with the well known Lucas-Penrose argument is that we do not use G\’odel’s incompleteness theorem, (although our argument seems related to G\’odel’s) and we do not need to assume fundamental consistency of human reasoning powers, (which is controversial) we also side-step some meta-logical issues with their argument, which have also been controversial. The argument is via a thought experiment and at least partly physical, but no serious physical assumptions are made. Furthermore the argument can be reformed as an actual (likely future) experiment.

Very Short Term Time-Series Forecasting of Solar Irradiance Without Exogenous Inputs

This paper compares different forecast methods and models to predict average values of solar irradiance with a sampling time of 15 min over a prediction horizon of up to 3 h. The methods considered only require historic solar irradiance values, the current time and geographical location, i.e., no exogenous inputs are required. Nearest neighbor regression (NNR) and autoregressive integrated moving average (ARIMA) models are tested using different hyperparameters (e.g., the number of autoregressive lags, or the size of the training data set) and data from different locations and seasons. Based on a high number of different models, NNR is identified to be the more promising approach. The hyperparameters and their effect on the forecast quality are analyzed to identify properties which are likely to lead to good forecasts. Using these properties, a reduced search space is derived which can be used to identify good forecast models much faster. In a case study, the use of this search space is demonstrated by finding forecast models for different climatic situations.

Tentacular Artificial Intelligence, and the Architecture Thereof, Introduced

We briefly introduce herein a new form of distributed, multi-agent artificial intelligence, which we refer to as ‘tentacular.’ Tentacular AI is distinguished by six attributes, which among other things entail a capacity for reasoning and planning based in highly expressive calculi (logics), and which enlists subsidiary agents across distances circumscribed only by the reach of one or more given networks.

Adversarial Learning and Explainability in Structured Datasets

We theoretically and empirically explore the explainability benefits of adversarial learning in logistic regression models on structured datasets. In particular we focus on improved explainability due to significantly higher _feature-concentration_ in adversarially-learned models: Compared to natural training, adversarial training tends to much more efficiently shrink the weights of non-predictive and weakly-predictive features, while model performance on natural test data only degrades slightly (and even sometimes improves), compared to that of a naturally trained model. We provide a theoretical insight into this phenomenon via an analysis of the expectation of the logistic model weight updates by an SGD-based adversarial learning algorithm, where examples are drawn from a random binary data-generation process. We empirically demonstrate the feature-pruning effect on a synthetic dataset, some datasets from the UCI repository, and real-world large-scale advertising response-prediction data-sets from MediaMath. In several of the MediaMath datasets there are 10s of millions of data points, and on the order of 100,000 sparse categorical features, and adversarial learning often results in model-size reduction by a factor of 20 or higher, and yet the model performance on natural test data (measured by AUC) is comparable to (and sometimes even better) than that of the naturally trained model. We also show that traditional \ell_1 regularization does not even come close to achieving this level of feature-concentration. We measure ‘feature concentration’ using the Integrated Gradients-based feature-attribution method of Sundararajan et. al (2017), and derive a new closed-form expression for 1-layer networks, which substantially speeds up computation of aggregate feature attributions across a large dataset.

Optimizing Heuristics for Tableau-based OWL Reasoners

Optimization techniques play a significant role in improving description logic reasoners covering the Web Ontology Language (OWL). These techniques are essential to speed up these reasoners. Many of the optimization techniques are based on heuristic choices. Optimal heuristic selection makes these techniques more effective. The FaCT++ OWL reasoner and its Java version JFact implement an optimization technique called ToDo list which is a substitute for a traditional top-down approach in tableau-based reasoners. The ToDo list mechanism allows one to arrange the order of applying different rules by giving each a priority. Compared to a top-down approach, the ToDo list technique has a better control over the application of expansion rules. Learning the proper heuristic order for applying rules in ToDo lis} will have a great impact on reasoning speed. We use a binary SVM technique to build our learning model. The model can help to choose ontology-specific order sets to speed up OWL reasoning. On average, our learning approach tested with 40 selected ontologies achieves a speedup of two orders of magnitude when compared to the worst rule ordering choice.

Stop Illegal Comments: A Multi-Task Deep Learning Approach

Deep learning methods are often difficult to apply in the legal domain due to the large amount of labeled data required by deep learning methods. A recent new trend in the deep learning community is the application of multi-task models that enable single deep neural networks to perform more than one task at the same time, for example classification and translation tasks. These powerful novel models are capable of transferring knowledge among different tasks or training sets and therefore could open up the legal domain for many deep learning applications. In this paper, we investigate the transfer learning capabilities of such a multi-task model on a classification task on the publicly available Kaggle toxic comment dataset for classifying illegal comments and we can report promising results.

Deep Transfer Reinforcement Learning for Text Summarization

Deep neural networks are data hungry models and thus they face difficulties when used for training on small size data. Transfer learning is a method that could potentially help in such situations. Although transfer learning achieved great success in image processing, its effect in the text domain is yet to be well established especially due to several intricacies that arise in the context of document analysis and understanding. In this paper, we study the problem of transfer learning for text summarization and discuss why the existing state-of-the-art models for this problem fail to generalize well on other (unseen) datasets. We propose a reinforcement learning framework based on self-critic policy gradient method which solves this problem and achieves good generalization and state-of-the-art results on a variety of datasets. Through an extensive set of experiments, we also show the ability of our proposed framework in fine-tuning the text summarization model only with a few training samples. To the best of our knowledge, this is first work that studies transfer learning in text summarization and provides a generic solution that works well on unseen data.

Named-Entity Linking Using Deep Learning For Legal Documents: A Transfer Learning Approach

In the legal domain it is important to differentiate between words in general, and afterwards to link the occurrences of the same entities. The topic to solve these challenges is called Named-Entity Linking (NEL). Current supervised neural networks designed for NEL use publicly available datasets for training and testing. However, this paper focuses especially on the aspect of applying transfer learning approach using networks trained for NEL to legal documents. Experiments show consistent improvement in the legal datasets that were created from the European Union law in the scope of this research. Using transfer learning approach, we reached F1-score of 98.90\% and 98.01\% on the legal small and large test dataset.

Trellis Networks for Sequence Modeling

We present trellis networks, a new architecture for sequence modeling. On the one hand, a trellis network is a temporal convolutional network with special structure, characterized by weight tying across depth and direct injection of the input into deep layers. On the other hand, we show that truncated recurrent networks are equivalent to trellis networks with special sparsity structure in their weight matrices. Thus trellis networks with general weight matrices generalize truncated recurrent networks. We leverage these connections to design high-performing trellis networks that absorb structural and algorithmic elements from both recurrent and convolutional models. Experiments demonstrate that trellis networks outperform the current state of the art on a variety of challenging benchmarks, including word-level language modeling on Penn Treebank and WikiText-103, character-level language modeling on Penn Treebank, and stress tests designed to evaluate long-term memory retention. The code is available at https://…/trellisnet .

Neural Styling for Interpretable Fair Representations

We observe a rapid increase in machine learning models for learning data representations that remove the semantics of protected characteristics, and are therefore able to mitigate unfair prediction outcomes. This is indeed a positive proliferation. All available models however learn latent embeddings, therefore the produced representations do not have the semantic meaning of the input. Our aim here is to learn fair representations that are directly interpretable in the original input domain. We cast this problem as a data-to-data translation; to learn a mapping from data in a source domain to a target domain such that data in the target domain enforces fairness definitions, such as statistical parity or equality of opportunity. Unavailability of fair data in the target domain is the crux of the problem. This paper provides the first approach to learn a highly unconstrained mapping from source to target by maximizing (conditional) dependence of residuals – the difference between data and its translated version – and protected characteristics. The usage of residual statistics ensures that our generated fair data should only be an adjustment of the input data, and this adjustment should reveal the main difference between protected characteristic groups. When applied to CelebA face image dataset with gender as protected characteristic, our model enforces equality of opportunity by adjusting eyes and lips regions. In Adult income dataset, also with gender as protected characteristic, our model achieves equality of opportunity by, among others, obfuscating wife and husband relationship. Visualizing those systematic changes will allow us to scrutinize the interplay of fairness criterion, chosen protected characteristics, and the prediction performance.

Push-Pull Gradient Methods for Distributed Optimization in Networks

In this paper, we focus on solving a distributed convex optimization problem in a network, where each agent has its own convex cost function and the goal is to minimize the sum of the agents’ cost functions while obeying the network connectivity structure. In order to minimize the sum of the cost functions, we consider new distributed gradient-based methods where each node maintains two estimates, namely, an estimate of the optimal decision variable and an estimate of the gradient for the average of the agents’ objective functions. From the viewpoint of an agent, the information about the decision variable is pushed to the neighbors, while the information about the gradients is pulled from the neighbors hence giving the name ‘push-pull gradient methods’. This name is also due to the consideration of the implementation aspect: the push-communication-protocol and the pull-communication-protocol are respectively employed to implement certain steps in the numerical schemes. The methods utilize two different graphs for the information exchange among agents, and as such, unify the algorithms with different types of distributed architecture, including decentralized (peer-to-peer), centralized (master-slave), and semi-centralized (leader-follower) architecture. We show that the proposed algorithms and their many variants converge linearly for strongly convex and smooth objective functions over a network (possibly with unidirectional data links) in both synchronous and asynchronous random-gossip settings. We numerically evaluate our proposed algorithm for both static and time-varying graphs, and find that the algorithms are competitive as compared to other linearly convergent schemes.

Discriminator Rejection Sampling

We propose a rejection sampling scheme using the discriminator of a GAN to approximately correct errors in the GAN generator distribution. We show that under quite strict assumptions, this will allow us to recover the data distribution exactly. We then examine where those strict assumptions break down and design a practical algorithm – called Discriminator Rejection Sampling (DRS) – that can be used on real data-sets. Finally, we demonstrate the efficacy of DRS on a mixture of Gaussians and on the state of the art SAGAN model. On ImageNet, we train an improved baseline that increases the best published Inception Score from 52.52 to 62.36 and reduces the Frechet Inception Distance from 18.65 to 14.79. We then use DRS to further improve on this baseline, improving the Inception Score to 76.08 and the FID to 13.75.

DN-ResNet: Efficient Deep Residual Network for Image Denoising

A deep learning approach to blind denoising of images without complete knowledge of the noise statistics is considered. We propose DN-ResNet, which is a deep convolutional neural network (CNN) consisting of several residual blocks (ResBlocks). With cascade training, DN-ResNet is more accurate and more computationally efficient than the state of art denoising networks. An edge-aware loss function is further utilized in training DN-ResNet, so that the denoising results have better perceptive quality compared to conventional loss function. Next, we introduce the depthwise separable DN-ResNet (DS-DN-ResNet) utilizing the proposed Depthwise Seperable ResBlock (DS-ResBlock) instead of standard ResBlock, which has much less computational cost. DS-DN-ResNet is incrementally evolved by replacing the ResBlocks in DN-ResNet by DS-ResBlocks stage by stage. As a result, high accuracy and good computational efficiency are achieved concurrently. Whereas previous state of art deep learning methods focused on denoising either Gaussian or Poisson corrupted images, we consider denoising images having the more practical Poisson with additive Gaussian noise as well. The results show that DN-ResNets are more efficient, robust, and perform better denoising than current state of art deep learning methods, as well as the popular variants of the BM3D algorithm, in cases of blind and non-blind denoising of images corrupted with Poisson, Gaussian or Poisson-Gaussian noise. Our network also works well for other image enhancement task such as compressed image restoration.

Approximate Fisher Information Matrix to Characterise the Training of Deep Neural Networks

In this paper, we introduce a novel methodology for characterising the performance of deep learning networks (ResNets and DenseNet) with respect to training convergence and generalisation as a function of mini-batch size and learning rate for image classification. This methodology is based on novel measurements derived from the eigenvalues of the approximate Fisher information matrix, which can be efficiently computed even for high capacity deep models. Our proposed measurements can help practitioners to monitor and control the training process (by actively tuning the mini-batch size and learning rate) to allow for good training convergence and generalisation. Furthermore, the proposed measurements also allow us to show that it is possible to optimise the training process with a new dynamic sampling training approach that continuously and automatically change the mini-batch size and learning rate during the training process. Finally, we show that the proposed dynamic sampling training approach has a faster training time and a competitive classification accuracy compared to the current state of the art.

Evolutionary Stochastic Gradient Descent for Optimization of Deep Neural Networks

We propose a population-based Evolutionary Stochastic Gradient Descent (ESGD) framework for optimizing deep neural networks. ESGD combines SGD and gradient-free evolutionary algorithms as complementary algorithms in one framework in which the optimization alternates between the SGD step and evolution step to improve the average fitness of the population. With a back-off strategy in the SGD step and an elitist strategy in the evolution step, it guarantees that the best fitness in the population will never degrade. In addition, individuals in the population optimized with various SGD-based optimizers using distinct hyper-parameters in the SGD step are considered as competing species in a coevolution setting such that the complementarity of the optimizers is also taken into account. The effectiveness of ESGD is demonstrated across multiple applications including speech recognition, image recognition and language modeling, using networks with a variety of deep architectures.

Named Entity Analysis and Extraction with Uncommon Words

Most previous research treats named entity extraction and classification as an end-to-end task. We argue that the two sub-tasks should be addressed separately. Entity extraction lies at the level of syntactic analysis while entity classification lies at the level of semantic analysis. According to Noam Chomsky’s ‘Syntactic Structures,’ pp. 93-94 (Chomsky 1957), syntax is not appealed to semantics and semantics does not affect syntax. We analyze two benchmark datasets for the characteristics of named entities, finding that uncommon words can distinguish named entities from common text; where uncommon words are the words that hardly appear in common text and they are mainly the proper nouns. Experiments validate that lexical and syntactic features achieve state-of-the-art performance on entity extraction and that semantic features do not further improve the extraction performance, in both of our model and the state-of-the-art baselines. With Chomsky’s view, we also explain the failure of joint syntactic and semantic parsings in other works.

Fast Randomized PCA for Sparse Data

Principal component analysis (PCA) is widely used for dimension reduction and embedding of real data in social network analysis, information retrieval, and natural language processing, etc. In this work we propose a fast randomized PCA algorithm for processing large sparse data. The algorithm has similar accuracy to the basic randomized SVD (rPCA) algorithm (Halko et al., 2011), but is largely optimized for sparse data. It also has good flexibility to trade off runtime against accuracy for practical usage. Experiments on real data show that the proposed algorithm is up to 9.1X faster than the basic rPCA algorithm without accuracy loss, and is up to 20X faster than the svds in Matlab with little error. The algorithm computes the first 100 principal components of a large information retrieval data with 12,869,521 persons and 323,899 keywords in less than 400 seconds on a 24-core machine, while all conventional methods fail due to the out-of-memory issue.

Classes of treebased networks

Recently, so-called treebased phylogenetic networks have gained considerable interest in the literature, where a treebased network is a network that can be constructed from a phylogenetic tree, called the \emph{base tree}, by adding additional edges. The main aim of this manuscript is to provide some sufficient criteria for treebasedness by reducing phylogenetic networks to related graph structures. While it is generally known that deciding whether a network is treebased is NP-complete, one of these criteria, namely \emph{edgebasedness}, can be verified in polynomial time. Next to these edgebased networks, we introduce further classes of treebased networks and analyze their relationships.

Collaborative Deep Learning Across Multiple Data Centers

Valuable training data is often owned by independent organizations and located in multiple data centers. Most deep learning approaches require to centralize the multi-datacenter data for performance purpose. In practice, however, it is often infeasible to transfer all data to a centralized data center due to not only bandwidth limitation but also the constraints of privacy regulations. Model averaging is a conventional choice for data parallelized training, but its ineffectiveness is claimed by previous studies as deep neural networks are often non-convex. In this paper, we argue that model averaging can be effective in the decentralized environment by using two strategies, namely, the cyclical learning rate and the increased number of epochs for local model training. With the two strategies, we show that model averaging can provide competitive performance in the decentralized mode compared to the data-centralized one. In a practical environment with multiple data centers, we conduct extensive experiments using state-of-the-art deep network architectures on different types of data. Results demonstrate the effectiveness and robustness of the proposed method.

Cyber-Physical Systems, a new formal paradigm to model redundancy and resiliency

Cyber-Physical Systems (CPS) are systems composed by a physical component that is controlled or monitored by a cyber-component, a computer-based algorithm. Advances in CPS technologies and science are enabling capability, adaptability, scalability, resiliency, safety, security, and usability that will far exceed the simple embedded systems of today. CPS technologies are transforming the way people interact with engineered systems. New smart CPS are driving innovation in various sectors such as agriculture, energy, transportation, healthcare, and manufacturing. They are leading the 4-th Industrial Revolution (Industry 4.0) that is having benefits thanks to the high flexibility of production. The Industry 4.0 production paradigm is characterized by high intercommunicating properties of its production elements in all the manufacturing processes. This is the reason it is a core concept how the systems should be structurally optimized to have the adequate level of redundancy to be satisfactorily resilient. This goal can benefit from formal methods well known in various scientific domains such as artificial intelligence. So, the current research concerns the proposal of a CPS meta-model and its instantiation. In this way it lists all kind of relationships that may occur between the CPSs themselves and between their (cyber-and physical-) components. Using the CPS meta-model formalization, with an adaptation of the Formal Concept Analysis (FCA) formal approach, this paper presents a way to optimize the modelling of CPS systems emphasizing their redundancy and their resiliency.

TNE: A Latent Model for Representation Learning on Networks

Network representation learning (NRL) methods aim to map each vertex into a low dimensional space by preserving the local and global structure of a given network, and in recent years they have received a significant attention thanks to their success in several challenging problems. Although various approaches have been proposed to compute node embeddings, many successful methods benefit from random walks in order to transform a given network into a collection of sequences of nodes and then they target to learn the representation of nodes by predicting the context of each vertex within the sequence. In this paper, we introduce a general framework to enhance the embeddings of nodes acquired by means of the random walk-based approaches. Similar to the notion of topical word embeddings in NLP, the proposed method assigns each vertex to a topic with the favor of various statistical models and community detection methods, and then generates the enhanced community representations. We evaluate our method on two downstream tasks: node classification and link prediction. The experimental results demonstrate that the incorporation of vertex and topic embeddings outperform widely-known baseline NRL methods.

Biologically Plausible Online Principal Component Analysis Without Recurrent Neural Dynamics

Artificial neural networks that learn to perform Principal Component Analysis (PCA) and related tasks using strictly local learning rules have been previously derived based on the principle of similarity matching: similar pairs of inputs should map to similar pairs of outputs. However, the operation of these networks (and of similar networks) requires a fixed-point iteration to determine the output corresponding to a given input, which means that dynamics must operate on a faster time scale than the variation of the input. Further, during these fast dynamics such networks typically ‘disable’ learning, updating synaptic weights only once the fixed-point iteration has been resolved. Here, we derive a network for PCA-based dimensionality reduction that avoids this fast fixed-point iteration. The key novelty of our approach is a modification of the similarity matching objective to encourage near-diagonality of a synaptic weight matrix. We then approximately invert this matrix using a Taylor series approximation, replacing the previous fast iterations. In the offline setting, our algorithm corresponds to a dynamical system, the stability of which we rigorously analyze. In the online setting (i.e., with stochastic gradients), we map our algorithm to a familiar neural network architecture and give numerical results showing that our method converges at a competitive rate. The computational complexity per iteration of our online algorithm is linear in the total degrees of freedom, which is in some sense optimal.

Opinion Dynamics via Search Engines

Ranking algorithms are the information gatekeepers of the Internet era. We develop a stylized model to study the effects of ranking algorithms on opinion dynamics. We consider a search engine that uses an algorithm based on popularity and on personalization. We find that popularity-based rankings generate an advantage of the fewer effect: fewer websites reporting a given signal attract relatively more traffic overall. This highlights a novel, ranking-driven channel that explains the diffusion of misinformation, as websites reporting incorrect information may attract an amplified amount of traffic precisely because they are few. Furthermore, when individuals provide sufficiently positive feedback to the ranking algorithm, popularity-based rankings tend to aggregate information while personalization acts in the opposite direction.

How to Stop Off-the-Shelf Deep Neural Networks from Overthinking

While deep neural networks (DNNs) can perform complex classification tasks, most of their natural inputs do not necessitate the depth of the modern architectures. This leads to wasted computation, as the network overthinks on the simpler inputs. The overthinking problem could be prevented if standard DNNs could produce early predictions. However, prior work suggests that this is challenging in existing architectures, such as ResNet, as their internal layers are not trained for classification and optimizing them for accurate predictions hurts the end performance. In this paper, we explore the overthinking problem, and, as a remedy, we propose a generic modification to off-the-shelf DNNs—the Shallow-Deep Network (SDN). With this modification, a DNN can efficiently produce predictions from either shallow or deep layers, as appropriate for the given input. We employ feature reduction and a layer-wise objective function to train these progressively deeper internal classifiers while preserving the end-performance. We can apply the SDN modification either by training from scratch or by tuning a pre-trained model. Experiments on four architectures (VGG, ResNet, WideResNet, and MobileNet) and three image classifications tasks suggest that, for an average input, an SDN can produce a correct prediction before its middle layer. By avoiding unnecessary computation, the SDN can reduce the required number of operations for an input by 41% over the original network. Finally, we observe that disagreements among the early classifiers reliably indicate inputs where the network is likely to make a mistake. Building on this observation we propose an internal confusion metric and a method to diagnose misclassifications by visualizing these disagreements.

Improving Data Quality through Deep Learning and Statistical Models

Traditional data quality control methods are based on users experience or previously established business rules, and this limits performance in addition to being a very time consuming process with lower than desirable accuracy. Utilizing deep learning, we can leverage computing resources and advanced techniques to overcome these challenges and provide greater value to users. In this paper, we, the authors, first review relevant works and discuss machine learning techniques, tools, and statistical quality models. Second, we offer a creative data quality framework based on deep learning and statistical model algorithm for identifying data quality. Third, we use data involving salary levels from an open dataset published by the state of Arkansas to demonstrate how to identify outlier data and how to improve data quality via deep learning. Finally, we discuss future work.

Hunting for Discriminatory Proxies in Linear Regression Models

A machine learning model may exhibit discrimination when used to make decisions involving people. One potential cause for such outcomes is that the model uses a statistical proxy for a protected demographic attribute. In this paper we formulate a definition of proxy use for the setting of linear regression and present algorithms for detecting proxies. Our definition follows recent work on proxies in classification models, and characterizes a model’s constituent behavior that: 1) correlates closely with a protected random variable, and 2) is causally influential in the overall behavior of the model. We show that proxies in linear regression models can be efficiently identified by solving a second-order cone program, and further extend this result to account for situations where the use of a certain input variable is justified as a `business necessity’. Finally, we present empirical results on two law enforcement datasets that exhibit varying degrees of racial disparity in prediction outcomes, demonstrating that proxies shed useful light on the causes of discriminatory behavior in models.

Multimodal Deep Gaussian Processes

We propose a novel Bayesian approach to modelling multimodal data generated by multiple independent processes, simultaneously solving the data association and induced supervised learning problems. Underpinning our approach is the use of Gaussian process priors which encode structure both on the functions and the associations themselves. The association of samples and functions are determined by taking both inputs and outputs into account while also obtaining a posterior belief about the relevance of the global components throughout the input space. We present an efficient learning scheme based on doubly stochastic variational inference and discuss how it can be applied to deep Gaussian process priors. We show results for an artificial data set, a noise separation problem, and a multimodal regression problem based on the cart-pole benchmark.

Packaging and Sharing Machine Learning Models via the Acumos AI Open Platform

Applying Machine Learning (ML) to business applications for automation usually faces difficulties when integrating diverse ML dependencies and services, mainly because of the lack of a common ML framework. In most cases, the ML models are developed for applications which are targeted for specific business domain use cases, leading to duplicated effort, and making reuse impossible. This paper presents Acumos, an open platform capable of packaging ML models into portable containerized microservices which can be easily shared via the platform’s catalog, and can be integrated into various business applications. We present a case study of packaging sentiment analysis and classification ML models via the Acumos platform, permitting easy sharing with others. We demonstrate that the Acumos platform reduces the technical burden on application developers when applying machine learning models to their business applications. Furthermore, the platform allows the reuse of readily available ML microservices in various business domains.

FlowQA: Grasping Flow in History for Conversational Machine Comprehension
A Machine Learning Approach to Persian Text Readability Assessment Using a Crowdsourced Dataset
Word Embeddings from Large-Scale Greek Web content
Using Sentiment Representation Learning to Enhance Gender Classification for User Profiling
Understanding and Predicting the Memorability of Natural Scene Images
Exploring the Use of Attention within an Neural Machine Translation Decoder States to Translate Idioms
Regularized shadowing-based data assimilation method for imperfect models and its comparison to the weak constraint 4DVar method
Observer Based Path Following for Underactuated Marine Vessels in the Presence of Ocean Currents: A Global Approach – With proofs
Supplementary Material to: Passive Controller Realization of a Biquadratic Impedance with Double Poles and Zeros as a Seven-Element Series-Parallel Network for Effective Mechanical Control
Detecting Strategic Manipulation in Distributed Optimisation of Electric Vehicle Aggregators
Formal Concept Analysis with Many-sorted Attributes
Network tomography: a new structure spectrum for studying network dynamics
Topographic Representation for Quantum Machine Learning
Bregman Divergence Bounds and the Universality of the Logarithmic Loss
Recipe1M: A Dataset for Learning Cross-Modal Embeddings for Cooking Recipes and Food Images
Estimating hourly population distribution change at high spatiotemporal resolution in urban areas using geo-tagged tweets, land use data, and dasymetric maps
The Trajectory of Voice Onset Time with Vocal Aging
Fractional Laplacians in bounded domains: Killed, reflected, censored and taboo Lévy flights
Multifaceted dynamics of Janus oscillator networks
Evaluating Sensitivity to the Stick Breaking Prior in Bayesian Nonparametrics
Modeling of nonlinear audio effects with end-to-end deep neural networks
Computer model calibration based on image warping metrics: an application for sea ice deformation
Deep learning-based super-resolution in coherent imaging systems
Learning to Segment Corneal Tissue Interfaces in OCT Images
Diacritization of Maghrebi Arabic Sub-Dialects
Adversarial Inpainting of Medical Image Modalities
UNIQUE: Unsupervised Image Quality Estimation
Design and Control of a Photonic Neural Network Applied to High-Bandwidth Classification
Rank Dynamics for Functional Data
CAVBench: A Benchmark Suite for Connected and Autonomous Vehicles
The chromatic index of strongly regular graphs
Constructing classification trees using column generation
To Kavanaugh or Not to Kavanaugh: That is the Polarizing Question
Lesion Focused Super-Resolution
Dynamically Stable 3D Quadrupedal Walking with Multi-Domain Hybrid System Models and Virtual Constraint Controllers
Learning by Unsupervised Nonlinear Diffusion
Colouring Graphs with Sparse Neighbourhoods: Bounds and Applications
Building Representative Matched Samples with Multi-valued Treatments in Large Observational Studies: Analysis of the Impact of an Earthquake on Educational Attainment
Bringing Order to the Cognitive Fallacy Zoo
A Chebyshev-Accelerated Primal-Dual Method for Distributed Optimization
Heteroclinic Dynamics of Localized Frequency Synchrony: Heteroclinic Cycles for Small Populations
Optimizing Agent Behavior over Long Time Scales by Transporting Value
Robust Neural Machine Translation with Joint Textual and Phonetic Embedding
SPRT Based Transceiver for Molecular Communications
Stochastic Resonance in neural network, noise color effects
Random clique covers for graphs with local density and global sparsity
An Illuminating Algorithm for the Light Bulb Problem
Achieving Covert Wireless Communications Using a Full-Duplex Receiver
Assessing and Remedying Coverage for a Given Dataset
Marrying Universal Dependencies and Universal Morphology
Can Euroscepticism Contribute to a European Public Sphere? The Europeanization of Media Discourses about Euroscepticism across Six Countries
Using Deep Reinforcement Learning for the Continuous Control of Robotic Arms
Assessing the Contribution of Semantic Congruency to Multisensory Integration and Conflict Resolution
Optimally rotated coordinate systems for adaptive least-squares regression on sparse grids
Dessins D’enfants, Surface Algebras, and Dessin Orders
On the Peaks of a Stochastic Heat Equation on a Sphere with a Large Radius
A Direct Method to Learn States and Parameters of Ordinary Differential Equations
Convex expansion for finite distributive lattices with applications
Simple Policy Evaluation for Data-Rich Iterative Tasks
A survey of automatic de-identification of longitudinal clinical narratives
SINE: Scalable Incomplete Network Embedding
Antisymmetry of the stochastic order on all ordered metric spaces
On the local pairing behavior of critical points and roots of random polynomials
ProMP: Proximal Meta-Policy Search
Supplementary Material for ‘Estimation of a Multiplicative Correlation Structure in the Large Dimensional Case’
Learning Two-layer Neural Networks with Symmetric Inputs
A Robust Local Binary Similarity Pattern for Foreground Object Detection
Quasi-hyperbolic momentum and Adam for deep learning
Co-manifold learning with missing data
Morph: Flexible Acceleration for 3D CNN-based Video Understanding
On Finding Dense Subgraphs in Bipartite Graphs: Linear Algorithms with Applications to Fraud Detection
OAM Mode Selection and Space-Time Coding for Turbulence Mitigation in FSO Communications
First-passage processes on a filamentous track in a dense traffic: optimizing diffusive search for a target in crowding conditions
A Time-domain Analog Weighted-sum Calculation Model for Extremely Low Power VLSI Implementation of Multi-layer Neural Networks
When are stars the largest cross intersecting families?
Multi-Source Neural Machine Translation with Data Augmentation
Combined Static and Motion Features for Deep-Networks Based Activity Recognition in Videos
Lattice consensus: A partial order on phylogenetic trees that induces an associatively stable consensus method
Maximizing Monotone DR-submodular Continuous Functions by Derivative-free Optimization
SpiNNTools: The Execution Engine for the SpiNNaker Platform
Capacity Enhanced Cooperative D2D Systems over Rayleigh Fading Channels with NOMA
Finite-sample Analysis of M-estimators using Self-concordance
Sharp Analysis of Learning with Discrete Losses
On projections of the supercritical contact process: uniform mixing and cutoff phenomenon
Multi-budgeted directed cuts
Convergence of the Fleming-Viot process toward the minimal quasi-stationary distribution
Non-binary treebased unrooted phylogenetic networks and their relations to binary and rooted ones
The critical Barkhausen avalanches in thin random-field ferromagnets with an open boundary
Semantic Aware Attention Based Deep Object Co-segmentation
Faster Matrix Completion Using Randomized SVD
Randomized contractions meet lean decompositions
Sharp Asymptotics for the Truncated Two-Point Function of the Ising Model with a Positive Field
A Roadmap Towards Resilient Internet of Things for Cyber-Physical Systems
Semitotal Domination: New hardness results and a polynomial-time algorithm for graphs of bounded mim-width
A rotary frequency converter model for electromechanical transient studies of 16$\frac{2}{3}$ Hz railway systems
On the Simulation of Polynomial NARMAX Models
Rotational 3D Texture Classification Using Group Equivariant CNNs
The LORACs prior for VAEs: Letting the Trees Speak for the Data
A Generative Model of Textures Using Hierarchical Probabilistic Principal Component Analysis
Analysis of the incurred but not reported/infinite server queue process with semi-Markovian multivariate discounted inputs
Asymptotics for infinite server queues with fast/slow Markov switching and fat tailed service times
Creating a New Persian Poet Based on Machine Learning
Optimal control of a fractional order epidemic model with application to human respiratory syncytial virus infection
Strongly vertex-reinforced jump process on a complete graph
Coordinating Multiple Sources for Service Restoration to Enhance Resilience of Distribution Systems
Neural Morphological Tagging for Estonian
How to share a cake with a secret agent
MoCaNA, un agent de n{é}gociation automatique utilisant la recherche arborescente de Monte-Carlo
Millimeter-Wave for Unmanned Aerial Vehicles Networks: Enabling Multi-Beam Multi-Stream Communications
CNN-based Preprocessing to Optimize Watershed-based Cell Segmentation in 3D Confocal Microscopy Images
Large Intelligent Surfaces for Energy Efficiency in Wireless Communication
Channel Attention and Multi-level Features Fusion for Single Image Super-Resolution
UnrealROX: An eXtremely Photorealistic Virtual Reality Environment for Robotics Simulations and Synthetic Data Generation
Wireless Access in Ultra-Reliable Low-Latency Communication (URLLC)
Estimation of the Spatial Weighting Matrix for Spatiotemporal Data under the Presence of Structural Breaks
The Deep Weight Prior. Modeling a prior distribution for CNNs using generative models
Transverse confinement of ultrasound through the Anderson transition in 3D mesoglasses
Bypassing sluggishness: SWAP algorithm and glassiness in high dimensions
Deep Metric Learning with Hierarchical Triplet Loss
Universal Painlevé VI Probability Distribution in Pfaffian Persistence and Gaussian First-Passage Problems with a sech-Kernel
Backward doubly stochastic differential equations with random coefficients and quasilinear stochastic PDEs
Wi-Fi Direct Based Mobile Ad hoc Network
Social Behavior Learning with Realistic Reward Shaping
Covariate Gaussian Process Latent Variable Models
Anti-$k$-labeling of graphs
Clustering in statistical ill-posed linear inverse problems
LRW-1000: A Naturally-Distributed Large-Scale Benchmark for Lip Reading in the Wild
SCPNet: Spatial-Channel Parallelism Network for Joint Holistic and Partial Person Re-Identification
Efficient Greedy Coordinate Descent for Composite Problems
Influence of A-Posteriori Subcell Limiting on Fault Frequency in Higher-Order DG Schemes
Finer estimates on the 2-dimensional matching problem
Dense Multi-path U-Net for Ischemic Stroke Lesion Segmentation in Multiple Image Modalities
Density Deconvolution with Small Berkson Errors
IRA assisted MMC-based topology optimization method
Some local–global phenomena in locally finite graphs
Statistical time analysis for regular events with high count rate
Optimizing AIREBO: Navigating the Journey from Complex Legacy Code to High Performance
Extremal Value Theory for Long Range Dependent Stable Random Fields
Learning Inward Scaled Hypersphere Embedding: Exploring Projections in Higher Dimensions
Recent Advances in Mobile Grid and Cloud Computing
On free regular and Bondesson convolution semigroups
Generating Self-Guided Dense Annotations for Weakly Supervised Semantic Segmentation
A Multi-stage Framework with Context Information Fusion Structure for Skin Lesion Segmentation
Stochastic Negative Mining for Learning with Large Output Spaces
Always be Two Steps Ahead of Your Enemy
Caching at the Edge with LT codes
Real-Valued Evolutionary Multi-Modal Optimization driven by Hill-Valley Clustering
A note on backward stochastic differential equation with generator $f(y)|z|^2$
A Comparison of 1-D and 2-D Deep Convolutional Neural Networks in ECG Classification
INFODENS: An Open-source Framework for Learning Text Representations
Learning abstract planning domains and mappings to real world perceptions
Salient Object Detection in Video using Deep Non-Local Neural Networks
Liapunov Criteria for the Feller-Dynkin Property of Martingale Problems
Motifs corrélés rares : Caractérisation et nouvelles représentations concises
Universal Uhrig dynamical decoupling for bosonic systems
Lagrangian Approximations for Stochastic Reachability of a Target Tube
Multiple Interactions Made Easy (MIME): Large Scale Demonstrations Data for Imitation
Robust Gesture-Based Communication for Underwater Human-Robot Interaction in the context of Search and Rescue Diver Missions
The CoNLL–SIGMORPHON 2018 Shared Task: Universal Morphological Reinflection
Compatibility and attainability of matrices of correlation-based measures of concordance
High-dimensional Varying Index Coefficient Models via Stein’s Identity
KPZ equation tails for general initial data
A Substrate-Independent Framework to Characterise Reservoir Computers
Integral Transform Methods in Goodness-of-Fit Testing, I: The Gamma Distributions
Regularity and $h$-polynomials of edge ideals
$Φ-$entropy inequalities and asymmetric covariance estimates for convex measures
Joint Nonparametric Precision Matrix Estimation with Confounding
Recent Trends in Quasisymmetric Functions
Subword Semantic Hashing for Intent Classification on Small Datasets
Metropolis-Hastings view on variational inference and adversarial training
Strategies for Language Identification in Code-Mixed Low Resource Languages
Critical probability on the product graph of a regular tree and a line
Composable Action-Conditioned Predictors: Flexible Off-Policy Learning for Robot Navigation
An empirical evaluation of imbalanced data strategies from a practitioner’s point of view
Particle velocity controls phase transitions in contagion dynamics
Logic Negation with Spiking Neural P Systems