Improving Context Modelling in Multimodal Dialogue Generation

In this work, we investigate the task of textual response generation in a multimodal task-oriented dialogue system. Our work is based on the recently released Multimodal Dialogue (MMD) dataset (Saha et al., 2017) in the fashion domain. We introduce a multimodal extension to the Hierarchical Recurrent Encoder-Decoder (HRED) model and show that this extension outperforms strong baselines in terms of text-based similarity metrics. We also showcase the shortcomings of current vision and language models by performing an error analysis on our system’s output.

Ruuh: A Deep Learning Based Conversational Social Agent

Dialogue systems and conversational agents are becoming increasingly popular in the modern society but building an agent capable of holding intelligent conversation with its users is a challenging problem for artificial intelligence. In this demo, we demonstrate a deep learning based conversational social agent called ‘Ruuh’ ( designed by a team at Microsoft India to converse on a wide range of topics. Ruuh needs to think beyond the utilitarian notion of merely generating ‘relevant’ responses and meet a wider range of user social needs, like expressing happiness when user’s favorite team wins, sharing a cute comment on showing the pictures of the user’s pet and so on. The agent also needs to detect and respond to abusive language, sensitive topics and trolling behavior of the users. Many of these problems pose significant research challenges which will be demonstrated in our demo. Our agent has interacted with over 2 million real world users till date which has generated over 150 million user conversations.

The FOLE Table

This paper continues the discussion of the representation of ontologies in the first-order logical environment FOLE. According to Gruber, an ontology defines the primitives with which to model the knowledge resources for a community of discourse. These primitives, consisting of classes, relationships and properties, are represented by the entity-relationship-attribute ERA data model of Chen. An ontology uses formal axioms to constrain the interpretation of these primitives. In short, an ontology specifies a logical theory. A series of three papers by the author provide a rigorous mathematical representation for the ERA data model in particular, and ontologies in general, within FOLE. The first two papers, which provide a foundation and superstructure for FOLE, represent the formalism and semantics of (many-sorted) first-order logic in a classification form corresponding to ideas discussed in the Information Flow Framework (IFF). The third paper will define an interpretation of FOLE in terms of the transformational passage, first described in (Kent, 2013), from the classification form of first-order logic to an equivalent interpretation form, thereby defining the formalism and semantics of first-order logical/relational database systems. Two papers will provide a precise mathematical basis for FOLE interpretation: the current paper develops the notion of a FOLE relational table following the relational model of Codd, and a follow-up paper will develop the notion of a FOLE relational database. Both of these papers expand on material found in the paper (Kent, 2011). Although the classification form follows the entity-relationship-attribute data model of Chen, the interpretation form follows the relational data model of Codd. In general, the FOLE representation uses a conceptual structures approach, that is completely compatible with formal concept analysis and information flow.

Accumulating Knowledge for Lifelong Online Learning

Lifelong learning can be viewed as a continuous transfer learning procedure over consecutive tasks, where learning a given task depends on accumulated knowledge — the so-called knowledge base. Most published work on lifelong learning makes a batch processing of each task, implying that a data collection step is required beforehand. We are proposing a new framework, lifelong online learning, in which the learning procedure for each task is interactive. This is done through a computationally efficient algorithm where the predicted result for a given task is made by combining two intermediate predictions: by using only the information from the current task and by relying on the accumulated knowledge. In this work, two challenges are tackled: making no assumption on the task generation distribution, and processing with a possibly unknown number of instances for each task. We are providing a theoretical analysis of this algorithm, with a cumulative error upper bound for each task. We find that under some mild conditions, the algorithm can still benefit from a small cumulative error even when facing few interactions. Moreover, we provide experimental results on both synthetic and real datasets that validate the correct behavior and practical usefulness of the proposed algorithm.

Integration of CUDA Processing within the C++ library for parallelism and concurrency (HPX)

Experience shows that on today’s high performance systems the utilization of different acceleration cards in conjunction with a high utilization of all other parts of the system is difficult. Future architectures, like exascale clusters, are expected to aggravate this issue as the number of cores are expected to increase and memory hierarchies are expected to become deeper. One big aspect for distributed applications is to guarantee high utilization of all available resources, including local or remote acceleration cards on a cluster while fully using all the available CPU resources and the integration of the GPU work into the overall programming model. For the integration of CUDA code we extended HPX, a general purpose C++ run time system for parallel and distributed applications of any scale, and enabled asynchronous data transfers from and to the GPU device and the asynchronous invocation of CUDA kernels on this data. Both operations are well integrated into the general programming model of HPX which allows to seamlessly overlap any GPU operation with work on the main cores. Any user defined CUDA kernel can be launched on any (local or remote) GPU device available to the distributed application. We present asynchronous implementations for the data transfers and kernel launches for CUDA code as part of a HPX asynchronous execution graph. Using this approach we can combine all remotely and locally available acceleration cards on a cluster to utilize its full performance capabilities. Overhead measurements show, that the integration of the asynchronous operations (data transfer + launches of the kernels) as part of the HPX execution graph imposes no additional computational overhead and significantly eases orchestrating coordinated and concurrent work on the main cores and the used GPU devices.

Information Bottleneck Methods for Distributed Learning

We study a distributed learning problem in which Alice sends a compressed distillation of a set of training data to Bob, who uses the distilled version to best solve an associated learning problem. We formalize this as a rate-distortion problem in which the training set is the source and Bob’s cross-entropy loss is the distortion measure. We consider this problem for unsupervised learning for batch and sequential data. In the batch data, this problem is equivalent to the information bottleneck (IB), and we show that reduced-complexity versions of standard IB methods solve the associated rate-distortion problem. For the streaming data, we present a new algorithm, which may be of independent interest, that solves the rate-distortion problem for Gaussian sources. Furthermore, to improve the results of the iterative algorithm for sequential data we introduce a two-pass version of this algorithm. Finally, we show the dependency of the rate on the number of samples k required for Gaussian sources to ensure cross-entropy loss that scales optimally with the growth of the training set.

Learning and Interpreting Multi-Multi-Instance Learning Networks

We introduce an extension of the multi-instance learning problem where examples are organized as nested bags of instances (e.g., a document could be represented as a bag of sentences, which in turn are bags of words). This framework can be useful in various scenarios, such as text and image classification, but also supervised learning over graphs. As a further advantage, multi-multi instance learning enables a particular way of interpreting predictions and the decision function. Our approach is based on a special neural network layer, called bag-layer, whose units aggregate bags of inputs of arbitrary size. We prove theoretically that the associated class of functions contains all Boolean functions over sets of sets of instances and we provide empirical evidence that functions of this kind can be actually learned on semi-synthetic datasets. We finally present experiments on text classification and on citation graphs and social graph data, showing that our model obtains competitive results with respect to other approaches such as convolutional networks on graphs.

A Theoretical Framework of Approximation Error Analysis of Evolutionary Algorithms

In the empirical study of evolutionary algorithms, the solution quality is evaluated by either the fitness value or approximation error. The latter measures the fitness difference between an approximation solution and the optimal solution. Since the approximation error analysis is more convenient than the direct estimation of the fitness value, this paper focuses on approximation error analysis. However, it is straightforward to extend all related results from the approximation error to the fitness value. Although the evaluation of solution quality plays an essential role in practice, few rigorous analyses have been conducted on this topic. This paper aims at establishing a novel theoretical framework of approximation error analysis of evolutionary algorithms for discrete optimization. This framework is divided into two parts. The first part is about exact expressions of the approximation error. Two methods, Jordan form and Schur’s triangularization, are presented to obtain an exact expression. The second part is about upper bounds on approximation error. Two methods, convergence rate and auxiliary matrix iteration, are proposed to estimate the upper bound. The applicability of this framework is demonstrated through several examples.

Mobile Sensor Data Anonymization

Data from motion sensors such as accelerometers and gyroscopes embedded in our devices can reveal secondary undesired, private information about our activities. This information can be used for malicious purposes such as user identification by application developers. To address this problem, we propose a data transformation mechanism that enables a device to share data for specific applications (e.g.~monitoring their daily activities) without revealing private user information (e.g.~ user identity). We formulate this anonymization process based on an information theoretic approach and propose a new multi-objective loss function for training convolutional auto-encoders~(CAEs) to provide a practical approximation to our anonymization problem. This effective loss function forces the transformed data to minimize the information about the user’s identity, as well as the data distortion to preserve application-specific utility. Our training process regulates the encoder to disregard user-identifiable patterns and tunes the decoder to shape the final output independently of users in the training set. Then, a trained CAE can be deployed on a user’s mobile device to anonymize sensor data before sharing with an app, even for users who are not included in the training dataset. The results, on a dataset of 24 users for activity recognition, show a promising trade-off on transformed data between utility and privacy, with an accuracy for activity recognition over 92%, while reducing the chance of identifying a user to less than 7%.

Estimators for Multivariate Information Measures in General Probability Spaces

Information theoretic quantities play an important role in various settings in machine learning, including causality testing, structure inference in graphical models, time-series problems, feature selection as well as in providing privacy guarantees. A key quantity of interest is the mutual information and generalizations thereof, including conditional mutual information, multivariate mutual information, total correlation and directed information. While the aforementioned information quantities are well defined in arbitrary probability spaces, existing estimators add or subtract entropies (we term them \Sigma H methods). These methods work only in purely discrete space or purely continuous case since entropy (or differential entropy) is well defined only in that regime. In this paper, we define a general graph divergence measure (\mathbb{GDM}), as a measure of incompatibility between the observed distribution and a given graphical model structure. This generalizes the aforementioned information measures and we construct a novel estimator via a coupling trick that directly estimates these multivariate information measures using the Radon-Nikodym derivative. These estimators are proven to be consistent in a general setting which includes several cases where the existing estimators fail, thus providing the only known estimators for the following settings: (1) the data has some discrete and some continuous-valued components (2) some (or all) of the components themselves are discrete-continuous mixtures (3) the data is real-valued but does not have a joint density on the entire space, rather is supported on a low-dimensional manifold. We show that our proposed estimators significantly outperform known estimators on synthetic and real datasets.

Gradient-Free Learning Based on the Kernel and the Range Space

In this article, we show that solving the system of linear equations by manipulating the kernel and the range space is equivalent to solving the problem of least squares error approximation. This establishes the ground for a gradient-free learning search when the system can be expressed in the form of a linear matrix equation. When the nonlinear activation function is invertible, the learning problem of a fully-connected multilayer feedforward neural network can be easily adapted for this novel learning framework. By a series of kernel and range space manipulations, it turns out that such a network learning boils down to solving a set of cross-coupling equations. By having the weights randomly initialized, the equations can be decoupled and the network solution shows relatively good learning capability for real world data sets of small to moderate dimensions. Based on the structural information of the matrix equation, the network representation is found to be dependent on the number of data samples and the output dimension.

Learning Abstract Options

Building systems that autonomously create temporal abstractions from data is a key challenge in scaling learning and planning in reinforcement learning. One popular approach for addressing this challenge is the options framework (Sutton et al., 1999). However, only recently in (Bacon et al., 2017) was a policy gradient theorem derived for online learning of general purpose options in an end to end fashion. In this work, we extend previous work on this topic that only focuses on learning a two-level hierarchy including options and primitive actions to enable learning simultaneously at multiple resolutions in time. We achieve this by considering an arbitrarily deep hierarchy of options where high level temporally extended options are composed of lower level options with finer resolutions in time. We extend results from (Bacon et al., 2017) and derive policy gradient theorems for a deep hierarchy of options. Our proposed hierarchical option-critic architecture is capable of learning internal policies, termination conditions, and hierarchical compositions over options without the need for any intrinsic rewards or subgoals. Our empirical results in both discrete and continuous environments demonstrate the efficiency of our framework.

A New Loss Function for Temperature Scaling to have Better Calibrated Deep Networks

However Deep neural networks recently have achieved impressive results for different tasks, they suffer from poor uncertainty prediction. Temperature Scaling(TS) is an efficient post-processing method for calibrating DNNs toward to have more accurate uncertainty prediction. TS relies on a single parameter T which softens the logit layer of a DNN and the optimal value of it is found by minimizing on Negative Log Likelihood (NLL) loss function. In this paper, we discuss about weakness of NLL loss function, especially for DNNs with high accuracy and propose a new loss function called Attended-NLL which can improve TS calibration ability significantly

Time series clustering based on the characterisation of segment typologies

Time series clustering is the process of grouping time series with respect to their similarity or characteristics. Previous approaches usually combine a specific distance measure for time series and a standard clustering method. However, these approaches do not take the similarity of the different subsequences of each time series into account, which can be used to better compare the time series objects of the dataset. In this paper, we propose a novel technique of time series clustering based on two clustering stages. In a first step, a least squares polynomial segmentation procedure is applied to each time series, which is based on a growing window technique that returns different-length segments. Then, all the segments are projected into same dimensional space, based on the coefficients of the model that approximates the segment and a set of statistical features. After mapping, a first hierarchical clustering phase is applied to all mapped segments, returning groups of segments for each time series. These clusters are used to represent all time series in the same dimensional space, after defining another specific mapping process. In a second and final clustering stage, all the time series objects are grouped. We consider internal clustering quality to automatically adjust the main parameter of the algorithm, which is an error threshold for the segmenta- tion. The results obtained on 84 datasets from the UCR Time Series Classification Archive have been compared against two state-of-the-art methods, showing that the performance of this methodology is very promising.

Informative Features for Model Comparison

Given two candidate models, and a set of target observations, we address the problem of measuring the relative goodness of fit of the two models. We propose two new statistical tests which are nonparametric, computationally efficient (runtime complexity is linear in the sample size), and interpretable. As a unique advantage, our tests can produce a set of examples (informative features) indicating the regions in the data domain where one model fits significantly better than the other. In a real-world problem of comparing GAN models, the test power of our new test matches that of the state-of-the-art test of relative goodness of fit, while being one order of magnitude faster.

Fabrik: An Online Collaborative Neural Network Editor

We present Fabrik, an online neural network editor that provides tools to visualize, edit, and share neural networks from within a browser. Fabrik provides a simple and intuitive GUI to import neural networks written in popular deep learning frameworks such as Caffe, Keras, and TensorFlow, and allows users to interact with, build, and edit models via simple drag and drop. Fabrik is designed to be framework agnostic and support high interoperability, and can be used to export models back to any supported framework. Finally, it provides powerful collaborative features to enable users to iterate over model design remotely and at scale.

Towards Understanding Learning Representations: To What Extent Do Different Neural Networks Learn the Same Representation

It is widely believed that learning good representations is one of the main reasons for the success of deep neural networks. Although highly intuitive, there is a lack of theory and systematic approach quantitatively characterizing what representations do deep neural networks learn. In this work, we move a tiny step towards a theory and better understanding of the representations. Specifically, we study a simpler problem: How similar are the representations learned by two networks with identical architecture but trained from different initializations. We develop a rigorous theory based on the neuron activation subspace match model. The theory gives a complete characterization of the structure of neuron activation subspace matches, where the core concepts are maximum match and simple match which describe the overall and the finest similarity between sets of neurons in two networks respectively. We also propose efficient algorithms to find the maximum match and simple matches. Finally, we conduct extensive experiments using our algorithms. Experimental results suggest that, surprisingly, representations learned by the same convolutional layers of networks trained from different initializations are not as similar as prevalently expected, at least in terms of subspace match.

P-MCGS: Parallel Monte Carlo Acyclic Graph Search

Recently, there have been great interests in Monte Carlo Tree Search (MCTS) in AI research. Although the sequential version of MCTS has been studied widely, its parallel counterpart still lacks systematic study. This leads us to the following questions: \emph{how to design efficient parallel MCTS (or more general cases) algorithms with rigorous theoretical guarantee? Is it possible to achieve linear speedup?} In this paper, we consider the search problem on a more general acyclic one-root graph (namely, Monte Carlo Graph Search (MCGS)), which generalizes MCTS. We develop a parallel algorithm (P-MCGS) to assign multiple workers to investigate appropriate leaf nodes simultaneously. Our analysis shows that P-MCGS algorithm achieves linear speedup and that the sample complexity is comparable to its sequential counterpart.

Machine Learning in Network Centrality Measures: Tutorial and Outlook

Complex networks are ubiquitous to several Computer Science domains. Centrality measures are an important analysis mechanism to uncover vital elements of complex networks. However, these metrics have high computational costs and requirements that hinder their applications in large real-world networks. In this tutorial, we explain how the use of neural network learning algorithms can render the application of the metrics in complex networks of arbitrary size. Moreover, the tutorial describes how to identify the best configuration for neural network training and learning such for tasks, besides presenting an easy way to generate and acquire training data. We do so by means of a general methodology, using complex network models adaptable to any application. We show that a regression model generated by the neural network successfully approximates the metric values and therefore are a robust, effective alternative in real-world applications. The methodology and proposed machine learning model use only a fraction of time with respect to other approximation algorithms, which is crucial in complex network applications.

Learning with Analytical Models

To understand and predict the performance of parallel and distributed programs, several analytical and machine learning approaches have been proposed, each having its advantages and disadvantages. In this paper, we propose and validate a hybrid approach exploiting both analytical and machine learning models. The hybrid model is able to learn and correct the analytical models to better match the actual performance. Furthermore, the proposed hybrid model improves the prediction accuracy in comparison to pure machine learning techniques while using small training datasets, thus making it suitable for hardware and workload changes.

Identifying Causal Structure in Large-Scale Kinetic Systems

In the natural sciences, differential equations are widely used to describe dynamical systems. The discovery and verification of such models from data has become a fundamental challenge of science today. From a statistical point of view, we distinguish two problems: parameter estimation and structure search. In parameter estimation, we start from a given differential equation and estimate the parameters from noisy data that are observed at discrete time points. The estimate depends nonlinearly on the parameters. This poses both statistical and computational challenges and makes the task of structure search even more ambitious. Existing methods use either standard model selection techniques or various types of sparsity enforcing regularization, hence focusing on predictive performance. In this work, we develop novel methodology for structure search in ordinary differential equation models. Exploiting ideas from causal inference, we propose to rank models not only by their predictive performance, but also by taking into account stability, i.e., their ability to predict well in different experimental settings. Based on this model ranking we also construct a ranking of individual variables reflecting causal importance. It provides researchers with a list of promising candidate variables that may be investigated further in interventional experiments. Our ranking methodology (both for models and variables) comes with theoretical asymptotic guarantees and is shown to outperform current state-of-the art methods based on extensive experimental evaluation on simulated data. Practical applicability of the procedure is illustrated on a not yet published biological data set. Our methodology is fully implemented. Code will be provided online and will also be made available as an R package.

Active Anomaly Detection with Switching Cost

The problem of anomaly detection among multiple processes is considered within the framework of sequential design of experiments. The objective is an active inference strategy consisting of a selection rule governing which process to probe at each time, a stopping rule on when to terminate the detection, and a decision rule on the final detection outcome. The performance measure is the Bayes risk that takes into account of not only sample complexity and detection errors, but also costs associated with switching across processes. While the problem is a partially observable Markov decision process to which optimal solutions are generally intractable, a low-complexity deterministic policy is shown to be asymptotically optimal and offer significant performance improvement over existing methods in the finite regime.

A Knowledge-Grounded Multimodal Search-Based Conversational Agent
Intraday Seasonalities and Nonstationarity of Trading Volume in Financial Markets: Individual and Cross-Sectional Features
80-Channel WDM-MDM Transmission over 50-km Ring-Core Fiber Using a Compact OAM DEMUX and Modular 4×4 MIMO Equalization
Symmetry and Stability of Homogenuous Flocks. A Position Paper
Quantum Entanglement in Corpuses of Documents
Estimating grouped data models with a binary dependent variable and fixed effect via logit vs OLS: the impact of dropped units
A Stochastic Maximum Principle for Control Problems Constrained by the Stochastic Navier-Stokes Equations
Finding Answers from the Word of God: Domain Adaptation for Neural Networks in Biblical Question Answering
A Scalable Pipelined Dataflow Accelerator for Object Region Proposals on FPGA Platform
Dimension-wise Multivariate Orthogonal Polynomials in General Probability Spaces
Extractive Summarization of EHR Discharge Notes
Named Person Coreference in English News
Testing Exponentiality Against a Trend Change in Mean Time to Failure in Age Replacement
Can Entropy Explain Successor Surprisal Effects in Reading?
Transfer of Deep Reactive Policies for MDP Planning
Negative Representation and Instability in Democratic Elections
Empirical Evaluation of Contextual Policy Search with a Comparison-based Surrogate Model and Active Covariance Matrix Adaptation
Parsing Coordination for Spoken Language Understanding
Automatic Identification and Ranking of Emergency Aids in Social Media Macro Community
Stability-certified reinforcement learning: A control-theoretic perspective
Efficient Distributed Hessian Free Algorithm for Large-scale Empirical Risk Minimization via Accumulating Sample Strategy
Neural Network-Based Approach to Phase Space Integration
Noise Sensitivity of Local Descriptors vs ConvNets: An application to Facial Recognition
An Acceleration Scheme to The Local Directional Pattern
Spectrogram-channels u-net: a source separation model viewing each channel as the spectrogram of each source
Whetstone: A Method for Training Deep Artificial Neural Networks for Binary Communication
Algebraic tests of general Gaussian latent tree models
Automatic differentiation in ML: Where we are and where we should be going
A simple controller for the transition maneuver of a tail-sitter drone
Automatic Graphics Program Generation using Attention-Based Hierarchical Decoder
Human-Robot Trust Integrated Task Allocation and Symbolic Motion planning for Heterogeneous Multi-robot Systems
Revisiting CFR+ and Alternating Updates
Quantifying Learning Guarantees for Convex but Inconsistent Surrogates
Efficiently Combining Human Demonstrations and Interventions for Safe Training of Autonomous Systems in Real-Time
Unsupervised Multi-Target Domain Adaptation: An Information Theoretic Approach
On the Identifiability of the Influence Model for Stochastic Spatiotemporal Spread Processes
Deep Convolutional Neural Network Applied to Quality Assessment for Video Tracking
Coherent systems of probability measures on graphs for representations of free Frobenius towers
A Note on the Expected Number of Interviews When Talent is Uniformly Distributed
MCA-based Rule Mining Enables Interpretable Inference in Clinical Psychiatry
Monitoring the shape of weather, soundscapes, and dynamical systems: a new statistic for dimension-driven data analysis on large data sets
Distributed Stochastic Approximation for Solving Network Optimization Problems Under Random Quantization
Analysis of KNN Information Estimators for Smooth Distributions
Short-segment heart sound classification using an ensemble of deep convolutional neural networks
Sampling of Planar Curves: Theory and Fast Algorithms
Faber-Krahn type inequalities and uniqueness of positive solutions on metric measure spaces
$A^2$-Nets: Double Attention Networks
Attacks Meet Interpretability: Attribute-steered Detection of Adversarial Samples
Applying Fourier Analysis to Judgment Aggregation
Study of Joint Automatic Gain Control and MMSE Receiver Design Techniques for Quantized Multiuser Multiple-Antenna Systems
Estimating Differential Entropy under Gaussian Convolutions
Sensitivity indices for output on a Riemannian manifold
Nonlocal flocking dynamics: Learning the fractional order of PDEs from particle simulations
Groupcast Index Coding Problem: Joint Extensions
Self-Supervised GAN to Counter Forgetting
Reduced-order Aggregate Dynamical Model for Wind Farms
A Miniaturized Semantic Segmentation Method for Remote Sensing Image
Generating Multi-Scroll Chua’s Attractors via Simplified Piecewise-Linear Chua’s Diode
On the linear static output feedback problem: the annihilating polynomial approach
Soft-Gated Warping-GAN for Pose-Guided Person Image Synthesis
Handling Imbalanced Dataset in Multi-label Text Categorization using Bagging and Adaptive Boosting
Learning and Management for Internet-of-Things: Accounting for Adaptivity and Scalability
Deep learning tutorial for denoising
Metric regularity under Gâteaux differentiability with applications to optimization and stochastic optimal control problems
On operations preserving semi-transitive orientability of graphs
Performance of Energy-Buffer Aided Incremental Relaying in Cooperative Networks
Agent-based models of collective intelligence
Newton method for finding a singularity of a special class of locally Lipschitz continuous vector fields on Riemannian manifolds
Inexact Newton method with feasible inexact projections for solving constrained smooth and nonsmooth equations
A Cross-Modal Distillation Network for Person Re-identification in RGB-Depth
Thermodynamic Limit of the Transition Rate of a Crystalline Defect
Removing Hidden Confounding by Experimental Grounding
On the Equivalence of Convolutional and Hadamard Networks using DFT
3D MRI brain tumor segmentation using autoencoder regularization
A privacy-preserving system for data ownership using blockchain and distributed databases
Minimizing Branching Vertices in Distance-preserving Subgraphs
A Biologically Motivated Asymmetric Exclusion Process: interplay of congestion in RNA polymerase traffic and slippage of nascent transcript
Suspicious News Detection Using Micro Blog Text
Calibration of imperfect mathematical models by multiple sources of data with measurement bias
Towards Smart City Innovation Under the Perspective of Software-Defined Networking, Artificial Intelligence and Big Data
Merging the A- and Q-spectral theories for digraphs
A no-regret generalization of hierarchical softmax to extreme multi-label classification
Average Convergence Rate of Evolutionary Algorithms II: Continuous Optimization
The Variational Deficiency Bottleneck
ERS approximation for solving Schrödinger’s equation and applications
Direct Quantitative Photoacoustic Tomography for realistic acoustic media
Accelerated Inference in Markov Random Fields via Smooth Riemannian Optimization
3D Terrain Segmentation in the SWIR Spectrum
Stein Variational Gradient Descent as Moment Matching
Resolutions of ideals associated to subspace arrangements
Designing Refund Bonus Schemes for Provision Point Mechanism in Civic Crowdfunding
Dealing with Uncertain Inputs in Regression Trees
Minimum Reload Cost Graph Factors
Hull Form Optimization with Principal Component Analysis and Deep Neural Network
Multi-Agent Common Knowledge Reinforcement Learning
Wi-Motion: A Robust Human Activity Recognition Using WiFi Signals
From Communication to Sensing : Recognizing and Counting Repetitive Motions with Wireless Backscattering
Flash Photography for Data-Driven Hidden Scene Recovery
Regularization Effect of Fast Gradient Sign Method and its Generalization
Training Frankenstein’s Creature to Stack: HyperTree Architecture Search
FBMC Prototype Filter Design via Convex Optimization
On the role of ML estimation and Bregman divergences in sparse representation of covariance and precision matrices
An algorithmically random family of MultiAspect Graphs and its topological properties
The B-Exponential Divergence and its Generalizations with Applications to Parametric Estimation
Nearly subadditive sequences
Using Fractional Programming for Zero-Norm Approximation
Towards Robust Deep Neural Networks
Low-shot Learning via Covariance-Preserving Adversarial Augmentation Networks
Real-time Action Recognition with Dissimilarity-based Training of Specialized Module Networks
Post-prognostics decision in Cyber-Physical Systems
Middle-Out Decoding
Gaussian Process Prior Variational Autoencoders
Large triangle packings and Tuza’s conjecture in sparse random graphs
A Convex Duality Framework for GANs
Non-Commutative Integrability of the Grassmann Pentagram Map
Another Enumeration of Caterpillar Trees
On buffered double autoregressive time series models
Sample Complexity for Nonlinear Dynamics
DQN-TAMER: Human-in-the-Loop Reinforcement Learning with Intractable Feedback
On Learning Markov Chains
Asymptotic Gilbert-Varshamov bound on Frequency Hopping
Distributive Dynamic Spectrum Access through Deep Reinforcement Learning: A Reservoir Computing Based Approach
On hitting time, mixing time and geometric interpretations of Metropolis-Hastings reversiblizations
Learning Sparse Neural Networks via Sensitivity-Driven Regularization
Towards Human Pulse Rate Estimation from Face Video: Automatic Component Selection and Comparison of Blind Source Separation Methods
Deep Affinity Network for Multiple Object Tracking
Broadcasting Information subject to State Masking
RecurJac: An Efficient Recursive Algorithm for Bounding Jacobian Matrix of Neural Networks and Its Applications
A Hitchhiker’s Guide On Distributed Training of Deep Neural Networks
A Tube-based MPC Scheme for Interaction Control of Underwater Vehicle Manipulator Systems
Multilevel Path Simulation to Jump-Diffusion Process with Superlinear Drift
Conic Programming: Infeasibility Certificates and Projective Geometry
Robust Audio Adversarial Example for a Physical Attack
Cascaded Pyramid Mining Network for Weakly Supervised Temporal Action Localization
Collapsibility of simplicial complexes of hypergraphs
Robots Learning to Say `No’: Prohibition and Rejective Mechanisms in Acquisition of Linguistic Negation
Discrimination-aware Channel Pruning for Deep Neural Networks
Random walk on comb-type subsets of Z^2
Object Tracking in Hyperspectral Videos with Convolutional Features and Kernelized Correlation Filter
Monochromatic $k$-edge-connection colorings of graphs
Multi-Spectral Imaging via Computed Tomography (MUSIC) – Comparing Unsupervised Spectral Segmentations for Material Differentiation
$m$-adic residue codes over $\mathbb{F}_q[v]/(v^s-v)$
On preserving non-discrimination when combining expert advice
Latency-Reliability Tradeoffs for State Estimation
VDMS: An Efficient Big-Visual-Data Access for Machine Learning Workloads
Enhanced CNN for image denoising