Detecting Adversarial Examples Based on Steganalysis

Deep Neural Networks (DNNs) have recently led to significant improvement in many fields, such as image classification. However, these machine learning models are vulnerable to adversarial examples which can mislead machine learning classifiers to give incorrect classifications. Adversarial examples pose security concerns in areas where privacy requirements are strict, such as face recognition, autonomous cars and malware detection. What’s more, they could be used to perform an attack on machine learning systems, even if the adversary has no access to the underlying model. In this paper, we focus on detecting adversarial examples. We propose to augment deep neural networks with a detector. The detector is constructed by modeling the differences between adjacent pixels in natural images. And then we identify deviations from this model and assume that such deviations are due to adversarial attack. We construct the detector based on steganalysis which can detect minor modifications to an image because the adversarial attack can be treated as a sort of accidental steganography.

Hierarchical Graph Representation Learning with Differentiable Pooling

Recently, graph neural networks (GNNs) have revolutionized the field of graph representation learning through effectively learned node embeddings, and achieved state-of-the-art results in tasks such as node classification and link prediction. However, current GNN methods are inherently flat and do not learn hierarchical representations of graphs—a limitation that is especially problematic for the task of graph classification, where the goal is to predict the label associated with an entire graph. Here we propose DiffPool, a differentiable graph pooling module that can generate hierarchical representations of graphs and can be combined with various graph neural network architectures in an end-to-end fashion. DiffPool learns a differentiable soft cluster assignment for nodes at each layer of a deep GNN, mapping nodes to a set of clusters, which then form the coarsened input for the next GNN layer. Our experimental results show that combining existing GNN methods with DiffPool yields an average improvement of 5-10% accuracy on graph classification benchmarks, compared to all existing pooling approaches, achieving a new state-of-the-art on four out of five benchmark data sets.

PCA of high dimensional random walks with comparison to neural network training

One technique to visualize the training of neural networks is to perform PCA on the parameters over the course of training and to project to the subspace spanned by the first few PCA components. In this paper we compare this technique to the PCA of a high dimensional random walk. We compute the eigenvalues and eigenvectors of the covariance of the trajectory and prove that in the long trajectory and high dimensional limit most of the variance is in the first few PCA components, and that the projection of the trajectory onto any subspace spanned by PCA components is a Lissajous curve. We generalize these results to a random walk with momentum and to an Ornstein-Uhlenbeck processes (i.e., a random walk in a quadratic potential) and show that in high dimensions the walk is not mean reverting, but will instead be trapped at a fixed distance from the minimum. We finally compare the distribution of PCA variances and the PCA projected training trajectories of a linear model trained on CIFAR-10 and ResNet-50-v2 trained on Imagenet and find that the distribution of PCA variances resembles a random walk with drift.

xGEMs: Generating Examplars to Explain Black-Box Models

This work proposes xGEMs or manifold guided exemplars, a framework to understand black-box classifier behavior by exploring the landscape of the underlying data manifold as data points cross decision boundaries. To do so, we train an unsupervised implicit generative model — treated as a proxy to the data manifold. We summarize black-box model behavior quantitatively by perturbing data samples along the manifold. We demonstrate xGEMs’ ability to detect and quantify bias in model learning and also for understanding the changes in model behavior as training progresses.

The Foundations of Deep Learning with a Path Towards General Intelligence

Like any field of empirical science, AI may be approached axiomatically. We formulate requirements for a general-purpose, human-level AI system in terms of postulates. We review the methodology of deep learning, examining the explicit and tacit assumptions in deep learning research. Deep Learning methodology seeks to overcome limitations in traditional machine learning research as it combines facets of model richness, generality, and practical applicability. The methodology so far has produced outstanding results due to a productive synergy of function approximation, under plausible assumptions of irreducibility and the efficiency of back-propagation family of algorithms. We examine these winning traits of deep learning, and also observe the various known failure modes of deep learning. We conclude by giving recommendations on how to extend deep learning methodology to cover the postulates of general-purpose AI including modularity, and cognitive architecture. We also relate deep learning to advances in theoretical neuroscience research.

Deep Reinforcement Learning: An Overview

In recent years, a specific machine learning method called deep learning has gained huge attraction, as it has obtained astonishing results in broad applications such as pattern recognition, speech recognition, computer vision, and natural language processing. Recent research has also been shown that deep learning techniques can be combined with reinforcement learning methods to learn useful representations for the problems with high dimensional raw data input. This chapter reviews the recent advances in deep reinforcement learning with a focus on the most used deep architectures such as autoencoders, convolutional neural networks and recurrent neural networks which have successfully been come together with the reinforcement learning framework.

Towards Practical Visual Search Engine within Elasticsearch

In this paper, we describe our end-to-end content-based image retrieval system built upon Elasticsearch, a well-known and popular textual search engine. As far as we know, this is the first time such a system has been implemented in eCommerce, and our efforts have turned out to be highly worthwhile. We end up with a novel and exciting visual search solution that is extremely easy to be deployed, distributed, scaled and monitored in a cost-friendly manner. Moreover, our platform is intrinsically flexible in supporting multimodal searches, where visual and textual information can be jointly leveraged in retrieval. The core idea is to encode image feature vectors into a collection of string tokens in a way such that closer vectors will share more string tokens in common. By doing that, we can utilize Elasticsearch to efficiently retrieve similar images based on similarities within encoded sting tokens. As part of the development, we propose a novel vector to string encoding method, which is shown to substantially outperform the previous ones in terms of both precision and latency. First-hand experiences in implementing this Elasticsearch-based platform are extensively addressed, which should be valuable to practitioners also interested in building visual search engine on top of Elasticsearch.

DALEX: explainers for complex predictive models

Predictive modeling is invaded by elastic, yet complex methods such as neural networks or ensembles (model stacking, boosting or bagging). Such methods are usually described by a large number of parameters or hyper parameters – a price that one needs to pay for elasticity. The very number of parameters makes models hard to understand. This paper describes a consistent collection of explainers for predictive models, a.k.a. black boxes. Each explainer is a technique for exploration of a black box model. Presented approaches are model-agnostic, what means that they extract useful information from any predictive method despite its internal structure. Each explainer is linked with a specific aspect of a model. Some are useful in decomposing predictions, some serve better in understanding performance, while others are useful in understanding importance and conditional responses of a particular variable. Every explainer presented in this paper works for a single model or for a collection of models. In the latter case, models can be compared against each other. Such comparison helps to find strengths and weaknesses of different approaches and gives additional possibilities for model validation. Presented explainers are implemented in the DALEX package for R. They are based on a uniform standardized grammar of model exploration which may be easily extended. The current implementation supports the most popular frameworks for classification and regression.

Multilevel Wavelet Decomposition Network for Interpretable Time Series Analysis

Recent years have witnessed the unprecedented rising of time series from almost all kindes of academic and industrial fields. Various types of deep neural network models have been introduced to time series analysis, but the important frequency information is yet lack of effective modeling. In light of this, in this paper we propose a wavelet-based neural network structure called multilevel Wavelet Decomposition Network (mWDN) for building frequency-aware deep learning models for time series analysis. mWDN preserves the advantage of multilevel discrete wavelet decomposition in frequency learning while enables the fine-tuning of all parameters under a deep neural network framework. Based on mWDN, we further propose two deep learning models called Residual Classification Flow (RCF) and multi-frequecy Long Short-Term Memory (mLSTM) for time series classification and forecasting, respectively. The two models take all or partial mWDN decomposed sub-series in different frequencies as input, and resort to the back propagation algorithm to learn all the parameters globally, which enables seamless embedding of wavelet-based frequency analysis into deep learning frameworks. Extensive experiments on 40 UCR datasets and a real-world user volume dataset demonstrate the excellent performance of our time series models based on mWDN. In particular, we propose an importance analysis method to mWDN based models, which successfully identifies those time-series elements and mWDN layers that are crucially important to time series analysis. This indeed indicates the interpretability advantage of mWDN, and can be viewed as an indepth exploration to interpretable deep learning.

Optimizing the Wisdom of the Crowd: Inference, Learning, and Teaching

The unprecedented demand for large amount of data has catalyzed the trend of combining human insights with machine learning techniques, which facilitate the use of crowdsourcing to enlist label information both effectively and efficiently. The classic work on crowdsourcing mainly focuses on the label inference problem under the categorization setting. However, inferring the true label requires sophisticated aggregation models that usually can only perform well under certain assumptions. Meanwhile, no matter how complicated the aggregation model is, the true model that generated the crowd labels remains unknown. Therefore, the label inference problem can never infer the ground truth perfectly. Based on the fact that the crowdsourcing labels are abundant and utilizing aggregation will lose such kind of rich annotation information (e.g., which worker provided which labels), we believe that it is critical to take the diverse labeling abilities of the crowdsourcing workers as well as their correlations into consideration. To address the above challenge, we propose to tackle three research problems, namely inference, learning, and teaching.

Deductron – A Recurrent Neural Network

The current paper is a study in Recurrent Neural Networks (RNN), motivated by the lack of examples simple enough so that they can be thoroughly understood theoretically, but complex enough to be realistic. We constructed an example of structured data, motivated by problems from image-to-text conversion (OCR), which requires long-term memory to decode. Our data is a simple writing system, encoding characters ‘X’ and ‘O’ as their upper halves, which is possible due to symmetry of the two characters. The characters can be connected, as in some languages using cursive, such as Arabic (abjad). The string ‘XOOXXO’ may be encoded as ‘{\vee}{\wedge}\kern-1.5pt{\wedge}{\vee}\kern-1.5pt{\vee}{\wedge}‘. It follows that we may need to know arbitrarily long past to decode a current character, thus requiring long-term memory. Subsequently we constructed an RNN capable of decoding sequences encoded in this manner. Rather than by training, we constructed our RNN ‘by inspection’, i.e. we guessed its weights. This involved a sequence of steps. We wrote a conventional program which decodes the sequences as the example above. Subsequently, we interpreted the program as a neural network (the only example of this kind known to us). Finally, we generalized this neural network to discover a new RNN architecture whose instance is our handcrafted RNN. It turns out to be a 3 layer network, where the middle layer is capable of performing simple logical inferences; thus the name ‘deductron’. It is demonstrated that it is possible to train our network by simulated annealing. Also, known variants of stochastic gradient descent (SGD) methods are shown to work.

A breakpoint detection in the mean model with heterogeneous variance on fixed time-intervals

This work is motivated by an application for the homogeneization of GNSS-derived IWV (Integrated Water Vapour) series. Indeed, these GPS series are affected by abrupt changes due to equipment changes or environmental effects. The detection and correction of the series from these changes is a crucial step before any use for climate studies. In addition to these abrupt changes, it has been observed in the series a non-stationary of the variability. We propose in this paper a new segmentation model that is a breakpoint detection in the mean model of a Gaussian process with heterogeneous variance on known time-intervals. In this segmentation case, the dynamic programming (DP) algorithm used classically to infer the breakpoints can not be applied anymore. We propose a procedure in two steps: we first estimate robustly the variances and then apply the classical inference by plugging these estimators. The performance of our proposed procedure is assessed through simulation experiments. An application to real GNSS data is presented.

Variational Wasserstein Clustering

We propose a new clustering method based on optimal transportation. We solve optimal transportation with variational principles and investigate the use of power diagrams as transportation plans for aggregating arbitrary domains into a fixed number of clusters. We iteratively drive centroids through target domains while maintaining the minimum clustering energy by adjusting the power diagrams. Thus, we simultaneously pursue clustering and the Wasserstein distances between centroids and target domains, resulting in a robust measure-preserving mapping. In general, there are two approaches for solving optimal transportation problem — Kantorovich’s v.s. Brenier’s. While most researchers focus on Kantorovich’s approach, we propose a solution to clustering problem following Brenier’s approach and achieve a competitive result with the state-of-the-art method. We demonstrate our applications to different areas such as domain adaptation, remeshing, and representation learning on synthetic and real data.

DARTS: Differentiable Architecture Search

This paper addresses the scalability challenge of architecture search by formulating the task in a differentiable manner. Unlike conventional approaches of applying evolution or reinforcement learning over a discrete and non-differentiable search space, our method is based on the continuous relaxation of the architecture representation, allowing efficient search of the architecture using gradient descent. Extensive experiments on CIFAR-10, ImageNet, Penn Treebank and WikiText-2 show that our algorithm excels in discovering high-performance convolutional architectures for image classification and recurrent architectures for language modeling, while being orders of magnitude faster than state-of-the-art non-differentiable techniques.

Efficient Graph Compression Using Huffman Coding Based Techniques
Self-Driving Vehicle Verification Towards a Benchmark
Optimal Seeding and Self-Reproduction from a Mathematical Point of View
Interpreting Embedding Models of Knowledge Bases: A Pedagogical Approach
Minimum degree and size conditions for the proper connection number of graphs
Target Contrastive Pessimistic Discriminant Analysis
Learning K-way D-dimensional Discrete Codes for Compact Embedding Representations
A proposed solution for analysis management in high energy physics
Optimized Video Streaming over Cloud: A Stall-Quality Trade-off
Using NLP on news headlines to predict index trends
Domination and regularity
Estimating the treatment effect in a subgroup defined by an early post-baseline biomarker measurement in randomized clinical trials with time-to-event endpoint
Complexity Matching and Requisite Variety
Packing and covering directed triangles
An Annealed Sequential Monte Carlo Method for Bayesian Phylogenetics
Augmented Reality-based Feedback for Technician-in-the-loop C-arm Repositioning
Robust Resource Allocation for MISO Cognitive Radio Networks Under Two Practical Non-Linear Energy Harvesting Models
Forecasting Internally Displaced Population Migration Patterns in Syria and Yemen
Staircase-PIR: Universally Robust Private Information Retrieval
Diffusion Scattering Transforms on Graphs
Smart Inverter Grid Probing for Learning Loads: Part I – Identifiability Analysis
Domain Adaptation for Infection Prediction from Symptoms Based on Data from Different Study Designs and Contexts
Smart Inverter Grid Probing for Learning Loads: Part II – Probing Injection Design
Rearrangement and Prekopa-Leindler type inequalities
Bayesian Optimization of Combinatorial Structures
Deep SNP: An End-to-end Deep Neural Network with Attention-based Localization for Break-point Detection in SNP Array Genomic data
The Meeting Time of Multiple Random Walks
A linear state feedback switching rule for global stabilization of switched nonlinear systems about a nonequilibrium point
On the Design of Multi-Dimensional Compactly Supported Parseval Framelets with Directional Characteristics
Finding Certain Arithmetic Progressions in 2-Coloured Cyclic Groups
Multi-Task Handwritten Document Layout Analysis
RUC+CMU: System Report for Dense Captioning Events in Videos
Proof of the Erdős Matching Conjecture in a New Range
A deep learning framework for segmentation of retinal layers from OCT images
GONet++: Traversability Estimation via Dynamic Scene View Synthesis
A Nearly-Linear Bound for Chasing Nested Convex Bodies
Model-Predictive Control with Reference Input Tracking for Tensegrity Spine Robots
A Lagrange decomposition based Branch and Bound algorithm for the Optimal Mapping of Cloud Virtual Machines
Dynamical quantum phase transitions in the random field Ising model
Technical Report on Optimal Link Scheduling in Millimeter Wave Multi-hop Networks with Space Division Multiple Access and Multiplexing
On the critical region of long-range depinning transitions
A Note on Minimal Senders
Note on the multicolour size-Ramsey number for paths
The Sparse Manifold Transform
Emotion Representation Mapping for Automatic Lexicon Construction (Mostly) Performs on Human Level
Synchronization, Consensus of Complex Networks and Lyapunov Function Approach
MRAttractor: Detecting Communities from Large-Scale Graphs
Optimizing Lossy Compression Rate-Distortion from Automatic Online Selection between SZ and ZFP
Temporal Activity Path Based Character Correction in Social Networks
Privacy-Protective-GAN for Face De-identification
Zeta Distribution and Transfer Learning Problem
Search Rank Fraud De-Anonymization in Online Systems
Overlapping Sliced Inverse Regression for Dimension Reduction
An Inductive Formalization of Self Reproduction in Dynamical Hierarchies
Hankel determinants and shifted periodic continued fractions
Communications, Caching and Computing for Mobile Virtual Reality: Modeling and Tradeoff
The second term for two-neighbour bootstrap percolation in two dimensions
Approximating some network problems with scenarios
A Recursive PLS (Partial Least Squares) based Approach for Enterprise Threat Management
Adjacency Matrix and Energy of the Line Graph of $Γ(\mathbb{Z}_n)$
On principal frequencies and isoperimetric ratios in convex sets
Experimental characterization of transitions between locking regimes in a laser system with weak periodic forcing
A local large deviation principle for inhomogeneous birth-death processes
Social Centrality using Network Hierarchy and Community Structure
Veldkamp Spaces of Low-Dimensional Ternary Segre Varieties
$\bf{C^{1,1}}$-smoothness of constrained solutions in the calculus of variations with application to mean field games
A Modulo-Based Architecture for Analog-to-Digital Conversion
Evaluation of Momentum Diverse Input Iterative Fast Gradient Sign Method (M-DI2-FGSM) Based Attack Method on MCS 2018 Adversarial Attacks on Black Box Face Recognition System
Almost optimal Boolean matrix multiplication [BMM]-by multi-encoding of rows and columns
Explainable Fashion Recommendation with Joint Outfit Matching and Comment Generation
Inferring Metapopulation Propagation Network for Intra-city Epidemic Control and Prevention
Retweet Us, We Will Retweet You: Spotting Collusive Retweeters Involved in Blackmarket Services
Extracting Tree-structures in CT data by Tracking Multiple Statistically Ranked Hypotheses
An Improved Generic Bet-and-Run Strategy for Speeding Up Stochastic Local Search
Affine stochastic equation with triangular matrices
On the support of solutions of stochastic differential equations with path-dependent coefficients
Stroke-based Character Recognition with Deep Reinforcement Learning
Leveraging Implicit Spatial Information in Global Features for Image Retrieval
List Decodability of Symbol-Pair Codes
Hiding the start of Brownian motion: towards a Bayesian analysis of privacy for GPS trajectories
Model predictive control of indoor microclimate: existing building stock comfort improvement
On Markov chain Monte Carlo for sparse and filamentary distributions
An Analysis of Uplink Asynchronous Non-Orthogonal Multiple Access Systems
The problem of optimal location of production points and distribution points in the vertices of the transportation network as an investment project
Minimax Optimum Clock Skew and Offset Estimators for IEEE 1588
Evaluating Gammatone Frequency Cepstral Coefficients with Neural Networks for Emotion Recognition from Speech
Sum-Rate Performance of Millimeter Wave MIMO Shared Spectrum Systems
Assumption Lean Regression
Considerations for a PAP Smear Image Analysis System with CNN Features
Residence Time Near an Absorbing Set
Improving Text-to-SQL Evaluation Methodology
On Adversarial Examples for Character-Level Neural Machine Translation
Weak and strong well-posedness of critical and supercritical SDEs with singular coefficients
Defending Malware Classification Networks Against Adversarial Perturbations with Non-Negative Weight Restrictions
Doubly transitive lines I: Higman pairs and roux
Parallel Transport Unfolding: A Connection-based Manifold Learning Approach
Disease Classification in Metagenomics with 2D Embeddings and Deep Learning
A classification point-of-view about conditional Kendall’s tau
In-situ Stochastic Training of MTJ Crossbar based Neural Networks
Disentangled VAE Representations for Multi-Aspect and Missing Data
How LinkedIn Economic Graph Bonds Information and Product: Applications in LinkedIn Salary
A multi-channel DAQ system based on FPGA for long-distance transmission in nuclear physics experiments
A simple bijection for classical and enhanced k-noncrossing partitions
Prototype of Front-end Electronics for PandaX-4ton Experiment
Generative Models for Pose Transfer
Walrasian Equilibrium and Centralized Distributed Optimization from the point of view of Modern Convex Optimization Methods on the Example of Resource Allocation Problem
CT-image Super Resolution Using 3D Convolutional Neural Network
Distributed Edge Caching in Ultra-dense Fog Radio Access Networks: A Mean Field Approach
Beyond Backprop: Alternating Minimization with co-Activation Memory
CNN-based Action Recognition and Supervised Domain Adaptation on 3D Body Skeletons via Kernel Feature Maps
Development of a 256-channel Time-of-flight Electronics System For Neutron Beam Profiling
Measuring News Similarity Across Ten U.S. News Sites
Artwork Identification from Wearable Camera Images for Enhancing Experience of Museum Audiences
Multivector variate distributions: An application in Finance
The CLT in high dimensions: quantitative bounds via martingale embedding
Character-Level Feature Extraction with Densely Connected Networks
Segmentation of Overlapped Steatosis in Whole-Slide Liver Histopathology Microscopy Images
An Input-Output Approach to Structured Stochastic Uncertainty in Continuous Time
Analysis of Cellular Feature Differences of Astrocytomas with Distinct Mutational Profiles Using Digitized Histopathology Images
Decentralized Asynchronous Coded Caching in Fog-RAN
Graph-based Cooperative Caching in Fog-RAN
Modeling Multi-turn Conversation with Deep Utterance Aggregation
Subword-augmented Embedding for Cloze Reading Comprehension
Accuracy Analysis for Distributed Weighted Least-Squares Estimation in Finite Steps and Loopy Networks
One-shot Learning for Question-Answering in Gaokao History Challenge
The Electronics Design of Error Field Feedback Control System in KTX