Decomposing an information stream into the principal components

We propose an approach to decomposing a thematic information stream into principal components. Each principal component is related to a narrow topic extracted from the information stream. The essence of the approach arises from analogy with the Fourier transform. We examine methods for analyzing the principal components and propose using multifractal analysis for identifying similar topics. The decomposition technique is applied to the information stream dedicated to Brexit. We provide a comparison between the principal components obtained by applying the decomposition to Brexit stream and the related topics extracted by Google Trends.

The Deep Kernelized Autoencoder

Autoencoders learn data representations (codes) in such a way that the input is reproduced at the output of the network. However, it is not always clear what kind of properties of the input data need to be captured by the codes. Kernel machines have experienced great success by operating via inner-products in a theoretically well-defined reproducing kernel Hilbert space, hence capturing topological properties of input data. In this paper, we enhance the autoencoder’s ability to learn effective data representations by aligning inner products between codes with respect to a kernel matrix. By doing so, the proposed kernelized autoencoder allows learning similarity-preserving embeddings of input data, where the notion of similarity is explicitly controlled by the user and encoded in a positive semi-definite kernel matrix. Experiments are performed for evaluating both reconstruction and kernel alignment performance in classification tasks and visualization of high-dimensional data. Additionally, we show that our method is capable to emulate kernel principal component analysis on a denoising task, obtaining competitive results at a much lower computational cost.

Doubly Stochastic Adversarial Autoencoder

Any autoencoder network can be turned into a generative model by imposing an arbitrary prior distribution on its hidden code vector. Variational Autoencoder (VAE) [2] uses a KL divergence penalty to impose the prior, whereas Adversarial Autoencoder (AAE) [1] uses {\it generative adversarial networks} GAN [3]. GAN trades the complexities of {\it sampling} algorithms with the complexities of {\it searching} Nash equilibrium in minimax games. Such minimax architectures get trained with the help of data examples and gradients flowing through a generator and an adversary. A straightforward modification of AAE is to replace the adversary with the maximum mean discrepancy (MMD) test [4-5]. This replacement leads to a new type of probabilistic autoencoder, which is also discussed in our paper. We propose a novel probabilistic autoencoder in which the adversary of AAE is replaced with a space of {\it stochastic} functions. This replacement introduces a new source of randomness, which can be considered as a continuous control for encouraging {\it explorations}. This prevents the adversary from fitting too closely to the generator and therefore leads to a more diverse set of generated samples.

Sparse space-time models: Concentration Inequalities and Lasso

Inspired by Kalikow-type decompositions, we introduce a new stochastic model of infinite neuronal networks, for which we establish oracle inequalities for Lasso methods and restricted eigenvalue properties for the associated Gram matrix with high probability. These results hold even if the network is only partially observed. The main argument rely on the fact that concentration inequalities can easily be derived whenever the transition probabilities of the underlying process admit a sparse space-time representation.

Rapid Time Series Prediction with a Hardware-Based Reservoir Computer

Reservoir computing is a neural network approach for processing time-dependent signals that has seen rapid development in recent years. Physical implementations of the technique using optical reservoirs have demonstrated remarkable accuracy and processing speed at benchmark tasks. However, these approaches require an electronic output layer to maintain high performance, which limits their use in tasks such as time-series prediction, where the output is fed back into the reservoir. We present here a reservoir computing scheme that has rapid processing speed both by the reservoir and the output layer. The reservoir is realized by an autonomous, time-delay, Boolean network configured on a field-programmable gate array. We investigate the dynamical properties of the network and observe the fading memory property that is critical for successful reservoir computing. We demonstrate the utility of the technique by training a reservoir to learn the short- and long-term behavior of a chaotic system. We find accuracy comparable to state-of-the-art software approaches of similar network size, but with a superior real-time prediction rate up to 160 MHz.

gSMat: A Scalable Sparse Matrix-based Join for SPARQL Query Processing

Resource Description Framework (RDF) has been widely used to represent information on the web, while SPARQL is a standard query language to manipulate RDF data. Given a SPARQL query, there often exist many joins which are the bottlenecks of efficiency of query processing. Besides, the real RDF datasets often reveal strong data sparsity, which indicates that a resource often only relates to a few resources even the number of total resources is large. In this paper, we propose a sparse matrix-based (SM-based) SPARQL query processing approach over RDF datasets which con- siders both join optimization and data sparsity. Firstly, we present a SM-based storage for RDF datasets to lift the storage efficiency, where valid edges are stored only, and then introduce a predicate- based hash index on the storage. Secondly, we develop a scalable SM-based join algorithm for SPARQL query processing. Finally, we analyze the overall cost by accumulating all intermediate results and design a query plan generated algorithm. Besides, we extend our SM-based join algorithm on GPU for parallelizing SPARQL query processing. We have evaluated our approach compared with the state-of-the-art RDF engines over benchmark RDF datasets and the experimental results show that our proposal can significantly improve SPARQL query processing with high scalability.

Apache Spark Streaming and HarmonicIO: A Performance and Architecture Comparison

Studies have demonstrated that Apache Spark, Flink and related frameworks can perform stream processing at very high frequencies, whilst tending to focus on small messages with a computationally light `map’ stage for each message; a common enterprise use case. We add to these benchmarks by broadening the domain to include loads with larger messages (leading to network-bound throughput), and that are computationally intensive (leading to CPU-bound throughput) in the map phase; in order to evaluate applicability of these frameworks to scientific computing applications. We present a performance benchmark comparison between Apache Spark Streaming (ASS) under both file and TCP streaming modes; and HarmonicIO, comparing maximum throughput over a broad domain of message sizes and CPU loads. We find that relative performance varies considerably across this domain, with the chosen means of stream source integration having a big impact. We offer recommendations for choosing and configuring the frameworks, and present a benchmarking toolset developed for this study.

Learning Representations for Soft Skill Matching

Employers actively look for talents having not only specific hard skills but also various soft skills. To analyze the soft skill demands on the job market, it is important to be able to detect soft skill phrases from job advertisements automatically. However, a naive matching of soft skill phrases can lead to false positive matches when a soft skill phrase, such as friendly, is used to describe a company, a team, or another entity, rather than a desired candidate. In this paper, we propose a phrase-matching-based approach which differentiates between soft skill phrases referring to a candidate vs. something else. The disambiguation is formulated as a binary text classification problem where the prediction is made for the potential soft skill based on the context where it occurs. To inform the model about the soft skill for which the prediction is made, we develop several approaches, including soft skill masking and soft skill tagging. We compare several neural network based approaches, including CNN, LSTM and Hierarchical Attention Model. The proposed tagging-based input representation using LSTM achieved the highest recall of 83.92% on the job dataset when fixing a precision to 95%.

Twitter Sentiment Analysis System

Social media is increasingly used by humans to express their feelings and opinions in the form of short text messages. Detecting sentiments in the text has a wide range of applications including identifying anxiety or depression of individuals and measuring well-being or mood of a community. Sentiments can be expressed in many ways that can be seen such as facial expression and gestures, speech and by written text. Sentiment Analysis in text documents is essentially a content-based classification problem involving concepts from the domains of Natural Language Processing as well as Machine Learning. In this paper, sentiment recognition based on textual data and the techniques used in sentiment analysis are discussed.

A Generalized Vector Space Model for Ontology-Based Information Retrieval

Named entities (NE) are objects that are referred to by names such as people, organizations and locations. Named entities and keywords are important to the meaning of a document. We propose a generalized vector space model that combines named entities and keywords. In the model, we take into account different ontological features of named entities, namely, aliases, classes and identifiers. Moreover, we use entity classes to represent the latent information of interrogative words in Wh-queries, which are ignored in traditional keyword-based searching. We have implemented and tested the proposed model on a TREC dataset, as presented and discussed in the paper.

Escaping the Curse of Dimensionality in Similarity Learning: Efficient Frank-Wolfe Algorithm and Generalization Bounds

Similarity and metric learning provides a principled approach to construct a task-specific similarity from weakly supervised data. However, these methods are subject to the curse of dimensionality: as the number of features grows large, poor generalization is to be expected and training becomes intractable due to high computational and memory costs. In this paper, we propose a similarity learning method that can efficiently deal with high-dimensional sparse data. This is achieved through a parameterization of similarity functions by convex combinations of sparse rank-one matrices, together with the use of a greedy approximate Frank-Wolfe algorithm which provides an efficient way to control the number of active features. We show that the convergence rate of the algorithm, as well as its time and memory complexity, are independent of the data dimension. We further provide a theoretical justification of our modeling choices through an analysis of the generalization error, which depends logarithmically on the sparsity of the solution rather than on the number of features. Our experiments on datasets with up to one million features demonstrate the ability of our approach to generalize well despite the high dimensionality as well as its superiority compared to several competing methods.

The Sliding Window Discrete Fourier Transform

This paper introduces a new tool for time-series analysis: the Sliding Window Discrete Fourier Transform (SWDFT). The SWDFT is especially useful for time-series with local- in-time periodic components. We define a 5-parameter model for noiseless local periodic signals, then study the SWDFT of this model. Our study illustrates several key concepts crucial to analyzing time-series with the SWDFT, in particular Aliasing, Leakage, and Ringing. We also show how these ideas extend to R > 1 local periodic components, using the linearity property of the Fourier transform. Next, we propose a simple procedure for estimating the 5 parameters of our local periodic signal model using the SWDFT. Our estimation procedure speeds up computation by using a trigonometric identity that linearizes estimation of 2 of the 5 parameters. We conclude with a very small Monte Carlo simulation study of our estimation procedure under different levels of noise.

Competition vs. Concatenation in Skip Connections of Fully Convolutional Networks

Increased information sharing through short and long-range skip connections between layers in fully convolutional networks have demonstrated significant improvement in performance for semantic segmentation. In this paper, we propose Competitive Dense Fully Convolutional Networks (CDFNet) by introducing competitive maxout activations in place of naive feature concatenation for inducing competition amongst layers. Within CDFNet, we propose two architectural contributions, namely competitive dense block (CDB) and competitive unpooling block (CUB) to induce competition at local and global scales for short and long-range skip connections respectively. This extension is demonstrated to boost learning of specialized sub-networks targeted at segmenting specific anatomies, which in turn eases the training of complex tasks. We present the proof-of-concept on the challenging task of whole body segmentation in the publicly available VISCERAL benchmark and demonstrate improved performance over multiple learning and registration based state-of-the-art methods.

Interactive Supercomputing on 40,000 Cores for Machine Learning and Data Analysis

Interactive massively parallel computations are critical for machine learning and data analysis. These computations are a staple of the MIT Lincoln Laboratory Supercomputing Center (LLSC) and has required the LLSC to develop unique interactive supercomputing capabilities. Scaling interactive machine learning frameworks, such as TensorFlow, and data analysis environments, such as MATLAB/Octave, to tens of thousands of cores presents many technical challenges – in particular, rapidly dispatching many tasks through a scheduler, such as Slurm, and starting many instances of applications with thousands of dependencies. Careful tuning of launches and prepositioning of applications overcome these challenges and allow the launching of thousands of tasks in seconds on a 40,000-core supercomputer. Specifically, this work demonstrates launching 32,000 TensorFlow processes in 4 seconds and launching 262,000 Octave processes in 40 seconds. These capabilities allow researchers to rapidly explore novel machine learning architecture and data analysis algorithms.

An Operational Approach to Information Leakage

Given two random variables X and Y, an operational approach is undertaken to quantify the “leakage” of information from X to Y. The resulting measure \mathcal{L}(X \!\! \to \!\! Y) is called \emph{maximal leakage}, and is defined as the multiplicative increase, upon observing Y, of the probability of correctly guessing a randomized function of X, maximized over all such randomized functions. A closed-form expression for \mathcal{L}(X \!\! \to \!\! Y) is given for discrete X and Y, and it is subsequently generalized to handle a large class of random variables. The resulting properties are shown to be consistent with an axiomatic view of a leakage measure, and the definition is shown to be robust to variations in the setup. Moreover, a variant of the Shannon cipher system is studied, in which performance of an encryption scheme is measured using maximal leakage. A single-letter characterization of the optimal limit of (normalized) maximal leakage is derived and asymptotically-optimal encryption schemes are demonstrated. Furthermore, the sample complexity of estimating maximal leakage from data is characterized up to subpolynomial factors. Finally, the \emph{guessing} framework used to define maximal leakage is used to give operational interpretations of commonly used leakage measures, such as Shannon capacity, maximal correlation, and local differential privacy.

Semi-Generative Modelling: Domain Adaptation with Cause and Effect Features

This paper presents a novel, causally-inspired approach to domain adaptation which aims to also include unlabelled data in the model fitting when labelled data is scarce. We consider a case of covariate-shift adaptation with cause and effect features, and–drawing from recent ideas in causal modelling and invariant prediction–show how this setting leads to, what we will refer to as, a semi-generative model: P(Y, X_eff; X_cau,\theta). Our proposed approach is robust to changes in the distribution over causal features, and naturally allows to impose model constraints by unsupervised learning of a map from causes to effects. In experiments on synthetic datasets we demonstrate a significant improvement in classification performance of our semi-generative model over purely-supervised and importance-weighting baselines when the amount of labelled data is small. Moreover, we apply our approach for regression on real-world protein-count data and compare it to feature transformation methods.

Boosting algorithms for uplift modeling

Uplift modeling is an area of machine learning which aims at predicting the causal effect of some action on a given individual. The action may be a medical procedure, marketing campaign, or any other circumstance controlled by the experimenter. Building an uplift model requires two training sets: the treatment group, where individuals have been subject to the action, and the control group, where no action has been performed. An uplift model allows then to assess the gain resulting from taking the action on a given individual, such as the increase in probability of patient recovery or of a product being purchased. This paper describes an adaptation of the well-known boosting techniques to the uplift modeling case. We formulate three desirable properties which an uplift boosting algorithm should have. Since all three properties cannot be satisfied simultaneously, we propose three uplift boosting algorithms, each satisfying two of them. Experiments demonstrate the usefulness of the proposed methods, which often dramatically improve performance of the base models and are thus new and powerful tools for uplift modeling.

Optimize Deep Convolutional Neural Network with Ternarized Weights and High Accuracy

Deep convolution neural network has achieved great success in many artificial intelligence applications. However, its enormous model size and massive computation cost have become the main obstacle for deployment of such powerful algorithm in the low power and resource-limited embedded systems. As the countermeasure to this problem, in this work, we propose statistical weight scaling and residual expansion methods to reduce the bit-width of the whole network weight parameters to ternary values (i.e. -1, 0, +1), with the objectives to greatly reduce model size, computation cost and accuracy degradation caused by the model compression. With about 16x model compression rate, our ternarized ResNet-32/44/56 could outperform full-precision counterparts by 0.12%, 0.24% and 0.18% on CIFAR- 10 dataset. We also test our ternarization method with AlexNet and ResNet-18 on ImageNet dataset, which both achieve the best top-1 accuracy compared to recent similar works, with the same 16x compression rate. If further incorporating our residual expansion method, compared to the full-precision counterpart, our ternarized ResNet-18 even improves the top-5 accuracy by 0.61% and merely degrades the top-1 accuracy only by 0.42% for the ImageNet dataset, with 8x model compression rate. It outperforms the recent ABC-Net by 1.03% in top-1 accuracy and 1.78% in top-5 accuracy, with around 1.25x higher compression rate and more than 6x computation reduction due to the weight sparsity.

Explicit inverse of nonsingular Jacobi matrices
Field-Trial of Machine Learning-Assisted Quantum Key Distribution (QKD) Networking with SDN
Socio-Material Network Analysis: A Mixed Method Study of Five European Artistic Collectives
Battery Swapping Station as an Energy Storage for Capturing Distribution-Integrated Solar Variability
When Do Gomory-Hu Subtrees Exist
About BIRDS project (Bioinformatics and Information Retrieval Data Structures Analysis and Design)
Output Selection and Observer Design for Boolean Control Networks: A Sub-Optimal Polynomial-Complexity Algorithm
Fully Convolutional Pixel Adaptive Image Denoiser
Thouless and Relaxation Time Scales in Many-Body Quantum Systems
Spectral characterizations of anti-regular graphs
The colored longest common prefix array computed via sequential scans
$\mathcal W^{α, p}$ and $C^{0,γ}$ regularity of solutions to $(μ- Δ+ b \cdot \nabla)u=f$ with form-bounded vector fields
Avoiding 3-Term Geometric Progressions in Non-Commutative Settings
Unrolling Swiss Cheese: Metric repair on manifolds with holes
Adaptive Variational Particle Filtering in Non-stationary Environments
Derivation degree sequences of non-free arrangements
Generalized Metric Repair on Graphs
Approximate Collapsed Gibbs Clustering with Expectation Propagation
Distributed Power Control Schemes for In-Band Full-Duplex Energy Harvesting Wireless Networks
An Optimal Algorithm for Stochastic and Adversarial Bandits
The Limiting Eigenvalue Distribution of Iterated k-Regular Graph Cylinders
Quasi-polynomial time approximation schemes for packing and covering problems in planar graphs
Multi-Resolution Hashing for Fast Pairwise Summations
Coloring in Graph Streams
Distributed approximation algorithms for maximum matching in graphs and hypergraphs
On Chebotarëv’s nonvanishing minors theorem and the Biró-Meshulam-Tao discrete uncertainty principle
Near-Optimal Distributed Estimation for a Network of Sensing Units Operating Under Communication Constraints
Deriving star cluster parameters by convolutional neural networks. I. Age, mass, and size
Automatically Designing CNN Architectures for Medical Image Segmentation
Multitask Reinforcement Learning for Zero-shot Generalization with Subtask Dependencies
Statistical generalized derivative applied to the profile likelihood estimation in a mixture of semiparametric models
Bounding Box Embedding for Single Shot Person Instance Segmentation
Arithmetic aspects of symmetric edge polytopes
Generalized Stochastic Frank-Wolfe Algorithm with Stochastic ‘Substitute” Gradient for Structured Convex Optimization
Optimal Co-design of Industrial Networked Control Systems with State-dependent Correlated Fading Channels
Exact minimum number of bits to stabilize a linear system
Toward Characteristic-Preserving Image-based Virtual Try-On Network
Automatic Semantic Content Removal by Learning to Neglect
Wild Residual Bootstrap Inference for Penalized Quantile Regression with Heteroscedastic Errors
Editable Generative Adversarial Networks: Generating and Editing Faces Simultaneously
PhaseStain: Digital staining of label-free quantitative phase microscopy images using deep learning
Efficient Probabilistic Inference in the Quest for Physics Beyond the Standard Model
Bi-Directional Cooperative NOMA without Full CSIT
Blind Signal Classification for Non-Orthogonal Multiple Access in Internet-of-Things Networks
Brain Tumor Segmentation and Tractographic Feature Extraction from Structural MR Images for Overall Survival Prediction
Efficient Facial Representations for Age, Gender and Identity Recognition in Organizing Photo Albums using Multi-output CNN
On Euclidean Methods for Cubic and Quartic Jacobi Symbols
Hitting time, access time and optimal transport on graphs
Stability in EMU
Impurity coupled to a lattice with disorder
Simplicial complexes and complex systems
Breaking of ensemble equivalence for perturbed Erdös-Rényi random graphs
Learning the effect of latent variables in Gaussian Graphical models with unobserved variables
Stable MPC Design for Hybrid Mixed Logical Dynamical Systems: $l_{\infty}$-based Lyapunov Approach
Improving Image Clustering With Multiple Pretrained CNN Feature Extractors
Controllability of Social Networks and the Strategic Use of Random Information
Novel flow field design for vanadium redox flow batteries via topology optimization
Concentration of measure for finite spin systems
Above Modelling and Adaptive Control of a Double Winded Induction Generator
Physical Adversarial Examples for Object Detectors
Wind Energy Conversion System – a Laboratory Setup
Principal Flow Patterns across renewable electricity networks
Non universality of fluctuations of outlier eigenvectors for block diagonal deformations of Wigner matrices
Semantic Document Clustering on Named Entity Features
Dialectical GAN for SAR Image Translation: From Sentinel-1 to TerraSAR-X
Lesion Saliency for Weakly Supervised Lesion Detection from Breast DCE-MRI
Fast transforms over finite fields of characteristic two
Kripke Semantics of the Perfectly Transparent Equilibrium
3D-LMNet: Latent Embedding Matching for Accurate and Diverse 3D Point Cloud Reconstruction from a Single Image
Finding Structure in Dynamic Networks
Continuous-Time Accelerated Methods via a Hybrid Control Lens
A characterization and an application of weight-regular partitions of graphs
Semiparametric Inference and Lower Bounds for Real Elliptically Symmetric Distributions
A Theil-like Class of Inequality Measures, its Asymptotic Normality Theory and Applications
Effect of disorder and noise in shaping the dynamics of power grids
On SOCP/SDP formulation of the extended trust region subproblem
Wireless Multi-Sensor Networks for Smart Cities: A Prototype System with Statistical Data Analysis
A Norm-Bounded based MPC strategy for uncertain systems under partial state availability
Towards functorial language-games
Biclustering Using Modified Matrix Bandwidth Minimization and Biogeography-based Optimization
Rank Minimization for Snapshot Compressive Imaging
TESSERACT: Eliminating Experimental Bias in Malware Classification across Space and Time
Distance-based Kernels for Surrogate Model-based Neuroevolution
On Synchronization of Dynamical Systems over Directed Switching Topologies: An Algebraic and Geometric Perspective
Self-regulation promotes cooperation in social networks
Surgical Phase Recognition of Short Video Shots Based on Temporal Modeling of Deep Features
Talking Face Generation by Adversarially Disentangled Audio-Visual Representation
From Face Recognition to Models of Identity: A Bayesian Approach to Learning about Unknown Identities from Unsupervised Data
On Secure Transmission Design: An Information Leakage Perspective
On a Loss-based prior for the number of components in mixture models
Complex Economic Activities Concentrate in Large Cities
Submodular Maximization with Optimal Approximation, Adaptivity and Query Complexity
Self-stabilization Overhead: an Experimental Case Study on Coded Atomic Storage
The Iterated Weighted Least-Squares Fit
Optimal Bounds on the VC-dimension
Asymptotic results under multiway clustering
Photorealistic Video Super Resolution
On discrete-time semi-Markov processes
Large scale evaluation of local image feature detectors on homography datasets
$(k,λ)$-Anti-Powers and Other Patterns in Words
Future Semantic Segmentation with Convolutional LSTM
Decentralized Task Allocation in Multi-Robot Systems via Bipartite Graph Matching Augmented with Fuzzy Clustering