Speech2Vec: A Sequence-to-Sequence Framework for Learning Word Embeddings from Speech

In this paper, we propose a novel deep neural network architecture, Speech2Vec, for learning fixed-length vector representations of audio segments excised from a speech corpus, where the vectors contain semantic information pertaining to the underlying spoken words, and are close to other vectors in the embedding space if their corresponding underlying spoken words are semantically similar. The proposed model can be viewed as a speech version of Word2Vec. Its design is based on a RNN Encoder-Decoder framework, and borrows the methodology of skipgrams or continuous bag-of-words for training. Learning word embeddings directly from speech enables Speech2Vec to make use of the semantic information carried by speech that does not exist in plain text. The learned word embeddings are evaluated and analyzed on 13 widely used word similarity benchmarks, and outperform word embeddings learned by Word2Vec from the transcriptions.

Automated Evaluation of Out-of-Context Errors

We present a new approach to evaluate computational models for the task of text understanding by the means of out-of-context error detection. Through the novel design of our automated modification process, existing large-scale data sources can be adopted for a vast number of text understanding tasks. The data is thereby altered on a semantic level, allowing models to be tested against a challenging set of modified text passages that require to comprise a broader narrative discourse. Our newly introduced task targets actual real-world problems of transcription and translation systems by inserting authentic out-of-context errors. The automated modification process is applied to the 2016 TEDTalk corpus. Entirely automating the process allows the adoption of complete datasets at low cost, facilitating supervised learning procedures and deeper networks to be trained and tested. To evaluate the quality of the modification algorithm a language model and a supervised binary classification model are trained and tested on the altered dataset. A human baseline evaluation is examined to compare the results with human performance. The outcome of the evaluation task indicates the difficulty to detect semantic errors for machine-learning algorithms and humans, showing that the errors cannot be identified when limited to a single sentence.

Pattern Analysis with Layered Self-Organizing Maps

This paper defines a new learning architecture, Layered Self-Organizing Maps (LSOMs), that uses the SOM and supervised-SOM learning algorithms. The architecture is validated with the MNIST database of hand-written digit images. LSOMs are similar to convolutional neural nets (covnets) in the way they sample data, but different in the way they represent features and learn. LSOMs analyze (or generate) image patches with maps of exemplars determined by the SOM learning algorithm rather than feature maps from filter-banks learned via backprop. LSOMs provide an alternative to features derived from covnets. Multi-layer LSOMs are trained bottom-up, without the use of backprop and therefore may be of interest as a model of the visual cortex. The results show organization at multiple levels. The algorithm appears to be resource efficient in learning, classifying and generating images. Although LSOMs can be used for classification, their validation accuracy for these exploratory runs was well below the state of the art. The goal of this article is to define the architecture and display the structures resulting from its application to the MNIST images.

WikiRank: Improving Keyphrase Extraction Based on Background Knowledge

Keyphrase is an efficient representation of the main idea of documents. While background knowledge can provide valuable information about documents, they are rarely incorporated in keyphrase extraction methods. In this paper, we propose WikiRank, an unsupervised method for keyphrase extraction based on the background knowledge from Wikipedia. Firstly, we construct a semantic graph for the document. Then we transform the keyphrase extraction problem into an optimization problem on the graph. Finally, we get the optimal keyphrase set to be the output. Our method obtains improvements over other state-of-art models by more than 2% in F1-score.

Accelerating Learning in Constructive Predictive Frameworks with the Successor Representation

Here we propose using the successor representation (SR) to accelerate learning in a constructive knowledge system based on general value functions (GVFs). In real-world settings like robotics for unstructured and dynamic environments, it is infeasible to model all meaningful aspects of a system and its environment by hand due to both complexity and size. Instead, robots must be capable of learning and adapting to changes in their environment and task, incrementally constructing models from their own experience. GVFs, taken from the field of reinforcement learning (RL), are a way of modeling the world as predictive questions. One approach to such models proposes a massive network of interconnected and interdependent GVFs, which are incrementally added over time. It is reasonable to expect that new, incrementally added predictions can be learned more swiftly if the learning process leverages knowledge gained from past experience. The SR provides such a means of separating the dynamics of the world from the prediction targets and thus capturing regularities that can be reused across multiple GVFs. As a primary contribution of this work, we show that using SR-based predictions can improve sample efficiency and learning speed in a continual learning setting where new predictions are incrementally added and learned over time. We analyze our approach in a grid-world and then demonstrate its potential on data from a physical robot arm.

Learning to Reweight Examples for Robust Deep Learning

Deep neural networks have been shown to be very powerful modeling tools for many supervised learning tasks involving complex input patterns. However, they can also easily overfit to training set biases and label noises. In addition to various regularizers, example reweighting algorithms are popular solutions to these problems, but they require careful tuning of additional hyperparameters, such as example mining schedules and regularization hyperparameters. In contrast to past reweighting methods, which typically consist of functions of the cost value of each example, in this work we propose a novel meta-learning algorithm that learns to assign weights to training examples based on their gradient directions. To determine the example weights, our method performs a meta gradient descent step on the current mini-batch example weights (which are initialized from zero) to minimize the loss on a clean unbiased validation set. Our proposed method can be easily implemented on any type of deep network, does not require any additional hyperparameter tuning, and achieves impressive performance on class imbalance and corrupted label problems where only a small amount of clean validation data is available.

Multi-range Reasoning for Machine Comprehension

We propose MRU (Multi-Range Reasoning Units), a new fast compositional encoder for machine comprehension (MC). Our proposed MRU encoders are characterized by multi-ranged gating, executing a series of parameterized contract-and-expand layers for learning gating vectors that benefit from long and short-term dependencies. The aims of our approach are as follows: (1) learning representations that are concurrently aware of long and short-term context, (2) modeling relationships between intra-document blocks and (3) fast and efficient sequence encoding. We show that our proposed encoder demonstrates promising results both as a standalone encoder and as well as a complementary building block. We conduct extensive experiments on three challenging MC datasets, namely RACE, SearchQA and NarrativeQA, achieving highly competitive performance on all. On the RACE benchmark, our model outperforms DFN (Dynamic Fusion Networks) by 1.5%-6% without using any recurrent or convolution layers. Similarly, we achieve competitive performance relative to AMANDA on the SearchQA benchmark and BiDAF on the NarrativeQA benchmark without using any LSTM/GRU layers. Finally, incorporating MRU encoders with standard BiLSTM architectures further improves performance, achieving state-of-the-art results.

AAANE: Attention-based Adversarial Autoencoder for Multi-scale Network Embedding

Network embedding represents nodes in a continuous vector space and preserves structure information from the Network. Existing methods usually adopt a ‘one-size-fits-all’ approach when concerning multi-scale structure information, such as first- and second-order proximity of nodes, ignoring the fact that different scales play different roles in the embedding learning. In this paper, we propose an Attention-based Adversarial Autoencoder Network Embedding(AAANE) framework, which promotes the collaboration of different scales and lets them vote for robust representations. The proposed AAANE consists of two components: 1) Attention-based autoencoder effectively capture the highly non-linear network structure, which can de-emphasize irrelevant scales during training. 2) An adversarial regularization guides the autoencoder learn robust representations by matching the posterior distribution of the latent embeddings to given prior distribution. This is the first attempt to introduce attention mechanisms to multi-scale network embedding. Experimental results on real-world networks show that our learned attention parameters are different for every network and the proposed approach outperforms existing state-of-the-art approaches for network embedding.

A Proximal Block Coordinate Descent Algorithm for Deep Neural Network Training

Training deep neural networks (DNNs) efficiently is a challenge due to the associated highly nonconvex optimization. The backpropagation (backprop) algorithm has long been the most widely used algorithm for gradient computation of parameters of DNNs and is used along with gradient descent-type algorithms for this optimization task. Recent work have shown the efficiency of block coordinate descent (BCD) type methods empirically for training DNNs. In view of this, we propose a novel algorithm based on the BCD method for training DNNs and provide its global convergence results built upon the powerful framework of the Kurdyka-Lojasiewicz (KL) property. Numerical experiments on standard datasets demonstrate its competitive efficiency against standard optimizers with backprop.

Machine Learning and Applied Linguistics

This entry introduces the topic of machine learning and provides an overview of its relevance for applied linguistics and language learning. The discussion will focus on giving an introduction to the methods and applications of machine learning in applied linguistics, and will provide references for further study.

Goldbach’s Function Approximation Using Deep Learning

Goldbach conjecture is one of the most famous open mathematical problems. It states that every even number, bigger than two, can be presented as a sum of 2 prime numbers. % In this work we present a deep learning based model that predicts the number of Goldbach partitions for a given even number. Surprisingly, our model outperforms all state-of-the-art analytically derived estimations for the number of couples, while not requiring prime factorization of the given number. We believe that building a model that can accurately predict the number of couples brings us one step closer to solving one of the world most famous open problems. To the best of our knowledge, this is the first attempt to consider machine learning based data-driven methods to approximate open mathematical problems in the field of number theory, and hope that this work will encourage such attempts.

Detecting Heads using Feature Refine Net and Cascaded Multi-scale Architecture

This paper presents a method that can accurately detect heads especially small heads under indoor scene. To achieve this, we propose a novel Feature Refine Net (FRN) and a cascaded multi-scale architecture. FRN exploits the multi-scale hierarchical features created by deep convolutional neural networks. Proposed channel weighting method enables FRN to make use of features alternatively and effectively. To improve the performance of small head detection, we propose a cascaded multi-scale architecture which has two detectors. One called global detector is responsible for detecting large objects and acquiring the global distribution information. The other called local detector is specified for small objects detection and makes use of the information provided by global detector. Due to the lack of head detection datasets, we have collected and labeled a new large dataset named SCUT-HEAD that includes 4405 images with 111251 heads annotated. Experiments show that our method has achieved state-of-art performance on SCUT-HEAD.

Neural Nets via Forward State Transformation and Backward Loss Transformation

This article studies (multilayer perceptron) neural networks with an emphasis on the transformations involved — both forward and backward — in order to develop a semantical/logical perspective that is in line with standard program semantics. The common two-pass neural network training algorithms make this viewpoint particularly fitting. In the forward direction, neural networks act as state transformers. In the reverse direction, however, neural networks change losses of outputs to losses of inputs, thereby acting like a (real-valued) predicate transformer. In this way, backpropagation is functorial by construction, as shown earlier in recent other work. We illustrate this perspective by training a simple instance of a neural network.

On the Performance of Preconditioned Stochastic Gradient Descent

This paper studies the performance of preconditioned stochastic gradient descent (PSGD), which can be regarded as an enhance stochastic Newton method with the ability to handle gradient noise and non-convexity at the same time. We have improved the implementation of PSGD, unrevealed its relationship to equilibrated stochastic gradient descent (ESGD) and batch normalization, and provided a software package (https://…/psgd_tf ) implemented in Tensorflow to compare variations of PSGD and stochastic gradient descent (SGD) on a wide range of benchmark problems with commonly used neural network models, e.g., convolutional and recurrent neural networks. Comparison results clearly demonstrate the advantages of PSGD in terms of convergence speeds and generalization performances.

code2vec: Learning Distributed Representations of Code

We present a neural model for representing snippets of code as continuous distributed vectors. The main idea is to represent code as a collection of paths in its abstract syntax tree, and aggregate these paths, in a smart and scalable way, into a single fixed-length \emph{code vector}, which can be used to predict semantic properties of the snippet. We demonstrate the effectiveness of our approach by using it to predict a method’s name from the vector representation of its body. We evaluate our approach by training a model on a dataset of 14M methods. We show that code vectors trained on this dataset can predict method names from files that were completely unobserved during training. Furthermore, we show that our model learns useful method name vectors that capture semantic similarities, combinations, and analogies. Comparing previous techniques over the same data set, our approach obtains a relative improvement of over 75\%, being the first to successfully predict method names based on a large, cross-project, corpus.

PSFGAN: a generative adversarial network system for separating quasar point sources and host galaxy light
Image Inpainting using Block-wise Procedural Training with Annealed Adversarial Counterpar
Sequential Event Detection Using Multimodal Data in Nonstationary Environments
Deep Convolutional Compressed Sensing for LiDAR Depth Completion
Asynchronous Subgradient-Push
Contract theory in a VUCA world
Phase diagram of Kob-Andersen type binary Lennard-Jones mixtures
On the number of cycles in Av(312,4321) and Av(321,4123)
Introduction to Cluster Algebras
Stochastic Dynamics of Einstein Matter-Radiation Model with Spikes
Counterexamples for Robotic Planning Explained in Structured Language
The Nested Kingman Coalescent: Speed of Coming Down from Infinity
Broad Learning for Healthcare
Lyapunov Event-triggered Stabilization with a Known Convergence Rate
DeepMood: Modeling Mobile Phone Typing Dynamics for Mood Detection
Evaluating Sentence-Level Relevance Feedback for High-Recall Information Retrieval
Leveraging translations for speech transcription in low-resource settings
Deep Learning Phase Segregation
Iterative Low-Rank Approximation for CNN Compression
LayoutNet: Reconstructing the 3D Room Layout from a Single RGB Image
On the structure of matrices avoiding interval-minor patterns
Face Recognition with Hybrid Efficient Convolution Algorithms on FPGAs
Constructing de Bruijn sequences by concatenating smaller universal cycles
Message passing-based joint CFO and channel estimation in millimeter wave systems with one-bit ADCs
Feature Transfer Learning for Deep Face Recognition with Long-Tail Data
Difference-in-Differences with Multiple Time Periods and an Application on the Minimum Wage and Employment
Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis
The Importance of Constraint Smoothness for Parameter Estimation in Computational Cognitive Modeling
Optimal Policies for the Sequential Stochastic Threshold Assignment Problem
On Large-Scale Graph Generation with Validation of Diverse Triangle Statistics at Edges and Vertices
Convex Optimization of Nonlinear State Feedback Controllers for Discrete-time Polynomial Systems via Occupation Measures
Realtime Time Synchronized Event-based Stereo
Comparing Population Means under Local Differential Privacy: with Significance and Power
A Bounded Formulation for The School Bus Scheduling Problem
Towards End-to-End Prosody Transfer for Expressive Speech Synthesis with Tacotron
Unconventional Scaling Theory in Disorder-Driven Quantum Phase Transition
Encoding Accelerometer Signals as Images for Activity Recognition Using Residual Neural Network
Asymptotic Representations of Statistics in the Functional Empirical process : A portal and some applications
A Note on Bootstrap Percolation Thresholds in Plane Tilings using Regular Polygons
A Single-shot Camera-Projector Calibration System For Imperfect Planar Targets
Near-lossless Binarization of Word Embeddings
Ground state properties of Ising chain with random monomer-dimer couplings
Chemical event chain model of coupled genetic oscillators
Ground-state magnetization of the Ising spin glass: A recursive numerical method and Chen-Ma scaling
Spectrum Sensing with Multiple Primary Users over Fading Channels
Physical Layer Security in the Presence of Interference
Simple Large-scale Relation Extraction from Unstructured Text
VOS-GAN: Adversarial Learning of Visual-Temporal Dynamics for Unsupervised Dense Prediction in Videos
Comparing Generative Adversarial Network Techniques for Image Creation and Modification
On a variant of Tykhonov regularization in optimal control under PDEs
Equivariant Algebraic Morse Theory
A Resourceful Reframing of Behavior Trees
On limit theorems for fields of martingale differences
Managing Large-Scale Transient Data in IoT Systems
Learning architectures based on quantum entanglement: a simple matrix product state algorithm for image recognition
Topological order generated by random field in a 2D exchange model
Gradient descent in Gaussian random fields as a toy model for high-dimensional optimisation in deep learning
Probability measure changes in Monte Carlo simulation
Equation Embeddings
Predicting Gaze in Egocentric Video by Learning Task-dependent Attention Transition
Merging and Evolution: Improving Convolutional Neural Networks for Mobile Applications
Multi-Level Factorisation Net for Person Re-Identification
Social Media Analysis For Organizations: Us Northeastern Public And State Libraries Case Study
Characterizing Diseases and disorders in Gay Users’ tweets
A stochastic telegraph equation from the six-vertex model
Posterior Concentration for Sparse Deep Learning
A Note on the DP-Chromatic Number of Complete Bipartite Graphs
A study on resistance matrix of graphs
Natural Gradients in Practice: Non-Conjugate Variational Inference in Gaussian Process Models
Fast variational Bayes for heavy-tailed PLDA applied to i-vectors and x-vectors
An Overview of Vulnerabilities of Voice Controlled Systems
Efficient Discovery of Heterogeneous Treatment Effects in Randomized Experiments via Anomalous Pattern Detection
A Dynamic-Adversarial Mining Approach to the Security of Machine Learning
Security Theater: On the Vulnerability of Classifiers to Exploratory Attacks
Low-Resource Speech-to-Text Translation
Noise generation for compression algorithms
Fast distributed optimization using row-stochastic weights and uncoordinated step-sizes
Multiple Sclerosis Lesion Segmentation from Brain MRI via Fully Convolutional Neural Networks
An Introduction to Imperfect Competition via Bilateral Oligopoly
Balanced Random Survival Forests for Extremely Unbalanced, Right Censored Data
FaceForensics: A Large-scale Video Dataset for Forgery Detection in Human Faces
Unsupervised Domain Adaptation: from Simulation Engine to the RealWorld
Saturated Fully Leafed Tree-Like Polyforms and Polycubes
Finite-Data Performance Guarantees for the Output-Feedback Control of an Unknown System
Scene Graph Parsing as Dependency Parsing
Bayesian Optimal Data Detector for Hybrid mmWave MIMO-OFDM Systems with Low-Resolution ADCs
Network archaeology: phase transition in the recoverability of network history
Optimal Spectrum Sensing Policy in RF-Powered Cognitive Radio Networks
Learning Type-Aware Embeddings for Fashion Compatibility
Unsupervised Depth Estimation, 3D Face Rotation and Replacement
Autonomous Ramp Merge Maneuver Based on Reinforcement Learning with Continuous Action Space
Agent-Based Implementation of Particle Hopping Traffic Model With Stochastic and Queuing Elements
Revisiting Mayer: Symmetric solutions for sporadic cases of the Map Color Theorem
Unsupervised Domain Adaptation: A Multi-task Learning-based Method
Importance Weighted Adversarial Nets for Partial Domain Adaptation
Bernoulli Embeddings for Graphs
Image Recognition Using Scale Recurrent Neural Networks
Nonconventional Random Matrix Products
Edge correlations in random regular hypergraphs and applications to subgraph testing
A Dynamic Penalty Parameter Updating Strategy for Matrix-Free Sequential Quadratic Optimization
Nonorthogonal Multiple Access for Beamforming in Energy-Harvesting Enabled Networks
A General Dichotomy of Evolutionary Algorithms on Monotone Functions
Large girth graphs with bounded diameter-by-girth ratio
Pay More Attention – Neural Architectures for Question-Answering
On the a posteriori error analysis for linear Fokker-Planck models in convection-dominated diffusion problems
Martin boundary of random walks in convex cones
A theory of the phenomenology of Multipopulation Genetic Algorithm with an application to the Ising model
Evolutionary n-level Hypergraph Partitioning with Adaptive Coarsening
P2P-NET: Bidirectional Point Displacement Network for Shape Transform
New SOCP relaxation and branching rule for bipartite bilinear programs
Pathwise integration and change of variable formulas for continuous paths with arbitrary regularity
On the Schur function expansion of a symmetric quasi-symmetric function
Stability Analysis of Inexact Solves in Moment Matching based Model Reduction
Synthesizing Skeletons for Reactive Systems
The Geometry of Culture: Analyzing Meaning through Word Embeddings
Minmax Centered k-Partitioning of Trees and Applications to Sink Evacuation with Dynamic Confluent Flows
A Hopf-Lax formula for Hamilton-Jacobi equations with Caputo time derivative
Connectivity-Preserving Consensus of Multi-Agent Systems with Bounded Actuation
Optimal shapes for general integral functionals
A New Reconfigurable Antenna MIMO Architecture for mmWave Communication
$\mathbb{Z}_{q}(\mathbb{Z}_{q}+u\mathbb{Z}_{q})$-Linear Skew Constacyclic Codes
Data-driven Discovery of Closure Models
SUNLayer: Stable denoising with generative networks
Importance sampling for McKean-Vlasov SDEs
Local Quadratic Estimation of the Curvature in a Functional Single Index Model
Algebras with two multiplications and their cumulants
Deep Depth Completion of a Single RGB-D Image
Stabilizing Gradients for Deep Neural Networks via Efficient SVD Parameterization
Quasi-Harmonic Constraints for Toric Bézier Surfaces
The top-degree part in the Matchings-Jack Conjecture
StarMap for Category-Agnostic Keypoint and Viewpoint Estimation
The subcritical phase for a homopolymer model
Text Segmentation as a Supervised Learning Task
DeepVesselNet: Vessel Segmentation, Centerline Prediction, and Bifurcation Detection in 3-D Angiographic Volumes
DMTCP Checkpoint/Restart of MPI Programs via Proxies
Large Deviations from the Hydrodynamic Limit for a System with Nearest Neighbor Interactions
Mechanism Deduction from Noisy Chemical Reaction Networks
Logistic Regression: The Importance of Being Improper
Stochastic bandits robust to adversarial corruptions
Learning-Based Quality Control for Cardiac MR Images
Minimizing Nonconvex Population Risk from Rough Empirical Risk
A Face Recognition Signature Combining Patch-based Features with Soft Facial Attributes
PI Consensus Error Transformation for Adaptive Cooperative Control of Nonlinear Multi-Agent Systems
Finite Sample Complexity of Sequential Monte Carlo Estimators
Opposition diagrams for automorphisms of small spherical buildings
Variations on the $S_n$-module $Lie_n$
Popular Matching in Roommates Setting is NP-hard
StaQC: A Systematically Mined Question-Code Dataset from Stack Overflow
Generalized Hadamard-Product Fusion Operators for Visual Question Answering
Removing scanner biases using Generative Adversarial Networks
A Systematic Comparison of Deep Learning Architectures in an Autonomous Vehicle
A vertex reconstruction algorithm in the central detector of JUNO
Bounds on the cardinality of restricted sumsets in $\mathbb{Z}_{p}$
HomeGuard: A Smart System to Deal with the Emergency Response of Domestic Violence Victims
Aggression-annotated Corpus of Hindi-English Code-mixed Data
A Hierarchy of Empirical Models of Plasma Profiles and Transport
Automatic Identification of Closely-related Indian Languages: Resources and Experiments
Efficient File Delivery for Coded Prefetching in Shared Cache Networks with Multiple Requests Per User
Optimal Design and Control of 4-IWD Electric Vehicles based on a 14-DOF Vehicle Model
Consensus Control of Multi-agent Systems with Optimal Performance
Precision Sugarcane Monitoring Using SVM Classifier
Induced nets and Hamiltonicity of claw-free graphs
Deep Faster Detection of Faint Edges in Noisy Images
Scalable photonic reinforcement learning by time-division multiplexing of laser chaos
Large Deviation Principle for arithmetic functions in continued fraction expansion
Unpopularity Factor in the Marriage and Roommates Problems
Cascaded multi-scale and multi-dimension convolutional neural network for stereo matching
The principle of maximum in the imitative control tasks
Cliquet option pricing with Meixner processes
Convergent collocation methods for parabolic equations
REST: Real-to-Synthetic Transform for Illumination Invariant Camera Localization
CNN in MRF: Video Object Segmentation via Inference in A CNN-Based Higher-Order Spatio-Temporal MRF
Fast and Accurate Single Image Super-Resolution via Information Distillation Network
Scalable inference for crossed random effects models
A clustering approach to infer Wikipedia contributors’ profile
Regularizing Deep Hashing Networks Using GAN Generated Fake Images
A Switch to the Concern of User: Importance Coefficient in Utility Distribution and Message Importance Measure
Image Set Classification for Low Resolution Surveillance
Semantic See-Through Rendering on Light Fields
User Positioning in mmW 5G Networks using Beam-RSRP Measurements and Kalman Filtering
Interpolation error of misspecified Gaussian process regression
Clustering to Given Connectivities
Lower bounds on the maximum delay margin by analytic interpolation
Solving linear parabolic rough partial differential equations
Unsupervised Learning and Segmentation of Complex Activities from Video
Latency and Throughput Characterization of Convolutional Neural Networks for Mobile Computer Vision
On the loss of Fisher information in some multi-object tracking observation models
Long time behavior of the volume of the Wiener sausage on Dirichlet spaces
A general white noise test based on kernel lag-window estimates of the spectral density operator
Long-term Tracking in the Wild: A Benchmark
Efficient space virtualisation for Hoshen–Kopelman algorithm