Rethinking Numerical Representations for Deep Neural Networks

With ever-increasing computational demand for deep learning, it is critical to investigate the implications of the numeric representation and precision of DNN model weights and activations on computational efficiency. In this work, we explore unconventional narrow-precision floating-point representations as it relates to inference accuracy and efficiency to steer the improved design of future DNN platforms. We show that inference using these custom numeric representations on production-grade DNNs, including GoogLeNet and VGG, achieves an average speedup of 7.6x with less than 1% degradation in inference accuracy relative to a state-of-the-art baseline platform representing the most sophisticated hardware using single-precision floating point. To facilitate the use of such customized precision, we also present a novel technique that drastically reduces the time required to derive the optimal precision configuration.


Efficient and Effective $L_0$ Feature Selection

Because of continuous advances in mathematical programing, Mix Integer Optimization has become a competitive vis-a-vis popular regularization method for selecting features in regression problems. The approach exhibits unquestionable foundational appeal and versatility, but also poses important challenges. We tackle these challenges, reducing computational burden when tuning the sparsity bound (a parameter which is critical for effectiveness) and improving performance in the presence of feature collinearity and of signals that vary in nature and strength. Importantly, we render the approach efficient and effective in applications of realistic size and complexity – without resorting to relaxations or heuristics in the optimization, or abandoning rigorous cross-validation tuning. Computational viability and improved performance in subtler scenarios is achieved with a multi-pronged blueprint, leveraging characteristics of the Mixed Integer Programming framework and by means of whitening, a data pre-processing step.


Image Anomalies: a Review and Synthesis of Detection Methods

We review the broad variety of methods that have been proposed for anomaly detection in images. Most methods found in the literature have in mind a particular application. Yet we show that the methods can be classified mainly by the structural assumption they make on the ‘normal’ image. Five different structural assumptions emerge. Our analysis leads us to reformulate the best representative algorithms by attaching to them an a contrario detection that controls the number of false positives and thus derive universal detection thresholds. By combining the most general structural assumptions expressing the background’s normality with the best proposed statistical detection tools, we end up proposing generic algorithms that seem to generalize or reconcile most methods. We compare the six best representatives of our proposed classes of algorithms on anomalous images taken from classic papers on the subject, and on a synthetic database. Our conclusion is that it is possible to perform automatic anomaly detection on a single image.


Parallax: Automatic Data-Parallel Training of Deep Neural Networks

The employment of high-performance servers and GPU accelerators for training deep neural network models have greatly accelerated recent advances in machine learning (ML). ML frameworks, such as TensorFlow, MXNet, and Caffe2, have emerged to assist ML researchers to train their models in a distributed fashion. However, correctly and efficiently utilizing multiple machines and GPUs is still not a straightforward task for framework users due to the non-trivial correctness and performance challenges that arise in the distribution process. This paper introduces Parallax, a tool for automatic parallelization of deep learning training in distributed environments. Parallax not only handles the subtle correctness issues, but also leverages various optimizations to minimize the communication overhead caused by scaling out. Experiments show that Parallax built atop TensorFlow achieves scalable training throughput on multiple CNN and RNN models, while requiring little effort from its users.


Debugging Neural Machine Translations

In this paper, we describe a tool for debugging the output and attention weights of neural machine translation (NMT) systems and for improved estimations of confidence about the output based on the attention. The purpose of the tool is to help researchers and developers find weak and faulty example translations that their NMT systems produce without the need for reference translations. Our tool also includes an option to directly compare translation outputs from two different NMT engines or experiments. In addition, we present a demo website of our tool with examples of good and bad translations: http://attention.lielakeda.lv


Can Network Analysis Techniques help to Predict Design Dependencies? An Initial Study

The degree of dependencies among the modules of a software system is a key attribute to characterize its design structure and its ability to evolve over time. Several design problems are often correlated with undesired dependencies among modules. Being able to anticipate those problems is important for developers, so they can plan early for maintenance and refactoring efforts. However, existing tools are limited to detecting undesired dependencies once they appeared in the system. In this work, we investigate whether module dependencies can be predicted (before they actually appear). Since the module structure can be regarded as a network, i.e, a dependency graph, we leverage on network features to analyze the dynamics of such a structure. In particular, we apply link prediction techniques for this task. We conducted an evaluation on two Java projects across several versions, using link prediction and machine learning techniques, and assessed their performance for identifying new dependencies from a project version to the next one. The results, although preliminary, show that the link prediction approach is feasible for package dependencies. Also, this work opens opportunities for further development of software-specific strategies for dependency prediction.


A simple analysis of flying capacitor converter
BayesReef: A Bayesian inference framework for modelling reef growth in response to environmental change and biological dynamics
Improved survival of cancer patients admitted to the ICU between 2002 and 2011 at a U.S. teaching hospital
Morphology of renormalization-group flow for the de Almeida-Thouless-Gardner universality class
A generalized scheme for BSDEs based on derivative approximation and its error estimates
A Class of Multirate Infinitesimal GARK Methods
Effective Character-augmented Word Embedding for Machine Reading Comprehension
eQASM: An Executable Quantum Instruction Set Architecture
Device-directed Utterance Detection
Message Passing Graph Kernels
Bipartite induced density in triangle-free graphs
Quantum Lyapunov control with machine learning
Width-Independence Beyond Linear Objectives: Distributed Fair Packing and Covering Algorithms
Detection and Segmentation of Manufacturing Defects with Convolutional Neural Networks and Transfer Learning
Spectral Efficiency Analysis of the Decoupled Access for Downlink and Uplink in Two Tier Network
Student Log-Data from a Randomized Evaluation of Educational Technology: A Causal Case Study
Circular critical exponents for Thue-Morse factors
Randomized sketch descent methods for non-separable linearly constrained optimization
SchiNet: Automatic Estimation of Symptoms of Schizophrenia from Facial Behaviour Analysis
Dynamic Temporal Pyramid Network: A Closer Look at Multi-Scale Modeling for Activity Detection
Light-stimulable molecules/nanoparticles networks for switchable logical functions and reservoir computing
Efficient Multi-Robot Coverage of a Known Environment
Free energy of bipartite Sherrington-Kirkpatrick model
A Randomized Block Proximal Variable Sample-size Stochastic Gradient Method for Composite Nonconvex Stochastic Optimization
Persistent Monitoring of Dynamically Changing Environments Using an Unmanned Vehicle
Parallel and Streaming Algorithms for K-Core Decomposition
Collaborative Planning for Mixed-Autonomy Lane Merging
On the Dimension of Unimodular Discrete Spaces, Part II: Relations with Growth Rate
Multi-robot Dubins Coverage with Autonomous Surface Vehicles
Deep context: end-to-end contextual speech recognition
A Joint Sequence Fusion Model for Video Question Answering and Retrieval
Belief likelihood function for generalised logistic regression
Description of closure operators in convex geometries of segments on a line
Design Challenges in Named Entity Transliteration
Good $r$-divisions Imply Optimal Amortised Decremental Biconnectivity
Machine Learning for Dynamic Models of Imperfect Information and Semiparametric Moment Inequalities
Outage Probability of the EH-based Full-Duplex AF and DF Relaying Systems in α-μEnvironment
Vertex-isoperimetric stability in the hypercube
The commuting complex of the symmetric group with bounded number of $p$-cycles
A Centralized Metropolitan-Scale Radio Resource Management Scheme
Reachability Analysis Using Dissipation Inequalities For Nonlinear Dynamical Systems
Adversarial Domain Adaptation for Variational Neural Language Generation in Dialogue Systems
The existence of square non-integer Heffter arrays
A Tutorial on Network Embeddings
A practical Single Source Shortest Path algorithm for random directed graphs with arbitrary weight in expecting linear time
The dynamical sine-Gordon model in the full subcritical regime
A Semi-Supervised Data Augmentation Approach using 3D Graphical Engines
PIVETed-Granite: Computational Phenotypes through Constrained Tensor Factorization
Unsupervised/Semi-supervised Deep Learning for Low-dose CT Enhancement
End-to-end Speech Recognition with Word-based RNN Language Models
L-Shapley and C-Shapley: Efficient Model Interpretation for Structured Data
Power domination in regular claw-free graphs
Learning to Write Notes in Electronic Health Records
Randomized box-ball systems, limit shape of rigged configurations and Thermodynamic Bethe ansatz
Training Compact Neural Networks with Binary Weights and Low Precision Activations
Question-Guided Hybrid Convolution for Visual Question Answering
Courteous Autonomous Cars
Reconciliation of probabilistic forecasts with an application to wind power
Cognitive system to achieve human-level accuracy in automated assignment of helpdesk email tickets
Social Community-Aware Content Placement in Wireless Device-to-Device Communication Networks
Accelerating wave-propagation algorithms with adaptive mesh refinement using the Graphics Processing Unit (GPU)
A Unified Framework for Testing High Dimensional Parameters: A Data-Adaptive Approach
Adversarial Geometry and Lighting using a Differentiable Renderer
Permutation patterns in genome rearrangement problems
Connected $k$-factors in bipartite graphs
Analogies of the Qi formula for some Dowling type numbers
An Occam’s Razor View on Learning Audiovisual Emotion Recognition with Small Training Sets
Testing heteroscedasticity for regression models based on projections
Modified box dimension of trees and hierarchical scale-free graphs
The scaling limit of the $(\nabla+Δ)$-model
Age of Information Upon Decisions
On a mixture of Brenier and Strassen theorems
An Improved Bound for Weak Epsilon-Nets in the Plane
On the Monitoring of Decentralized Specifications Semantics, Properties, Analysis, and Simulation
The roll call interpretation of the Shapley value
New lower bounds on the size of arcs and new optimal projective linear codes
A Method for Estimating the Probability of Extremely Rare Accidents in Complex Systems
Memetic Algorithm-Based Path Generation for Multiple Dubins Vehicles Performing Remote Tasks
Asymptotics of maximum likelihood estimators based on Markov chain Monte Carlo methods
Learning to Focus when Ranking Answers
Limiting properties of random graph models with vertex and edge weights
On the convergence of closed-loop Nash equilibria to the mean field game limit
Natural Language Generation by Hierarchical Decoding with Linguistic Patterns
On lexicographic representatives in braid monoids
Omnidirectional DSO: Direct Sparse Odometry with Fisheye Cameras
Cache Aided Communications with Multiple Antennas at Finite SNR
Schools are segregated by educational outcomes in the digital space
On the Number of Acyclic Orientations of Complete $k$-Partite Graphs
Multiband SAS Imagery
Weighted models for level statistics across the many–body localization transition
Steiner Point Removal with distortion $O(\log k)$, using the Noisy-Voronoi algorithm
Extremal Norms for Fiber Bunched Cocycles
Highly Accelerated Multishot EPI through Synergistic Combination of Machine Learning and Joint Reconstruction
Separators for Planar Graphs that are Almost Trees
A Kernel Method for Positive 1-in-3-SAT
Backprop Evolution
Joint Frequency Reuse and Cache Optimization in Backhaul-Limited Small-Cell Wireless Networks
FLUX: Progressive State Estimation Based on Zakai-type Distributed Ordinary Differential Equations
Debunking Fake News One Feature at a Time
A Novel Disparity Transformation Algorithm for Road Segmentation
On the Effect of Task-to-Worker Assignment in Distributed Computing Systems with Stragglers
Pattern Recognition Approach to Violin Shapes of MIMO database
Relaxing and Restraining Queries for OBDA
Exotic matrix models: the Albert algebra and the spin factor
On the Solvability of Viewing Graphs
Hard to Solve Instances of the Euclidean Traveling Salesman Problem
Choose Your Neuron: Incorporating Domain Knowledge through Neuron-Importance
Additional Representations for Improving Synthetic Aperture Sonar Classification Using Convolutional Neural Networks
Parkinson’s Disease Assessment from a Wrist-Worn Wearable Sensor in Free-Living Conditions: Deep Ensemble Learning and Visualization
Random directions stochastic approximation with deterministic perturbations
Visualizing Convolutional Networks for MRI-based Diagnosis of Alzheimer’s Disease