mQAPViz: A divide-and-conquer multi-objective optimization algorithm to compute large data visualizations

Modern digital products and services are instrumental in understanding users activities and behaviors. In doing so, we have to extract relevant relationships and patterns from extensive data collections efficiently. Data visualization algorithms are essential tools in transforming data into narratives. Unfortunately, very few visualization algorithms can handle a significant amount of data. In this study, we address the visualization of large-scale datasets as a multi-objective optimization problem. We propose mQAPViz, a divide-and-conquer multi-objective optimization algorithm to compute large-scale data visualizations. Our method employs the Multi-Objective Quadratic Assignment Problem (mQAP) as the mathematical foundation to solve the visualization task at hand. The algorithm applies advanced machine learning sampling techniques and efficient data structures to scale to millions of data objects. The divide-and-conquer strategy can efficiently handle millions of objects which the algorithm allocates onto a layout that allows the visualization of a whole dataset. Experimental results on real-world and large datasets demonstrate that mQAPViz is a competitive alternative to compute large-scale visualizations that we can employ to inform the development and improvement of digital applications.

Evaluating Hospital Case Cost Prediction Models Using Azure Machine Learning Studio

Ability for accurate hospital case cost modelling and prediction is critical for efficient health care financial management and budgetary planning. A variety of regression machine learning algorithms are known to be effective for health care cost predictions. The purpose of this experiment was to build an Azure Machine Learning Studio tool for rapid assessment of multiple types of regression models. The tool offers environment for comparing 14 types of regression models in a unified experiment: linear regression, Bayesian linear regression, decision forest regression, boosted decision tree regression, neural network regression, Poisson regression, Gaussian processes for regression, gradient boosted machine, nonlinear least squares regression, projection pursuit regression, random forest regression, robust regression, robust regression with mm-type estimators, support vector regression. The tool presents assessment results arranged by model accuracy in a single table using five performance metrics. Evaluation of regression machine learning models for performing hospital case cost prediction demonstrated advantage of robust regression model, boosted decision tree regression and decision forest regression. The operational tool has been published to the web and openly available for experiments and extensions.

End-to-End DNN Training with Block Floating Point Arithmetic

DNNs are ubiquitous datacenter workloads, requiring orders of magnitude more computing power from servers than traditional workloads. As such, datacenter operators are forced to adopt domain-specific accelerators that employ half-precision floating-point (FP) numeric representations to improve arithmetic density. Unfortunately, even these representations are not dense enough, and are, therefore, sub-optimal for DNNs. We propose a hybrid approach that employs dense block floating-point (BFP) arithmetic on dot product computations and FP arithmetic elsewhere. While using BFP improves the performance of dot product operations, that compose most of DNN computations, allowing values to freely float between dot product operations leads to a better choice of tensor exponents when converting values to back BFP. We show that models trained with hybrid BFP-FP arithmetic either match or outperform their FP32 counterparts, leading to more compact models and denser arithmetic in computing platforms.

Identification of Shallow Neural Networks by Fewest Samples

We address the uniform approximation of sums of ridge functions \sum_{i=1}^m g_i(a_i\cdot x) on {\mathbb R}^d, representing the shallowest form of feed-forward neural network, from a small number of query samples, under mild smoothness assumptions on the functions g_i‘s and near-orthogonality of the ridge directions a_i‘s. The sample points are randomly generated and are universal, in the sense that the sampled queries on those points will allow the proposed recovery algorithms to perform a uniform approximation of any sum of ridge functions with high-probability. Our general approximation strategy is developed as a sequence of algorithms to perform individual sub-tasks. We first approximate the span of the ridge directions. Then we use a straightforward substitution, which reduces the dimensionality of the problem from d to m. The core of the construction is then the approximation of ridge directions expressed in terms of rank-1 matrices a_i \otimes a_i, realized by formulating their individual identification as a suitable nonlinear program, maximizing the spectral norm of certain competitors constrained over the unit Frobenius sphere. The final step is then to approximate the functions g_1,\dots,g_m by \hat g_1,\dots,\hat g_m. Higher order differentiation, as used in our construction, of sums of ridge functions or of their compositions, as in deeper neural network, yields a natural connection between neural network weight identification and tensor product decomposition identification. In the case of the shallowest feed-forward neural network, we show that second order differentiation and tensors of order two (i.e., matrices) suffice.

Pigeonring: A Principle for Faster Thresholded Similarity Search

The pigeonhole principle states that if n items are contained in m boxes, then at least one box has no fewer than n/m items. It is utilized to solve many data management problems, especially for thresholded similarity searches. Despite many pigeonhole principle-based solutions proposed in the last few decades, the condition stated by the principle is weak. It only constrains the number of items in a single box. By organizing the boxes in a ring, we observe that the number of items in multiple boxes are also constrained. We propose a new principle called the pigeonring principle which formally captures such constraints and yields stronger conditions. To utilize the pigeonring principle, we focus on problems defined in the form of identifying data objects whose similarities or distances to the query is constrained by a threshold. Many solutions to these problems utilize the pigeonhole principle to find candidates that satisfy a filtering condition. By the pigeonring principle, stronger filtering conditions can be established. We show that the pigeonhole principle is a special case of the pigeonring principle. This suggests that all the solutions based on the pigeonhole principle are possible to be accelerated by the pigeonring principle. A universal filtering framework is introduced to encompass the solutions to these problems based on the pigeonring principle. Besides, we discuss how to quickly find candidates specified by the pigeonring principle with minor modifications on top of existing algorithms. Experimental results on real datasets demonstrate the applicability of the pigeonring principle as well as the superior performance of the algorithms based on the principle.

Stability and Convergence Trade-off of Iterative Optimization Algorithms

The overall performance or expected excess risk of an iterative machine learning algorithm can be decomposed into training error and generalization error. While the former is controlled by its convergence analysis, the latter can be tightly handled by algorithmic stability. The machine learning community has a rich history investigating convergence and stability separately. However, the question about the trade-off between these two quantities remains open. In this paper, we show that for any iterative algorithm at any iteration, the overall performance is lower bounded by the minimax statistical error over an appropriately chosen loss function class. This implies an important trade-off between convergence and stability of the algorithm — a faster converging algorithm has to be less stable, and vice versa. As a direct consequence of this fundamental tradeoff, new convergence lower bounds can be derived for classes of algorithms constrained with different stability bounds. In particular, when the loss function is convex (or strongly convex) and smooth, we discuss the stability upper bounds of gradient descent (GD) and stochastic gradient descent and their variants with decreasing step sizes. For Nesterov’s accelerated gradient descent (NAG) and heavy ball method (HB), we provide stability upper bounds for the quadratic loss function. Applying existing stability upper bounds for the gradient methods in our trade-off framework, we obtain lower bounds matching the well-established convergence upper bounds up to constants for these algorithms and conjecture similar lower bounds for NAG and HB. Finally, we numerically demonstrate the tightness of our stability bounds in terms of exponents in the rate and also illustrate via a simulated logistic regression problem that our stability bounds reflect the generalization errors better than the simple uniform convergence bounds for GD and NAG.

Simultaneous Mean-Variance Regression

We propose simultaneous mean-variance regression for the linear estimation and approximation of conditional mean functions. In the presence of heteroskedasticity of unknown form, our method accounts for varying dispersion in the regression outcome across the support of conditioning variables by using weights that are jointly determined with mean regression parameters. Simultaneity generates outcome predictions that are guaranteed to improve over ordinary least-squares prediction error, with corresponding parameter standard errors that are automatically valid. Under shape misspecification of the conditional mean and variance functions, we establish existence and uniqueness of the resulting approximations and characterize their formal interpretation. We illustrate our method with numerical simulations and two empirical applications to the estimation of the relationship between economic prosperity in 1500 and today, and demand for gasoline in the United States.

Review of Deep Learning

In recent years, China, the United States and other countries, Google and other high-tech companies have increased investment in artificial intelligence. Deep learning is one of the current artificial intelligence research’s key areas. This paper analyzes and summarizes the latest progress and future research directions of deep learning. Firstly, three basic models of deep learning are outlined, including multilayer perceptrons, convolutional neural networks, and recurrent neural networks. On this basis, we further analyze the emerging new models of convolution neural networks and recurrent neural networks. This paper then summarizes deep learning’s applications in many areas of artificial intelligence, including voice, computer vision, natural language processing and so on. Finally, this paper discusses the existing problems of deep learning and gives the corresponding possible solutions.

The structure of evolved representations across different substrates for artificial intelligence

Artificial neural networks (ANNs), while exceptionally useful for classification, are vulnerable to misdirection. Small amounts of noise can significantly affect their ability to correctly complete a task. Instead of generalizing concepts, ANNs seem to focus on surface statistical regularities in a given task. Here we compare how recurrent artificial neural networks, long short-term memory units, and Markov Brains sense and remember their environments. We show that information in Markov Brains is localized and sparsely distributed, while the other neural network substrates ‘smear’ information about the environment across all nodes, which makes them vulnerable to noise.

Learning Strict Identity Mappings in Deep Residual Networks

A family of super deep networks, referred to as residual networks or ResNet, achieved record-beating performance in various visual tasks such as image recognition, object detection, and semantic segmentation. The ability to train very deep networks naturally pushed the researchers to use enormous resources to achieve the best performance. Consequently, in many applications super deep residual networks were employed for just a marginal improvement in performance. In this paper, we propose \epsilon-ResNet that allows us to automatically discard redundant layers, which produces responses that are smaller than a threshold \epsilon, without any loss in performance. The \epsilon-ResNet architecture can be achieved using a few additional rectified linear units in the original ResNet. Our method does not use any additional variables nor numerous trials like other hyper-parameter optimization techniques. The layer selection is achieved using a single training process and the evaluation is performed on CIFAR-10, CIFAR-100, SVHN, and ImageNet datasets. In some instances, we achieve about 80\% reduction in the number of parameters.

The Kanerva Machine: A Generative Distributed Memory

We present an end-to-end trained memory system that quickly adapts to new data and generates samples like them. Inspired by Kanerva’s sparse distributed memory, it has a robust distributed reading and writing mechanism. The memory is analytically tractable, which enables optimal on-line compression via a Bayesian update-rule. We formulate it as a hierarchical conditional generative model, where memory provides a rich data-dependent prior distribution. Consequently, the top-down memory and bottom-up perception are combined to produce the code representing an observation. Empirically, we demonstrate that the adaptive memory significantly improves generative models trained on both the Omniglot and CIFAR datasets. Compared with the Differentiable Neural Computer (DNC) and its variants, our memory model has greater capacity and is significantly easier to train.

Time Series Analysis of the Southern Oscillation Index using Bayesian Additive Regression Trees

Bayesian additive regression trees (BART) is a regression technique developed by Chipman et al. (2008). Its usefulness in standard regression settings has been clearly demonstrated, but it has not been applied to time series analysis as yet. We discuss the difficulties in applying this technique to time series analysis and demonstrate its superior predictive capabilities in the case of a well know time series: the Southern Oscillation Index.

Robust Fusion Methods for Structured Big Data

We address one of the important problems in Big Data, namely how to combine estimators from different subsamples by robust fusion procedures, when we are unable to deal with the whole sample. We propose a general framework based on the classic idea of `divide and conquer’. In particular we address in some detail the case of a multivariate location and scatter matrix, the covariance operator for functional data, and clustering problems.

A Human Mixed Strategy Approach to Deep Reinforcement Learning

In 2015, Google’s DeepMind announced an advancement in creating an autonomous agent based on deep reinforcement learning (DRL) that could beat a professional player in a series of 49 Atari games. However, the current manifestation of DRL is still immature, and has significant drawbacks. One of DRL’s imperfections is its lack of ‘exploration’ during the training process, especially when working with high-dimensional problems. In this paper, we propose a mixed strategy approach that mimics behaviors of human when interacting with environment, and create a ‘thinking’ agent that allows for more efficient exploration in the DRL training process. The simulation results based on the Breakout game show that our scheme achieves a higher probability of obtaining a maximum score than does the baseline DRL algorithm, i.e., the asynchronous advantage actor-critic method. The proposed scheme therefore can be applied effectively to solving a complicated task in a real-world application.

Sliced-Wasserstein Autoencoder: An Embarrassingly Simple Generative Model

In this paper we study generative modeling via autoencoders while using the elegant geometric properties of the optimal transport (OT) problem and the Wasserstein distances. We introduce Sliced-Wasserstein Autoencoders (SWAE), which are generative models that enable one to shape the distribution of the latent space into any samplable probability distribution without the need for training an adversarial network or defining a closed-form for the distribution. In short, we regularize the autoencoder loss with the sliced-Wasserstein distance between the distribution of the encoded training samples and a predefined samplable distribution. We show that the proposed formulation has an efficient numerical solution that provides similar capabilities to Wasserstein Autoencoders (WAE) and Variational Autoencoders (VAE), while benefiting from an embarrassingly simple implementation.

Automated Classification of Text Sentiment

The ability to identify sentiment in text, referred to as sentiment analysis, is one which is natural to adult humans. This task is, however, not one which a computer can perform by default. Identifying sentiments in an automated, algorithmic manner will be a useful capability for business and research in their search to understand what consumers think about their products or services and to understand human sociology. Here we propose two new Genetic Algorithms (GAs) for the task of automated text sentiment analysis. The GAs learn whether words occurring in a text corpus are either sentiment or amplifier words, and their corresponding magnitude. Sentiment words, such as ‘horrible’, add linearly to the final sentiment. Amplifier words in contrast, which are typically adjectives/adverbs like ‘very’, multiply the sentiment of the following word. This increases, decreases or negates the sentiment of the following word. The sentiment of the full text is then the sum of these terms. This approach grows both a sentiment and amplifier dictionary which can be reused for other purposes and fed into other machine learning algorithms. We report the results of multiple experiments conducted on large Amazon data sets. The results reveal that our proposed approach was able to outperform several public and/or commercial sentiment analysis algorithms.

Probabilistic Formulations of Regression with Mixed Guidance
Contrastive Learning of Emoji-based Representations for Resource-Poor Languages
A Priori Tests for the MIXMAX Random Number Generator
On influence and compromise in two-tier voting systems
Hyperbolic Entailment Cones for Learning Hierarchical Embeddings
Boosting Handwriting Text Recognition in Small Databases with Transfer Learning
GoSGD: Distributed Optimization for Deep Learning with Gossip Exchange
Non-parametric cure rate estimation under insufficient follow-up using extremes
SOPR for sparse phase retrieval
Probabilistic representation for mild solution of Navier-Stokes equations
Small Deviations of Sums of Independent Random Variables
Self-supervised Learning of Geometrically Stable Features Through Probabilistic Introspection
Qualitätsmaße binärer Klassifikationen im Bereich kriminalprognostischer Instrumente der vierten Generation
Semi-Supervised Deep Metrics for Image Registration
Distributed optimal transport for the deployment of swarms
Numerical Verification of Affine Systems with up to a Billion Dimensions
A PTAS for subset TSP in minor-free graphs
Matching fields and lattice points of simplices
Counting with Borel’s Triangle
StainGAN: Stain Style Transfer for Digital Histological Images
On the number of containments in $P$-free families
Mesh-free Semi-Lagrangian Methods for Transport on a Sphere Using Radial Basis Functions
Learnable Exposure Fusion for Dynamic Scenes
Prime Parking Functions on Rooted Trees
Functional Summaries of Persistence Diagrams
Active covariance estimation by random sub-sampling of variables
Image Generation from Scene Graphs
Evaluation of Object Trackers in Distorted Surveillance Videos
SBFT: a Scalable Decentralized Trust Infrastructure for Blockchains
Phase diagram and metastability of the Ising model on interdependent networks
Metacirculants and split weak metacirculants
Unifying Bilateral Filtering and Adversarial Training for Robust Neural Networks
Hypertree Decompositions Revisited for PGMs
Optimal streaming and tracking distinct elements with high probability
Pretty good quantum state transfer in asymmetric graphs via potential
A Pyramid CNN for Dense-Leaves Segmentation
Mathematical Properties of Polynomial Dimensional Decomposition
Jointly Detecting and Separating Singing Voice: A Multi-Task Approach
Multi-dimensional $q$-summations and multi-colored partitions
Pixel2Mesh: Generating 3D Mesh Models from Single RGB Images
Learning to Separate Object Sounds by Watching Unlabeled Video
Cancelable Indexing Based on Low-rank Approximation of Correlation-invariant Random Filtering for Fast and Secure Biometric Identification
Confidence regions in Cox proportional hazards model with measurement errors and unbounded parameter set
Semi-Supervised Classification for oil reservoir
Fractional Cox–Ingersoll–Ross process with non-zero <>
Computing Stieltjes constants using complex integration
Hallucinated-IQA: No-Reference Image Quality Assessment via Adversarial Learning
Using a Classifier Ensemble for Proactive Quality Monitoring and Control: the impact of the choice of classifiers types, selection criterion, and fusion process
Channel Hardening in Massive MIMO – A Measurement Based Analysis
On backward Kolmogorov equation related to CIR process
Feedback GAN (FBGAN) for DNA: a Novel Feedback-Loop Architecture for Optimizing Protein Functions
High-performance sparse matrix-matrix products on Intel KNL and multicore architectures
Towards Massive Connectivity Support for Scalable mMTC Communications in 5G networks
Spatially coupled turbo-like codes: a new trade-off between waterfall and error floor
Markerless Inside-Out Tracking for Interventional Applications
Submodular Functions and Valued Constraint Satisfaction Problems over Infinite Domains
Time Blocks Decomposition of Multistage Stochastic Optimization Problems
Variational Rejection Sampling
Application of Symmetry Groups to the Observability Analysis of Partial Differential Equations
Finding beans in burgers: Deep semantic-visual embedding with localization
Identifying Cross-Depicted Historical Motifs
ERA: Towards Privacy Preservation and Verifiability for Online Ad Exchanges
Missing Slice Recovery for Tensors Using a Low-rank Model in Embedded Space
On large primitive subsets of $\{1,2,\ldots,2n\}$
Ergodic Control of Infinite Dimensional SDEs with Degenerate Noise
Towards Improved Cartoon Face Detection and Recognition Systems
Domain Adaptation for Statistical Machine Translation
A noncommutative cycle index and new bases of quasi-symmetric functions and noncommutative symmetric functions
Chinese-Portuguese Machine Translation: A Study on Building Parallel Corpora from Comparable Texts
Learning a Robust Society of Tracking Parts using Co-occurrence Constraints
Not just about size – A Study on the Role of Distributed Word Representations in the Analysis of Scientific Publications
Distributed Data Compression in Sensor Clusters: A Maximum Independent Flow Approach
Future Energy Consumption Prediction Based on Grey Forecast Model
Word Segmentation as Graph Partition
Fairness in Multiterminal Data Compression: Decomposition of Shapley Value
End-to-End Saliency Mapping via Probability Distribution Prediction
On random shifted standard Young tableaux and 132-avoiding sorting networks
The Geometry of SDP-Exactness in Quadratic Optimization
Composable, Unconditionally Secure Message Authentication without any Secret Key
Structural cost-optimal design of sensor networks for distributed estimation
Bayesian Extreme Value Analysis of Stock Exchange Data
Asymptotic genealogies of interacting particle systems with an application to sequential Monte Carlo
Dirichlet boundary value problems for elliptic operators with measure data: a probabilistic approach
The polytopal structure of the tight-span of a totally split-decomposable metric
Simple dynamic algorithms for Maximal Independent Set and other problems
Guess Where? Actor-Supervision for Spatiotemporal Action Localization
Golden ratio algorithms for solving equilibrium problems in Hilbert spaces
Phylogenetic networks that are their own fold-ups
A Large-Scale Study of Language Models for Chord Prediction
Energy estimates and model order reduction for stochastic bilinear systems
Adaptive test for ergodic diffusions plus noise
Linear systems with persistent memory: An overview of the biblography on controllability
Poly-Bernoulli Numbers and Eulerian Numbers
Large Scale Local Online Similarity/Distance Learning Framework based on Passive/Aggressive
Towards radiologist-level cancer risk assessment in CT lung screening using deep learning
Mobility-Aware Coded Storage and Delivery
Multi-level Activation for Segmentation of Hierarchically-nested Classes
Energy-efficiency evaluation of Intel KNL for HPC workloads
Early Experience on Using Knights Landing Processors for Lattice Boltzmann Applications
Profinite separation systems
Approaching Waterfilling Capacity of Parallel Channels by Higher Order Modulation and Probabilistic Amplitude Shaping
Scalable Magnetic Field SLAM in 3D Using Gaussian Process Maps
On fixable families of Boolean networks
Density estimation on small datasets
On Undetected Redundancy in the Burrows-Wheeler Transform
Scaling Out Acid Applications with Operation Partitioning
Two-time height distribution for 1D KPZ growth: the recent exact result and its tail via replica
Paired many-to-many 2-disjoint path cover of balanced hypercubes with faulty edges
Numerical and analytical bounds on threshold error rates for hypergraph-product codes
Explanations of model predictions with live and breakDown packages
A Class of Skewed Distributions with Applications in Environmental Data
Machine learning of neuroimaging to diagnose cognitive impairment and dementia: a systematic review and comparative analysis
Iris Recognition After Death
Relating modularity maximization and stochastic block models in multilayer networks
CBMV: A Coalesced Bidirectional Matching Volume for Disparity Estimation
Laminations of a graph on a pair of pants
The power of block-encoded matrix powers: improved regression techniques via faster Hamiltonian simulation
An End-to-end Argument in Mechanism Design (Prior-independent Auctions for Budgeted Agents)
Continuity of the Shafer-Vovk-Ville Operator
Inverted orbits of exclusion processes, diffuse-extensive-amenability and (non-?)amenability of the interval exchanges