Phrase-Based Attentions

Most state-of-the-art neural machine translation systems, despite being different in architectural skeletons (e.g. recurrence, convolutional), share an indispensable feature: the Attention. However, most existing attention methods are token-based and ignore the importance of phrasal alignments, the key ingredient for the success of phrase-based statistical machine translation. In this paper, we propose novel phrase-based attention methods to model n-grams of tokens as attention entities. We incorporate our phrase-based attentions into the recently proposed Transformer network, and demonstrate that our approach yields improvements of 1.3 BLEU for English-to-German and 0.5 BLEU for German-to-English translation tasks on WMT newstest2014 using WMT’16 training data.

Exploiting Contextual Information via Dynamic Memory Network for Event Detection

The task of event detection involves identifying and categorizing event triggers. Contextual information has been shown effective on the task. However, existing methods which utilize contextual information only process the context once. We argue that the context can be better exploited by processing the context multiple times, allowing the model to perform complex reasoning and to generate better context representation, thus improving the overall performance. Meanwhile, dynamic memory network (DMN) has demonstrated promising capability in capturing contextual information and has been applied successfully to various tasks. In light of the multi-hop mechanism of the DMN to model the context, we propose the trigger detection dynamic memory network (TD-DMN) to tackle the event detection problem. We performed a five-fold cross-validation on the ACE-2005 dataset and experimental results show that the multi-hop mechanism does improve the performance and the proposed model achieves best F_1 score compared to the state-of-the-art methods.

Active Learning for New Domains in Natural Language Understanding

We explore active learning (AL) utterance selection for improving the accuracy of new underrepresented domains in a natural language understanding (NLU) system. Moreover, we propose an AL algorithm called Majority-CRF that uses an ensemble of classification and sequence labeling models to guide utterance selection for annotation. Experiments with three domains show that Majority-CRF achieves 6.6%-9% relative error rate reduction compared to random sampling with the same annotation budget, and statistically significant improvements compared to other AL approaches. Additionally, case studies with human-in-the-loop AL on six new domains show 4.6%-9% improvement on an existing NLU system.

Clustering-based Anomaly Detection for microservices

Anomaly detection is an important step in the management and monitoring of data centers and cloud computing platforms. The ability to detect anomalous virtual machines before real failures occur results in reduced downtime while operations engineers urgently recover malfunctioning virtual machines, efficient root cause analysis, and improved customer optics in the event said malfunction lead to an outage. Virtual machines could fail at any time, whether in a lab or production system. If there is no anomaly detection system, and a virtual machine in a lab environment fails, the QA and DEV team will have to switch to another environment while the OPS team fixes the failure. The potential impact of failing to detect anomalous virtual machines can result in financial ramifications, both when developing new features and servicing existing ones. This paper presents a model that can efficiently detect anomalous virtual machines both in production and testing environments.

Generalizing the theory of cooperative inference

Cooperation information sharing is important to theories of human learning and has potential implications for machine learning. Prior work derived conditions for achieving optimal Cooperative Inference given strong, relatively restrictive assumptions. We relax these assumptions by demonstrating convergence for any discrete joint distribution, robustness through equivalence classes and stability under perturbation, and effectiveness by deriving bounds from structural properties of the original joint distribution. We provide geometric interpretations, connections to and implications for optimal transport, and connections to importance sampling, and conclude by outlining open questions and challenges to realizing the promise of Cooperative Inference.

AutoLoss: Learning Discrete Schedules for Alternate Optimization

Many machine learning problems involve iteratively and alternately optimizing different task objectives with respect to different sets of parameters. Appropriately scheduling the optimization of a task objective or a set of parameters is usually crucial to the quality of convergence. In this paper, we present AutoLoss, a meta-learning framework that automatically learns and determines the optimization schedule. AutoLoss provides a generic way to represent and learn the discrete optimization schedule from metadata, allows for a dynamic and data-driven schedule in ML problems that involve alternating updates of different parameters or from different loss objectives. We apply AutoLoss on four ML tasks: d-ary quadratic regression, classification using a multi-layer perceptron (MLP), image generation using GANs, and multi-task neural machine translation (NMT). We show that the AutoLoss controller is able to capture the distribution of better optimization schedules that result in higher quality of convergence on all four tasks. The trained AutoLoss controller is generalizable — it can guide and improve the learning of a new task model with different specifications, or on different datasets.

Statistical Optimality of Interpolated Nearest Neighbor Algorithms

In the era of deep learning, understanding over-fitting phenomenon becomes increasingly important. It is observed that carefully designed deep neural networks achieve small testing error even the training error is close to zero. One possible explanation is that for many modern machine learning algorithms, over-fitting can greatly reduces the estimation bias, while not increases the estimation variance too much. To illustrate the above idea, we prove that our interpolated nearest neighbors algorithm achieves the minimax optimal rate in both regression and classification regimes, and observe that they are empirically better than the traditional k nearest neighbor method in some cases.

Demonstration Abstract: A Toolkit for Specifying Service Level Agreements for IoT applications

Today we see the use of the Internet of Things (IoT) in various application domains such as healthcare, smart homes, smart cars, and smart-x applications in smart cities. The number of applications based on IoT and cloud computing is projected to increase rapidly over the next few years. IoT-based services must meet the guaranteed levels of quality of service (QoS) to match users’ expectations. Ensuring QoS through specifying the QoS constraints using Service Level Agreements (SLAs) is crucial. Therefore, as a first step toward SLA management, it is essential to provide an SLA specification in a machine-readable format. In this paper, we demonstrate a toolkit for creating SLA specifications for IoT applications. The toolkit is used to simplify the process of capturing the requirements of IoT applications. We present a demonstration of the toolkit using a Remote Health Monitoring Service (RHMS) usecase. The toolkit supports the following: (1) specifying the Service-Level Objectives (SLO) of an IoT application at the application level; (2) specifying the workflow activities of the IoT application; (3) mapping each activity to the required software and hardware resources and specifying the constraints of SLOs and other configuration- related metrics of the required hardware and software; and (4) creating the composed SLA in JSON format.

Sliced Average Variance Estimation for Multivariate Time Series

Supervised dimension reduction for time series is challenging as there may be temporal dependence between the response y and the predictors \boldsymbol x. Recently a time series version of sliced inverse regression, TSIR, was suggested, which applies approximate joint diagonalization of several supervised lagged covariance matrices to consider the temporal nature of the data. In this paper we develop this concept further and propose a time series version of sliced average variance estimation, TSAVE. As both TSIR and TSAVE have their own advantages and disadvantages, we consider furthermore a hybrid version of TSIR and TSAVE. Based on examples and simulations we demonstrate and evaluate the differences between the three methods and show also that they are superior to apply their iid counterparts to when also using lagged values of the explaining variables as predictors.

ResumeNet: A Learning-based Framework for Automatic Resume Quality Assessment

Recruitment of appropriate people for certain positions is critical for any companies or organizations. Manually screening to select appropriate candidates from large amounts of resumes can be exhausted and time-consuming. However, there is no public tool that can be directly used for automatic resume quality assessment (RQA). This motivates us to develop a method for automatic RQA. Since there is also no public dataset for model training and evaluation, we build a dataset for RQA by collecting around 10K resumes, which are provided by a private resume management company. By investigating the dataset, we identify some factors or features that could be useful to discriminate good resumes from bad ones, e.g., the consistency between different parts of a resume. Then a neural-network model is designed to predict the quality of each resume, where some text processing techniques are incorporated. To deal with the label deficiency issue in the dataset, we propose several variants of the model by either utilizing the pair/triplet-based loss, or introducing some semi-supervised learning technique to make use of the abundant unlabeled data. Both the presented baseline model and its variants are general and easy to implement. Various popular criteria including the receiver operating characteristic (ROC) curve, F-measure and ranking-based average precision (AP) are adopted for model evaluation. We compare the different variants with our baseline model. Since there is no public algorithm for RQA, we further compare our results with those obtained from a website that can score a resume. Experimental results in terms of different criteria demonstrate the effectiveness of the proposed method. We foresee that our approach would transform the way of future human resources management.

CanvasGAN: A simple baseline for text to image generation by incrementally patching a canvas

We propose a new recurrent generative model for generating images from text captions while attending on specific parts of text captions. Our model creates images by incrementally adding patches on a ‘canvas’ while attending on words from text caption at each timestep. Finally, the canvas is passed through an upscaling network to generate images. We also introduce a new method for generating visual-semantic sentence embeddings based on self-attention over text. We compare our model’s generated images with those generated Reed et. al.’s model and show that our model is a stronger baseline for text to image generation tasks.

A New Method for the Semantic Integration of Multiple OWL Ontologies using Alignments

This work is done as part of a master’s thesis project. The goal is to integrate two or more ontologies (of the same or close domains) in a new consistent and coherent OWL ontology to insure semantic interoperability between them. To do this, we have chosen to create a bridge ontology that includes all source ontologies and their bridging axioms in a customized way. In addition, we introduced a new criterion for obtaining an ontology of better quality (having the minimum of semantic/logical conflicts). We have also proposed new terminology and definitions that clarify the unclear and misplaced ‘integration’ and ‘merging’ notions that are randomly used in state-of-the-art works. Finally, we tested and evaluated our OIA2R tool using ontologies and reference alignments of the OAEI campaign. It turned out that it is generic, efficient and powerful enough.

Visualizing probabilistic models: Intensive Principal Component Analysis

Unsupervised learning makes manifest the underlying structure of data without curated training and specific problem definitions. However, the inference of relationships between data points is frustrated by the `curse of dimensionality’ in high-dimensions. Inspired by replica theory from statistical mechanics, we consider replicas of the system to tune the dimensionality and take the limit as the number of replicas goes to zero. The result is the intensive embedding, which is not only isometric (preserving local distances) but allows global structure to be more transparently visualized. We develop the Intensive Principal Component Analysis (InPCA) and demonstrate clear improvements in visualizations of the Ising model of magnetic spins, a neural network, and the dark energy cold dark matter ({\Lambda}CDM) model as applied to the Cosmic Microwave Background.

Scalable Micro-planned Generation of Discourse from Structured Data

We present a framework for generating natural language description from structured data such as tables. Motivated by the need to approach this problem in a manner that is scalable and easily adaptable to newer domains, unlike existing related systems, our system does not require parallel data; it rather relies on monolingual corpora and basic NLP tools which are easily accessible. The system employs a 3-staged pipeline that: (i) converts entries in the structured data to canonical form, (ii) generates simple sentences for each atomic entry in the canonicalized representation, and (iii) combines the sentences to produce a coherent, fluent and adequate paragraph description through sentence compounding and co-reference replacement modules. Experiments on a benchmark mixed-domain dataset curated for paragraph description from tables reveals the superiority of our system over existing data-to-text approaches. We also demonstrate the robustness of our system in accepting other data types such as Knowledge-Graphs and Key-Value dictionaries.

Interval Estimation of Individual-Level Causal Effects Under Unobserved Confounding

We study the problem of learning conditional average treatment effects (CATE) from observational data with unobserved confounders. The CATE function maps baseline covariates to individual causal effect predictions and is key for personalized assessments. Recent work has focused on how to learn CATE under unconfoundedness, i.e., when there are no unobserved confounders. Since CATE may not be identified when unconfoundedness is violated, we develop a functional interval estimator that predicts bounds on the individual causal effects under realistic violations of unconfoundedness. Our estimator takes the form of a weighted kernel estimator with weights that vary adversarially. We prove that our estimator is sharp in that it converges exactly to the tightest bounds possible on CATE when there may be unobserved confounders. Further, we study personalized decision rules derived from our estimator and prove that they achieve optimal minimax regret asymptotically. We assess our approach in a simulation study as well as demonstrate its application in the case of hormone replacement therapy by comparing conclusions from a real observational study and clinical trial.

Sifaka: Text Mining Above a Search API

Text mining and analytics software has become popular, but little attention has been paid to the software architectures of such systems. Often they are built from scratch using special-purpose software and data structures, which increases their cost and complexity. This demo paper describes Sifaka, a new open-source text mining application constructed above a standard search engine index using existing application programmer interface (API) calls. Indexing integrates popular annotation software libraries to augment the full-text index with noun phrase and named-entities; n-grams are also provided. Sifaka enables a person to quickly explore and analyze large text collections using search, frequency analysis, and co-occurrence analysis; and import existing document labels or interactively construct document sets that are positive or negative examples of new concepts, perform feature selection, and export feature vectors compatible with popular machine learning software. Sifaka demonstrates that search engines are good platforms for text mining applications while also making common IR text mining capabilities accessible to researchers in disciplines where programming skills are less common.

On the Art and Science of Machine Learning Explanations

This text discusses several explanatory methods that go beyond the error measurements and plots traditionally used to assess machine learning models. Some of the methods are tools of the trade while others are rigorously derived and backed by long-standing theory. The methods, decision tree surrogate models, individual conditional expectation (ICE) plots, local interpretable model-agnostic explanations (LIME), partial dependence plots, and Shapley explanations, vary in terms of scope, fidelity, and suitable application domain. Along with descriptions of these methods, this text presents real-world usage recommendations supported by a use case and in-depth software examples.

Mining Novel Multivariate Relationships in Time Series Data Using Correlation Networks

In many domains, there is significant interest in capturing novel relationships between time series that represent activities recorded at different nodes of a highly complex system. In this paper, we introduce multipoles, a novel class of linear relationships between more than two time series. A multipole is a set of time series that have strong linear dependence among themselves, with the requirement that each time series makes a significant contribution to the linear dependence. We demonstrate that most interesting multipoles can be identified as cliques of negative correlations in a correlation network. Such cliques are typically rare in a real-world correlation network, which allows us to find almost all multipoles efficiently using a clique-enumeration approach. Using our proposed framework, we demonstrate the utility of multipoles in discovering new physical phenomena in two scientific domains: climate science and neuroscience. In particular, we discovered several multipole relationships that are reproducible in multiple other independent datasets and lead to novel domain insights.

Understanding Recurrent Neural Architectures by Analyzing and Synthesizing Long Distance Dependencies in Benchmark Sequential Datasets

At present, the state-of-the-art computational models across a range of sequential data processing tasks, including language modeling, are based on recurrent neural network architectures. This paper begins with the observation that most research on developing computational models capable of processing sequential data fails to explicitly analyze the long distance dependencies (LDDs) within the datasets the models process. In this context, in this paper, we make five research contributions. First, we argue that a key step in modeling sequential data is to understand the characteristics of the LDDs within the data. Second, we present a method to compute and analyze the LDD characteristics of any sequential dataset, and demonstrate this method on a number of sequential datasets that are frequently used for model benchmarking. Third, based on the analysis of the LDD characteristics within the benchmarking datasets, we observe that LDDs are far more complex than previously assumed, and depend on at least four factors: (i) the number of unique symbols in a dataset, (ii) size of the dataset, (iii) the number of interacting symbols within an LDD, and (iv) the distance between the interacting symbols. Fourth, we verify these factors by using synthetic datasets generated using Strictly k-Piecewise (SPk) languages. We then demonstrate how SPk languages can be used to generate benchmarking datasets with varying degrees of LDDs. The advantage of these synthesized datasets being that they enable the targeted testing of recurrent neural architectures. Finally, we demonstrate how understanding the characteristics of the LDDs in a dataset can inform better hyper-parameter selection for current state-of-the-art recurrent neural architectures and also aid in understanding them…

Anytime Stochastic Gradient Descent: A Time to Hear from all the Workers

In this paper, we focus on approaches to parallelizing stochastic gradient descent (SGD) wherein data is farmed out to a set of workers, the results of which, after a number of updates, are then combined at a central master node. Although such synchronized SGD approaches parallelize well in idealized computing environments, they often fail to realize their promised computational acceleration in practical settings. One cause is slow workers, termed stragglers, who can cause the fusion step at the master node to stall, which greatly slowing convergence. In many straggler mitigation approaches work completed by these nodes, while only partial, is discarded completely. In this paper, we propose an approach to parallelizing synchronous SGD that exploits the work completed by all workers. The central idea is to fix the computation time of each worker and then to combine distinct contributions of all workers. We provide a convergence analysis and optimize the combination function. Our numerical results demonstrate an improvement of several factors of magnitude in comparison to existing methods.

Discretizing Logged Interaction Data Biases Learning for Decision-Making

Time series data that are not measured at regular intervals are commonly discretized as a preprocessing step. For example, data about customer arrival times might be simplified by summing the number of arrivals within hourly intervals, which produces a discrete-time time series that is easier to model. In this abstract, we show that discretization introduces a bias that affects models trained for decision-making. We refer to this phenomenon as discretization bias, and show that we can avoid it by using continuous-time models instead.

Robust variance estimation and inference for causal effect estimation

We consider a longitudinal data structure consisting of baseline covariates, time-varying treatment variables, intermediate time-dependent covariates, and a possibly time dependent outcome. Previous studies have shown that estimating the variance of asymptotically linear estimators using empirical influence functions in this setting result in anti-conservative estimates with increasing magnitudes of positivity violations, leading to poor coverage and uncontrolled Type I errors. In this paper, we present two alternative approaches of estimating the variance of these estimators: (i) a robust approach which directly targets the variance of the influence function as a counterfactual mean outcome, and (ii) a non-parametric bootstrap based approach that is theoretically valid and lowers the computational cost, thereby increasing the feasibility in non-parametric settings using complex machine learning algorithms. The performance of these approaches are compared to that of the empirical influence function in simulations across different levels of positivity violations and treatment effect sizes.

Characterizing Deep-Learning I/O Workloads in TensorFlow

The performance of Deep-Learning (DL) computing frameworks rely on the performance of data ingestion and checkpointing. In fact, during the training, a considerable high number of relatively small files are first loaded and pre-processed on CPUs and then moved to accelerator for computation. In addition, checkpointing and restart operations are carried out to allow DL computing frameworks to restart quickly from a checkpoint. Because of this, I/O affects the performance of DL applications. In this work, we characterize the I/O performance and scaling of TensorFlow, an open-source programming framework developed by Google and specifically designed for solving DL problems. To measure TensorFlow I/O performance, we first design a micro-benchmark to measure TensorFlow reads, and then use a TensorFlow mini-application based on AlexNet to measure the performance cost of I/O and checkpointing in TensorFlow. To improve the checkpointing performance, we design and implement a burst buffer. We find that increasing the number of threads increases TensorFlow bandwidth by a maximum of 2.3x and 7.8x on our benchmark environments. The use of the tensorFlow prefetcher results in a complete overlap of computation on accelerator and input pipeline on CPU eliminating the effective cost of I/O on the overall performance. The use of a burst buffer to checkpoint to a fast small capacity storage and copy asynchronously the checkpoints to a slower large capacity storage resulted in a performance improvement of 2.6x with respect to checkpointing directly to slower storage on our benchmark environment.

Bayes-CPACE: PAC Optimal Exploration in Continuous Space Bayes-Adaptive Markov Decision Processes

We present the first PAC optimal algorithm for Bayes-Adaptive Markov Decision Processes (BAMDPs) in continuous state and action spaces, to the best of our knowledge. The BAMDP framework elegantly addresses model uncertainty by incorporating Bayesian belief updates into long-term expected return. However, computing an exact optimal Bayesian policy is intractable. Our key insight is to compute a near-optimal value function by covering the continuous state-belief-action space with a finite set of representative samples and exploiting the Lipschitz continuity of the value function. We prove the near-optimality of our algorithm and analyze a number of schemes that boost the algorithm’s efficiency. Finally, we empirically validate our approach on a number of discrete and continuous BAMDPs and show that the learned policy has consistently competitive performance against baseline approaches.

Deep convolutional Gaussian processes

We propose deep convolutional Gaussian processes, a deep Gaussian process architecture with convolutional structure. The model is a principled Bayesian framework for detecting hierarchical combinations of local features for image classification. We demonstrate greatly improved image classification performance compared to current Gaussian process approaches on the MNIST and CIFAR-10 datasets. In particular, we improve CIFAR-10 accuracy by over 10 percentage points.

Graph Classification with Geometric Scattering

One of the most notable contributions of deep learning is the application of convolutional neural networks (ConvNets) to structured signal classification, and in particular image classification. Beyond their impressive performances in supervised learning, the structure of such networks inspired the development of deep filter banks referred to as scattering transforms. These transforms apply a cascade of wavelet transforms and complex modulus operators to extract features that are invariant to group operations and stable to deformations. Furthermore, ConvNets inspired recent advances in geometric deep learning, which aim to generalize these networks to graph data by applying notions from graph signal processing to learn deep graph filter cascades. We further advance these lines of research by proposing a geometric scattering transform using graph wavelets defined in terms of random walks on the graph. We demonstrate the utility of features extracted with this designed deep filter bank in graph classification, and show its competitive performance relative to other methods, including graph kernel methods and geometric deep learning ones, on both social and biochemistry data.

A Survey of Neighbourhood Construction Models for Categorizing Data Points

Finding neighbourhood structures is very useful in extracting valuable relationships among data samples. This paper presents a survey of recent neighbourhood construction algorithms for pattern clustering and classifying data points. Extracting neighbourhoods and connections among the points is extremely useful for clustering and classifying the data. Many applications such as detecting social network communities, bundling related edges, and solving location and routing problems all indicate the usefulness of this problem. Finding data point neighbourhood in data mining and pattern recognition should generally improve knowledge extraction from databases. Several algorithms of data point neighbourhood construction have been proposed to analyse the data in this sense. They will be described and discussed from different aspects in this paper. Finally, the future challenges concerning the title of the present paper will be outlined.

Multi-reference Cosine: A New Approach to Text Similarity Measurement in Large Collections

The importance of an efficient and scalable document similarity detection system is undeniable nowadays. Search engines need batch text similarity measures to detect duplicated and near-duplicated web pages in their indexes in order to prevent indexing a web page multiple times. Furthermore, in the scoring phase, search engines need similarity measures to detect duplicated contents on web pages so as to increase the quality of their results. In this paper, a new approach to batch text similarity detection is proposed by combining some ideas from dimensionality reduction techniques and information gain theory. The new approach is focused on search engines need to detect duplicated and near-duplicated web pages. The new approach is evaluated on the NEWS20 dataset and the results show that the new approach is faster than the cosine text similarity algorithm in terms of speed and performance. On top of that, It is faster and more accurate than the other rival method, Simhash similarity algorithm.

A Fast Text Similarity Measure for Large Document Collections using Multi-reference Cosine and Genetic Algorithm

One of the important factors that make a search engine fast and accurate is a concise and duplicate free index. In order to remove duplicate and near-duplicate documents from the index, a search engine needs a swift and reliable duplicate and near-duplicate text document detection system. Traditional approaches to this problem, such as brute force comparisons or simple hash-based algorithms are not suitable as they are not scalable and are not capable of detecting near-duplicate documents effectively. In this paper, a new signature-based approach to text similarity detection is introduced which is fast, scalable, reliable and needs less storage space. The proposed method is examined on popular text document data-sets such as CiteseerX, Enron, Gold Set of Near-duplicate News Articles and etc. The results are promising and comparable with the best cutting-edge algorithms, considering the accuracy and performance. The proposed method is based on the idea of using reference texts to generate signatures for text documents. The novelty of this paper is the use of genetic algorithms to generate better reference texts.

ASVRG: Accelerated Proximal SVRG

This paper proposes an accelerated proximal stochastic variance reduced gradient (ASVRG) method, in which we design a simple and effective momentum acceleration trick. Unlike most existing accelerated stochastic variance reduction methods such as Katyusha, ASVRG has only one additional variable and one momentum parameter. Thus, ASVRG is much simpler than those methods, and has much lower per-iteration complexity. We prove that ASVRG achieves the best known oracle complexities for both strongly convex and non-strongly convex objectives. In addition, we extend ASVRG to mini-batch and non-smooth settings. We also empirically verify our theoretical results and show that the performance of ASVRG is comparable with, and sometimes even better than that of the state-of-the-art stochastic methods.

Sparse Regression with Multi-type Regularized Feature Modeling

Within the statistical and machine learning literature, regularization techniques are often used to construct sparse (predictive) models. Most regularization strategies only work for data where all predictors are of the same type, such as Lasso regression for continuous predictors. However, many predictive problems involve different predictor types. We propose a multi-type Lasso penalty that acts on the objective function as a sum of subpenalties, one for each predictor type. As such, we perform predictor selection and level fusion within a predictor in a data-driven way, simultaneous with the parameter estimation process. We develop a new estimation strategy for convex predictive models with this multi-type penalty. Using the theory of proximal operators, our estimation procedure is computationally efficient, partitioning the overall optimization problem into easier to solve subproblems, specific for each predictor type and its associated penalty. The proposed SMuRF algorithm improves on existing solvers in both accuracy and computational efficiency. This is demonstrated with an extensive simulation study and the analysis of a case-study on insurance pricing analytics.

On LASSO for Predictive Regression

A typical predictive regression employs a multitude of potential regressors with various degrees of persistence while their signal strength in explaining the dependent variable is often low. Variable selection in such context is of great importance. In this paper, we explore the pitfalls and possibilities of the LASSO methods in this predictive regression A typical predictive regression employs a multitude of potential regressors with various degrees of persistence while their signal strength in explaining the dependent variable is often low. Variable selection in such context is of great importance. In this paper, we explore the pitfalls and possibilities of the LASSO methods in this predictive regression framework with mixed degrees of persistence. With the presence of stationary, unit root and cointegrated predictors, we show that the adaptive LASSO maintains the consistent variable selection and the oracle property due to its penalty scheme that accommodates the system of regressors. On the contrary, conventional LASSO does not have this desirable feature as the penalty its imposed according to the marginal behavior of each individual regressor. We demonstrate this theoretical property via extensive Monte Carlo simulations, and evaluate its empirical performance for short- and long-horizon stock return predictability.

Real-Time Workload Classification during Driving using HyperNetworks

Classifying human cognitive states from behavioral and physiological signals is a challenging problem with important applications in robotics. The problem is challenging due to the data variability among individual users, and sensor artefacts. In this work, we propose an end-to-end framework for real-time cognitive workload classification with mixture Hyper Long Short Term Memory Networks, a novel variant of HyperNetworks. Evaluating the proposed approach on an eye-gaze pattern dataset collected from simulated driving scenarios of different cognitive demands, we show that the proposed framework outperforms previous baseline methods and achieves 83.9\% precision and 87.8\% recall during test. We also demonstrate the merit of our proposed architecture by showing improved performance over other LSTM-based methods.

Competitive Online Virtual Cluster Embedding Algorithms

In the conventional cloud service model, computing resources are allocated for tenants on a pay-per-use basis. However, the performance of applications that communicate inside this network is unpredictable because network resources are not guaranteed. To mitigate this issue, the virtual cluster (VC) model has been developed in which network and compute units are guaranteed. Thereon, many algorithms have been developed that are based on novel extensions of the VC model in order to solve the online virtual cluster embedding problem (VCE) with additional parameters. In the online VCE, the resource footprint is greedily minimized per request which is connected with maximizing the profit for the provider per request. However, this does not imply that a global maximization of the profit over the whole sequence of requests is guaranteed. In fact, these algorithms do not even provide a worst case guarantee on a fraction of the maximum achievable profit of a certain sequence of requests. Thus, these online algorithms do not provide a competitive ratio on the profit. In this thesis, two competitive online VCE algorithms and two heuristic algorithms are presented. The competitive online VCE algorithms have different competitive ratios on the objective function and the capacity constraints whereas the heuristic algorithms do not violate the capacity constraints. The worst case competitive ratios are analyzed. After that, the evaluation shows the advantages and disadvantages of these algorithms in several scenarios with different request patterns and profit metrics on the fat-tree and MDCube datacenter topologies. The results show that for different scenarios, different algorithms have the best performance with respect to certain metrics.

Efficient Crowd Exploration of Large Networks: The Case of Causal Attribution

Accurately and efficiently crowdsourcing complex, open-ended tasks can be difficult, as crowd participants tend to favor short, repetitive ‘microtasks’. We study the crowdsourcing of large networks where the crowd provides the network topology via microtasks. Crowds can explore many types of social and information networks, but we focus on the network of causal attributions, an important network that signifies cause-and-effect relationships. We conduct experiments on Amazon Mechanical Turk (AMT) testing how workers propose and validate individual causal relationships and introduce a method for independent crowd workers to explore large networks. The core of the method, Iterative Pathway Refinement, is a theoretically-principled mechanism for efficient exploration via microtasks. We evaluate the method using synthetic networks and apply it on AMT to extract a large-scale causal attribution network, then investigate the structure of this network as well as the activity patterns and efficiency of the workers who constructed this network. Worker interactions reveal important characteristics of causal perception and the network data they generate can improve our understanding of causality and causal inference.

k-price auctions and Combination auctions
Srishti Dhar Chatterji (1935-2017): In Memoriam
CINIC-10 is not ImageNet or CIFAR-10
Multilingual sequence-to-sequence speech recognition: architecture, transfer learning, and language modeling
Building a language evolution tree based on word vector combination model
Graph Embedding with Shifted Inner Product Similarity and its Improved Approximation Capability
Regression Analyses of Distributions using Quantile Functional Regression
Classifying Multi-channel UWB SAR Imagery via Tensor Sparsity Learning Techniques
Wide and Deep Learning for Peer-to-Peer Lending
Sentence Segmentation for Classical Chinese Based on LSTM with Radical Embedding
Text Classification of the Precursory Accelerating Seismicity Corpus: Inference on some Theoretical Trends in Earthquake Predictability Research from 1988 to 2018
A Comparison between Background Modelling Methods for Vehicle Segmentation in Highway Traffic Videos
The KPZ Equation, Non-Equilibrium Energy Solutions, and Weak Universality for Long-Range Interactions
Scaling Submodular Optimization Approaches for Control Applications in Networked Systems
Transition Operations over Plane Trees
Training Complex Models with Multi-Task Weak Supervision
Cross-Subject Transfer Learning on High-Speed Steady-State Visual Evoked Potential-Based Brain-Computer Interface
Deep Probabilistic Video Compression
Fault-Tolerant Consensus with an Abstract MAC Layer
Learning to Encode Text as Human-Readable Summaries using Generative Adversarial Networks
Efficient ZF-WF Strategy for Sum-Rate Maximization of MU-MISO Cognitive Radio Networks
Generic Model-Agnostic Convolutional Neural Network for Single Image Dehazing
Artificial Intelligence Assisted Power Grid Hardening in Response to Extreme Weather Events
Simultaneous Combinatorial Game Theory
Massive MIMO Pilot Assignment Optimization based on Total Capacity
Sequential Patient Recruitment and Allocation for Adaptive Clinical Trials
Physics Guided Recurrent Neural Networks For Modeling Dynamical Systems: Application to Monitoring Water Temperature And Quality In Lakes
Random orthogonal matrices and the Cayley transform
The Fractional Local Metric Dimension of Graphs
Temporal pattern recognition through analog molecular computation
Entity Tracking Improves Cloze-style Reading Comprehension
Optimization on Spheres: Models and Proximal Algorithms with Computational Performance Comparisons
CDF Transform-Shift: An effective way to deal with inhomogeneous density datasets
Stability analysis of networked control systems with not necessarily UGES protocols
Memento: Making Sliding Windows Efficient for Heavy Hitters
Bounding Optimality Gap in Stochastic Optimization via Bagging: Statistical Efficiency and Stability
Network Distance Based on Laplacian Flows on Graphs
Tuning for Tissue Image Segmentation Workflows for Accuracy and Performance
Actor-Attention-Critic for Multi-Agent Reinforcement Learning
Askey–Wilson polynomials and a double $q$-series transformation formula with twelve parameters
A Hybrid Optimal Control Approach to LQG Mean Field Games with Switching and Stopping Strategies
Adaptive Independence Tests with Geo-Topological Transformation
Stochastic Chemical Reaction Networks for Robustly Approximating Arbitrary Probability Distributions
Q-map: a Convolutional Approach for Goal-Oriented Reinforcement Learning
Indirect Mechanism Design for Efficient and Stable Renewable Energy Aggregation
Towards Self-Tuning Parameter Servers
FD-GAN: Pose-guided Feature Distilling GAN for Robust Person Re-identification
Co-Stack Residual Affinity Networks with Multi-level Attention Refinement for Matching Text Sequences
Adapting to Unknown Noise Distribution in Matrix Denoising
Consistency and Computation of Regularized MLEs for Multivariate Hawkes Processes
Low rank spatial econometric models
Higher-order Spectral Clustering for Heterogeneous Graphs
Jacobi Fields in Optimal Control I: Morse and Maslov Indices
The universal Poisson deformation of hypertoric varieties and some classification results
Cross validating extensions of kernel, sparse or regular partial least squares regression models to censored data
Local Boxicity, Local Dimension, and Maximum Degree
Possible origin of $β$-relaxation in amorphous metal alloys from atomic-mass differences of the constituents
Alpha Entanglement Codes: Practical Erasure Codes to Archive Data in Unreliable Environments
Personality facets recognition from text
Camera Model Identification Using Convolutional Neural Networks
A complete solution to the infinite Oberwolfach problem
Eigenvector convergence for minors of unitarily invariant infinite random matrices
Distributed Learning Algorithms for Opportunistic Spectrum Access in Infrastructure-less Networks
Robust and Efficient Estimation in the Parametric Cox Regression Model under Random Censoring
Context-Aware Deep Spatio-Temporal Network for Hand Pose Estimation from Depth Images
Total variation distance for discretely observed Lévy processes: a Gaussian approximation of the small jumps
When logic lays down the law
Super-resolution radar imaging via convex optimization
h-detach: Modifying the LSTM Gradient Towards Better Optimization
Learning to Optimize under Non-Stationarity
Text-based Sentiment Analysis and Music Emotion Recognition
Constructing Graph Node Embeddings via Discrimination of Similarity Distributions
Efficient Detection in Uniform Linear and Planar Arrays MIMO Systems under Spatial Correlated Channels
Over-parameterization Improves Generalization in the XOR Detection Problem
Scott functions, their representations on domains, and applications to random sets
Tight-and-cheap conic relaxation for the optimal reactive power dispatch problem
Sequential likelihood ascent search detector for massive MIMO systems
Quantum transport through the edge states of Zigzag phosphorene nanoribbons in presence of a single point defect: analytic Green’s function method
Robustness via Retrying: Closed-Loop Robotic Manipulation with Self-Supervised Learning
Artificial Intelligence for Diabetes Case Management: The Intersection of Physical and Mental Health
MeetupNet Dublin: Discovering Communities in Dublin’s Meetup Network
Subspace Tracking from Missing and Outlier Corrupted Data
Supporting High-Performance and High-Throughput Computing for Experimental Science
Solving Large Sequential Games with the Excessive Gap Technique
CSI-Net: Unified Human Body Characterization and Action Recognition
Deep Model-Based 6D Pose Refinement in RGB
Geocoding Without Geotags: A Text-based Approach for reddit
Spatio-temporal Edge Service Placement: A Bandit Learning Approach
Hierarchical Optimization for Whole-Body Control of Wheeled Inverted Pendulum Humanoids
Training Convolutional Neural Networks and Compressed Sensing End-to-End for Microscopy Cell Detection
Online Center of Mass Estimation for a Humanoid Wheeled Inverted Pendulum Robot
DeepGeo: Photo Localization with Deep Neural Network
Graphlet Count Estimation via Convolutional Neural Networks
Error bounds for sparse classifiers in high-dimensions
Analysis of a longitudinal multilevel experiment using GAMLSSs
Counting homomorphisms in plain exponential time
A General Sensitivity Analysis Approach for Demand Response Optimizations
Outlier Detection and Optimal Anchor Placement for 3D Underwater Optical Wireless Sensor Networks Localization
European Court of Human Right Open Data project
Underwater Anchor-AUV Localization Geometries with an Isogradient Sound Speed Profile: A CRLB-Based Optimality Analysis
Regularity of binomial edge ideals of chordal graphs
Using Time to Break Symmetry: Universal Deterministic Anonymous Rendezvous
The graph grabbing game on $\{0,1\}$-weighted graphs
Accelerating Stochastic Gradient Descent Using Antithetic Sampling
On the decomposition of the supersymmetric state
Nonlinear Stochastic Attitude Filters on the Special Orthogonal Group 3: Ito and Stratonovich
Coronary Artery Centerline Extraction in Cardiac CT Angiography Using a CNN-Based Orientation Classifier
Assessing Crosslingual Discourse Relations in Machine Translation
A Minesweeper Solver Using Logic Inference, CSP and Sampling
Finding Correspondences for Optical Flow and Disparity Estimations using a Sub-pixel Convolution-based Encoder-Decoder Network
A Framework for One-Bit and Constant-Envelope Precoding over Multiuser Massive MISO Channels
q-Analogues of several $π$-formulas
Unsupervised Neural Word Segmentation for Chinese via Segmental Language Modeling
Small Infrared Target Detection Using Absolute Directional Mean Difference Algorithm