Boost Picking: A Universal Method on Converting Supervised Classification to Semi-supervised Classification

This paper proposes a universal method, Boost Picking, to train supervised classification models mainly by un-labeled data. Boost Picking only adopts two weak classifiers to estimate and correct the error. It is theoretically proved that Boost Picking could train a supervised model mainly by un-labeled data as effectively as the same model trained by 100% labeled data, only if recalls of the two weak classifiers are all greater than zero and the sum of precisions is greater than one. Based on Boost Picking, we present ‘Test along with Training (TawT)’ to improve the generalization of supervised models. Both Boost Picking and TawT are successfully tested in varied little data sets.


Overview of Annotation Creation: Processes & Tools

Creating linguistic annotations requires more than just a reliable annotation scheme. Annotation can be a complex endeavour potentially involving many people, stages, and tools. This chapter outlines the process of creating end-to-end linguistic annotations, identifying specific tasks that researchers often perform. Because tool support is so central to achieving high quality, reusable annotations with low cost, the focus is on identifying capabilities that are necessary or useful for annotation tools, as well as common problems these tools present that reduce their utility. Although examples of specific tools are provided in many cases, this chapter concentrates more on abstract capabilities and problems because new tools appear continuously, while old tools disappear into disuse or disrepair. The two core capabilities tools must have are support for the chosen annotation scheme and the ability to work on the language under study. Additional capabilities are organized into three categories: those that are widely provided; those that often useful but found in only a few tools; and those that have as yet little or no available tool support.


Spatial Heterogeneity in House Price Models: An Iterative Locally Weighted Regression Approach

Empirical analysis in economics often faces the difficulty that the data is correlated and heterogeneous in some unknown form. Spatial parametric approaches have been widely used to account for dependence structures, but the problem of directly deal with spatially varying parameters has been largely unexplored. The problem can be serious in all those cases in which we have no prior information justified by the economic theory. In this paper we propose an algorithm-based procedure which is able to endogenously identify structural breaks in space. The proposed algorithm is illustrated by using two well known house price data sets.


TwoPaCo: An efficient algorithm to build the compacted de Bruijn graph from many complete genomes

Motivation: De Bruijn graphs have been proposed as a data structure to facilitate the analysis of related whole genome sequences, in both a population and comparative genomic settings. However, current approaches do not scale well to many genomes of large size (such as mammalian genomes). Results: In this paper, we present TwoPaCo, a simple and scalable low memory algorithm for the direct construction of the compacted de Bruijn graph from a set of complete genomes. We demonstrate that it can construct the graph for 100 simulated human genomes in less then a day and eight real primates in less than two hours, on a typical shared-memory machine. We believe that this progress will enable novel biological analyses of hundreds of mammalian-sized genomes. Availability: Our code and data is available for download from github.com/medvedevgroup/TwoPaCo Contact: ium125@psu.edu


ABRA: Approximating Betweenness Centrality in Static and Dynamic Graphs with Rademacher Averages

We present ABRA, a suite of algorithms that compute and maintain probabilistically-guaranteed, high-quality, approximations of the betweenness centrality of all nodes (or edges) on both static and fully dynamic graphs. Our algorithms rely on random sampling and their analysis leverages on Rademacher averages and pseudodimension, fundamental concepts from statistical learning theory. To our knowledge, this is the first application of these concepts to the field of graph analysis. The results of our experimental evaluation show that our approach is much faster than exact methods, and vastly outperforms, in both speed and number of samples, current state-of-the-art algorithms with the same quality guarantees.


Convolutional RNN: an Enhanced Model for Extracting Features from Sequential Data

Traditional convolutional layers extract features from patches of data by applying a non-linearity on an affine function of the input. We propose a model that enhances this feature extraction process for the case of sequential data, by feeding patches of the data into a recurrent neural network and using the outputs or hidden states of the recurrent units to compute the extracted features. By doing so, we exploit the fact that a window containing a few frames of the sequential data is a sequence itself and this additional structure might encapsulate valuable information. In addition, we allow for more steps of computation in the feature extraction process, which is potentially beneficial as an affine function followed by a non-linearity can result in too simple features. Using our convolutional recurrent layers we obtain an improvement in performance in two audio classification tasks, compared to traditional convolutional layers.


Isoperimetry in supercritical bond percolation in dimensions three and higher

On the fine-grained complexity of rainbow coloring

Intermittency for the stochastic heat equation driven by a rough time fractional Gaussian noise

Compact Flow Diagrams for State Sequences

Federated Learning of Deep Networks using Model Averaging

Exponents for the number of pairs of nearly favorite points of simple random walk in ${\mathbb Z}^2$

When the Filter Bubble Bursts: Collective Evaluation Dynamics in Online Communities

Averaging principle for non autonomous slow-fast systems of stochastic RDEs: the almost periodic case

Incorporating Hierarchical Structure Into Dynamic Systems: An Application Of Estimating HIV Epidemics At Sub-National And Sub-Population Level

Attraction properties for general urn processes and applications to a class of interacting reinforced particle systems

Identification of Audio Recording Devices From Background Noise

Query Answering with Inconsistent Existential Rules under Stable Model Semantics

EEG-informed attended speaker extraction from recorded speech mixtures with application in neuro-steered hearing prostheses

Least Mean Squares Estimation of Graph Signals

Segre classes and Kempf-Laksov formula in algebraic cobordism

Applying Boolean discrete methods in the production of a real-valued probabilistic programming model

Central Limit Theorems of a Recursive Stochastic Algorithm with Applications to Adaptive Designs

An improved analysis of the ER-SpUD dictionary learning algorithm

The effects of marine protected areas over time and species dispersal potential: A quantitative conservation conflict attempt

Dynamic Modelling of Health and its application to the large scale analysis of Body Mass Index, using data from consecutive set of surveys

Periodicity and decidability of tilings of $\mathbb{Z}^{2}$

Channel Covariance Estimation in Massive MIMO Frequency Division Duplex Systems

On cyclability of digraphs

Consistency of direct integral estimator for partially observed systems of ordinary differential equations linear in the parameters

Spatial Heterogeneity in Production Functions Models

Entity Embeddings with Conceptual Subspaces as a Basis for Plausible Reasoning

The coalescing-branching random walk on expanders and the dual epidemic process

Corpus analysis without prior linguistic knowledge – unsupervised mining of phrases and subphrase structure

On Subpackings of Polyomino Packings

A stochastic approach to path–dependent nonlinear Kolmogorov equations via BSDEs with time-delayed generators and applications to finance

Examination and visualization of the simplifying assumption for vine copulas in three dimensions

Betweenness and Nonbetweenness

Leave-one-out prediction intervals in linear regression models with many variables

Distributions with fixed marginals maximizing the mass of the endograph of a function

Controllability of stochastic impulsive neutral functional differential equations driven by fractional Brownian motion with infinite delay

Particle-like wave packets in complex scattering systems

What is the distribution of the number of unique original items in a bootstrap sample?

Scaling limits of disordered systems and disorder relevance

A General Modifier-based Framework for Inconsistency-Tolerant Query Answering

Unfolding problem clarification and solution validation

Tight Hardness Results for Maximum Weight Rectangles

Asymptotic growth of trajectories of multifractional Brownian motion, with statistical applications to drift parameter estimation

Consensus in Directed Dynamic Networks with Short-Lived Stability

Applications of graph containers in the Boolean lattice

Modelling modal gating of ion channels with hierarchical Markov models

Goodness-of-fit test: Khmaladze Transformation vs Empirical Likelihood

Character Values of the Sidelnikov-Lempel-Cohn-Eastman Sequences

Distortion-Resistant Hashing for rapid search of similar DNA subsequence

Nonparametric Estimation of the Proportion of Treatment Effect Explained by a Surrogate Marker using Censored Data

Toward Deeper Understanding of Neural Networks: The Power of Initialization and a Dual View on Expressivity

An algorithm for the weighted metric dimension of two-dimensional grids

Efficient approaches for escaping higher order saddle points in non-convex optimization

Breaking the Logarithmic Barrier for Truthful Combinatorial Auctions with Submodular Bidders

Local Rademacher Complexity-based Learning Guarantees for Multi-Task Learning

Schubert polynomials and degeneracy locus formulas

Connectivity, toughness, spanning trees of bounded degree, and the spectrum of regular graphs

Encoding Data for HTM Systems

Colouring and Covering Nowhere Dense Graphs

Convergence of graphs with intermediate density

The Interaction of Memory and Attention in Novel Word Generalization: A Computational Investigation

An Efficient, Sparsity-Preserving, Online Algorithm for Data Approximation