• Optimally Pruning Decision Tree Ensembles With Feature Cost
• On the number of nonisomorphic subtrees of a tree
• Representation of large matchings in bipartite graphs
• Confidence Intervals for Projections of Partially Identified Parameters
• A greedy algorithm for $B_h[g]$ sequences
• The part-frequency matrices of a partition
• Complex Decomposition of the Negative Distance kernel
• Distilling Reverse-Mode Automatic Differentiation (DrMAD) for Optimizing Hyperparameters of Deep Neural Networks
• The high-conductance state enables neural sampling in networks of LIF neurons
• Too good to be true: when overwhelming evidence fails to convince
• Configurable memory systems for embedded many-core processors
• Scaling limits for sub-ballistic biased random walks in random conductances
• Partition zeta functions
• Treelike snarks
• TimeMachine: Entity-centric Search and Visualization of News Archives
• Randomly perturbed switching dynamics of a DC/DC converter
• Approximate Distance Oracles for Planar Graphs with Improved Query Time-Space Tradeoff
• Fast Power and Energy Efficiency Analysis of FPGA-based Wireless Base-band Processing
• Open challenges in understanding development and evolution of speech forms: The roles of embodied self-organization, motivation and active exploration
• Semi-parametric efficiency bounds and efficient estimation for high-dimensional models
• Optimal designs for active controlled dose finding trials with efficacy-toxicity outcomes
• Ergodic decompositions of stationary max-stable processes in terms of their spectral functions
• Comment on ‘On Nomenclature, and the Relative Merits of Two Formulations of Skew Distributions’ by A. Azzalini, R. Browne, M. Genton, and P. McNicholas
• End-to-end Relation Extraction using LSTMs on Sequences and Tree Structures
• Learning Preferences for Manipulation Tasks from Online Coactive Feedback
• Resource Sharing for Multi-Tenant NoSQL Data Store in Cloud
• Penalized Maximum Likelihood Estimation of Multi-layered Gaussian Graphical Models
• How do neurons operate on sparse distributed representations? A mathematical theory of sparsity, neurons and active dendrites
• Polynomial convergence to equilibrium for a system of interacting particles
• Systematic Measures of Biological Networks, Part II: Degeneracy, Complexity and Robustness
• Systematic Measures of Biological Networks, Part I: Invariant measures and Entropy
• Multi-Source Neural Translation
• Weakly-supervised Disentangling with Recurrent Transformations for 3D View Synthesis
• Nonlinear Hebbian learning as a unifying principle in receptive field formation
• Modular Arithmetic and Calculus Problems in #P
• Artwork creation by a cognitive architecture integrating computational creativity and dual process approaches
Semantic parsing methods are used for capturing and representing semantic meaning of text. Meaning representation capturing all the concepts in the text may not always be available or may not be sufficiently complete. Ontologies provide a structured and reasoning-capable way to model the content of a collection of texts. In this work, we present a novel approach to joint learning of ontology and semantic parser from text. The method is based on semi-automatic induction of a context-free grammar from semantically annotated text. The grammar parses the text into semantic trees. Both, the grammar and the semantic trees are used to learn the ontology on several levels — classes, instances, taxonomic and non-taxonomic relations. The approach was evaluated on the first sentences of Wikipedia pages describing people.
We provide the first extensive evaluation of how using different types of context to learn skip-gram word embeddings affects performance on a wide range of intrinsic and extrinsic NLP tasks. Our results suggest that while intrinsic tasks tend to exhibit a clear preference to particular types of contexts and higher dimensionality, more careful tuning is required for finding the optimal settings for most of the extrinsic tasks that we considered. Furthermore, for these extrinsic tasks, we find that once the benefit from increasing the embedding dimensionality is mostly exhausted, simple concatenation of word embeddings, learned with different context types, can yield further performance gains. As an additional contribution, we propose a new variant of the skip-gram model that learns word embeddings from weighted contexts of substitute words.
This paper focuses on the coordinate update method, which is useful for solving large-sized problems involving linear and nonlinear mappings, and smooth and nonsmooth functions. It decomposes a problem into simple subproblems, where each subproblem updates one, or a small block of, variables. The coordinate update method sits at a high level of abstraction and includes many special cases such as the Jacobi, Gauss-Seidel, alternated projection, as well as coordinate descent methods. They have found greatly many applications throughout computational sciences. In this paper, we abstract many problems to the fixed-point problem and study the favorable structures in operator that enable highly efficient coordinate updates: . Such updates can be carried out in the sequential, parallel, and async-parallel fashions. This study leads to new coordinate update algorithms for a variety of problems in machine learning, image processing, as well as sub-areas of optimization. The obtained algorithms are scalable to very large instances through parallel and even asynchronous computing. We present numerical examples to illustrate how effective these algorithms are.
Advanced Driver Assistance Systems (ADAS) have made driving safer over the last decade. They prepare vehicles for unsafe road conditions and alert drivers if they perform a dangerous maneuver. However, many accidents are unavoidable because by the time drivers are alerted, it is already too late. Anticipating maneuvers beforehand can alert drivers before they perform the maneuver and also give ADAS more time to avoid or prepare for the danger. In this work we propose a vehicular sensor-rich platform and learning algorithms for maneuver anticipation. For this purpose we equip a car with cameras, Global Positioning System (GPS), and a computing device to capture the driving context from both inside and outside of the car. In order to anticipate maneuvers, we propose a sensory-fusion deep learning architecture which jointly learns to anticipate and fuse multiple sensory streams. Our architecture consists of Recurrent Neural Networks (RNNs) that use Long Short-Term Memory (LSTM) units to capture long temporal dependencies. We propose a novel training procedure which allows the network to predict the future given only a partial temporal context. We introduce a diverse data set with 1180 miles of natural freeway and city driving, and show that we can anticipate maneuvers 3.5 seconds before they occur in real-time with a precision and recall of 90.5\% and 87.4\% respectively.
One of the core problems of modern statistics is to approximate difficult-to-compute probability distributions. This problem is especially important in Bayesian statistics, which frames all inference about unknown quantities as a calculation about the posterior. In this paper, we review variational inference (VI), a method from machine learning that approximates probability distributions through optimization. VI has been used in myriad applications and tends to be faster than classical methods, such as Markov chain Monte Carlo sampling. The idea behind VI is to first posit a family of distributions and then to find the member of that family which is close to the target. Closeness is measured by Kullback-Leibler divergence. We review the ideas behind mean-field variational inference, discuss the special case of VI applied to exponential family models, present a full example with a Bayesian mixture of Gaussians, and derive a variant that uses stochastic optimization to scale up to massive data. We discuss modern research in VI and highlight important open problems. VI is powerful, but it is not yet well understood. Our hope in writing this paper is to catalyze statistical research on this widely-used class of algorithms.
We propose a model of network formation based on reinforcement learning, which can be seen as a generalization as the one proposed by Skyrms for signaling games. On a discrete graph, whose vertices represent individuals, at any time step each of them picks one of its neighbors with a probability proportional to their past number of communications; independently, Nature chooses, with an independent identical distribution in time, which ones are allowed to communicate. Communications occur when any two neighbors mutually pick each other and are both allowed by Nature to communicate. Our results generalize the ones obtained by Hu, Skyrms and Tarr\`es. We prove that, up to an error term, the expected rate of communications increases in average, and thus a.s. converges. If we define the limit graph as the non-oriented subgraph on which edges are pairs of vertices communicating infinitely often, then, for stable configurations of the dynamics outside the boundary, the connected components of this limit graph are star-shaped. Conversely, any graph correspondence satisfying that property and a certain balance condition, and within which every vertex is connected to at least another one, is a limit configuration with positive probability.