Consistent Estimation for Partition-wise Regression and Classification Models
Partition-wise models offer a flexible approach for modeling complex and multidimensional data that are capable of producing interpretable results. They are based on partitioning the observed data into regions, each of which is modeled with a simple submodel. The success of this approach highly depends on the quality of the partition, as too large a region could lead to a non-simple submodel, while too small a region could inflate estimation variance. This paper proposes an automatic procedure for choosing the partition (i.e., the number of regions and the boundaries between regions) as well as the submodels for the regions. It is shown that, under the assumption of the existence of a true partition, the proposed partition estimator is statistically consistent. The methodology is demonstrated for both regression and classification problems.
Trans-gram, Fast Cross-lingual Word-embeddings
We introduce Trans-gram, a simple and computationally-efficient method to simultaneously learn and align wordembeddings for a variety of languages, using only monolingual data and a smaller set of sentence-aligned data. We use our new method to compute aligned wordembeddings for twenty-one languages using English as a pivot language. We show that some linguistic features are aligned across languages for which we do not have aligned data, even though those properties do not exist in the pivot language. We also achieve state of the art results on standard cross-lingual text classification and word translation tasks.
Git4Voc: Git-based Versioning for Collaborative Vocabulary Development
Collaborative vocabulary development in the context of data integration is the process of finding consensus between the experts of the different systems and domains. The complexity of this process is increased with the number of involved people, the variety of the systems to be integrated and the dynamics of their domain. In this paper we advocate that the realization of a powerful version control system is the heart of the problem. Driven by this idea and the success of Git in the context of software development, we investigate the applicability of Git for collaborative vocabulary development. Even though vocabulary development and software development have much more similarities than differences there are still important differences. These need to be considered within the development of a successful versioning and collaboration system for vocabulary development. Therefore, this paper starts by presenting the challenges we were faced with during the creation of vocabularies collaboratively and discusses its distinction to software development. Based on these insights we propose Git4Voc which comprises guidelines how Git can be adopted to vocabulary development. Finally, we demonstrate how Git hooks can be implemented to go beyond the plain functionality of Git by realizing vocabulary-specific features like syntactic validation and semantic diffs.
Argumentation Mining in User-Generated Web Discourse
The goal of argumentation mining, an evolving research field in computational linguistics, is to design methods capable of analyzing people’s argumentation. In this article, we go beyond the state of the art in several ways. (i) We deal with actual Web data and take up the challenges given by the variety of registers, multiple domains, and unrestricted noisy user-generated Web discourse. (ii) We bridge the gap between normative argumentation theories and argumentation phenomena encountered in actual data by adapting an argumentation model tested in an extensive annotation study. (iii) We create a new gold standard corpus (90k tokens in 340 documents) and experiment with several machine learning methods to identify argument components. We offer the data, source codes, and annotation guidelines to the community under free licenses. Our findings show that argumentation mining in user-generated Web discourse is a feasible but challenging task.
A Synthetic Approach for Recommendation: Combining Ratings, Social Relations, and Reviews
Recommender systems (RSs) provide an effective way of alleviating the information overload problem by selecting personalized choices. Online social networks and user-generated content provide diverse sources for recommendation beyond ratings, which present opportunities as well as challenges for traditional RSs. Although social matrix factorization (Social MF) can integrate ratings with social relations and topic matrix factorization can integrate ratings with item reviews, both of them ignore some useful information. In this paper, we investigate the effective data fusion by combining the two approaches, in two steps. First, we extend Social MF to exploit the graph structure of neighbors. Second, we propose a novel framework MR3 to jointly model these three types of information effectively for rating prediction by aligning latent factors and hidden topics. We achieve more accurate rating prediction on two real-life datasets. Furthermore, we measure the contribution of each data source to the proposed framework.
Temporal Multinomial Mixture for Instance-Oriented Evolutionary Clustering
Evolutionary clustering aims at capturing the temporal evolution of clusters. This issue is particularly important in the context of social media data that are naturally temporally driven. In this paper, we propose a new probabilistic model-based evolutionary clustering technique. The Temporal Multinomial Mixture (TMM) is an extension of classical mixture model that optimizes feature co-occurrences in the trade-off with temporal smoothness. Our model is evaluated for two recent case studies on opinion aggregation over time. We compare four different probabilistic clustering models and we show the superiority of our proposal in the task of instance-oriented clustering.
Bayesian linear regression with skew-symmetric error distributions with applications to survival analysis
We study Bayesian linear regression models with skew-symmetric scale mixtures of normal error distributions. These kinds of models can be used to capture departures from the usual assumption of normality of the errors in terms of heavy tails and asymmetry. We propose a general non-informative prior structure for these regression models and show that the corresponding posterior distribution is proper under mild conditions. We extend these propriety results to cases where the response variables are censored. The latter scenario is of interest in the context of accelerated failure time models, which are relevant in survival analysis. We present a simulation study that demonstrates good frequentist properties of the posterior credible intervals associated to the proposed priors. This study also sheds some light on the trade-off between increased model flexibility and the risk of over-fitting. We illustrate the performance of the proposed models with real data. Although we focus on models with univariate response variables, we also present some extensions to the multivariate case in the Supporting Web Material.
A FIRM Approach to Software-Defined Service Composition
Service composition is an aggregate of services often leveraged to automate the enterprise business processes. While Service Oriented Architecture (SOA) has been a forefront of service composition, services can be realized as efficient distributed and parallel constructs such as MapReduce, which are not typically exploited in service composition. With the advent of Software\-Defined Networking (SDN), global view and control of the entire network is made available to the networking controller, which can further be leveraged in application level. This paper presents FIRM, an approach for Software-Defined Service Composition by leveraging SDN and MapReduce. FIRM comprises Find, Invoke, Return, and Manage, as the core procedures in achieving a QoS-Aware Service Composition.
SENDIM for Incremental Development of Cloud Networks
Due to the limited and varying availability of cheap infrastructure and resources, cloud network systems and applications are tested in simulation and emulation environments prior to physical deployments, at different stages of development. Configuration management tools manage deployments and migrations across different cloud platforms, mitigating tedious system administration efforts. However, currently a cloud networking simulation cannot be migrated as an emulation, or vice versa, without rewriting and manually re-deploying the simulated application. This paper presents SENDIM (Sendim is a northeastern Portuguese town close to the Spanish border, where the rare Mirandese language is spoken), a Simulation, Emulation, aNd Deployment Integration Middleware for cloud networks. As an orchestration platform for incrementally building Software-Defined Cloud Networks (SDCN), SENDIM manages the development and deployment of algorithms and architectures the entire length from visualization, simulation, emulation, to physical deployments. Hence, SENDIM optimizes the evaluation of cloud networks.
• Bayesian Inference using the Symmetric Monoidal Closed Category Structure
• Exact Relation between Singular Value and Eigenvalue Statistics
• Programming Discrete Distributions with Chemical Reaction Networks
• On the enumeration of lattice $3$-polytopes
• On the local genus distribution of graph embeddings
• On some multicolour Ramsey properties of random graphs
• Bayesian subset simulation
• Environmental Noise Embeddings for Robust Speech Recognition
• Evaluating the Performance of a Speech Recognition based System
• Investigating gated recurrent neural networks for speech synthesis
• Stationary signal processing on graphs
• The homotopy theory of equivariant posets
• On the variations of the principal eigenvalue and the probability of survival with respect to a parameter in growth-fragmentation-death models
• Multidimensional Selberg theorem and fluctuations of the zeta zeros via Malliavin calculus
• How to learn a graph from smooth signals
• Numerical analysis of lognormal diffusions on the sphere
• An inequality for moments of log-concave functions on Gaussian random vectors
• Approximation algorithms for node-weighted prize-collecting Steiner tree problems on planar graphs
• Approximating the degree sequence of two random graphs
• An Application-Level Dependable Technique for Farmer-Worker Parallel Programs
• Modeling Multivariate Mixed-Response Functional Data
• Aging in the three-dimensional Random Field Ising Model
• The Effects of Age, Gender and Region on Non-standard Linguistic Variation in Online Social Networks
• Autonomous Crowds Tracking with Box Particle Filtering and Convolution Particle Filtering
• Extension complexity and realization spaces of hypersimplices
• Subexponential time algorithms for finding small tree and path decompositions
• A novel approach for Markov Random Field with intractable normalising constant on large lattices
• New Integrality Gap Results for the Firefighters Problem on Trees
• Bounding errors of Expectation-Propagation
• Implicit Look-alike Modelling in Display Ads: Transfer Collaborative Filtering to CTR Estimation
• Deep Learning over Multi-field Categorical Data: A Case Study on User Response Prediction
• Asymptotic results for exponential functionals of Levy processes
• Cospectral lifts of graphs
• Linear and Optimization Hamiltonians in Clustered Exponential Random Graph Modeling
• Optimal Power Flow with Inelastic Demands for Demand Response in Radial Distribution Networks
• Localisation of a source of biochemical agent dispersion using binary measurements
• Eventual return probability in multidimensional random walks
• On the geometry of random lemniscates
• Bismut’s gradient formula for vector bundles
• On the geometric properties of the semi-Lagrangian discontinuous Galerkin scheme for the Vlasov-Poisson equation
• Bounded colorings of multipartite graphs and hypergraphs
• Predicting the large-scale evolution of tag systems
• Involution words II: braid relations and atomic structures
• Improper Twin Edge Coloring of Graphs
• A Sufficient Statistics Construction of Bayesian Nonparametric Exponential Family Conjugate Models
• Negative interest rates: why and how?
• On parallel solution of ordinary differential equations
• Multivariate Regular Variation of Discrete Mass Functions with Applications to Preferential Attachment Networks
• Hypo-efficient domination and hypo-unique domination
• Constructions for the optimal pebbling of grids
• Parallel Stroked Multi Line: a model-based method for compressing large fingerprint databases
• On the lock-in probability estimate of stochastic approximation with controlled Markov noise
• On Clustering Time Series Using Euclidean Distance and Pearson Correlation
• Stammering tableaux – Tableaux bégayants
• Heat transport in low-dimensional random harmonic networks
• Random Continued fractions: Lévy constant and Chernoff-type estimate
• Limit theorems related to beta-expansion and continued fraction expansion
• Identifying Stable Patterns over Time for Emotion Recognition from EEG
• Limit Laws for Random Matrices from Traffic-Free Probability
• Optimal-order bounds on the rate of convergence to normality for maximum likelihood estimators
• Empirical Gaussian priors for cross-lingual transfer learning
• Coexistence of shocks and rarefaction fans: complex phase diagram of a simple hyperbolic particle system
• Computing semiparametric bounds on the expected payments of insurance instruments via column generation
• Fluctuations in the heterogeneous multiscale methods for fast-slow systems
• Discrepancy of line segments for general lattice checkerboards
• Spectra of general hypergraphs
• A note on the Sobol’ indices and interactive criteria
• Diffusive Propagation of Energy in a Non-Acoustic Chain
• Sklar’s Theorem in an Imprecise Setting
• On totally antimagic total labeling of complete bipartite graphs
• Invertible binary matrix with maximum number of $2$-by-$2$ invertible submatrices
• Dynamic Monopolies for Degree Proportional Thresholds in Connected Graphs of Girth at least Five and Trees
• Wavelet analysis on symbolic sequences and two-fold de Bruijn sequences
• Group Invariant Deep Representations for Image Instance Retrieval
• On the Very-well-poised Bilateral Basic Hypergeometric $_5ψ_5$ Series
• Minimax Subsampling for Estimation and Prediction in Low-Dimensional Linear Regression
• Maxima of Two Random Walks: Universal Statistics of Lead Changes
• A note on the sample complexity of the Er-SpUD algorithm by Spielman, Wang and Wright for exact recovery of sparsely used dictionaries
• Autocorrelated errors in experimental data in the language sciences: Some solutions offered by Generalized Additive Mixed Models
• It’s just a matter of perspective(s): Crowd-Powered Consensus Organization of Corpora
Like this:
Like Loading...