A Comparison Between Decision Trees and Decision Tree Forest Models for Software Development Effort Estimation

Accurate software effort estimation has been a challenge for many software practitioners and project managers. Underestimation leads to disruption in the projects estimated cost and delivery. On the other hand, overestimation causes outbidding and financial losses in business. Many software estimation models exist; however, none have been proven to be the best in all situations. In this paper, a decision tree forest (DTF) model is compared to a traditional decision tree (DT) model, as well as a multiple linear regression model (MLR). The evaluation was conducted using ISBSG and Desharnais industrial datasets. Results show that the DTF model is competitive and can be used as an alternative in software effort prediction.

Competitive and Penalized Clustering Auto-encoder

Auto-encoders (AE) has been widely applied in different fields of machine learning. However, as a deep model, there are a large amount of learnable parameters in the AE, which would cause over-fitting and slow learning speed in practice. Many researchers have been study the intrinsic structure of AE and showed different useful methods to regularize those parameters. In this paper, we present a novel regularization method based on a clustering algorithm which is able to classify the parameters into different groups. With this regularization, parameters in a given group have approximate equivalent values and over-fitting problem could be alleviated. Moreover, due to the competitive behavior of clustering algorithm, this model also overcomes some intrinsic problems of clustering algorithms like the determination of number of clusters. Experiments on handwritten digits recognition verify the effectiveness of our novel model.

Gaussian Approximation for High Dimensional Time Series

We consider the problem of approximating sums of high-dimensional stationary time series by Gaussian vectors, using the framework of functional dependence measure. The validity of the Gaussian approximation depends on the sample size n, the dimension p, the moment condition and the dependence of the underlying processes. We also consider an estimator for long-run covariance matrices and study its convergence properties. Our results allow constructing simultaneous confidence intervals for mean vectors of high-dimensional time series with asymptotically correct coverage probabilities. A Gaussian multiplier bootstrap method is proposed. A simulation study indicates the quality of Gaussian approximation with different n, p under different moment and dependence conditions.

Mining Combined Causes

In recent years, many methods have been developed for detecting causal relationships in observational data. Some of them have the potential to tackle large data sets. However, these methods fail to discover a combined cause, i.e. a multi-factor cause consisting of two or more component variables which individually are not causes. A straightforward approach to uncovering a combined cause is to include both individual and combined variables in the causal discovery using existing methods, but this scheme is computationally infeasible due to the huge number of combined variables. In this paper, we propose a novel approach to address this practical causal discovery problem, i.e. mining combined causes in large data sets. The experiments with both synthetic and real world data sets show that the proposed method can obtain high-quality causal discoveries with a high computational efficiency.

Multi-armed Bandit Problem with Known Trend

We consider a variant of the multi-armed bandit model, which we call multi-armed bandit problem with known trend, where the gambler knows the shape of the reward function of each arm but not its distribution. This new problem is motivated by different online problems like active learning, music and interface recommendation applications, where when an arm is sampled by the model the received reward change according to a known trend. By adapting the standard multi-armed bandit algorithm UCB1 to take advantage of this setting, we propose the new algorithm named A-UCB that assumes a stochastic model. We provide upper bounds of the regret which compare favourably with the ones of UCB1. We also confirm that experimentally with different simulations

Transparent hardware synthesis of Java for predictable large-scale distributed systems

The JUNIPER project is developing a framework for the construction of large-scale distributed systems in which execution time bounds can be guaranteed. Part of this work involves the automatic implementation of input Java code on FPGAs, both for speed and predictability. An important focus of this work is to make the use of FPGAs transparent though runtime co-design and partial reconfiguration. Initial results show that the use of Java does not hamper hardware generation, and provides tight execution time estimates. This paper describes an overview the approach taken, and presents some preliminary results that demonstrate the promise in the technique.

Understanding Editing Behaviors in Multilingual Wikipedia

Multilingualism is common offline, but we have a more limited understanding of the ways multilingualism is displayed online and the roles that multilinguals play in the spread of content between speakers of different languages. We take a computational approach to studying multilingualism using one of the largest user-generated content platforms, Wikipedia. We study multilingualism by collecting and analyzing a large dataset of the content written by multilingual editors of the English, German, and Spanish editions of Wikipedia. This dataset contains over two million paragraphs edited by over 15,000 multilingual users from July 8 to August 9, 2013. We analyze these multilingual editors in terms of their engagement, interests, and language proficiency in their primary and non-primary (secondary) languages and find that the English edition of Wikipedia displays different dynamics from the Spanish and German editions. Users primarily editing the Spanish and German editions make more complex edits than users who edit these editions as a second language. In contrast, users editing the English edition as a second language make edits that are just as complex as the edits by users who primarily edit the English edition. In this way, English serves a special role bringing together content written by multilinguals from many language editions. Nonetheless, language remains a formidable hurdle to the spread of content: we find evidence for a complexity barrier whereby editors are less likely to edit complex content in a second language. In addition, we find that multilinguals are less engaged and show lower levels of language proficiency in their second languages. We also examine the topical interests of multilingual editors and find that there is no significant difference between primary and non-primary editors in each language.

A concave pairwise fusion approach to subgroup analysis

A dual descent algorithm for node-capacitated multiflow problems and its applications

A note-question on partitions of semigroups

A problem on track runners

A proof of the SXP rule by bijections and involutions

A Strong Limit Theorem for Two-Time-Scale Fucntional Stochastic Differential Equations

A theoretical framework for calibration in computer models: parametrization, estimation and convergence properties

Bi-Cohen-Macaulay graphs

Consensus Convergence with Stochastic Effects

Constructing Internally Disjoint Pendant Steiner Trees in Cartesian Product Networks

Continuum Space Limit of the Genealogies of Interacting Fleming-Viot Processes on $\Z$

Corrigendum to ‘Weak Approximations for Wiener Functionals’ [Ann. Appl. Probab. (2013), 23, 4, 1660-1691

Covering the large spectrum and generalized Riesz products

Dense binary $PG(t-1,2)$-free matroids have critical number $t-1$ or $t$

Detecting Abrupt Changes in the Spectra of High-Energy Astrophysical Sources

Exact solution for low energy quantum anharmonic vibrations in a long periodic chain

Exact triangles for SO(3) instanton homology of webs

From symmetric fundamental expansions to Schur positivity

Functional BKR inequalities, and their duals, with applications

Incoherent ensemble dynamics in disordered systems

Many-body localization in a quantum simulator with programmable random disorder

Many-body localization protected quantum state transfer

Modified Linear Programming and Class 0 Bounds for Graph Pebbling

Note on abstract stochastic semilinear evolution equations

On linear equations arising in Combinatorics (Part II)

On the pedant tree-connectivity of graphs

On the tails of the limiting Quicksort distribution

On ultralimits of sparse graph classes

Parallel Dither and Dropout for Regularising Deep Neural Networks

Partitioning Large Scale Deep Belief Networks Using Dropout

Persistence exponent for random walk on directed versions of $Z^2$

Polarized CMB recovery with sparse component separation

P-positions in Modular Extensions to Nim

Proposal of ROS-compliant FPGA Component for Low-Power Robotic Systems

Quantum Control of Many-body Localized States

Rational Chebyshev of Second Kind Collocation Method for Solving a Class of Astrophysics Problems

Regularity of solutions of abstract linear evolution equations

Regularized Kernel Recursive Least Square Algoirthm

RIPL: An Efficient Image Processing DSL for FPGAs

Small ball properties and representation results

Spectral statistics across the many-body localization transition

Tait colorings, and an instanton homology for webs and foams

The cardinality of sets in Tverberg partitions

The Ramsey number of mixed-parity cycles I

The Ramsey number of mixed-parity cycles II

The Ramsey number of mixed-parity cycles III

Topological Control on Silicates Dissolution Kinetics

Unified System for Processing Real and Simulated Data in the ATLAS Experiment

Unitals in shift planes of odd order

Varying-coefficient models with isotropic Gaussian process priors