• A Comparison Between Decision Trees and Decision Tree Forest Models for Software Development Effort Estimation
Accurate software effort estimation has been a challenge for many software practitioners and project managers. Underestimation leads to disruption in the projects estimated cost and delivery. On the other hand, overestimation causes outbidding and financial losses in business. Many software estimation models exist; however, none have been proven to be the best in all situations. In this paper, a decision tree forest (DTF) model is compared to a traditional decision tree (DT) model, as well as a multiple linear regression model (MLR). The evaluation was conducted using ISBSG and Desharnais industrial datasets. Results show that the DTF model is competitive and can be used as an alternative in software effort prediction.
• Competitive and Penalized Clustering Auto-encoder
Auto-encoders (AE) has been widely applied in different fields of machine learning. However, as a deep model, there are a large amount of learnable parameters in the AE, which would cause over-fitting and slow learning speed in practice. Many researchers have been study the intrinsic structure of AE and showed different useful methods to regularize those parameters. In this paper, we present a novel regularization method based on a clustering algorithm which is able to classify the parameters into different groups. With this regularization, parameters in a given group have approximate equivalent values and over-fitting problem could be alleviated. Moreover, due to the competitive behavior of clustering algorithm, this model also overcomes some intrinsic problems of clustering algorithms like the determination of number of clusters. Experiments on handwritten digits recognition verify the effectiveness of our novel model.
• Gaussian Approximation for High Dimensional Time Series
• Mining Combined Causes
In recent years, many methods have been developed for detecting causal relationships in observational data. Some of them have the potential to tackle large data sets. However, these methods fail to discover a combined cause, i.e. a multi-factor cause consisting of two or more component variables which individually are not causes. A straightforward approach to uncovering a combined cause is to include both individual and combined variables in the causal discovery using existing methods, but this scheme is computationally infeasible due to the huge number of combined variables. In this paper, we propose a novel approach to address this practical causal discovery problem, i.e. mining combined causes in large data sets. The experiments with both synthetic and real world data sets show that the proposed method can obtain high-quality causal discoveries with a high computational efficiency.
• Multi-armed Bandit Problem with Known Trend
We consider a variant of the multi-armed bandit model, which we call multi-armed bandit problem with known trend, where the gambler knows the shape of the reward function of each arm but not its distribution. This new problem is motivated by different online problems like active learning, music and interface recommendation applications, where when an arm is sampled by the model the received reward change according to a known trend. By adapting the standard multi-armed bandit algorithm UCB1 to take advantage of this setting, we propose the new algorithm named A-UCB that assumes a stochastic model. We provide upper bounds of the regret which compare favourably with the ones of UCB1. We also confirm that experimentally with different simulations
• Transparent hardware synthesis of Java for predictable large-scale distributed systems
The JUNIPER project is developing a framework for the construction of large-scale distributed systems in which execution time bounds can be guaranteed. Part of this work involves the automatic implementation of input Java code on FPGAs, both for speed and predictability. An important focus of this work is to make the use of FPGAs transparent though runtime co-design and partial reconfiguration. Initial results show that the use of Java does not hamper hardware generation, and provides tight execution time estimates. This paper describes an overview the approach taken, and presents some preliminary results that demonstrate the promise in the technique.
• Understanding Editing Behaviors in Multilingual Wikipedia
Multilingualism is common offline, but we have a more limited understanding of the ways multilingualism is displayed online and the roles that multilinguals play in the spread of content between speakers of different languages. We take a computational approach to studying multilingualism using one of the largest user-generated content platforms, Wikipedia. We study multilingualism by collecting and analyzing a large dataset of the content written by multilingual editors of the English, German, and Spanish editions of Wikipedia. This dataset contains over two million paragraphs edited by over 15,000 multilingual users from July 8 to August 9, 2013. We analyze these multilingual editors in terms of their engagement, interests, and language proficiency in their primary and non-primary (secondary) languages and find that the English edition of Wikipedia displays different dynamics from the Spanish and German editions. Users primarily editing the Spanish and German editions make more complex edits than users who edit these editions as a second language. In contrast, users editing the English edition as a second language make edits that are just as complex as the edits by users who primarily edit the English edition. In this way, English serves a special role bringing together content written by multilinguals from many language editions. Nonetheless, language remains a formidable hurdle to the spread of content: we find evidence for a complexity barrier whereby editors are less likely to edit complex content in a second language. In addition, we find that multilinguals are less engaged and show lower levels of language proficiency in their second languages. We also examine the topical interests of multilingual editors and find that there is no significant difference between primary and non-primary editors in each language.
• A concave pairwise fusion approach to subgroup analysis
• A dual descent algorithm for node-capacitated multiflow problems and its applications
• A note-question on partitions of semigroups
• A problem on track runners
• A proof of the SXP rule by bijections and involutions
• A Strong Limit Theorem for Two-Time-Scale Fucntional Stochastic Differential Equations
• A theoretical framework for calibration in computer models: parametrization, estimation and convergence properties
• Bi-Cohen-Macaulay graphs
• Consensus Convergence with Stochastic Effects
• Constructing Internally Disjoint Pendant Steiner Trees in Cartesian Product Networks
• Continuum Space Limit of the Genealogies of Interacting Fleming-Viot Processes on $\Z$
• Corrigendum to ‘Weak Approximations for Wiener Functionals’ [Ann. Appl. Probab. (2013), 23, 4, 1660-1691
• Covering the large spectrum and generalized Riesz products
• Dense binary $PG(t-1,2)$-free matroids have critical number $t-1$ or $t$
• Detecting Abrupt Changes in the Spectra of High-Energy Astrophysical Sources
• Exact solution for low energy quantum anharmonic vibrations in a long periodic chain
• Exact triangles for SO(3) instanton homology of webs
• From symmetric fundamental expansions to Schur positivity
• Functional BKR inequalities, and their duals, with applications
• Incoherent ensemble dynamics in disordered systems
• Many-body localization in a quantum simulator with programmable random disorder
• Many-body localization protected quantum state transfer
• Modified Linear Programming and Class 0 Bounds for Graph Pebbling
• Note on abstract stochastic semilinear evolution equations
• On linear equations arising in Combinatorics (Part II)
• On the pedant tree-connectivity of graphs
• On the tails of the limiting Quicksort distribution
• On ultralimits of sparse graph classes
• Parallel Dither and Dropout for Regularising Deep Neural Networks
• Partitioning Large Scale Deep Belief Networks Using Dropout
• Persistence exponent for random walk on directed versions of $Z^2$
• Polarized CMB recovery with sparse component separation
• P-positions in Modular Extensions to Nim
• Proposal of ROS-compliant FPGA Component for Low-Power Robotic Systems
• Quantum Control of Many-body Localized States
• Rational Chebyshev of Second Kind Collocation Method for Solving a Class of Astrophysics Problems
• Regularity of solutions of abstract linear evolution equations
• Regularized Kernel Recursive Least Square Algoirthm
• RIPL: An Efficient Image Processing DSL for FPGAs
• Small ball properties and representation results
• Spectral statistics across the many-body localization transition
• Tait colorings, and an instanton homology for webs and foams
• The cardinality of sets in Tverberg partitions
• The Ramsey number of mixed-parity cycles I
• The Ramsey number of mixed-parity cycles II
• The Ramsey number of mixed-parity cycles III
• Topological Control on Silicates Dissolution Kinetics
• Unified System for Processing Real and Simulated Data in the ATLAS Experiment
• Unitals in shift planes of odd order
• Varying-coefficient models with isotropic Gaussian process priors
Like this:
Like Loading...