SpCoSLAM 2.0 google
In this paper, we propose a novel online learning algorithm, SpCoSLAM 2.0 for spatial concepts and lexical acquisition with higher accuracy and scalability. In previous work, we proposed SpCoSLAM as an online learning algorithm based on the Rao–Blackwellized particle filter. However, this conventional algorithm had problems such as the decrease of the estimation accuracy due to the influence of the early stages of learning as well as the increase of the computational complexity with the increase of the training data. Therefore, we first develop an improved algorithm by introducing new techniques such as rejuvenation. Next, we develop a scalable algorithm to reduce the calculation time while maintaining a higher accuracy than the conventional algorithm. In the experiment, we evaluate and compare the estimation accuracy and calculation time of the proposed algorithm, conventional online algorithm, and batch learning. The experimental results demonstrate that the proposed algorithm not only exceeds the accuracy of the conventional algorithm but also capable of achieving an accuracy comparable to that of batch learning. In addition, the proposed algorithm showed that the calculation time does not depend on the amount of training data and becomes constant for each step with the scalable algorithm. …

Balanced Random Forest Approach for WEKA google
Data analysis and machine learning have become an integrative part of the modern scientific methodology, providing automated techniques to predict further information based on observations. One of these classification and regression techniques is the random forest approach. Those decision tree based predictors are best known for their good computational performance and scalability. However, in case of severely imbalanced training data, as often seen in medical studies’ data with large control groups, the training algorithm or the sampling process has to be altered in order to improve the prediction quality for minority classes. In this work, a balanced random forest approach for WEKA is proposed. Furthermore, the prediction quality of the unmodified random forest implementation and the new balanced random forest version for WEKA are evaluated against reference implementations in R. Two-class problems on balanced data sets and imbalanced medical studies’ data are investigated. A superior prediction quality using the proposed method for imbalanced data is shown compared to the other three techniques. …

Gradient Projection Iterative Sketch (GPIS) google
We propose a randomized first order optimization algorithm Gradient Projection Iterative Sketch (GPIS) and an accelerated variant for efficiently solving large scale constrained Least Squares (LS). We provide theoretical convergence analysis for both proposed algorithms and demonstrate our methods’ computational efficiency compared to classical accelerated gradient method, and the state of the art variance-reduced stochastic gradient methods through numerical experiments in various large synthetic/real data sets. …

Evolution Gene google
The modeling of time series is becoming increasingly critical in a wide variety of applications. Overall, data evolves by following different patterns, which are generally caused by different user behaviors. Given a time series, we define the evolution gene to capture the latent user behaviors and to describe how the behaviors lead to the generation of time series. In particular, we propose a uniform framework that recognizes different evolution genes of segments by learning a classifier, and adopt an adversarial generator to implement the evolution gene by estimating the segments’ distribution. Experimental results based on a synthetic dataset and five real-world datasets show that our approach can not only achieve a good prediction results (e.g., averagely +10.56% in terms of F1), but is also able to provide explanations of the results. …