Mobile offloading is an effective way that migrates computation-intensive parts of applications from resource-constrained mobile devices onto remote resource-rich servers. Application partitioning plays a critical role in high-performance offloading systems, which involves splitting the execution of applications between the mobile side and cloud side so that the total execution cost is minimized. Through partitioning, the mobile device can have the most benefit from offloading the application to a remote cloud. In this paper, we study how to effectively and dynamically partition a given application into local and remote parts while keeping the total cost as small as possible. For general tasks (i.e., arbitrary topological consumption graphs), we propose a new min-cost offloading partitioning (MCOP) algorithm that aims at finding the optimal application partitioning (determining which portions of the application to run on mobile devices and which portions on cloud servers) under different partitioning cost models and mobile environments. The simulation results show that the proposed algorithm provides a stably low time complexity method and can significantly reduce execution time and energy consumption by optimally distributing tasks between mobile devices and cloud servers, and in the meantime, it can well adapt to environment changes.
Procedural content generation (PCG) is of great interest to game design and development as it generates game content automatically. Motivated by the recent learning-based PCG framework and other existing PCG works, we propose an alternative approach to online content generation and adaptation in Super Mario Bros (SMB). Unlike most of existing works in SMB, our approach exploits the synergy between rule-based and learning-based methods to produce constructive primitives, quality yet controllable game segments in SMB. As a result, a complete quality game level can be generated online by integrating relevant constructive primitives via controllable parameters regarding geometrical features and procedure-level properties. Also the adaptive content can be generated in real time by dynamically selecting proper constructive primitives via an adaptation criterion, e.g., dynamic difficulty adjustment (DDA). Our approach is of several favorable properties in terms of content quality assurance, generation efficiency and controllability. Extensive simulation results demonstrate that the proposed approach can generate controllable yet quality game levels online and adaptable content for DDA in real time.
The aggregation of microarray datasets originating from different studies is still a difficult open problem. Currently, best results are generally obtained by the so-called meta-analysis approach, which aggregates results from individual datasets, instead of analyzing aggre-gated datasets. In order to tackle such aggregation problems, it is necessary to correct for interstudy variability prior to aggregation. The goal of this paper is to present a new approach for microarray datasets merging, based upon explicit modeling of interstudy variability and gene variability. We develop and demonstrate a new algorithm for microarray datasets merging. The underlying model assumes normally distributed intrinsic gene expressions, distorted by a study-dependent nonlinear transformation, and study dependent (normally distributed) observation noise. The algorithm addresses both parameter estimation (the parameters being gene expression means and variances, observation noise variances and the nonlinear transformations) and data adjustment, and yields as a result adjusted datasets suitable for aggregation. The method is validated on two case studies. The first one concerns E. Coli expression data, artificially distorted by given nonlinear transformations and additive observation noise. The proposed method is able to correct for the distortion, and yields adjusted datasets from which the relevant biological effects can be recovered, as shown by a standard differential analysis. The second case study concerns the aggregation of two real prostate cancer datasets. After adjustment using the proposed algorithm, a differential analysis performed on adjusted datasets yields a larger number of differentially expressed genes (between control and tumor data). The proposed method has been implemented using the statistical software R 1, and Bioconductor packages 2. The source code (valid for merging two datasets), as well as the datasets used for the validation, and some complementary results, are made available on the web site
Predicting missing links in incomplete complex networks efficiently and accurately is still a challenging problem. The recently proposed CAR (Cannistrai-Alanis-Ravai) index shows the power of local link/triangle information in improving link-prediction accuracy. With the information of level-2 links, which are links between common-neighbors, most classical similarity indices can be improved. Nevertheless, calculating the number of level-2 links makes CAR index not efficient enough. Inspired by the idea of employing local link/triangle information, we propose a new similarity index with more local structure information. In our method, local link/triangle structure information can be conveyed by clustering coefficient of common neighbors directly. The reason why clustering coefficient has good effectiveness in estimating the contribution of a common-neighbor is because that it employs links existing between neighbors of the common-neighbor and these links have the same structural position with the candidate link to this common-neighbor. Ten real-world networks drawn from five various fields are used to test the performance of our method against to classical similarity indices and recently proposed CAR index. Two estimators: precision and AUP, are used to evaluate the accuracy of link prediction algorithms. Generally speaking, our new index only performs competitively with CAR, but it is a good complement to CAR for networks with not very high LCP-corr, which is a measure to estimate the correlation between number of common-neighbors and number of links between common-neighbors. Besides, the proposed index is also more efficient than CAR index.
The cluster analysis of very large objects is an important problem, which spans several theoretical as well as applied branches of mathematics and computer science. Here we suggest a novel approach: under assumption of local convergence of a sequence of finite structures we derive an asymptotic clustering. This is achieved by a blend of analytic and geometric techniques, and particularly by a new interpretation of the authors’ representation theorem for limits of local convergent sequences, which serves as a guidance for the whole process. Our study may be seen as an effort to describe connectivity structure at the limit (without having a defined explicit limit structure) and to pull this connectivity structure back to the finite structures in the sequence in a continuous way.
Upcoming many core processors are expected to employ a distributed memory architecture similar to currently available supercomputers, but parallel pattern mining algorithms amenable to the architecture are not comprehensively studied. We present a novel closed pattern mining algorithm with a well-engineered communication protocol, and generalize it to find statistically significant patterns from personal genome data. For distributing communication evenly, it employs global load balancing with multiple stacks distributed on a set of cores organized as a hypercube with random edges. Our algorithm achieved up to 1175-fold speedup by using 1200 cores for solving a problem with 11,914 items and 697 transactions, while the naive approach of separating the search space failed completely.
Estimating the strength of dependency between two variables is fundamental for exploratory analysis and many other applications in data mining. For example: non-linear dependencies between two continuous variables can be explored with the Maximal Information Coefficient (MIC); and categorical variables that are dependent to the target class are selected using Gini gain in random forests. Nonetheless, because dependency measures are estimated on finite samples, the interpretability of their quantification and the accuracy when ranking dependencies become challenging. Dependency estimates are not equal to 0 when variables are independent, cannot be compared if computed on different sample size, and they are inflated by chance on variables with more categories. In this paper, we propose a framework to adjust dependency measure estimates on finite samples. Our adjustments, which are simple and applicable to any dependency measure, are helpful in improving interpretability when quantifying dependency and in improving accuracy on the task of ranking dependencies. In particular, we demonstrate that our approach enhances the interpretability of MIC when used as a proxy for the amount of noise between variables, and to gain accuracy when ranking variables during the splitting procedure in random forests.
We study the combinatorial description of the LDU decomposition of totally positive matrices. We give a description of the lower triangular L, the diagonal D, and the upper triangular U matrices of the LDU decomposition of totally positive matrices in terms of the combinatorial structure of essential planar networks described by Zelvinsky and Fomin. Similarly, we find a combinatorial description of the inverses of these matrices. In addition, we provide recursive formulae for computing the L, D, and U matrices of a totally positive matrix.