Distributed gRaph cOmputiNg Engine (DRONE) google
Nowadays, in big data era, social networks, graph database, knowledge graph, electronic commerce and etc. demand efficient and scalable capability to process ever increasingly volume of graph-structured data. To meet the challenge, two mainstream distributed programming models, vertex-centric VC and subgraph-centric (SC) were proposed. Compared to the VC model, the SC model converges faster with less communication overhead on well-partitioned graphs, and is easy to program with due to the ‘think like a graph’ philosophy. However, edge-cut method causes significant performance bottleneck for preprocessing large graphs, especially power-law graphs. Although the edge-cut method is considered as a natural choice of subgraph-centric model for graph partitioning, and adopted by Giraph++, Blogel, GRAPE. Thus, the SC model is less competitive in practice. In this paper, we present an innovative distributed graph computing framework, DRONE(Distributed gRaph cOmputiNg Engine). It combines the subgraph-centric model and the vertex-cut graph partitioning strategy. Experiments show that DRONE outperform the state-of-art distributed graph computing engines on real-world graphs and synthetic power-law graphs. DRONE is capable to scale up to process one-trillion-edges synthetic power-law graphs, which is orders of magnitude larger than previously reported by existing SC-based frameworks. …

Nucleus Neural Network google
Artificial neural networks which model the neurons and connecting architectures in brain have achieved great successes in many problems, especially those with deep layers. In this paper, we propose a nucleus neural network (NNN) and corresponding architecture and parameter learning methods. In a nucleus, there are no regular layers, i.e., a neuron may connect to all the neurons in the nucleus. This architecture gets rid of layer limitation and may lead to more powerful learning capability. It is crucial to determine the connections given numerous neurons. Based on the principle that more relevant input and output neuron pair deserves higher connecting density, we propose an architecture learning model for the nucleus. Moreover, we propose an improved learning method for learning connecting weights and biases with the optimized architecture. We find that this novel architecture is robust to irrelevant components in test data. So we define a super robust learning problem and test the proposed network with one case where the types of image backgrounds in training and test sets are different. Experiments demonstrate that the proposed learner achieves significant improvement over traditional learners on the reconstructed data set. …

Bayesian Minimum Expected Loss google
Central to many inferential situations is the estimation of rational functions of parameters. The mainstream in statistics and econometrics estimates these quantities based on the plug-in approach without consideration of the main objective of the inferential situation. We propose the Bayesian Minimum Expected Loss (MELO) approach focusing explicitly on the function of interest, and calculating its frequentist variability. Asymptotic properties of the MELO estimator are similar to the plug-in approach. Nevertheless, simulation exercises show that our proposal is better in situations characterized by small sample sizes and noisy models. In addition, we observe in the applications that our approach gives lower standard errors than frequently used alternatives when datasets are not very informative. …

Ramer-Douglas-Peucker Algorithm (RDP) google
The Ramer-Douglas-Peucker algorithm (RDP) is an algorithm for reducing the number of points in a curve that is approximated by a series of points. The initial form of the algorithm was independently suggested in 1972 by Urs Ramer and 1973 by David Douglas and Thomas Peucker and several others in the following decade. This algorithm is also known under the names Douglas-Peucker algorithm, iterative end-point fit algorithm and split-and-merge algorithm.The purpose of the algorithm is, given a curve composed of line segments, to find a similar curve with fewer points. The algorithm defines ‘dissimilar’ based on the maximum distance between the original curve and the simplified curve. The simplified curve consists of a subset of the points that defined the original curve.
http://…/rdp