Available recommender systems mostly provide recommendations based on the users preferences by utilizing traditional methods such as collaborative filtering which only relies on the similarities between users and items. However, collaborative filtering might lead to provide poor recommendation because it does not rely on other useful available data such as users locations and hence the accuracy of the recommendations could be very low and inefficient. This could be very obvious in the systems that locations would affect users preferences highly such as movie recommender systems. In this paper a new location-based movie recommender system based on the collaborative filtering is introduced for enhancing the accuracy and the quality of recommendations. In this approach, users locations have been utilized and take in consideration in the entire processing of the recommendations and peer selections. The potential of the proposed approach in providing novel and better quality recommendations have been discussed through experiments in real datasets.
Security in computer networks is one of the most interesting aspects of computer systems. It is typically represented by the initials CIA: confidentiality, integrity, and authentication or availability. Although, many access levels for data protection have been identified in computer networks, the intruders would still find lots of ways to harm sites and systems. The accommodation proceedings and the security supervision in the network systems, especially wireless sensor networks have been changed into a challenging point. One of the newest security algorithms for wireless sensor networks is Artificial Immune System (AIS) algorithm. Human lymphocytes play the main role in recognizing and destroying the unknown elements. In this article, we focus on the inspiration of these defective systems to guarantee the complications security using two algorithms; the first algorithms proposed to distinguish self-nodes from non-self ones by the related factors and the second one is to eliminate the enemy node danger.The results showed a high rate success and good rate of detecting for unknown object; it could present the best nodes with high affinity and fitness to be selected to confront the unknown agents.
Security issues are the most challenging problems in cloud computing environments as an emerging technology. Regarding to this importance, an efficient and reliable user authentication and data protection model has been presented in this paper to increase the rate of reliability cloud-based environments. Accordingly, two encryption procedures have been established in an independent middleware (Agent) to perform the process of user authentication, access control, and data protection in cloud servers. AES has been used as a symmetric cryptography algorithm in cloud servers and RSA has been used as an asymmetric cryptography algorithm in Agent servers. The theoretical evaluation of the proposed model shows that the ability of resistance in face with possible attacks and unpredictable events has been enhanced considerably in comparison with similar models because of using dual encryption and an independent middleware during user authentication and data protection procedures.
We apply a general deep learning framework to address the non-factoid question answering task. Our approach does not rely on any linguistic tools and can be applied to different languages or domains. Various architectures are presented and compared. We create and release a QA corpus and setup a new QA task in the insurance domain. Experimental results demonstrate superior performance compared to the baseline methods and various technologies give further improvements. For this highly challenging task, the top-1 accuracy can reach up to 65.3% on a test set, which indicates a great potential for practical use.
We introduce a dimension reduction method for visualizing the clustering structure obtained from a finite mixture of Gaussian densities. Information on the dimension reduction subspace is obtained from the variation on group means and, depending on the estimated mixture model, on the variation on group covariances. The proposed method aims at reducing the dimensionality by identifying a set of linear combinations, ordered by importance as quantified by the associated eigenvalues, of the original features which capture most of the cluster structure contained in the data. Observations may then be projected onto such a reduced subspace, thus providing summary plots which help to visualize the clustering structure. These plots can be particularly appealing in the case of high-dimensional data and noisy structure. The new constructed variables capture most of the clustering information available in the data, and they can be further reduced to improve clustering performance. We illustrate the approach on both simulated and real data sets.
The paper introduces a methodology for visualizing on a dimension reduced subspace the classification structure and the geometric characteristics induced by an estimated Gaussian mixture model for discriminant analysis. In particular, we consider the case of mixture of mixture models with varying parametrization which allow for parsimonious models. The approach is an extension of an existing work on reducing dimensionality for model-based clustering based on Gaussian mixtures. Information on the dimension reduction subspace is provided by the variation on class locations and, depending on the estimated mixture model, on the variation on class dispersions. Projections along the estimated directions provide summary plots which help to visualize the structure of the classes and their characteristics. A suitable modification of the method allows us to recover the most discriminant directions, i.e., those that show maximal separation among classes. The approach is illustrated using simulated and real data.
Recommender systems daily influence our decisions on the Internet. While considerable attention has been given to issues such as recommendation accuracy and user privacy, the long-term mutual feedback between a recommender system and the decisions of its users has been neglected so far. We propose here a model of network evolution which allows us to study the complex dynamics induced by this feedback, including the hysteresis effect which is typical for systems with non-linear dynamics. Despite the popular belief that recommendation helps users to discover new things, we find that the long-term use of recommendation can contribute to the rise of extremely popular items and thus ultimately narrow the user choice. These results are supported by measurements of the time evolution of item popularity inequality in real systems. We show that this adverse effect of recommendation can be tamed by sacrificing part of short-term recommendation accuracy.
In recent years, research efforts to extend linear metric learning models to handle nonlinear structures have attracted great interests. In this paper, we propose a novel nonlinear solution through the utilization of deformable geometric models to learn spatially varying metrics, and apply the strategy to boost the performance of both kNN and SVM classifiers. Thin-plate splines (TPS) are chosen as the geometric model due to their remarkable versatility and representation power in accounting for high-order deformations. By transforming the input space through TPS, we can pull same-class neighbors closer while pushing different-class points farther away in kNN, as well as make the input data points more linearly separable in SVMs. Improvements in the performance of kNN classification are demonstrated through experiments on synthetic and real world datasets, with comparisons made with several state-of-the-art metric learning solutions. Our SVM-based models also achieve significant improvements over traditional linear and kernel SVMs with the same datasets.
The minimal sets within a collection of sets are defined as the ones which do not have a proper subset within the collection, and the maximal sets are the ones which do not have a proper superset within the collection. Identifying extremal sets is a fundamental problem with a wide-range of applications in SAT solvers, data-mining and social network analysis. In this paper, we present two novel improvements of the high-quality extremal set identification algorithm, \textit{AMS-Lex}, described by Bayardo and Panda. The first technique uses memoization to improve the execution time of the single-threaded variant of the AMS-Lex, whilst our second improvement uses parallel programming methods. In a subset of the presented experiments our memoized algorithm executes more than $400$ times faster than the highly efficient publicly available implementation of AMS-Lex. Moreover, we show that our modified algorithm’s speedup is not bounded above by a constant and that it increases as the length of the common prefixes in successive input \textit{itemsets} increases. We provide experimental results using both real-world and synthetic data sets, and show our multi-threaded variant algorithm out-performing AMS-Lex by $3$ to $6$ times. We find that on synthetic input datasets when executed using $16$ CPU cores of a $32$-core machine, our multi-threaded program executes about as fast as the state of the art parallel GPU-based program using an NVIDIA GTX 580 graphics processing unit.
In this paper, we introduce a new framework for robust multiple signal classification (MUSIC). The proposed framework, called robust measure-transformed (MT) MUSIC, is based on applying a transform to the probability distribution of the received signals, i.e., transformation of the probability measure defined on the observation space. In robust MT-MUSIC, the sample covariance is replaced by the empirical MT-covariance. By judicious choice of the transform we show that: 1) the resulting empirical MT-covariance is B-robust, with bounded influence function that takes negligible values for large norm outliers, and 2) under the assumption of spherically contoured noise distribution, the noise subspace can be determined from the eigendecomposition of the MT-covariance. Furthermore, we derive a new robust measure-transformed minimum description length (MDL) criterion for estimating the number of signals, and extend the MT-MUSIC framework to the case of coherent signals. The proposed approach is illustrated in simulation examples that show its advantages as compared to other robust MUSIC and MDL generalizations.
Structure learning is a central objective of statistical causal inference. While quite a few methods exist for directed acyclic graphs (DAGs), the case for more general model classes remains challenging. In this paper we present a greedy algorithm for structure learning with bow-free acyclic path diagrams (BAPs) with a Gaussian linear parametrization, which can be viewed as a generalization of Gaussian linear DAG models to the setting with hidden variables. In contrast to maximal ancestral graph (MAG) models, BAPs incorporate more constraints than conditional independencies and consequently more structure can be learned. We also investigate some distributional equivalence properties of BAPs which are used in an algorithmic approach to compute (nearly) equivalent model structures, allowing to infer lower bounds of causal effects. Of independent interest might be our very general proof of Wright’s path tracing formula as well as sufficient conditions for distributional equivalence in acyclic path diagrams. The application of our method to some datasets reveals that BAP models can represent the data much better than DAG models.
The output scores of a neural network classifier are converted to probabilities via normalizing over the scores of all competing categories. Computing this partition function, $Z$, is then linear in the number of categories, which is problematic as real-world problem sets continue to grow in categorical types, such as in visual object recognition or discriminative language modeling. We propose three approaches for sublinear estimation of the partition function, based on approximate nearest neighbor search and kernel feature maps and compare the performance of the proposed approaches empirically.
Many real-world problems involve massive amounts of data. Under these circumstances learning algorithms often become prohibitively expensive, making scalability a pressing issue to be addressed. A common approach is to perform sampling to reduce the size of the dataset and enable efficient learning. Alternatively, one customizes learning algorithms to achieve scalability. In either case, the key challenge is to obtain algorithmic efficiency without compromising the quality of the results. In this paper we discuss a meta-learning algorithm (PSBML) which combines features of parallel algorithms with concepts from ensemble and boosting methodologies to achieve the desired scalability property. We present both theoretical and empirical analyses which show that PSBML preserves a critical property of boosting, specifically, convergence to a distribution centered around the margin. We then present additional empirical analyses showing that this meta-level algorithm provides a general and effective framework that can be used in combination with a variety of learning classifiers. We perform extensive experiments to investigate the tradeoff achieved between scalability and accuracy, and robustness to noise, on both synthetic and real-world data. These empirical results corroborate our theoretical analysis, and demonstrate the potential of PSBML in achieving scalability without sacrificing accuracy.
The use of weights provides an effective strategy to incorporate prior domain knowledge in large-scale inference. This paper studies weighted multiple testing in a decision-theoretic framework. We develop oracle and data-driven procedures that aim to maximize the expected number of true positives subject to a constraint on the weighted false discovery rate. The asymptotic validity and optimality of the proposed methods are established. The results demonstrate that incorporating informative domain knowledge enhances the interpretability of results and precision of inference. Simulation studies show that the proposed method controls the error rate at the nominal level, and the gain in power over existing methods is substantial in many settings. An application to genome-wide association study is discussed.