Pabon Lasso Pabon Lasso is a graphical method for monitoring the efficiency of different wards of a hospital or different hospitals.Pabon Lasso graph is divided into 4 parts which are created after drawing the average of BTR and BOR. The part in the left-down side is Zone I, left-up side is Zone II, Right-up side part is Zone III and the last part is Zone IV. PabonLasso Pachinko Allocation Machine(PAM) ➘ “Pachinko Allocation Model” Variational Inference In Pachinko Allocation Machines Pachinko Allocation Model(PAM) In machine learning and natural language processing, the pachinko allocation model (PAM) is a topic model. Topic models are a suite of algorithms to uncover the hidden thematic structure of a collection of documents. The algorithm improves upon earlier topic models such as latent Dirichlet allocation (LDA) by modeling correlations between topics in addition to the word correlations which constitute topics. PAM provides more flexibility and greater expressive power than latent Dirichlet allocation. While first described and implemented in the context of natural language processing, the algorithm may have applications in other fields such as bioinformatics. The model is named for pachinko machines – a game popular in Japan, in which metal balls bounce down around a complex collection of pins until they land in various bins at the bottom. http://…/pam-icml06.pdf Pachinkogram Conditional Probabilities Visualisation Pachyderm MapReduce without Hadoop Analyze massive datasets with Docker: Pachyderm is an open source MapReduce engine that uses Docker containers for distributed computations. Pachyderm is a completely new MapReduce engine built on top of modern tools. The biggest benefit of starting from scratch is that we get to leverage amazing advances in open source infrastructure, such as Docker and CoreOS. https://…/pfs Replacing Hadoop Packet Capture(pcap) In the field of computer network administration, pcap (packet capture) consists of an application programming interface (API) for capturing network traffic. Unix-like systems implement pcap in the libpcap library; Windows uses a port of libpcap known as WinPcap. Monitoring software may use libpcap and/or WinPcap to capture packets travelling over a network and, in newer versions, to transmit packets on a network at the link layer, as well as to get a list of network interfaces for possible use with libpcap or WinPcap. The pcap API is written in C, so other languages such as Java, .NET languages, and scripting languages generally use a wrapper; no such wrappers are provided by libpcap or WinPcap itself. C++ programs may link directly to the C API or use an object-oriented wrapper. Packing(PacGAN) Generative adversarial networks (GANs) are innovative techniques for learning generative models of complex data distributions from samples. Despite remarkable recent improvements in generating realistic images, one of their major shortcomings is the fact that in practice, they tend to produce samples with little diversity, even when trained on diverse datasets. This phenomenon, known as mode collapse, has been the main focus of several recent advances in GANs. Yet there is little understanding of why mode collapse happens and why existing approaches are able to mitigate mode collapse. We propose a principled approach to handling mode collapse, which we call packing. The main idea is to modify the discriminator to make decisions based on multiple samples from the same class, either real or artificially generated. We borrow analysis tools from binary hypothesis testing—in particular the seminal result of Blackwell [Bla53]—to prove a fundamental connection between packing and mode collapse. We show that packing naturally penalizes generators with mode collapse, thereby favoring generator distributions with less mode collapse during the training process. Numerical experiments on benchmark datasets suggests that packing provides significant improvements in practice as well. Padé Approximant In mathematics a Padé approximant is the ‘best’ approximation of a function by a rational function of given order – under this technique, the approximant’s power series agrees with the power series of the function it is approximating. The technique was developed around 1890 by Henri Padé, but goes back to Georg Frobenius who introduced the idea and investigated the features of rational approximations of power series. The Padé approximant often gives better approximation of the function than truncating its Taylor series, and it may still work where the Taylor series does not converge. For these reasons Padé approximants are used extensively in computer calculations. They have also been used as auxiliary functions in Diophantine approximation and transcendental number theory, though for sharp results ad hoc methods in some sense inspired by the Padé theory typically replace them. http://…ing Padé Approximant Coefficients Using R Pade PageRank PageRank is an algorithm used by Google Search to rank websites in their search engine results. PageRank was named after Larry Page, one of the founders of Google. PageRank is a way of measuring the importance of website pages. According to Google: PageRank works by counting the number and quality of links to a page to determine a rough estimate of how important the website is. The underlying assumption is that more important websites are likely to receive more links from other websites. PAI Data The Project PAI Data Protocol (‘PAI Data’) is a specification that extends the Project PAI Blockchain Protocol to include a method of securing and provisioning access to arbitrary data. In the context of PAI Coin Development Proposal (PDP) 2, this paper defines two important transaction types that PAI Data supports: Storage Transactions, which facilitate storage of data and proof of ownership, and Sharing Transactions, designed to enable granting and revocation of data access to designated recipients. A comparative analysis of PAI Data against similar blockchain-based file storage systems is also presented. PaintBot We propose a new automated digital painting framework, based on a painting agent trained through reinforcement learning. To synthesize an image, the agent selects a sequence of continuous-valued actions representing primitive painting strokes, which are accumulated on a digital canvas. Action selection is guided by a given reference image, which the agent attempts to replicate subject to the limitations of the action space and the agent’s learned policy. The painting agent policy is determined using a variant of proximal policy optimization reinforcement learning. During training, our agent is presented with patches sampled from an ensemble of reference images. To accelerate training convergence, we adopt a curriculum learning strategy, whereby reference patches are sampled according to how challenging they are using the current policy. We experiment with differing loss functions, including pixel-wise and perceptual loss, which have consequent differing effects on the learned policy. We demonstrate that our painting agent can learn an effective policy with a high dimensional continuous action space comprising pen pressure, width, tilt, and color, for a variety of painting styles. Through a coarse-to-fine refinement process our agent can paint arbitrarily complex images in the desired style. pair2vec Reasoning about implied relationships (e.g. paraphrastic, common sense, encyclopedic) between pairs of words is crucial for many cross-sentence inference problems. This paper proposes new methods for learning and using embeddings of word pairs that implicitly represent background knowledge about such relationships. Our pairwise embeddings are computed as a compositional function of each word’s representation, which is learned by maximizing the pointwise mutual information (PMI) with the contexts in which the the two words co-occur. We add these representations to the cross-sentence attention layer of existing inference models (e.g. BiDAF for QA, ESIM for NLI), instead of extending or replacing existing word embeddings. Experiments show a gain of 2.72% on the recently released SQuAD 2.0 and 1.3% on MultiNLI. Our representations also aid in better generalization with gains of around 6-7% on adversarial SQuAD datasets, and 8.8% on the adversarial entailment test set by Glockner et al. Paired Lasso Regression palasso Paired Leave-one-out Potential Outcome(P-LOOP) In paired experiments, participants are grouped into pairs with similar characteristics, and one observation from each pair is randomly assigned to treatment. Because of both the pairing and the randomization, the treatment and control groups should be well balanced; however, there may still be small chance imbalances. It may be possible to improve the precision of the treatment effect estimate by adjusting for these imbalances. Building on related work for completely randomized experiments, we propose the P-LOOP (paired leave-one-out potential outcomes) estimator for paired experiments. We leave out each pair and then impute its potential outcomes using any prediction algorithm. The imputation method is flexible; for example, we could use lasso or random forests. While similar methods exist for completely randomized experiments, covariate adjustment methods in paired experiments are relatively understudied. A unique trade-off exists for paired experiments, where it can be unclear whether to factor in pair assignments when making adjustments. We address this issue in the P-LOOP estimator by automatically deciding whether to account for the pairing when imputing the potential outcomes. By addressing this trade-off, the method has the potential to improve precision over existing methods. Pairwise Augmented GAN We propose a novel autoencoding model called Pairwise Augmented GANs. We train a generator and an encoder jointly and in an adversarial manner. The generator network learns to sample realistic objects. In turn, the encoder network at the same time is trained to map the true data distribution to the prior in latent space. To ensure good reconstructions, we introduce an augmented adversarial reconstruction loss. Here we train a discriminator to distinguish two types of pairs: an object with its augmentation and the one with its reconstruction. We show that such adversarial loss compares objects based on the content rather than on the exact match. We experimentally demonstrate that our model generates samples and reconstructions of quality competitive with state-of-the-art on datasets MNIST, CIFAR10, CelebA and achieves good quantitative results on CIFAR10. Pairwise Differentiable Gradient Descent(PDGD) Online Learning to Rank (OLTR) methods optimize rankers based on user interactions. State-of-the-art OLTR methods are built specifically for linear models. Their approaches do not extend well to non-linear models such as neural networks. We introduce an entirely novel approach to OLTR that constructs a weighted differentiable pairwise loss after each interaction: Pairwise Differentiable Gradient Descent (PDGD). PDGD breaks away from the traditional approach that relies on interleaving or multileaving and extensive sampling of models to estimate gradients. Instead, its gradient is based on inferring preferences between document pairs from user clicks and can optimize any differentiable model. We prove that the gradient of PDGD is unbiased w.r.t. user document pair preferences. Our experiments on the largest publicly available Learning to Rank (LTR) datasets show considerable and significant improvements under all levels of interaction noise. PDGD outperforms existing OLTR methods both in terms of learning speed as well as final convergence. Furthermore, unlike previous OLTR methods, PDGD also allows for non-linear models to be optimized effectively. Our results show that using a neural network leads to even better performance at convergence than a linear model. In summary, PDGD is an efficient and unbiased OLTR approach that provides a better user experience than previously possible. Pairwise Inner Product(PIP) In this paper, we provide a theoretical understanding of word embedding and its dimensionality. Motivated by the unitary-invariance of word embedding, we propose the Pairwise Inner Product (PIP) loss, a novel metric on the dissimilarity between word embeddings. Using techniques from matrix perturbation theory, we reveal a fundamental bias-variance trade-off in dimensionality selection for word embeddings. This bias-variance trade-off sheds light on many empirical observations which were previously unexplained, for example the existence of an optimal dimensionality. Moreover, new insights and discoveries, like when and how word embeddings are robust to over-fitting, are revealed. By optimizing over the bias-variance trade-off of the PIP loss, we can explicitly answer the open question of dimensionality selection for word embedding. Pairwise Issue Expansion A public decision-making problem consists of a set of issues, each with multiple possible alternatives, and a set of competing agents, each with a preferred alternative for each issue. We study adaptations of market economies to this setting, focusing on binary issues. Issues have prices, and each agent is endowed with artificial currency that she can use to purchase probability for her preferred alternatives (we allow randomized outcomes). We first show that when each issue has a single price that is common to all agents, market equilibria can be arbitrarily bad. This negative result motivates a different approach. We present a novel technique called ‘pairwise issue expansion’, which transforms any public decision-making instance into an equivalent Fisher market, the simplest type of private goods market. This is done by expanding each issue into many goods: one for each pair of agents who disagree on that issue. We show that the equilibrium prices in the constructed Fisher market yield a ‘pairwise pricing equilibrium’ in the original public decision-making problem which maximizes Nash welfare. More broadly, pairwise issue expansion uncovers a powerful connection between the public decision-making and private goods settings; this immediately yields several interesting results about public decisions markets, and furthers the hope that we will be able to find a simple iterative voting protocol that leads to near-optimum decisions. Pairwise Median Scaled Difference(MSD) A new robust pairwise statistic, the pairwise median scaled difference (MSD), is proposed for the detection of anomalous location/uncertainty pairs in heteroscedastic interlaboratory study data with associated uncertainties. The distribution for the IID case is presented and approximate critical values for routine use are provided. The determination of observation-specific quantiles and p-values for heteroscedastic data, using parametric bootstrapping, is demonstrated by example. It is shown that the statistic has good power for detecting anomalies compared to a previous pairwise statistic, and offers much greater resistance to multiple outlying values. PaloBoost Stochastic Gradient TreeBoost is often found in many winning solutions in public data science challenges. Unfortunately, the best performance requires extensive parameter tuning and can be prone to overfitting. We propose PaloBoost, a Stochastic Gradient TreeBoost model that uses novel regularization techniques to guard against overfitting and is robust to parameter settings. PaloBoost uses the under-utilized out-of-bag samples to perform gradient-aware pruning and estimate adaptive learning rates. Unlike other Stochastic Gradient TreeBoost models that use the out-of-bag samples to estimate test errors, PaloBoost treats the samples as a second batch of training samples to prune the trees and adjust the learning rates. As a result, PaloBoost can dynamically adjust tree depths and learning rates to achieve faster learning at the start and slower learning as the algorithm converges. We illustrate how these regularization techniques can be efficiently implemented and propose a new formula for calculating feature importance to reflect the node coverages and learning rates. Extensive experimental results on seven datasets demonstrate that PaloBoost is robust to overfitting, is less sensitivity to the parameters, and can also effectively identify meaningful features. PANDA In this paper we consider a distributed convex optimization problem over time-varying networks. We propose a dual method that converges R-linearly to the optimal point given that the agents’ objective functions are strongly convex and have Lipschitz continuous gradients. The proposed method requires half the amount of variable exchanges per iterate than methods based on DIGing, and yields improved practical performance as empirically demonstrated. Pan-Density Network(PaDNet) Crowd counting in varying density scenes is a challenging problem in artificial intelligence (AI) and pattern recognition. Recently, deep convolutional neural networks (CNNs) are used to tackle this problem. However, the single-column CNN cannot achieve high accuracy and robustness in diverse density scenes. Meanwhile, multi-column CNNs lack effective way to accurately learn the features of different scales for estimating crowd density. To address these issues, we propose a novel pan-density level deep learning model, named as Pan-Density Network (PaDNet). Specifically, the PaDNet learns multi-scale features by three steps. First, several sub-networks are pre-trained on crowd images with different density-levels. Then, a Scale Reinforcement Net (SRN) is utilized to reinforce the scale features. Finally, a Fusion Net fuses all of the scale features to generate the final density map. Experiments on four crowd counting benchmark datasets, the ShanghaiTech, the UCF\_CC\_50, the UCSD, and the UCF-QRNF, indicate that the PaDNet achieves the best performance and has high robustness in pan-density crowd counting compared with other state-of-the-art algorithms. Pando Volunteer computing is currently successfully used to make hundreds of thousands of machines available free-of-charge to projects of general interest. However the effort and cost involved in participating in and launching such projects may explain why only a few high-profile projects use it and why only 0.1% of Internet users participate in them. In this paper we present Pando, a new web-based volunteer computing system designed to be easy to deploy and which does not require dedicated servers. The tool uses new demand-driven stream abstractions and a WebRTC overlay based on a fat tree for connecting volunteers. Together the stream abstractions and the fat-tree overlay enable a thousand browser tabs running on multiple machines to be used for computation, enough to tap into all machines bought as part of previous hardware investments made by a small- or medium-company or a university department. Moreover the approach is based on a simple programming model that should be both easy to use by itself by JavaScript programmers and as a compilation target by compiler writers. We provide a command-line version of the tool and all scripts and procedures necessary to replicate the experiments we made on the Grid5000 testbed. Panel Analysis of Nonstationarity in Idiosyncratic and Common Components(PANIC) Decomposing a mutivariate time series into common factors and idiosyncratic components, a method called PANIC (Panel Analysis of Non-stationary in Idiosyncratic and Common components) is suggested by Bai and Ng (2004). PANICr Panel Data In statistics and econometrics, panel data or longitudinal data are multi-dimensional data involving measurements over time. Panel data contain observations of multiple phenomena obtained over multiple time periods for the same firms or individuals. Time series and cross-sectional data can be thought of as special cases of panel data that are in one dimension only (one panel member or individual for the former, one time point for the latter). A study that uses panel data is called a longitudinal study or panel study. ivpanel Panel Vector Autoregression(PVAR) We extend two general methods of moment estimators to panel vector autoregression models (PVAR) with p lags of endogenous variables, predetermined and strictly exogenous variables. This general PVAR model contains the first difference GMM estimator by Holtz-Eakin et al. (1988) , Arellano and Bond (1991) and the system GMM estimator by Blundell and Bond (1998) . We also provide specification tests (Hansen overidentification test, lag selection criterion and stability test of the PVAR polynomial) and classical structural analysis for PVAR models such as orthogonal and generalized impulse response functions, bootstrapped confidence intervals for impulse response analysis and forecast error variance decompositions. panelvar PANFIS++ The concept of evolving intelligent system (EIS) provides an effective avenue for data stream mining because it is capable of coping with two prominent issues: online learning and rapidly changing environments. We note at least three uncharted territories of existing EISs: data uncertainty, temporal system dynamic, redundant data streams. This book chapter aims at delivering a concrete solution of this problem with the algorithmic development of a novel learning algorithm, namely PANFIS++. PANFIS++ is a generalized version of the PANFIS by putting forward three important components: 1) An online active learning scenario is developed to overcome redundant data streams. This module allows to actively select data streams for the training process, thereby expediting execution time and enhancing generalization performance, 2) PANFIS++ is built upon an interval type-2 fuzzy system environment, which incorporates the so-called footprint of uncertainty. This component provides a degree of tolerance for data uncertainty. 3) PANFIS++ is structured under a recurrent network architecture with a self-feedback loop. This is meant to tackle the temporal system dynamic. The efficacy of the PANFIS++ has been numerically validated through numerous real-world and synthetic case studies, where it delivers the highest predictive accuracy while retaining the lowest complexity. Pangea Storage and memory systems for modern data analytics are heavily layered, managing shared persistent data, cached data, and non- shared execution data in separate systems such as distributed file system like HDFS, in-memory file system like Alluxio and computation framework like Spark. Such layering introduces significant performance and management costs for copying data across layers redundantly and deciding proper resource allocation for all layers. In this paper we propose a single system called Pangea that can manage all data—both intermediate and long-lived data, and their buffer/caching, data placement optimization, and failure recovery—all in one monolithic storage system, without any layering. We present a detailed performance evaluation of Pangea and show that its performance compares favorably with several widely used layered systems such as Spark. Pangloss Entity linking is the task of mapping potentially ambiguous terms in text to their constituent entities in a knowledge base like Wikipedia. This is useful for organizing content, extracting structured data from textual documents, and in machine learning relevance applications like semantic search, knowledge graph construction, and question answering. Traditionally, this work has focused on text that has been well-formed, like news articles, but in common real world datasets such as messaging, resumes, or short-form social media, non-grammatical, loosely-structured text adds a new dimension to this problem. This paper presents Pangloss, a production system for entity disambiguation on noisy text. Pangloss combines a probabilistic linear-time key phrase identification algorithm with a semantic similarity engine based on context-dependent document embeddings to achieve better than state-of-the-art results (>5% in F1) compared to other research or commercially available systems. In addition, Pangloss leverages a local embedded database with a tiered architecture to house its statistics and metadata, which allows rapid disambiguation in streaming contexts and on-device disambiguation in low-memory environments such as mobile phones. PanJoin In stream processing, stream join is one of the critical sources of performance bottlenecks. The sliding-window-based stream join provides a precise result but consumes considerable computational resources. The current solutions lack support for the join predicates on large windows. These algorithms and their hardware accelerators are either limited to equi-join or use a nested loop join to process all the requests. In this paper, we present a new algorithm called PanJoin which has high throughput on large windows and supports both equi-join and non-equi-join. PanJoin implements three new data structures to reduce computations during the probing phase of stream join. We also implement the most hardware-friendly data structure, called BI-Sort, on FPGA. Our evaluation shows that PanJoin outperforms several recently proposed stream join methods by more than 1000x, and it also adapts well to highly skewed data. Panoptic Feature Pyramid Network(Panoptic FPN) The recently introduced panoptic segmentation task has renewed our community’s interest in unifying the tasks of instance segmentation (for thing classes) and semantic segmentation (for stuff classes). However, current state-of-the-art methods for this joint task use separate and dissimilar networks for instance and semantic segmentation, without performing any shared computation. In this work, we aim to unify these methods at the architectural level, designing a single network for both tasks. Our approach is to endow Mask R-CNN, a popular instance segmentation method, with a semantic segmentation branch using a shared Feature Pyramid Network (FPN) backbone. Surprisingly, this simple baseline not only remains effective for instance segmentation, but also yields a lightweight, top-performing method for semantic segmentation. In this work, we perform a detailed study of this minimally extended version of Mask R-CNN with FPN, which we refer to as Panoptic Feature Pyramid Network, and show it is a robust and accurate baseline for both tasks. Given its effectiveness and conceptual simplicity, we hope our method can serve as a strong baseline and aid future research in panoptic segmentation. PanopticFusion We propose PanopticFusion, a novel online volumetric semantic mapping system at the level of stuff and things. In contrast to previous semantic mapping systems, PanopticFusion is able to densely predict class labels of a background region (stuff) and individually segment arbitrary foreground objects (things). In addition, our system has the capability to reconstruct a large-scale scene and extract a labeled mesh thanks to its use of a spatially hashed volumetric map representation. Our system first predicts pixel-wise panoptic labels (class labels for stuff regions and instance IDs for thing regions) for incoming RGB frames by fusing 2D semantic and instance segmentation outputs. The predicted panoptic labels are integrated into the volumetric map together with depth measurements while keeping the consistency of the instance IDs, which could vary frame to frame, by referring to the 3D map at that moment. In addition, we construct a fully connected conditional random field (CRF) model with respect to panoptic labels for map regularization. For online CRF inference, we propose a novel unary potential approximation and a map division strategy. We evaluated the performance of our system on the ScanNet (v2) dataset. PanopticFusion outperformed or compared with state-of-the-art offline 3D DNN methods in both semantic and instance segmentation benchmarks. Also, we demonstrate a promising augmented reality application using a 3D panoptic map generated by the proposed system. Pan-Sharpening Generative Adversarial Network(PSGAN) Remote sensing image fusion (also known as pan-sharpening) aims to generate a high resolution multi-spectral image from inputs of a high spatial resolution single band panchromatic (PAN) image and a low spatial resolution multi-spectral (MS) image. In this paper, we propose PSGAN, a generative adversarial network (GAN) for remote sensing image pan-sharpening. To the best of our knowledge, this is the first attempt at producing high quality pan-sharpened images with GANs. The PSGAN consists of two parts. Firstly, a two-stream fusion architecture is designed to generate the desired high resolution multi-spectral images, then a fully convolutional network serving as a discriminator is applied to distinct ‘real’ or ‘pan-sharpened’ MS images. Experiments on images acquired by Quickbird and GaoFen-1 satellites demonstrate that the proposed PSGAN can fuse PAN and MS images effectively and significantly improve the results over the state of the art traditional and CNN based pan-sharpening methods. PARAFAC Tensor Decomposition ➘ “Tensor Rank Decomposition” Paragraph Vector Many machine learning algorithms require the input to be represented as a fixed-length feature vector. When it comes to texts, one of the most common fixed-length features is bag-of-words. Despite their popularity, bag-of-words features have two major weaknesses: they lose the ordering of the words and they also ignore semantics of the words. For example, ‘powerful,’ ‘strong’ and ‘Paris’ are equally distant. In this paper, we propose Paragraph Vector, an unsupervised algorithm that learns fixed-length feature representations from variable-length pieces of texts, such as sentences, paragraphs, and documents. Our algorithm represents each document by a dense vector which is trained to predict words in the document. Its construction gives our algorithm the potential to overcome the weaknesses of bag-of-words models. Empirical results show that Paragraph Vectors outperform bag-of-words models as well as other techniques for text representations. Finally, we achieve new state-of-the-art results on several text classification and sentiment analysis tasks. GitXiv Paragraph Vector-based Matrix Factorization Recommender System(ParVecMF) Review-based recommender systems have gained noticeable ground in recent years. In addition to the rating scores, those systems are enriched with textual evaluations of items by the users. Neural language processing models, on the other hand, have already found application in recommender systems, mainly as a means of encoding user preference data, with the actual textual description of items serving only as side information. In this paper, a novel approach to incorporating the aforementioned models into the recommendation process is presented. Initially, a neural language processing model and more specifically the paragraph vector model is used to encode textual user reviews of variable length into feature vectors of fixed length. Subsequently this information is fused along with the rating scores in a probabilistic matrix factorization algorithm, based on maximum a-posteriori estimation. The resulting system, ParVecMF, is compared to a ratings’ matrix factorization approach on a reference dataset. The obtained preliminary results on a set of two metrics are encouraging and may stimulate further research in this area. ParaGraphE Knowledge graph embedding aims at translating the knowledge graph into numerical representations by transforming the entities and relations into con- tinuous low-dimensional vectors. Recently, many methods [1, 5, 3, 2, 6] have been proposed to deal with this problem, but existing single-thread implemen- tations of them are time-consuming for large-scale knowledge graphs. Here, we design a unified parallel framework to parallelize these methods, which achieves a significant time reduction without in uencing the accuracy. We name our framework as ParaGraphE, which provides a library for parallel knowledge graph embedding. The source code can be downloaded from https: //github.com/LIBBLE/LIBBLE-MultiThread/tree/master/ParaGraphE. Parallax The employment of high-performance servers and GPU accelerators for training deep neural network models have greatly accelerated recent advances in machine learning (ML). ML frameworks, such as TensorFlow, MXNet, and Caffe2, have emerged to assist ML researchers to train their models in a distributed fashion. However, correctly and efficiently utilizing multiple machines and GPUs is still not a straightforward task for framework users due to the non-trivial correctness and performance challenges that arise in the distribution process. This paper introduces Parallax, a tool for automatic parallelization of deep learning training in distributed environments. Parallax not only handles the subtle correctness issues, but also leverages various optimizations to minimize the communication overhead caused by scaling out. Experiments show that Parallax built atop TensorFlow achieves scalable training throughput on multiple CNN and RNN models, while requiring little effort from its users. Parallel Amplitudes Distribution Matcher(PADM) A distribution matcher (DM) maps a binary input sequence into a block of nonuniformly distributed symbols. To facilitate the implementation of shaped signaling, fast DM solutions with high throughput and low serialism are required. We propose a novel DM architecture with parallel amplitudes (PADM) for which m component DMs, each with a different binary output alphabet, are operated in parallel in order to generate a shaped sequence with m amplitudes. With negligible rate loss compared to a single nonbinary DM, PA-DM has a parallelization factor that grows linearly with m, and the component DMs have reduced output lengths. For such binary-output DMs, a novel constant-composition DM (CCDM) algorithm based on subset ranking (SR) is proposed. We present SR-CCDM algorithms that are serial in the minimum number of occurrences of either binary symbol for mapping and fully parallel in demapping. For distributions that are optimized for the additive white Gaussian noise (AWGN) channel, we numerically show that PA-DM combined with SR-CCDM can reduce the number of sequential processing steps by more than an order of magnitude, while having a rate loss that is comparable to conventional nonbinary CCDM with arithmetic coding. Parallel and Interacting Stochastic Approximation Annealing(PISAA) We present the parallel and interacting stochastic approximation annealing (PISAA) algorithm, a stochastic simulation procedure for global optimisation, that extends and improves the stochastic approximation annealing (SAA) by using population Monte Carlo ideas. The standard SAA algorithm guarantees convergence to the global minimum when a square-root cooling schedule is used; however the efficiency of its performance depends crucially on its self-adjusting mechanism. Because its mechanism is based on information obtained from only a single chain, SAA may present slow convergence in complex optimisation problems. The proposed algorithm involves simulating a population of SAA chains that interact each other in a manner that ensures significant improvement of the self-adjusting mechanism and better exploration of the sampling space. Central to the proposed algorithm are the ideas of (i) recycling information from the whole population of Markov chains to design a more accurate/stable self-adjusting mechanism and (ii) incorporating more advanced proposals, such as crossover operations, for the exploration of the sampling space. PISAA presents a significantly improved performance in terms of convergence. PISAA can be implemented in parallel computing environments if available. We demonstrate the good performance of the proposed algorithm on challenging applications including Bayesian network learning and protein folding. Our numerical comparisons suggest that PISAA outperforms the simulated annealing, stochastic approximation annealing, and annealing evolutionary stochastic approximation Monte Carlo especially in high dimensional or rugged scenarios. Parallel Augmented Maps(PAM) In this paper we introduce an interface for supporting ordered maps that are augmented to support quick ‘sums’ of values over ranges of the keys. We have implemented this interface as part of a C++ library called PAM (Parallel and Persistent Augmented Map library). This library supports a wide variety of functions on maps ranging from basic insertion and deletion to more interesting functions such as union, intersection, difference, filtering, extracting ranges, splitting, and range-sums. The functions in the library are parallel, persistent (meaning that functions do not affect their inputs), and work-efficient. The underlying data structure is the augmented balanced binary search tree, which is a binary search tree in which each node is augmented with a value keeping the ‘sum’ of its subtree with respect to some user supplied function. With this augmentation the library can be directly applied to many settings such as to 2D range trees, interval trees, word index searching, and segment trees. The interface greatly simplifies the implementation of such data structures while it achieves efficiency that is significantly better than previous libraries. We tested our library and its corresponding applications. Experiments show that our implementation of set functions can get up to 50+ speedup on 72 cores. As for our range tree implementation, the sequential running time is more efficient than existing libraries such as CGAL, and can get up to 42+ speedup on 72 cores. Parallel Coordinates Parallel coordinates is a common way of visualizing high-dimensional geometry and analyzing multivariate data. To show a set of points in an n-dimensional space, a backdrop is drawn consisting of n parallel lines, typically vertical and equally spaced. A point in n-dimensional space is represented as a polyline with vertices on the parallel axes; the position of the vertex on the ith axis corresponds to the ith coordinate of the point. This visualization is closely related to time series visualization, except that it is applied to data where the axes do not correspond to points in time, and therefore do not have a natural order. Therefore, different axis arrangements may be of interest. visova,MASS,Acinonyx Parallel Data Assimilation Framework(PDAF) The Parallel Data Assimilation Framework – PDAF – is a software environment for ensemble data assimilation. PDAF simplifies the implementation of the data assimilation system with existing numerical models. With this, users can obtain a data assimilation system with less work and can focus on applying data assimilation. PDAF provides fully implemented and optimized data assimilation algorithms, in particular ensemble-based Kalman filters like LETKF and LSEIK. It allows users to easily test different assimilation algorithms and observations. PDAF is optimized for the application with large-scale models that usually run on big parallel computers and is applicable for operational applications. However, it is also well suited for smaller models and even toy models. PDAF provides a standardized interface that separates the numerical model from the assimilation routines. This allows to perform the further development of the assimilation methods and the model independently. New algorithmic developments can be readily made available through the interface such that they can be immediately applied with existing implementations. The test suite of PDAF provides small models for easy testing of algorithmic developments and for teaching data assimilation. PDAF is an open-source project. Its functionality will be further extended by input from research projects. In addition, users are welcome to contribute to the further enhancement of PDAF, e.g. by contributing additional assimilation methods or interface routines for different numerical models. Parallel External Memory(PEM) In this paper, we study parallel algorithms for private-cache chip multiprocessors (CMPs), focusing on methods for foundational problems that can scale to hundreds or even thousands of cores. By focusing on private-cache CMPs, we show that we can design efficient algorithms that need no additional assumptions about the way that cores are interconnected, for we assume that all inter-processor communication occurs through the memory hierarchy. We study several fundamental problems, including prefix sums, selection, and sorting, which often form the building blocks of other parallel algorithms. Indeed, we present two sorting algorithms, a distribution sort and a mergesort. All algorithms in the paper are asymptotically optimal in terms of the parallel cache accesses and space complexity under reasonable assumptions about the relationships between the number of processors, the size of memory, and the size of cache blocks. In addition, we study sorting lower bounds in a computational model, which we call the parallel external-memory (PEM) model, that formalizes the essential properties of our algorithms for private-cache chip multiprocessors. Parallel External Memory Algorithm(PEMA) ➘ “Parallel External Memory” Parallel Grid Pooling(PGP) Convolutional neural network (CNN) architectures utilize downsampling layers, which restrict the subsequent layers to learn spatially invariant features while reducing computational costs. However, such a downsampling operation makes it impossible to use the full spectrum of input features. Motivated by this observation, we propose a novel layer called parallel grid pooling (PGP) which is applicable to various CNN models. PGP performs downsampling without discarding any intermediate feature. It works as data augmentation and is complementary to commonly used data augmentation techniques. Furthermore, we demonstrate that a dilated convolution can naturally be represented using PGP operations, which suggests that the dilated convolution can also be regarded as a type of data augmentation technique. Experimental results based on popular image classification benchmarks demonstrate the effectiveness of the proposed method. Code is available at: https://…/akitotakeki Parallel Iterative Nonnegative Matrix Factorization(PARINOM) Matrix decomposition is ubiquitous and has applications in various fields like speech processing, data mining and image processing to name a few. Under matrix decomposition, nonnegative matrix factorization is used to decompose a nonnegative matrix into a product of two nonnegative matrices which gives some meaningful interpretation of the data. Thus, nonnegative matrix factorization has an edge over the other decomposition techniques. In this paper, we propose two novel iterative algorithms based on Majorization Minimization (MM)-in which we formulate a novel upper bound and minimize it to get a closed form solution at every iteration. Since the algorithms are based on MM, it is ensured that the proposed methods will be monotonic. The proposed algorithms differ in the updating approach of the two nonnegative matrices. The first algorithm-Iterative Nonnegative Matrix Factorization (INOM) sequentially updates the two nonnegative matrices while the second algorithm-Parallel Iterative Nonnegative Matrix Factorization (PARINOM) parallely updates them. We also prove that the proposed algorithms converge to the stationary point of the problem. Simulations were conducted to compare the proposed methods with the existing ones and was found that the proposed algorithms performs better than the existing ones in terms of computational speed and convergence. KeyWords: Nonnegative matrix factorization, Majorization Minimization, Big Data, Parallel, Multiplicative Update Parallel Locality-Optimized Non-negative Matrix Factorization(PL-NMF) Non-negative Matrix Factorization (NMF) is a key kernel for unsupervised dimension reduction used in a wide range of applications, including topic modeling, recommender systems and bioinformatics. Due to the compute-intensive nature of applications that must perform repeated NMF, several parallel implementations have been developed in the past. However, existing parallel NMF algorithms have not addressed data locality optimizations, which are critical for high performance since data movement costs greatly exceed the cost of arithmetic/logic operations on current computer systems. In this paper, we devise a parallel NMF algorithm based on the HALS (Hierarchical Alternating Least Squares) scheme that incorporates algorithmic transformations to enhance data locality. Efficient realizations of the algorithm on multi-core CPUs and GPUs are developed, demonstrating significant performance improvement over existing state-of-the-art parallel NMF algorithms. Parallel Matrix Condensation Calculating the log-determinant of a matrix is useful for statistical computations used in machine learning, such as generative learning which uses the log-determinant of the covariance matrix to calculate the log-likelihood of model mixtures. The log-determinant calculation becomes challenging as the number of variables becomes large. Therefore, finding a practical speedup for this computation can be useful. In this study, we present a parallel matrix condensation algorithm for calculating the log-determinant of a large matrix. We demonstrate that in a distributed environment, Parallel Matrix Condensation has several advantages over the well-known Parallel Gaussian Elimination. The advantages include high data distribution efficiency and less data communication operations. We test our Parallel Matrix Condensation against self-implemented Parallel Gaussian Elimination as well as ScaLAPACK (Scalable Linear Algebra Package) on 1000 x1000 to 8000×8000 for 1,2,4,8,16,32,64 and 128 processors. The results show that Matrix Condensation yields the best speed-up among all other tested algorithms. The code is available on https://…/MatrixCondensation Parallel Monte Carlo Graph Search(P-MCGS) Recently, there have been great interests in Monte Carlo Tree Search (MCTS) in AI research. Although the sequential version of MCTS has been studied widely, its parallel counterpart still lacks systematic study. This leads us to the following questions: \emph{how to design efficient parallel MCTS (or more general cases) algorithms with rigorous theoretical guarantee? Is it possible to achieve linear speedup?} In this paper, we consider the search problem on a more general acyclic one-root graph (namely, Monte Carlo Graph Search (MCGS)), which generalizes MCTS. We develop a parallel algorithm (P-MCGS) to assign multiple workers to investigate appropriate leaf nodes simultaneously. Our analysis shows that P-MCGS algorithm achieves linear speedup and that the sample complexity is comparable to its sequential counterpart. Parallel Pareto Local Search based on Decomposition(PPLS/D) Pareto Local Search (PLS) is a basic building block in many multiobjective metaheuristics. In this paper, Parallel Pareto Local Search based on Decomposition (PPLS/D) is proposed. PPLS/D decomposes the original search space into L subregions and executes L parallel search processes in these subregions simultaneously. Inside each subregion, the PPLS/D process is first guided by a scalar objective function to approach the Pareto set quickly, then it finds non-dominated solutions in this subregion. Our experimental studies on the multiobjective Unconstrained Binary Quadratic Programming problems (mUBQPs) with two to four objectives demonstrate the efficiency of PPLS/D. We investigate the behavior of PPLS/D to understand its working mechanism. Moreover, we propose a variant of PPLS/D called PPLS/D with Adaptive Expansion (PPLS/D-AE), in which each process can search other subregions after it converges in its own subregion. Its advantages and disadvantages have been studied. Parallel Predictive Entropy Search(PPES) We develop parallel predictive entropy search (PPES), a novel algorithm for Bayesian optimization of expensive black-box objective functions. At each iteration, PPES aims to select a batch of points which will maximize the information gain about the global maximizer of the objective. Well known strategies exist for suggesting a single evaluation point based on previous observations, while far fewer are known for selecting batches of points to evaluate in parallel. The few batch selection schemes that have been studied all resort to greedy methods to compute an optimal batch. To the best of our knowledge, PPES is the first non-greedy batch Bayesian optimization strategy. We demonstrate the benefit of this approach in optimization performance on both synthetic and real world applications, including problems in machine learning, rocket science and robotics. Parallel Random Forest(PRF) With the emergence of the big data age, the issue of how to obtain valuable knowledge from a dataset efficiently and accurately has attracted increasingly attention from both academia and industry. This paper presents a Parallel Random Forest (PRF) algorithm for big data on the Apache Spark platform. The PRF algorithm is optimized based on a hybrid approach combining data-parallel and task-parallel optimization. From the perspective of data-parallel optimization, a vertical data-partitioning method is performed to reduce the data communication cost effectively, and a data-multiplexing method is performed is performed to allow the training dataset to be reused and diminish the volume of data. From the perspective of task-parallel optimization, a dual parallel approach is carried out in the training process of RF, and a task Directed Acyclic Graph (DAG) is created according to the parallel training process of PRF and the dependence of the Resilient Distributed Datasets (RDD) objects. Then, different task schedulers are invoked for the tasks in the DAG. Moreover, to improve the algorithm’s accuracy for large, high-dimensional, and noisy data, we perform a dimension-reduction approach in the training process and a weighted voting approach in the prediction process prior to parallelization. Extensive experimental results indicate the superiority and notable advantages of the PRF algorithm over the relevant algorithms implemented by Spark MLlib and other studies in terms of the classification accuracy, performance, and scalability. Parallel Sets(ParSets) Parallel Sets (ParSets) is a visualization application for categorical data, like census and survey data, inventory, and many other kinds of data that can be summed up in a cross-tabulation. ParSets provide a simple, interactive way to explore and analyze such data. Parallel Sparse Tensor Algorithm Benchmark Suite(PASTA) Tensor methods have gained increasingly attention from various applications, including machine learning, quantum chemistry, healthcare analytics, social network analysis, data mining, and signal processing, to name a few. Sparse tensors and their algorithms become critical to further improve the performance of these methods and enhance the interpretability of their output. This work presents a sparse tensor algorithm benchmark suite (PASTA) for single- and multi-core CPUs. To the best of our knowledge, this is the first benchmark suite for sparse tensor world. PASTA targets on: 1) helping application users to evaluate different computer systems using its representative computational workloads; 2) providing insights to better utilize existed computer architecture and systems and inspiration for the future design. This benchmark suite is publicly released https://…/pasta. Parallel Temporal Neural Coding Network Temporal models based on recurrent neural networks have proven to be quite powerful in a wide variety of applications, including language modeling and speech processing. However, to train these models, one relies on back-propagation through time, which entails unfolding the network over many time steps, making the process of conducting credit assignment considerably more challenging. Furthermore, the nature of back-propagation itself does not permit the use of non-differentiable activation functions and is inherently sequential, making parallelization of the underlying training process very difficult. In this work, we propose the Parallel Temporal Neural Coding Network, a biologically inspired model trained by the local learning algorithm known as Local Representation Alignment, that aims to resolve the difficulties and problems that plague recurrent networks trained by back-propagation through time. Most notably, this architecture requires neither unrolling nor the derivatives of its internal activation functions. We compare our model and learning procedure to other online back-propagation-through-time alternatives (which also tend to be computationally expensive), including real-time recurrent learning, echo state networks, and unbiased online recurrent optimization, and show that it outperforms them on sequence modeling benchmarks such as Bouncing MNIST, a new benchmark we call Bouncing NotMNIST, and Penn Treebank. Notably, our approach can, in some instances, even outperform full back-propagation through time itself as well as variants such as sparse attentive back-tracking. Furthermore, we present promising experimental results that demonstrate our model’s ability to conduct zero-shot adaptation. Parallelizable Stack Long Short-Term Memory Stack Long Short-Term Memory (StackLSTM) is useful for various applications such as parsing and string-to-tree neural machine translation, but it is also known to be notoriously difficult to parallelize for GPU training due to the fact that the computations are dependent on discrete operations. In this paper, we tackle this problem by utilizing state access patterns of StackLSTM to homogenize computations with regard to different discrete operations. Our parsing experiments show that the method scales up almost linearly with increasing batch size, and our parallelized PyTorch implementation trains significantly faster compared to the Dynet C++ implementation. Parameter Hub(PHub) Most work in the deep learning systems community has focused on faster inference, but arriving at a trained model requires lengthy experiments. Accelerating training lets developers iterate faster and come up with better models. DNN training is often seen as a compute-bound problem, best done in a single large compute node with many GPUs. As DNNs get bigger, training requires going distributed. Distributed deep neural network (DDNN) training constitutes an important workload on the cloud. Larger DNN models and faster compute engines shift training performance bottleneck from computation to communication. Our experiments show existing DNN training frameworks do not scale in a typical cloud environment due to insufficient bandwidth and inefficient parameter server software stacks. We propose PHub, a high performance parameter server (PS) software design that provides an optimized network stack and a streamlined gradient processing pipeline to benefit common PS setups, and PBox, a balanced, scalable central PS hardware that fully utilizes PHub capabilities. We show that in a typical cloud environment, PHub can achieve up to 3.8x speedup over state-of-theart designs when training ImageNet. We discuss future directions of integrating PHub with programmable switches for in-network aggregation during training, leveraging the datacenter network topology to reduce bandwidth usage and localize data movement. Parameter Selection and Model Evaluation(PSME) Parameter Transfer Unit(PTU) Parameters in deep neural networks which are trained on large-scale databases can generalize across multiple domains, which is referred as ‘transferability’. Unfortunately, the transferability is usually defined as discrete states and it differs with domains and network architectures. Existing works usually heuristically apply parameter-sharing or fine-tuning, and there is no principled approach to learn a parameter transfer strategy. To address the gap, a parameter transfer unit (PTU) is proposed in this paper. The PTU learns a fine-grained nonlinear combination of activations from both the source and the target domain networks, and subsumes hand-crafted discrete transfer states. In the PTU, the transferability is controlled by two gates which are artificial neurons and can be learned from data. The PTU is a general and flexible module which can be used in both CNNs and RNNs. Experiments are conducted with various network architectures and multiple transfer domain pairs. Results demonstrate the effectiveness of the PTU as it outperforms heuristic parameter-sharing and fine-tuning in most settings. Parameter-Free Online Learning We introduce an efficient algorithmic framework for model selection in online learning, also known as parameter-free online learning. Departing from previous work, which has focused on highly structured function classes such as nested balls in Hilbert space, we propose a generic meta-algorithm framework that achieves online model selection oracle inequalities under minimal structural assumptions. We give the first computationally efficient parameter-free algorithms that work in arbitrary Banach spaces under mild smoothness assumptions; previous results applied only to Hilbert spaces. We further derive new oracle inequalities for matrix classes, non-nested convex sets, and $\mathbb{R}^{d}$ with generic regularizers. Finally, we generalize these results by providing oracle inequalities for arbitrary non-linear classes in the online supervised learning model. These results are all derived through a unified meta-algorithm scheme using a novel ‘multi-scale’ algorithm for prediction with expert advice based on random playout, which may be of independent interest. PArameterized Clipping acTivation(PACT) Deep learning algorithms achieve high classification accuracy at the expense of significant computation cost. To address this cost, a number of quantization schemes have been proposed – but most of these techniques focused on quantizing weights, which are relatively smaller in size compared to activations. This paper proposes a novel quantization scheme for activations during training – that enables neural networks to work well with ultra low precision weights and activations without any significant accuracy degradation. This technique, PArameterized Clipping acTivation (PACT), uses an activation clipping parameter $\alpha$ that is optimized during training to find the right quantization scale. PACT allows quantizing activations to arbitrary bit precisions, while achieving much better accuracy relative to published state-of-the-art quantization schemes. We show, for the first time, that both weights and activations can be quantized to 4-bits of precision while still achieving accuracy comparable to full precision networks across a range of popular models and datasets. We also show that exploiting these reduced-precision computational units in hardware can enable a super-linear improvement in inferencing performance due to a significant reduction in the area of accelerator compute engines coupled with the ability to retain the quantized model and activation data in on-chip memories. Parameters Read-Write Network(PRaWN) In this paper, we describe a general framework: Parameters Read-Write Networks (PRaWNs) to systematically analyze current neural models for multi-task learning, in which we find that existing models expect to disentangle features into different spaces while features learned in practice are still entangled in shared space, leaving potential hazards for other training or unseen tasks. We propose to alleviate this problem by incorporating an inductive bias into the process of multi-task learning, that each task can keep informed of not only the knowledge stored in other tasks but the way how other tasks maintain their knowledge. In practice, we achieve above inductive bias by allowing different tasks to communicate by passing both hidden variables and gradients explicitly. Experimentally, we evaluate proposed methods on three groups of tasks and two types of settings (\textsc{in-task} and \textsc{out-of-task}). Quantitative and qualitative results show their effectiveness. Parametric Gaussian Processes(PGP) This work introduces the concept of parametric Gaussian processes (PGPs), which is built upon the seemingly self-contradictory idea of making Gaussian processes parametric. Parametric Gaussian processes, by construction, are designed to operate in ‘big data’ regimes where one is interested in quantifying the uncertainty associated with noisy data. The proposed methodology circumvents the well-established need for stochastic variational inference, a scalable algorithm for approximating posterior distributions. The effectiveness of the proposed approach is demonstrated using an illustrative example with simulated data and a benchmark dataset in the airline industry with approximately $6$ million records. Parametric Model In statistics, a parametric model or parametric family or finite-dimensional model is a family of distributions that can be described using a finite number of parameters. These parameters are usually collected together to form a single k-dimensional parameter vector θ = (θ1, θ2, …, θk). Parametric models are contrasted with the semi-parametric, semi-nonparametric, and non-parametric models, all of which consist of an infinite set of ‘parameters’ for description. The distinction between these four classes is as follows: · in a ‘parametric’ model all the parameters are in finite-dimensional parameter spaces; · a model is ‘non-parametric’ if all the parameters are in infinite-dimensional parameter spaces; · a ‘semi-parametric’ model contains finite-dimensional parameters of interest and infinite-dimensional nuisance parameters; · a ‘semi-nonparametric’ model has both finite-dimensional and infinite-dimensional unknown parameters of interest. Some statisticians believe that the concepts ‘parametric’, ‘non-parametric’, and ‘semi-parametric’ are ambiguous. It can also be noted that the set of all probability measures has cardinality of continuum, and therefore it is possible to parametrize any model at all by a single number in (0,1) interval. This difficulty can be avoided by considering only ‘smooth’ parametric models. Parametric Portfolio Policies(PPP) We propose a novel approach to optimizing portfolios with large numbers of assets. We model directly the portfolio weight in each asset as a function of the asset’s characteristics. The coefficients of this function are found by optimizing the investor’s average utility of the portfolio’s return over the sample period. Our approach is computationally simple and easily modified and extended to capture the effect of transaction costs, for example, produces sensible portfolio weights, and offers robust performance in and out of sample. In contrast, the traditional approach of first modeling the joint distribution of returns and then solving for the corresponding optimal portfolio weights is not only difficult to implement for a large number of assets but also yields notoriously noisy and unstable results. We present an empirical implementation for the universe of all stocks in the CRSP-Compustat data set, exploiting the size, value, and momentum anomalies. Parametric Rectified Linear Unit(PReLU) Rectified activation units (rectifiers) are essential for state-of-the-art neural networks. In this work, we study rectifier neural networks for image classification from two aspects. First, we propose a Parametric Rectified Linear Unit (PReLU) that generalizes the traditional rectified unit. PReLU improves model fitting with nearly zero extra computational cost and little overfitting risk. Second, we derive a robust initialization method that particularly considers the rectifier nonlinearities. This method enables us to train extremely deep rectified models directly from scratch and to investigate deeper or wider network architectures. Based on our PReLU networks (PReLU-nets), we achieve 4.94% top-5 test error on the ImageNet 2012 classification dataset. This is a 26% relative improvement over the ILSVRC 2014 winner (GoogLeNet, 6.66%). To our knowledge, our result is the first to surpass human-level performance (5.1%,) on this visual recognition challenge. Parametrized Deep Q-Network(P-DQN) Most existing deep reinforcement learning (DRL) frameworks consider either discrete action space or continuous action space solely. Motivated by applications in computer games, we consider the scenario with discrete-continuous hybrid action space. To handle hybrid action space, previous works either approximate the hybrid space by discretization, or relax it into a continuous set. In this paper, we propose a parametrized deep Q-network (P- DQN) framework for the hybrid action space without approximation or relaxation. Our algorithm combines the spirits of both DQN (dealing with discrete action space) and DDPG (dealing with continuous action space) by seamlessly integrating them. Empirical results on a simulation example, scoring a goal in simulated RoboCup soccer and the solo mode in game King of Glory (KOG) validate the efficiency and effectiveness of our method. ParamRLS On the Impact of the Cutoff Time on the Performance of Algorithm Configurators ParaNet DenseNets have been shown to be a competitive model among recent convolutional network architectures. These networks utilize Dense Blocks, which are groups of densely connected layers where the output of a hidden layer is fed in as the input of every other layer following it. In this paper, we aim to improve certain aspects of DenseNet, especially when it comes to practicality. We introduce ParaNet, a new architecture that constructs three pipelines which allow for early inference. We additionally introduce a cascading mechanism such that different pipelines are able to share parameters, as well as logit matching between the outputs of the pipelines. We separately evaluate each of the newly introduced mechanisms of ParaNet, then evaluate our proposed architecture on CIFAR-100. Paranom In this paper, we present Paranom, a parallel anomaly dataset generator. We discuss its design and provide brief experimental results demonstrating its usefulness in improving the classification correctness of LSTM-AD, a state-of-the-art anomaly detection model. Paraphrase Adversaries from Word Scrambling(PAWS) Existing paraphrase identification datasets lack sentence pairs that have high lexical overlap without being paraphrases. Models trained on such data fail to distinguish pairs like flights from New York to Florida and flights from Florida to New York. This paper introduces PAWS (Paraphrase Adversaries from Word Scrambling), a new dataset with 108,463 well-formed paraphrase and non-paraphrase pairs with high lexical overlap. Challenging pairs are generated by controlled word swapping and back translation, followed by fluency and paraphrase judgments by human raters. State-of-the-art models trained on existing datasets have dismal performance on PAWS (<40% accuracy); however, including PAWS training data for these models improves their accuracy to 85% while maintaining performance on existing tasks. In contrast, models that do not capture non-local contextual information fail even with PAWS training examples. As such, PAWS provides an effective instrument for driving further progress on models that better exploit structure, context, and pairwise comparisons. Parenting Autonomous agents trained via reinforcement learning present numerous safety concerns: reward hacking, negative side effects, and unsafe exploration, among others. In the context of near-future autonomous agents, operating in environments where humans understand the existing dangers, human involvement in the learning process has proved a promising approach to AI Safety. Here we demonstrate that a precise framework for learning from human input, loosely inspired by the way humans parent children, solves a broad class of safety problems in this context. We show that our Parenting algorithm solves these problems in the relevant AI Safety gridworlds of Leike et al. (2017), that an agent can learn to outperform its parent as it ‘matures’, and that policies learnt through Parenting are generalisable to new environments. Pareto Depth Analysis(PDA) We consider the problem of identifying patterns in a data set that exhibit anomalous behavior, often referred to as anomaly detection. Similarity-based anomaly detection algorithms detect abnormally large amounts of similarity or dissimilarity, e.g.~as measured by nearest neighbor Euclidean distances between a test sample and the training samples. In many application domains there may not exist a single dissimilarity measure that captures all possible anomalous patterns. In such cases, multiple dissimilarity measures can be defined, including non-metric measures, and one can test for anomalies by scalarizing using a non-negative linear combination of them. If the relative importance of the different dissimilarity measures are not known in advance, as in many anomaly detection applications, the anomaly detection algorithm may need to be executed multiple times with different choices of weights in the linear combination. In this paper, we propose a method for similarity-based anomaly detection using a novel multi-criteria dissimilarity measure, the Pareto depth. The proposed Pareto depth analysis (PDA) anomaly detection algorithm uses the concept of Pareto optimality to detect anomalies under multiple criteria without having to run an algorithm multiple times with different choices of weights. The proposed PDA approach is provably better than using linear combinations of the criteria and shows superior performance on experiments with synthetic and real data sets. Pareto-Smoothed Importance Sampling(PSIS) While it’s always possible to compute a variational approximation to a posterior distribution, it can be difficult to discover problems with this approximation’. We propose two diagnostic algorithms to alleviate this problem. The Pareto-smoothed importance sampling (PSIS) diagnostic gives a goodness of fit measurement for joint distributions, while simultaneously improving the error in the estimate. The variational simulation-based calibration (VSBC) assesses the average performance of point estimates. Parikh Matrix Parikh Matrices are a newly developed tool for studying numerical properties of words in terms of their (scattered) subwords. They were introduced by Mateescu et al. in 2000 and continuously received attention from the research community ever since. Mateescu et al (2000) introduced an interesting new tool, called Parikh matrix, to study in terms of subwords, the numerical properties of words over an alphabet. The Parikh matrix gives more information than the well-known Parikh vector of a word which counts only occurrences of symbols in a word. Parity Model(ParM) Machine learning models are becoming the primary workhorses for many applications. Production services deploy models through prediction serving systems that take in queries and return predictions by performing inference on machine learning models. In order to scale to high query rates, prediction serving systems are run on many machines in cluster settings, and thus are prone to slowdowns and failures that inflate tail latency and cause violations of strict latency targets. Current approaches to reducing tail latency are inadequate for the latency targets of prediction serving, incur high resource overhead, or are inapplicable to the computations performed during inference. We present ParM, a novel, general framework for making use of ideas from erasure coding and machine learning to achieve low-latency, resource-efficient resilience to slowdowns and failures in prediction serving systems. ParM encodes multiple queries together into a single parity query and performs inference on the parity query using a parity model. A decoder uses the output of a parity model to reconstruct approximations of unavailable predictions. ParM uses neural networks to learn parity models that enable simple, fast encoders and decoders to reconstruct unavailable predictions for a variety of inference tasks such as image classification, speech recognition, and object localization. We build ParM atop an open-source prediction serving system and through extensive evaluation show that ParM improves overall accuracy in the face of unavailability with low latency while using 2-4$\times$ less additional resources than replication-based approaches. ParM reduces the gap between 99.9th percentile and median latency by up to $3.5\times$ compared to approaches that use an equal amount of resources, while maintaining the same median. ParlAI We introduce ParlAI (pronounced ‘par-lay’), an open-source software platform for dialog research implemented in Python, available at http://parl.ai. Its goal is to provide a unified framework for training and testing of dialog models, including multitask training, and integration of Amazon Mechanical Turk for data collection, human evaluation, and online/reinforcement learning. Over 20 tasks are supported in the first release, including popular datasets such as SQuAD, bAbI tasks, MCTest, WikiQA, QACNN, QADailyMail, CBT, bAbI Dialog, Ubuntu, OpenSubtitles and VQA. Included are examples of training neural models with PyTorch and Lua Torch, including both batch and hogwild training of memory networks and attentive LSTMs. Parle We propose a new algorithm called Parle for parallel training of deep networks that converges 2-4x faster than a data-parallel implementation of SGD, while achieving significantly improved error rates that are nearly state-of-the-art on several benchmarks including CIFAR-10 and CIFAR-100, without introducing any additional hyper-parameters. We exploit the phenomenon of flat minima that has been shown to lead to improved generalization error for deep networks. Parle requires very infrequent communication with the parameter server and instead performs more computation on each client, which makes it well-suited to both single-machine, multi-GPU settings and distributed implementations. Parrondo’s Paradox Parrondo’s paradox, a paradox in game theory, has been described as: A combination of losing strategies becomes a winning strategy. It is named after its creator, Juan Parrondo, who discovered the paradox in 1996. A more explanatory description is: ‘There exist pairs of games, each with a higher probability of losing than winning, for which it is possible to construct a winning strategy by playing the games alternately.’ Parrondo devised the paradox in connection with his analysis of the Brownian ratchet, a thought experiment about a machine that can purportedly extract energy from random heat motions popularized by physicist Richard Feynman. However, the paradox disappears when rigorously analyzed. Parseval Networks We introduce Parseval networks, a form of deep neural networks in which the Lipschitz constant of linear, convolutional and aggregation layers is constrained to be smaller than 1. Parseval networks are empirically and theoretically motivated by an analysis of the robustness of the predictions made by deep neural networks when their input is subject to an adversarial perturbation. The most important feature of Parseval networks is to maintain weight matrices of linear and convolutional layers to be (approximately) Parseval tight frames, which are extensions of orthogonal matrices to non-square matrices. We describe how these constraints can be maintained efficiently during SGD. We show that Parseval networks match the state-of-the-art in terms of accuracy on CIFAR-10/100 and Street View House Numbers (SVHN) while being more robust than their vanilla counterpart against adversarial examples. Incidentally, Parseval networks also tend to train faster and make a better usage of the full capacity of the networks. Parsimonious Adaptive Rejection Sampling Monte Carlo (MC) methods have become very popular in signal processing during the past decades. The adaptive rejection sampling (ARS) algorithms are well-known MC technique which draw efficiently independent samples from univariate target densities. The ARS schemes yield a sequence of proposal functions that converge toward the target, so that the probability of accepting a sample approaches one. However, sampling from the proposal pdf becomes more computationally demanding each time it is updated. We propose the Parsimonious Adaptive Rejection Sampling (PARS) method, where an efficient trade-off between acceptance rate and proposal complexity is obtained. Thus, the resulting algorithm is faster than the standard ARS approach. Parsimonious Bayesian Deep Network Combining Bayesian nonparametrics and a forward model selection strategy, we construct parsimonious Bayesian deep networks (PBDNs) that infer capacity-regularized network architectures from the data and require neither cross-validation nor fine-tuning when training the model. One of the two essential components of a PBDN is the development of a special infinite-wide single-hidden-layer neural network, whose number of active hidden units can be inferred from the data. The other one is the construction of a greedy layer-wise learning algorithm that uses a forward model selection criterion to determine when to stop adding another hidden layer. We develop both Gibbs sampling and stochastic gradient descent based maximum a posteriori inference for PBDNs, providing state-of-the-art classification accuracy and interpretable data subtypes near the decision boundaries, while maintaining low computational complexity for out-of-sample prediction. Parsimonious Gaussian Mixture Models McNicholas and Murphy (2008) , McNicholas (2010) , McNicholas and Murphy (2010) . pgmm Parsimonious Learning Machine(PALM) Data stream has been the underlying challenge in the age of big data because it calls for real-time data processing with the absence of a retraining process and/or an iterative learning approach. In realm of fuzzy system community, data stream is handled by algorithmic development of self-adaptive neurofuzzy systems (SANFS) characterized by the single-pass learning mode and the open structure property which enables effective handling of fast and rapidly changing natures of data streams. The underlying bottleneck of SANFSs lies in its design principle which involves a high number of free parameters (rule premise and rule consequent) to be adapted in the training process. This figure can even double in the case of type-2 fuzzy system. In this work, a novel SANFS, namely parsimonious learning machine (PALM), is proposed. PALM features utilization of a new type of fuzzy rule based on the concept of hyperplane clustering which significantly reduces the number of network parameters because it has no rule premise parameters. PALM is proposed in both type-1 and type-2 fuzzy systems where all of which characterize a fully dynamic rule-based system. That is, it is capable of automatically generating, merging and tuning the hyperplane based fuzzy rule in the single pass manner. The efficacy of PALM has been evaluated through numerical study with six real-world and synthetic data streams from public database and our own real-world project of autonomous vehicles. The proposed model showcases significant improvements in terms of computational complexity and number of required parameters against several renowned SANFSs, while attaining comparable and often better predictive accuracy. Parsl High-level programming languages such as Python are increasingly used to provide intuitive interfaces to libraries written in lower-level languages and for assembling applications from various components. This migration towards orchestration rather than implementation, coupled with the growing need for parallel computing (e.g., due to big data and the end of Moore’s law), necessitates rethinking how parallelism is expressed in programs. Here, we present Parsl, a parallel scripting library that augments Python with simple, scalable, and flexible constructs for encoding parallelism. These constructs allow Parsl to construct a dynamic dependency graph of components that it can then execute efficiently on one or many processors. Parsl is designed for scalability, with an extensible set of executors tailored to different use cases, such as low-latency, high-throughput, or extreme-scale execution. We show, via experiments on the Blue Waters supercomputer, that Parsl executors can allow Python scripts to execute components with as little as 5 ms of overhead, scale to more than 250 000 workers across more than 8000 nodes, and process upward of 1200 tasks per second. Other Parsl features simplify the construction and execution of composite programs by supporting elastic provisioning and scaling of infrastructure, fault-tolerant execution, and integrated wide-area data management. We show that these capabilities satisfy the needs of many-task, interactive, online, and machine learning applications in fields such as biology, cosmology, and materials science. ParsRec Bibliographic reference parsers extract metadata (e.g. author names, title, year) from bibliographic reference strings. No reference parser consistently gives the best results in every scenario. For instance, one tool may be best in extracting titles, and another tool in extracting author names. In this paper, we address the problem of reference parsing from a recommender-systems perspective. We propose ParsRec, a meta-learning approach that recommends the potentially best parser(s) for a given reference string. We evaluate ParsRec on 105k references from chemistry. We propose two approaches to meta-learning recommendations. The first approach learns the best parser for an entire reference string. The second approach learns the best parser for each field of a reference string. The second approach achieved a 2.6% increase in F1 (0.909 vs. 0.886, p < 0.001) over the best single parser (GROBID), reducing the false positive rate by 20.2% (0.075 vs. 0.094), and the false negative rate by 18.9% (0.107 vs. 0.132). Part of Speech(POS) A part of speech is a category of words (or, more generally, of lexical items) which have similar grammatical properties. Words that are assigned to the same part of speech generally display similar behavior in terms of syntax – they play similar roles within the grammatical structure of sentences – and sometimes in terms of morphology, in that they undergo inflection for similar properties. Commonly listed English parts of speech are noun, verb, adjective, adverb, pronoun, preposition, conjunction, interjection, and sometimes article or determiner. A part of speech – particularly in more modern classifications, which often make more precise distinctions than the traditional scheme does – may also be called a word class, lexical class, or lexical category, although the term lexical category refers in some contexts to a particular type of syntactic category, and may thus exclude parts of speech that are considered to be functional, such as pronouns. The term form class is also used, although this has various conflicting definitions. Word classes may be classified as open or closed: open classes (like nouns, verbs and adjectives) acquire new members constantly, while closed classes (such as pronouns and conjunctions) acquire new members infrequently, if at all. Almost all languages have the word classes noun and verb, but beyond these there are significant variations in different languages. For example, Japanese has as many as three classes of adjectives where English has one; Chinese, Korean and Japanese have a class of nominal classifiers; many languages lack a distinction between adjectives and adverbs, or between adjectives and verbs. This variation in the number of categories and their identifying properties means that analysis needs to be done for each individual language. Nevertheless, the labels for each category are assigned on the basis of universal criteria. http://…/Stative_verb Part of Speech Tagging(POST) In corpus linguistics, part-of-speech tagging (POS tagging or POST), also called grammatical tagging or word-category disambiguation, is the process of marking up a word in a text (corpus) as corresponding to a particular part of speech, based on both its definition, as well as its context – i.e. relationship with adjacent and related words in a phrase, sentence, or paragraph. A simplified form of this is commonly taught to school-age children, in the identification of words as nouns, verbs, adjectives, adverbs, etc. Once performed by hand, POS tagging is now done in the context of computational linguistics, using algorithms which associate discrete terms, as well as hidden parts of speech, in accordance with a set of descriptive tags. POS-tagging algorithms fall into two distinctive groups: rule-based and stochastic. E. Brill’s tagger, one of the first and most widely used English POS-taggers, employs rule-based algorithms. Partial Adversarial Domain Adaptation(PADA) Domain adversarial learning aligns the feature distributions across the source and target domains in a two-player minimax game. Existing domain adversarial networks generally assume identical label space across different domains. In the presence of big data, there is strong motivation of transferring deep models from existing big domains to unknown small domains. This paper introduces partial domain adaptation as a new domain adaptation scenario, which relaxes the fully shared label space assumption to that the source label space subsumes the target label space. Previous methods typically match the whole source domain to the target domain, which are vulnerable to negative transfer for the partial domain adaptation problem due to the large mismatch between label spaces. We present Partial Adversarial Domain Adaptation (PADA), which simultaneously alleviates negative transfer by down-weighing the data of outlier source classes for training both source classifier and domain adversary, and promotes positive transfer by matching the feature distributions in the shared label space. Experiments show that PADA exceeds state-of-the-art results for partial domain adaptation tasks on several datasets. Partial Area Under a Receiver Operating Characteristic(pAUC) We propose a method for maximizing a partial area under a receiver operating characteristic (ROC) curve (pAUC) for binary classification tasks. In binary classification tasks, accuracy is the most commonly used as a measure of classifier performance. In some applications such as anomaly detection and diagnostic testing, accuracy is not an appropriate measure since prior probabilties are often greatly biased. Although in such cases the pAUC has been utilized as a performance measure, few methods have been proposed for directly maximizing the pAUC. This optimization is achieved by using a scoring function. The conventional approach utilizes a linear function as the scoring function. In contrast we newly introduce nonlinear scoring functions for this purpose. Specifically, we present two types of nonlinear scoring functions based on generative models and deep neural networks. We show experimentally that nonlinear scoring fucntions improve the conventional methods through the application of a binary classification of real and bogus objects obtained with the Hyper Suprime-Cam on the Subaru telescope. Partial AutoCorrelation Function(PACF) In time series analysis, the partial autocorrelation function (PACF) plays an important role in data analyses aimed at identifying the extent of the lag in an autoregressive model. The use of this function was introduced as part of the Box-Jenkins approach to time series modelling, where by plotting the partial autocorrelative functions one could determine the appropriate lags p in an AR (p) model or in an extended ARIMA (p,d,q) model. Partial Decision-DNNF Model counting is the problem of computing the number of satisfying assignments of a given propositional formula. Although exact model counters can be naturally furnished by most of the knowledge compilation (KC) methods, in practice, they fail to generate the compiled results for the exact counting of models for certain formulas due to the explosion in sizes. Decision-DNNF is an important KC language that captures most of the practical compilers. We propose a generalized Decision-DNNF (referred to as partial Decision-DNNF) via introducing a class of new leaf vertices (called unknown vertices), and then propose an algorithm called PartialKC to generate randomly partial Decision-DNNF formulas from the given formulas. An unbiased estimate of the model number can be computed via a randomly partial Decision-DNNF formula. Each calling of PartialKC consists of multiple callings of MicroKC, while each of the latter callings is a process of importance sampling equipped with KC technologies. The experimental results show that PartialKC is more accurate than both SampleSearch and SearchTreeSampler, PartialKC scales better than SearchTreeSampler, and the KC technologies can obviously accelerate sampling. Partial Dependency Plots Partial dependence plots show the dependence between the target function and a set of ‘target’ features, marginalizing over the values of all other features (the complement features). Due to the limits of human perception the size of the target feature set must be small (usually, one or two) thus the target features are usually chosen among the most important features. randomForest Partial Domain Adaptation(PDA) Learning to Transfer Examples for Partial Domain Adaptation Partial Least Squares Regression(PLS) Partial least squares regression (PLS regression) is a statistical method that bears some relation to principal components regression; instead of finding hyperplanes of minimum variance between the response and independent variables, it finds a linear regression model by projecting the predicted variables and the observable variables to a new space. Because both the X and Y data are projected to new spaces, the PLS family of methods are known as bilinear factor models. Partial least squares Discriminant Analysis (PLS-DA) is a variant used when the Y is categorical. PLS think twice about partial least squares Partial Membership Latent Dirichlet Allocation(PM-LDA) For many years, topic models (e.g., pLSA, LDA, SLDA) have been widely used for segmenting and recognizing objects in imagery simultaneously. However, these models are confined to the analysis of categorical data, forcing a visual word to belong to one and only one topic. There are many images in which some regions cannot be assigned a crisp categorical label (e.g., transition regions between a foggy sky and the ground or between sand and water at a beach). In these cases, a visual word is best represented with partial memberships across multiple topics. To address this, we present a partial membership latent Dirichlet allocation (PM-LDA) model and associated parameter estimation algorithms. PM-LDA defines a novel partial membership model for word and document generation. We employ Gibbs sampling for parameter estimation. Experimental results on two natural image datasets and one SONAR image dataset show that PM-LDA can produce both crisp and soft semantic image segmentations; a capability existing methods do not have. Partial Order Pruning Achieving good speed and accuracy trade-off on target platform is very important in deploying deep neural networks. Most existing automatic architecture search approaches only pursue high performance but ignores such an important factor. In this work, we propose an algorithm ‘Partial Order Pruning’ to prune architecture search space with partial order assumption, quickly lift the boundary of speed/accuracy trade-off on target platform, and automatically search the architecture with the best speed and accuracy trade-off. Our algorithm explicitly take profile information about the inference speed on target platform into consideration. With the proposed algorithm, we present several ‘Dongfeng’ networks that provide high accuracy and fast inference speed on various application GPU platforms. By further searching decoder architecture, our DF-Seg real-time segmentation models yields state-of-the-art speed/accuracy trade-off on both embedded device and high-end GPU. Partial Robust M Regression If an appropriate weighting scheme is chosen, partial M-estimators become entirely robust to any type of outlying points, and are called Partial Robust M-estimators. It is shown that partial robust M-regression outperforms existing methods for robust PLS regression in terms of statistical precision and computational speed, while keeping good robustness properties. sprm Partial Transfer Learning Adversarial learning has been successfully embedded into deep networks to learn transferable features, which reduce distribution discrepancy between the source and target domains. Existing domain adversarial networks assume fully shared label space across domains. In the presence of big data, there is strong motivation of transferring both classification and representation models from existing big domains to unknown small domains. This paper introduces partial transfer learning, which relaxes the shared label space assumption to that the target label space is only a subspace of the source label space. Previous methods typically match the whole source domain to the target domain, which are prone to negative transfer for the partial transfer problem. We present Selective Adversarial Network (SAN), which simultaneously circumvents negative transfer by selecting out the outlier source classes and promotes positive transfer by maximally matching the data distributions in the shared label space. Experiments demonstrate that our models exceed state-of-the-art results for partial transfer learning tasks on several benchmark datasets. Partially Adaptive Momentum Estimation Method(Padam) Adaptive gradient methods, which adopt historical gradient information to automatically adjust the learning rate, have been observed to generalize worse than stochastic gradient descent (SGD) with momentum in training deep neural networks. This leaves how to close the generalization gap of adaptive gradient methods an open problem. In this work, we show that adaptive gradient methods such as Adam, Amsgrad, are sometimes ‘over adapted’. We design a new algorithm, called Partially adaptive momentum estimation method (Padam), which unifies the Adam/Amsgrad with SGD to achieve the best from both worlds. Experiments on standard benchmarks show that Padam can maintain fast convergence rate as Adam/Amsgrad while generalizing as well as SGD in training deep neural networks. These results would suggest practitioners pick up adaptive gradient methods once again for faster training of deep neural networks. Partially Exchangeable Network(PEN) We present a novel family of deep neural architectures, named partially exchangeable networks (PENs) that leverage probabilistic symmetries. By design, PENs are invariant to block-switch transformations, which characterize the partial exchangeability properties of conditionally Markovian processes. Moreover, we show that any block-switch invariant function has a PEN-like representation. The DeepSets architecture is a special case of PEN and we can therefore also target fully exchangeable data. We employ PENs to learn summary statistics in approximate Bayesian computation (ABC). When comparing PENs to previous deep learning methods for learning summary statistics, our results are highly competitive, both considering time series and static models. Indeed, PENs provide more reliable posterior samples even when using less training data. Partially Labeled Degree-Corrected Block Model(pDCBM) Community detection was a hot topic on network analysis, where the main aim is to perform unsupervised learning or clustering in networks. Recently, semi-supervised learning has received increasing attention among researchers. In this paper, we propose a new algorithm, called weighted inverse Laplacian (WIL), for predicting labels in partially labeled networks. The idea comes from the first hitting time in random walk, and it also has nice explanations both in information propagation and the regularization framework. We propose a partially labeled degree-corrected block model (pDCBM) to describe the generation of partially labeled networks. We show that WIL ensures the misclassification rate is of order $O(\frac{1}{d})$ for the pDCBM with average degree $d=\Omega(\log n),$ and that it can handle situations with greater unbalanced than traditional Laplacian methods. WIL outperforms other state-of-the-art methods in most of our simulations and real datasets, especially in unbalanced networks and heterogeneous networks. Partially Linear Additive Quantile Regression plaqr Partially Linear Spatial Probit Model A partially linear probit model for spatially dependent data is considered. A triangular array setting is used to cover various patterns of spatial data. Conditional spatial heteroscedasticity and non-identically distributed observations and a linear process for disturbances are assumed, allowing various spatial dependencies. The estimation procedure is a combination of a weighted likelihood and a generalized method of moments. The procedure first fixes the parametric components of the model and then estimates the non-parametric part using weighted likelihood; the obtained estimate is then used to construct a GMM parametric component estimate. The consistency and asymptotic distribution of the estimators are established under sufficient conditions. Some simulation experiments are provided to investigate the finite sample performance of the estimators. Partially Observable Markov Decision Process A partially observable Markov decision process (POMDP) is a generalization of a Markov decision process (MDP). A POMDP models an agent decision process in which it is assumed that the system dynamics are determined by an MDP, but the agent cannot directly observe the underlying state. Instead, it must maintain a probability distribution over the set of possible states, based on a set of observations and observation probabilities, and the underlying MDP. The POMDP framework is general enough to model a variety of real-world sequential decision processes. Applications include robot navigation problems, machine maintenance, and planning under uncertainty in general. The framework originated in the operations research community, and was later adapted by the artificial intelligence and automated planning communities. An exact solution to a POMDP yields the optimal action for each possible belief over the world states. The optimal action maximizes (or minimizes) the expected reward (or cost) of the agent over a possibly infinite horizon. The sequence of optimal actions is known as the optimal policy of the agent for interacting with its environment. Partially Observable Stacked Thompson Sampling(POSTS) State-of-the-art approaches to partially observable planning like POMCP are based on stochastic tree search. While these approaches are computationally efficient, they may still construct search trees of considerable size, which could limit the performance due to restricted memory resources. In this paper, we propose Partially Observable Stacked Thompson Sampling (POSTS), a memory bounded approach to open-loop planning in large POMDPs, which optimizes a fixed size stack of Thompson Sampling bandits. We empirically evaluate POSTS in four large benchmark problems and compare its performance with different tree-based approaches. We show that POSTS achieves competitive performance compared to tree-based open-loop planning and offers a performance-memory tradeoff, making it suitable for partially observable planning with highly restricted computational and memory resources. Partially Observed Markov Decision Process(POMDP) ➘ “Partially Observable Markov Decision Process” Partially Observed Markov Process(POMP) ➚ “Hidden Markov Model” pomp Participatory Sensing Participatory sensing is the concept of communities (or other groups of people) contributing sensory information to form a body of knowledge. Participatory Sensing Location-Sensitive Data Simulation(PS-Sim) Emergence of smartphone and the participatory sensing (PS) paradigm have paved the way for a new variant of pervasive computing. In PS, human user performs sensing tasks and generates notifications, typically in lieu of incentives. These notifications are real-time, large-volume, and multi-modal, which are eventually fused by the PS platform to generate a summary. One major limitation with PS is the sparsity of notifications owing to lack of active participation, thus inhibiting large scale real-life experiments for the research community. On the flip side, research community always needs ground truth to validate the efficacy of the proposed models and algorithms. Most of the PS applications involve human mobility and report generation following sensing of any event of interest in the adjacent environment. This work is an attempt to study and empirically model human participation behavior and event occurrence distributions through development of a location-sensitive data simulation framework, called PS-Sim. From extensive experiments it has been observed that the synthetic data generated by PS-Sim replicates real participation and event occurrence behaviors in PS applications, which may be considered for validation purpose in absence of the groundtruth. As a proof-of-concept, we have used real-life dataset from a vehicular traffic management application to train the models in PS-Sim and cross-validated the simulated data with other parts of the same dataset. Particle Filter Particle filters or Sequential Monte Carlo (SMC) methods are a set of on-line posterior density estimation algorithms that estimate the posterior density of the state-space by directly implementing the Bayesian recursion equations. The term ‘sequential Monte Carlo’ was first coined in Liu and Chen (1998). SMC methods use a sampling approach, with a set of particles to represent the posterior density. The state-space model can be non-linear and the initial state and noise distributions can take any form required. SMC methods provide a well-established methodology for generating samples from the required distribution without requiring assumptions about the state-space model or the state distributions. However, these methods do not perform well when applied to high-dimensional systems. SMC methods implement the Bayesian recursion equations directly by using an ensemble based approach. The samples from the distribution are represented by a set of particles; each particle has a weight assigned to it that represents the probability of that particle being sampled from the probability density function. Weight disparity leading to weight collapse is a common issue encountered in these filtering algorithms; however it can be mitigated by including a resampling step before the weights become too uneven. In the resampling step, the particles with negligible weights are replaced by new particles in the proximity of the particles with higher weights. http://…/particle-filtering.pdf ➘ “Sequential Monte Carlo” Particle Swarm Optimization(PSO) In computer science, particle swarm optimization (PSO) is a computational method that optimizes a problem by iteratively trying to improve a candidate solution with regard to a given measure of quality. It solves a problem by having a population of candidate solutions, here dubbed particles, and moving these particles around in the search-space according to simple mathematical formulae over the particle’s position and velocity. Each particle’s movement is influenced by its local best known position, but is also guided toward the best known positions in the search-space, which are updated as better positions are found by other particles. This is expected to move the swarm toward the best solutions. Particle Swarm Optimization Hidden Markov Model(PSOHMM) As one of Bayesian analysis tools, Hidden Markov Model (HMM) has been used to in extensive applications. Most HMMs are solved by Baum-Welch algorithm (BWHMM) to predict the model parameters, which is difficult to find global optimal solutions. This paper proposes an optimized Hidden Markov Model with Particle Swarm Optimization (PSO) algorithm and so is called PSOHMM (Particle Swarm Optimization Hidden Markov Model). In order to overcome the statistical constraints in HMM, the paper develops re-normalization and re-mapping mechanisms to ensure the constraints in HMM. The experiments have shown that PSOHMM can search better solution than BWHMM, and has faster convergence speed. Partition Set Cover Problem Various $O(\log n)$ approximations are known for the Set Cover problem, where $n$ is the number of elements. We study a generalization of the Set Cover problem, called the Partition Set Cover problem. Here, the elements are partitioned into $r$ color classes, and we are required to cover at least $k_t$ elements from each color class $\mathcal{C}_t$, using the minimum number of sets. We give a randomized LP rounding algorithm that is an $O(\beta + \log r)$ approximation for the Partition Set Cover problem. Here $\beta$ denotes the approximation guarantee for a related Set Cover instance obtained by rounding the standard LP. As a corollary, we obtain improved approximation guarantees for various set systems, for which $\beta$ is known to be sublogarithmic in $n$. Partitional Clustering Partitional clustering decomposes a data set into a set of disjoint clusters. Given a data set of N points, a partitioning method constructs K (N ≥ K) partitions of the data, with each partition representing a cluster. That is, it classifies the data into K groups by satisfying the following requirements: (1) each group contains at least one point, and (2) each point belongs to exactly one group. Notice that for fuzzy partitioning, a point can belong to more than one group. Many partitional clustering algorithms try to minimize an objective function. Partition-Centric Programming Model(PPM) The past decade has seen development of many shared-memory graph processing frameworks intended to reduce the effort for creating high performance parallel applications. However, their programming models, based on Vertex-centric or Edge-centric paradigms suffer from several issues, such as poor cache utilization, irregular memory accesses, heavy use of synchronization primitives or theoretical inefficiency, that deteriorate the performance and scalability of applications. Recently, a cache-efficient partition-centric paradigm was proposed for computing PageRank. In this paper, we generalize this approach to develop a novel Partition-centric Programming Model(PPM) that is cache-efficient, scalable and work-efficient. We implement PPM as part of Graph Processing over Partitions(GPOP) framework that can efficiently execute a variety of algorithms. GPOP dramatically improves the cache performance by exploiting locality of partitioning. It achieves high scalability by enabling completely lock and atomic free computation. Its built-in analytical performance models enable it to use a hybrid of source and destination centric communication modes in a way that ensures work-efficiency of each iteration and simultaneously boosts high bandwidth sequential memory accesses. GPOP framework completely abstracts away underlying parallelism and programming model details from the user. It provides an easy to program set of APIs with the ability to selectively continue the active vertex set across iterations, which is not intrinsically supported by the current frameworks. We extensively evaluate the performance of GPOP for a variety of graph algorithms, using several large datasets. We observe that GPOP incurs upto 8.6x and 5.2x less L2 cache misses compared to Ligra and GraphMat, respectively. In terms of execution time, GPOP is upto 19x and 6.1x faster than Ligra and GraphMat, respectively. Partitioned In-memory Merge-Tree There is increasing interest in using multicore processors to accelerate stream processing. For example, indexing sliding window content to enhance the performance of streaming queries is greatly improved by utilizing the computational capabilities of a multicore processor. However, designing an effective concurrency control mechanism that addresses the problem of concurrent indexing in highly dynamic settings remains a challenge. In this paper, we introduce an index data structure, called the Partitioned In-memory Merge-Tree, to address the challenges that arise when indexing highly dynamic data, which are common in streaming settings. To complement the index, we design an algorithm to realize a parallel index-based stream join that exploits the computational power of multicore processors. Our experiments using an octa-core processor show that our parallel stream join achieves up to 5.5 times higher throughput than a single-threaded approach. Partitioned Variational Inference(PVI) Variational inference (VI) has become the method of choice for fitting many modern probabilistic models. However, practitioners are faced with a fragmented literature that offers a bewildering array of algorithmic options. First, the variational family. Second, the granularity of the updates e.g. whether the updates are local to each data point and employ message passing or global. Third, the method of optimization (bespoke or blackbox, closed-form or stochastic updates, etc.). This paper presents a new framework, termed Partitioned Variational Inference (PVI), that explicitly acknowledges these algorithmic dimensions of VI, unifies disparate literature, and provides guidance on usage. Crucially, the proposed PVI framework allows us to identify new ways of performing VI that are ideally suited to challenging learning scenarios including federated learning (where distributed computing is leveraged to process non-centralized data) and continual learning (where new data and tasks arrive over time and must be accommodated quickly). We showcase these new capabilities by developing communication-efficient federated training of Bayesian neural networks and continual learning for Gaussian process models with private pseudo-points. The new methods significantly outperform the state-of-the-art, whilst being almost as straightforward to implement as standard VI. PArtitioning and query processing of Spatio-Temporal graphs(PAST) Real-world graphs often contain spatio-temporal information and evolve over time. Compared with static graphs, spatio-temporal graphs have very different characteristics, presenting more significant challenges in data volume, data velocity, and query processing. In this paper, we describe three representative applications to understand the features of spatio-temporal graphs. Based on the commonalities of the applications, we define a formal spatio-temporal graph model, where a graph consists of location vertices, object vertices, and event edges. Then we discuss a set of design goals to meet the requirements of the applications: (i) supporting up to 10 billion object vertices, 10 million location vertices, and 100 trillion edges in the graph, (ii) supporting up to 1 trillion new edges that are streamed in daily, and (iii) minimizing cross-machine communication for query processing. We propose and evaluate PAST, a framework for efficient PArtitioning and query processing of Spatio-Temporal graphs. Experimental results show that PAST successfully achieves the above goals. It improves query performance by orders of magnitude compared with state-of-the-art solutions, including JanusGraph, Greenplum, Spark and ST-Hadoop. Partitioning Around Medoids(PAM) The PAM algorithm was developed by Leonard Kaufman and Peter J. Rousseeuw, and this algorithm is very similar to K-means, mostly because both are partitional algorithms, in other words, both break the datasets into groups, and both works trying to minimize the error, but PAM works with Medoids, that are an entity of the dataset that represent the group in which it is inserted, and K-means works with Centroids, that are artificially created entity that represent its cluster. The PAM algorithm partitionates a dataset of n objects into a number k of clusters, where both the dataset and the number k is an input of the algorithm. This algorithm works with a matrix of dissimilarity, where its goal is to minimize the overall dissimilarity between the representants of each cluster and its members. PARyOpt PARyOpt is a python based implementation of the Bayesian optimization routine designed for remote and asynchronous function evaluations. Bayesian optimization is especially attractive for computational optimization due to its low cost function footprint as well as the ability to account for uncertainties in data. A key challenge to efficiently deploy any optimization strategy on distributed computing systems is the synchronization step, where data from multiple function calls is assimilated to identify the next campaign of function calls. Bayesian optimization provides an elegant approach to overcome this issue via asynchronous updates. We formulate, develop and implement a parallel, asynchronous variant of Bayesian optimization. The framework is robust and resilient to external failures. We show how such asynchronous evaluations help reduce the total optimization wall clock time for a suite of test problems. Additionally, we show how the software design of the framework allows easy extension to response surface reconstruction (Kriging), providing a high performance software for autonomous exploration. The software is available on PyPI, with examples and documentation. Parzen-Rosenblatt Kernel Density Estimation The Parzen-window method (also known as Parzen-Rosenblatt window method) is a widely used non-parametric approach to estimate a probability density function p(x) for a specific point p(x) from a sample p(xn) that doesn’t require any knowledge or assumption about the underlying distribution. Parzen-Rosenblatt Window Technique A non-parametric kernel density estimation technique for probability densities of random variables if the underlying distribution/model is unknown. A so-called window function is used to count samples within hypercubes or Gaussian kernels of a specified volume to estimate the probability density. Passing Bablok Regression The comparison of methods experiment is important part in process of analytical methods and instruments validation. Passing and Bablok regression analysis is a statistical procedure that allows valuable estimation of analytical methods agreement and possible systematic bias between them. It is robust, non-parametric, non sensitive to distribution of errors and data outliers. Assumptions for proper application of Passing and Bablok regression are continuously distributed data and linear relationship between data measured by two analytical methods. Results are presented with scatter diagram and regression line, and regression equation where intercept represents constant and slope proportional measurement error. Confidence intervals of 95% of intercept and slope explain if their value differ from value zero (intercept) and value one (slope) only by chance, allowing conclusion of method agreement and correction action if necessary. Residual plot revealed outliers and identify possible non-linearity. Furthermore, cumulative sum linearity test is performed to investigate possible significant deviation from linearity between two sets of data. Non linear samples are not suitable for concluding on method agreement. deming Passive and Partially Active(PPA) Fault-tolerance techniques for stream processing engines can be categorized into passive and active approaches. A typical passive approach periodically checkpoints a processing task’s runtime states and can recover a failed task by restoring its runtime state using its latest checkpoint. On the other hand, an active approach usually employs backup nodes to run replicated tasks. Upon failure, the active replica can take over the processing of the failed task with minimal latency. However, both approaches have their own inadequacies in Massively Parallel Stream Processing Engines (MPSPE). The passive approach incurs a long recovery latency especially when a number of correlated nodes fail simultaneously, while the active approach requires extra replication resources. In this paper, we propose a new fault-tolerance framework, which is Passive and Partially Active (PPA). In a PPA scheme, the passive approach is applied to all tasks while only a selected set of tasks will be actively replicated. The number of actively replicated tasks depends on the available resources. If tasks without active replicas fail, tentative outputs will be generated before the completion of the recovery process. We also propose effective and efficient algorithms to optimize a partially active replication plan to maximize the quality of tentative outputs. We implemented PPA on top of Storm, an open-source MPSPE and conducted extensive experiments using both real and synthetic datasets to verify the effectiveness of our approach. Passive-Aggressive Learning(PA) We present a unified view for online classification, regression, and uniclass problems. This view leads to a single algorithmic framework for the three problems. We prove worst case loss bounds for various algorithms for both the realizable case and the non-realizable case. A conversion of our main online algorithm to the setting of batch learning is also discussed. The end result is new algorithms and accompanying loss bounds for the hinge-loss. PatchNet The ability to visually understand and interpret learned features from complex predictive models is crucial for their acceptance in sensitive areas such as health care. To move closer to this goal of truly interpretable complex models, we present PatchNet, a network that restricts global context for image classification tasks in order to easily provide visual representations of learned texture features on a predetermined local scale. We demonstrate how PatchNet provides visual heatmap representations of the learned features, and we mathematically analyze the behavior of the network during convergence. We also present a version of PatchNet that is particularly well suited for lowering false positive rates in image classification tasks. We apply PatchNet to the classification of textures from the Describable Textures Dataset and to the ISBI-ISIC 2016 melanoma classification challenge. PatchShuffle Regularization This paper focuses on regularizing the training of the convolutional neural network (CNN). We propose a new regularization approach named “PatchShuffle“ that can be adopted in any classification-oriented CNN models. It is easy to implement: in each mini-batch, images or feature maps are randomly chosen to undergo a transformation such that pixels within each local patch are shuffled. Through generating images and feature maps with interior orderless patches, PatchShuffle creates rich local variations, reduces the risk of network overfitting, and can be viewed as a beneficial supplement to various kinds of training regularization techniques, such as weight decay, model ensemble and dropout. Experiments on four representative classification datasets show that PatchShuffle improves the generalization ability of CNN especially when the data is scarce. Moreover, we empirically illustrate that CNN models trained with PatchShuffle are more robust to noise and local changes in an image. Path Analysis In statistics, path analysis is used to describe the directed dependencies among a set of variables. This includes models equivalent to any form of multiple regression analysis, factor analysis, canonical correlation analysis, discriminant analysis, as well as more general families of models in the multivariate analysis of variance and covariance analyses (MANOVA, ANOVA, ANCOVA). In addition to being thought of as a form of multiple regression focusing on causality, path analysis can be viewed as a special case of structural equation modeling (SEM) – one in which only single indicators are employed for each of the variables in the causal model. That is, path analysis is SEM with a structural model, but no measurement model. Other terms used to refer to path analysis include causal modeling, analysis of covariance structures, and latent variable models. Path Capsule Network(PathCapsNet) Capsule network (CapsNet) was introduced as an enhancement over convolutional neural networks, supplementing the latter’s invariance properties with equivariance through pose estimation. CapsNet achieved a very decent performance with a shallow architecture and a significant reduction in parameters count. However, the width of the first layer in CapsNet is still contributing to a significant number of its parameters and the shallowness may be limiting the representational power of the capsules. To address these limitations, we introduce Path Capsule Network (PathCapsNet), a deep parallel multi-path version of CapsNet. We show that a judicious coordination of depth, max-pooling, regularization by DropCircuit and a new fan-in routing by agreement technique can achieve better or comparable results to CapsNet, while further reducing the parameter count significantly. Path Correlation Data A communication network can be modeled as a directed connected graph with edge weights that characterize performance metrics such as loss and delay. Network tomography aims to infer these edge weights from their pathwise versions measured on a set of intersecting paths between a subset of boundary vertices, and even the underlying graph when this is not known. In particular, temporal correlations between path metrics have been used infer composite weights on the subpath formed by the path intersection. We call these subpath weights the Path Correlation Data. In this paper we ask the following question: when can the underlying weighted graph be recovered knowing only the boundary vertices and the Path Correlation Data? We establish necessary and sufficient conditions for a graph to be reconstructible from this information, and describe an algorithm to perform the reconstruction. Subject to our conditions, the result applies to directed graphs with asymmetric edge weights, and accommodates paths arising from asymmetric routing in the underlying communication network. We also describe the relationship between the graph produced by our algorithm and the true graph in the case that our conditions are not satisfied. Path Integral Based Convolution(PAN) Convolution operations designed for graph-structured data usually utilize the graph Laplacian, which can be seen as message passing between the adjacent neighbors through a generic random walk. In this paper, we propose PAN, a new graph convolution framework that involves every path linking the message sender and receiver with learnable weights depending on the path length, which corresponds to the maximal entropy random walk. PAN generalizes the graph Laplacian to a new transition matrix we call \emph{maximal entropy transition} (MET) matrix derived from a path integral formalism. Most previous graph convolutional network architectures can be adapted to our framework, and many variations and derivatives based on the path integral idea can be developed. Experimental results show that the path integral based graph neural networks have great learnability and fast convergence rate, and achieve state-of-the-art performance on benchmark tasks. Path Modeling Segmentation Tree(PATHMOX) One of the main issues within path modeling techniques, especially in business and marketing applications, concerns the identification of different segments in the model population. The approach proposed by the authors consists of building a path models tree having a decision tree-like structure by means of the PATHMOX (Path Modeling Segmentation Tree) algorithm. This algorithm is specifically designed when prior information in form of external variables (such as socio-demographic variables) is available. Inner models are compared using an extension for testing the equality of two regression models; and outer models are compared by means of a Ryan-Joiner correlation test. genpathmox Path-by-Path The doctrinal paradox is analysed from a probabilistic point of view assuming a simple parametric model for the committee’s behaviour. The well known issue-by-issue and case-by-case majority rules are compared in this model, by means of the concepts of false positive rate (FPR), false negative rate (FNR) and Receiver Operating Characteristics (ROC) space. We introduce also a new rule that we call path-by-path, which is somehow halfway between the other two. Under our model assumptions, the issue-by-issue rule is shown to be the best of the three according to an optimality criterion based in ROC maps, for all values of the model parameters (committee size and competence of its members), when equal weight is given to FPR an FNR. For unequal weights, the relative goodness of the rules depends on the values of the competence and the weights, in a way which is precisely described. The results are illustrated with some numerical examples. PathGAN We introduce PathGAN, a deep neural network for visual scanpath prediction trained on adversarial examples. A visual scanpath is defined as the sequence of fixation points over an image defined by a human observer with its gaze. PathGAN is composed of two parts, the generator and the discriminator. Both parts extract features from images using off-the-shelf networks, and train recurrent layers to generate or discriminate scanpaths accordingly. In scanpath prediction, the stochastic nature of the data makes it very difficult to generate realistic predictions using supervised learning strategies, but we adopt adversarial training as a suitable alternative. Our experiments prove how PathGAN improves the state of the art of visual scanpath prediction on the iSUN and Salient360! datasets. Source code and models are available at https://…/pathgan Path-Invariance Optimizing a network of maps among a collection of objects/domains (or map synchronization) is a central problem across computer vision and many other relevant fields. Compared to optimizing pairwise maps in isolation, the benefit of map synchronization is that there are natural constraints among a map network that can improve the quality of individual maps. While such self-supervision constraints are well-understood for undirected map networks (e.g., the cycle-consistency constraint), they are under-explored for directed map networks, which naturally arise when maps are given by parametric maps (e.g., a feed-forward neural network). In this paper, we study a natural self-supervision constraint for directed map networks called path-invariance, which enforces that composite maps along different paths between a fixed pair of source and target domains are identical. We introduce path-invariance bases for efficient encoding of the path-invariance constraint and present an algorithm that outputs a path-variance basis with polynomial time and space complexities. We demonstrate the effectiveness of our formulation on optimizing object correspondences, estimating dense image maps via neural networks, and 3D scene segmentation via map networks of diverse 3D representations. In particular, our approach only requires 8% labeled data from ScanNet to achieve the same performance as training a single 3D segmentation network with 30% to 100% labeled data. PathNet For artificial general intelligence (AGI) it would be efficient if multiple users trained the same giant neural network, permitting parameter reuse, without catastrophic forgetting. PathNet is a first step in this direction. It is a neural network algorithm that uses agents embedded in the neural network whose task is to discover which parts of the network to re-use for new tasks. Agents are pathways (views) through the network which determine the subset of parameters that are used and updated by the forwards and backwards passes of the backpropogation algorithm. During learning, a tournament selection genetic algorithm is used to select pathways through the neural network for replication and mutation. Pathway fitness is the performance of that pathway measured according to a cost function. We demonstrate successful transfer learning; fixing the parameters along a path learned on task A and re-evolving a new population of paths for task B, allows task B to be learned faster than it could be learned from scratch or after fine-tuning. Paths evolved on task B re-use parts of the optimal path evolved on task A. Positive transfer was demonstrated for binary MNIST, CIFAR, and SVHN supervised learning classification tasks, and a set of Atari and Labyrinth reinforcement learning tasks, suggesting PathNets have general applicability for neural network training. Finally, PathNet also significantly improves the robustness to hyperparameter choices of a parallel asynchronous reinforcement learning algorithm (A3C). Pathway Induced Multiple Kernel Learning(PIMKL) Reliable identification of molecular biomarkers is essential for accurate patient stratification. While state-of-the-art machine learning approaches for sample classification continue to push boundaries in terms of performance, most of these methods are not able to integrate different data types and lack generalization power limiting their application in a clinical setting. Furthermore, many methods behave as black boxes, therefore we have very little understanding about the mechanisms that lead to the prediction provided. While opaqueness concerning machine behaviour might not be a problem in deterministic domains, in health care, providing explanations about the molecular factors and phenotypes that are driving the classification is crucial to build trust in the performance of the predictive system. We propose Pathway Induced Multiple Kernel Learning (PIMKL), a novel methodology to classify samples reliably that can, at the same time, provide a pathway-based molecular fingerprint of the signature that underlies the classification. PIMKL exploits prior knowledge in the form of molecular interaction networks and annotated gene sets, by optimizing a mixture of pathway-induced kernels using a Multiple Kernel Learning algorithm (MKL), an approach that has demonstrated excellent performance in different machine learning applications. After optimizing the combination of kernels for prediction of a specific phenotype, the model provides a stable molecular signature that can be interpreted in the light of the ingested prior knowledge and that can be used in transfer learning tasks. Pathwise Calibrated Sparse Shooting Algorithm(PICASSO) A family of efficient algorithms, called PathwIse CalibrAted Sparse Shooting AlgOrithm, for a variety of sparse learning problems, including Sparse Linear Regression, Sparse Logistic Regression, Sparse Column Inverse Operator and Sparse Multivariate Regression. Different types of active set identification schemes are implemented, such as cyclic search, greedy search, stochastic search and proximal gradient search. Besides, the package provides the choices between convex (L1 norm) and non-convex (MCP and SCAD) regularizations. Moreover, group regularization for Sparse Linear Regression and Sparse Logistic Regression are also implemented. picasso Patient Event Graph(PatientEG) Medical activities, such as diagnoses, medicine treatments, and laboratory tests, as well as temporal relations between these activities are the basic concepts in clinical research. However, existing relational data model on electronic medical records (EMRs) lacks explicit and accurate semantic definitions of these concepts. It leads to the inconvenience of query construction and the inefficiency of query execution where multi-table join queries are frequently required. In this paper, we propose a patient event graph (PatientEG) model to capture the characteristics of EMRs. We respectively define five types of medical entities, five types of medical events and five types of temporal relations. Based on the proposed model, we also construct a PatientEG dataset with 191,294 events, 3,429 distinct entities, and 545,993 temporal relations using EMRs from Shanghai Shuguang hospital. To help to normalize entity values which contain synonyms, hyponymies, and abbreviations, we link them with the Chinese biomedical knowledge graph. With the help of PatientEG dataset, we are able to conveniently perform complex queries for clinical research such as auxiliary diagnosis and therapeutic effectiveness analysis. In addition, we provide a SPARQL endpoint to access PatientEG dataset and the dataset is also publicly available online. Also, we list several illustrative SPARQL queries on our website. Patient Rule Induction Method(PRIM) PRIM (Patient Rule Induction Method) is a data mining technique introduced by Friedman and Fisher (1999). Its objective is to find subregions in the input space with relatively high (low) values for the target variable. By construction, PRIM directly targets these regions rather than indirectly through the estimation of a regression function. The method is such that these subregions can be described by simple rules, as the subregions are (unions of) rectangles in the input space. There are many practical problems where finding such rectangular subregions with relatively high (low) values of the target variable is of considerable interest. Often these are problems where a decision maker wants to choose the values or ranges of the input variables so as to optimize the value of the target variable. Such types of applications can be found in the fields of medical research, financial risk analysis, and social sciences, and PRIM has been applied to these fields. While PRIM enjoys some popularity, and even several modifications have been proposed (see Becker and Fahrmeier, 2001, Cole, Galic and Zack, 2003, Leblanc et al, 2003, Nannings et al. (2008), Wu and Chipman, 2003, and Wang et al, 2004), there is according to our knowledge no thorough study of its basic statistical properties. PRIMsrc,prim PaToPaEM Grid topology and line parameters are essential for grid operation and planning, which may be missing or inaccurate in distribution grids. Existing data-driven approaches for recovering such information usually suffer from ignoring 1) input measurement errors and 2) possible state changes among historical measurements. While using the errors-in-variables (EIV) model and letting the parameter and topology estimation interact with each other (PaToPa) can address input and output measurement error modeling, it only works when all measurements are from a single system state. To solve the two challenges simultaneously, we propose the PaToPaEM framework for joint line parameter and topology estimation with historical measurements from different unknown states. We improve the static framework that only works when measurements are from one single state, by further treating state changes in historical measurements as an unobserved latent variable. We then systematically analyze the new mathematical modeling, decouple the optimization problem, and incorporate the expectation-maximization (EM) algorithm to recover different hidden states in measurements. Combining these, PaToPaEM framework enables joint topology and line parameter estimation using noisy measurements from multiple system states. It lays a solid foundation for data-driven system identification in distribution grids. Superior numerical results validate the practicability of the PaToPaEM framework. Pattern Classification The usage of patterns in datasets to discriminate between classes, i.e., to assign a class label to a new observation based on inference. Pattern Sequence based Forecasting(PSF) This paper discusses about PSF, an R package for Pattern Sequence based Forecasting (PSF) algorithm used for univariate time series future prediction. The PSF algorithm consists of two major parts: clustering and prediction techniques. Clustering part includes selection of cluster size and then labeling of time series data with reference to various clusters. Whereas, the prediction part include functions like optimum window size selection for specific patterns and prediction of future values with reference to past pattern sequences. The PSF package consists of various functions to implement PSF algorithm. It also contains a function, which automates all other functions to obtain optimum prediction results. The aim of this package is to promote PSF algorithm and to ease its implementation with minimum efforts. This paper describe all the functions in PSF package with their syntax and simple examples. Finally, the usefulness of this package is discussed by comparing it with auto.arima, a well known time series forecasting function available on CRAN repository. PSF Pattern Theory Pattern theory, formulated by Ulf Grenander, is a mathematical formalism to describe knowledge of the world as patterns. It differs from other approaches to artificial intelligence in that it does not begin by prescribing algorithms and machinery to recognize and classify patterns; rather, it prescribes a vocabulary to articulate and recast the pattern concepts in precise language. In addition to the new algebraic vocabulary, its statistical approach was novel in its aim to: · Identify the hidden variables of a data set using real world data rather than artificial stimuli, which was commonplace at the time. · Formulate prior distributions for hidden variables and models for the observed variables that form the vertices of a Gibbs-like graph. · Study the randomness and variability of these graphs. · Create the basic classes of stochastic models applied by listing the deformations of the patterns. · Synthesize (sample) from the models, not just analyze signals with it. Broad in its mathematical coverage, Pattern Theory spans algebra and statistics, as well as local topological and global entropic properties. PatternNet Visual patterns represent the discernible regularity in the visual world. They capture the essential nature of visual objects or scenes. Understanding and modeling visual patterns is a fundamental problem in visual recognition that has wide ranging applications. In this paper, we study the problem of visual pattern mining and propose a novel deep neural network architecture called PatternNet for discovering these patterns that are both discriminative and representative. The proposed PatternNet leverages the filters in the last convolution layer of a convolutional neural network to find locally consistent visual patches, and by combining these filters we can effectively discover unique visual patterns. In addition, PatternNet can discover visual patterns efficiently without performing expensive image patch sampling, and this advantage provides an order of magnitude speedup compared to most other approaches. We evaluate the proposed PatternNet subjectively by showing randomly selected visual patterns which are discovered by our method and quantitatively by performing image classification with the identified visual patterns and comparing our performance with the current state-of-the-art. We also directly evaluate the quality of the discovered visual patterns by leveraging the identified patterns as proposed objects in an image and compare with other relevant methods. Our proposed network and procedure, PatterNet, is able to outperform competing methods for the tasks described. Paxos Algorithm A fault-tolerant file system called Echo was built at SRC in the late 80s. The builders claimed that it would maintain consistency despite any number of non-Byzantine faults, and would make progress if any majority of the processors were working. As with most such systems, it was quite simple when nothing went wrong, but had a complicated algorithm for handling failures based on taking care of all the cases that the implementers could think of. I decided that what they were trying to do was impossible, and set out to prove it. Instead, I discovered the Paxos algorithm, described in this paper. At the heart of the algorithm is a three-phase consensus protocol. Dale Skeen seems to have been the first to have recognized the need for a three-phase protocol to avoid blocking in the presence of an arbitrary single failure. However, to my knowledge, Paxos contains the first three-phase commit algorithm that is a real algorithm, with a clearly stated correctness condition and a proof of correctness. Payoff Dynamical Model(PDM) We consider that at every instant each member of a population, which we refer to as an agent, selects one strategy out of a finite set. The agents are nondescript, and their strategy choices are described by the so-called population state vector, whose entries are the portions of the population selecting each strategy. Likewise, each entry constituting the so-called payoff vector is the reward attributed to a strategy. We consider that a general finite-dimensional nonlinear dynamical system, denoted as payoff dynamical model (PDM), describes a mechanism that determines the payoff as a causal map of the population state. A bounded-rationality protocol, inspired primarily on evolutionary biology principles, governs how each agent revises its strategy repeatedly based on complete or partial knowledge of the population state and payoff. The population is protocol-homogeneous but is otherwise strategy-heterogeneous considering that the agents are allowed to select distinct strategies concurrently. A stochastic mechanism determines the instants when agents revise their strategies, but we consider that the population is large enough that, with high probability, the population state can be approximated with arbitrary accuracy uniformly over any finite horizon by a so-called (deterministic) mean population state. We propose an approach that takes advantage of passivity principles to obtain sufficient conditions determining, for a given protocol and PDM, when the mean population state is guaranteed to converge to a meaningful set of equilibria, which could be either an appropriately defined extension of Nash’s for the PDM or a perturbed version of it. By generalizing and unifying previous work, our framework also provides a foundation for future work. PC Algorithm An algorithm that has the same input/output relations as the SGS procedure for faithful distributions but which for sparse graphs does not require the testing of higher order independence relations in the discrete case, and in any case requires testing as few d-separation relations as possible. The procedure starts by forming the complete undirected graph, then ‘thins’ that graph by removing edges with zero order conditional independence relations, thins again with first order conditional independence relations, and so on. The set of variables conditioned on need only be a subset of the set of variables adjacent to one or the other of the variables conditioned. http://…/41_paper.pdf CausalFX PC-LPGM Biological processes underlying the basic functions of a cell involve complex interactions between genes. From a technical point of view, these interactions can be represented through a graph where genes and their connections are, respectively, nodes and edges. The main objective of this paper is to develop a statistical framework for modelling the interactions between genes when the activity of genes is measured on a discrete scale. In detail, we define a new algorithm for learning the structure of undirected graphs, PC-LPGM, proving its theoretical consistence in the limit of infinite observations. The proposed algorithm shows promising results when applied to simulated data as well as to real data. PCNet A deep neural network (DNN) based power control method is proposed, which aims at solving the non-convex optimization problem of maximizing the sum rate of a multi-user interference channel. Towards this end, we first present PCNet, which is a multi-layer fully connected neural network that is specifically designed for the power control problem. PCNet takes the channel coefficients as input and outputs the transmit power of all users. A key challenge in training a DNN for the power control problem is the lack of ground truth, i.e., the optimal power allocation is unknown. To address this issue, PCNet leverages the unsupervised learning strategy and directly maximizes the sum rate in the training phase. Observing that a single PCNet does not globally outperform the existing solutions, we further propose ePCNet, a network ensemble with multiple PCNets trained independently. Simulation results show that for the standard symmetric multi-user Gaussian interference channel, ePCNet can outperform all state-of-the-art power control methods by 1.2%-4.6% under a variety of system configurations. Furthermore, the performance improvement of ePCNet comes with a reduced computational complexity. Pedometrics Pedometrics is a branch of soil science that applies mathematical and statistical methods for the study of the distribution and genesis of soils. The goal of pedometrics is to achieve a better understanding of the soil as a phenomenon that varies over different scales in space and time. This understanding is important, both for improved soil management and for our scientific appreciation of the soil and the systems (agronomic, ecological and hydrological) of which it is a part. For this reason much of pedometrics is concerned with predicting the properties of the soil in space and time, with sampling and monitoring the soil and with modelling the soil’s behaviour. Pedometricians are typically engaged in developing and applying quantitative methods to apply to these problems. These include geostatistical methods for spatial prediction, sampling designs and strategies, linear modelling methods and novel mathematical and computational techniques such as wavelet transforms, data mining and fuzzy logic. Peek Search We resolve the fundamental problem of online decoding with ergodic Markov models. Specifically, we provide deterministic and randomized algorithms that are provably near-optimal under latency constraints with respect to the unconstrained offline optimal algorithm. Our algorithms admit efficient implementation via dynamic programs, and extend to (possibly adversarial) non-stationary or time-varying Markov settings as well. Moreover, we establish lower bounds in both deterministic and randomized settings subject to latency requirements, and prove that no online algorithm can perform significantly better than our algorithms. Peer Group Analysis Peer group analysis is a new tool for monitoring behavior over time in data mining situations. In particular, the tool detects individual objects that begin to behave in a way distinct from objects to which they had previously been similar. Each object is selected as a target object and is compared with all other objects in the database, using either external comparison criteria or internal criteria summarizing earlier behavior patterns of each object. Based on this comparison, a peer group of objects most similar to the target object is chosen. The behavior of the peer group is then summarized at each subsequent time point, and the behavior of the target object compared with the summary of its peer group. Those target objects exhibiting behavior most different from their peer group summary behavior are flagged as meriting closer investigation. The tool is intended to be part of the data mining process, involving cycling between the detection of objects that behave in anomalous ways and the detailed examination of those objects. Several aspects of peer group analysis can be tuned to the particular application, including the size of the peer group, the width of the moving behavior window being used, the way the peer group is summarized, and the measures of difference between the target object and its peer group summary. PeerNet Deep learning systems have become ubiquitous in many aspects of our lives. Unfortunately, it has been shown that such systems are vulnerable to adversarial attacks, making them prone to potential unlawful uses. Designing deep neural networks that are robust to adversarial attacks is a fundamental step in making such systems safer and deployable in a broader variety of applications (e.g. autonomous driving), but more importantly is a necessary step to design novel and more advanced architectures built on new computational paradigms rather than marginally building on the existing ones. In this paper we introduce PeerNets, a novel family of convolutional networks alternating classical Euclidean convolutions with graph convolutions to harness information from a graph of peer samples. This results in a form of non-local forward propagation in the model, where latent features are conditioned on the global structure induced by the graph, that is up to 3 times more robust to a variety of white- and black-box adversarial attacks compared to conventional architectures with almost no drop in accuracy. Penalized Adaptive Weighted Least Squares Regression(PWLS) To conduct regression analysis for data contaminated with outliers, many approaches have been proposed for simultaneous outlier detection and robust regression, so is the approach proposed in this manuscript. This new approach is called ‘penalized weighted least squares’ (PWLS). By assigning each observation an individual weight and incorporating a lasso-type penalty on the log-transformation of the weight vector, the PWLS is able to perform outlier detection and robust regression simultaneously. A Bayesian point-of-view of the PWLS is provided, and it is showed that the PWLS can be seen as an example of Mestimation. Two methods are developed for selecting the tuning parameter in the PWLS. The performance of the PWLS is demonstrated via simulations and real applications. pawls Penalized Component Hub Model Social network analysis presupposes that observed social behavior is influenced by an unobserved network. Traditional approaches to inferring the latent network use pairwise descriptive statistics that rely on a variety of measures of co-occurrence. While these techniques have proven useful in a wide range of applications, the literature does not describe the generating mechanism of the observed data from the network. In a previous article, the authors presented a technique which used a finite mixture model as the connection between the unobserved network and the observed social behavior. This model assumed that each group was the result of a star graph on a subset of the population. Thus, each group was the result of a leader who selected members of the population to be in the group. They called these hub models. This approach treats the network values as parameters of a model. However, this leads to a general challenge in estimating parameters which must be addressed. For small datasets there can be far more parameters to estimate than there are observations. Under these conditions, the estimated network can be unstable. In this article, we propose a solution which penalizes the number of nodes which can exert a leadership role. We implement this as a pseudo-Expectation Maximization algorithm. We demonstrate this technique through a series of simulations which show that when the number of leaders is sparse, parameter estimation is improved. Further, we apply this technique to a dataset of animal behavior and an example of recommender systems. Penalized Maximum Tangent Likelihood Estimation We introduce a new class of mean regression estimators — penalized maximum tangent likelihood estimation — for high-dimensional regression estimation and variable selection. We first explain the motivations for the key ingredient, maximum tangent likelihood estimation (MTE), and establish its asymptotic properties. We further propose a penalized MTE for variable selection and show that it is $\sqrt{n}$-consistent, enjoys the oracle property. The proposed class of estimators consists penalized $\ell_2$ distance, penalized exponential squared loss, penalized least trimmed square and penalized least square as special cases and can be regarded as a mixture of minimum Kullback-Leibler distance estimation and minimum $\ell_2$ distance estimation. Furthermore, we consider the proposed class of estimators under the high-dimensional setting when the number of variables $d$ can grow exponentially with the sample size $n$, and show that the entire class of estimators (including the aforementioned special cases) can achieve the optimal rate of convergence in the order of $\sqrt{\ln(d)/n}$. Finally, simulation studies and real data analysis demonstrate the advantages of the penalized MTE. Penalized Orthogonal-Components Regression(POCRE) Penalized orthogonal-components regression (POCRE) is a supervised dimension reduction method for high-dimensional data. It sequentially constructs orthogonal components (with selected features) which are maximally correlated to the response residuals. POCRE can also construct common components for multiple responses and thus build up latent-variable models. POCRE Penalized Splines of Propensity Prediction(PSPP) Little and An (2004, Statistica Sinica 14, 949-968) proposed a penalized spline of propensity prediction (PSPP) method of imputation of missing values that yields robust model-based inference under the missing at random assumption. The propensity score for a missing variable is estimated and a regression model is fitted that includes the spline of the estimated logit propensity score as a covariate. The predicted unconditional mean of the missing variable has a double robustness (DR) property under misspecification of the imputation model. We show that a simplified version of PSPP, which does not center other regressors prior to including them in the prediction model, also has the DR property. We also propose two extensions of PSPP, namely, stratified PSPP and bivariate PSPP, that extend the DR property to inferences about conditional means. These extended PSPP methods are compared with the PSPP method and simple alternatives in a simulation study and applied to an online weight loss study conducted by Kaiser Permanente.. ‘Robust-squared’ Imputation Models Using BART Pena-Yohai Initial Estimator Pena, D., & Yohai, V. (1999) . pyinit Penguin Search Optimisation Algorithm(PeSOA) This paper develops Penguin search Optimisation Algorithm (PeSOA), a new metaheuristic algorithm which is inspired by the foraging behaviours of penguins. A population of penguins located in the solution space of the given search and optimisation problem is divided into groups and tasked with finding optimal solutions. The penguins of a group perform simultaneous dives and work as a team to collaboratively feed on fish the energy content of which corresponds to the fitness of candidate solutions. Fish stocks have higher fitness and concentration near areas of solution optima and thus drive the search. Penguins can migrate to other places if their original habitat lacks food. We identify two forms of penguin communication both intra-group and inter-group which are useful in designing intensification and diversification strategies. An efficient intensification strategy allows fast convergence to a local optimum, whereas an effective diversification strategy avoids cyclic behaviour around local optima and explores more effectively the space of potential solutions. The proposed PeSOA algorithm has been validated on a well-known set of benchmark functions. Comparative performances with six other nature-inspired metaheuristics show that the PeSOA performs favourably in these tests. A run-time analysis shows that the performance obtained by the PeSOA is very stable at any time of the evolution horizon, making the PeSOA a viable approach for real world applications. People + AI Research(PAIR) The past few years have seen rapid advances in machine learning, with new technologies achieving dramatic improvements in technical performance. But we can go beyond optimizing objective functions. By building AI systems with users in mind from the ground up, we open up entire new areas of design and interaction. PAIR is devoted to advancing the research and design of people-centric AI systems. We’re interested in the full spectrum of human interaction with machine intelligence, from supporting engineers to understanding everyday experiences with AI. Our goal is to do fundamental research, invent new technology, and create frameworks for design in order to drive a human-centered approach to artificial intelligence. And we want to be as open as possible: we’re building open source tools that everyone can use, hosting public events, and supporting academics in advancing the state of the art. PEORL Reinforcement learning and symbolic planning have both been used to build intelligent autonomous agents. Reinforcement learning relies on learning from interactions with real world, which often requires an unfeasibly large amount of experience. Symbolic planning relies on manually crafted symbolic knowledge, which may not be robust to domain uncertainties and changes. In this paper we present a unified framework {\em PEORL} that integrates symbolic planning with hierarchical reinforcement learning (HRL) to cope with decision-making in a dynamic environment with uncertainties. Symbolic plans are used to guide the agent’s task execution and learning, and the learned experience is fed back to symbolic knowledge to improve planning. This method leads to rapid policy search and robust symbolic plans in complex domains. The framework is tested on benchmark domains of HRL. Percentages of Maximum Deviation from Independence(PEM) GDAtools Perception Perception (from the Latin perceptio, percipio) is the organization, identification, and interpretation of sensory information in order to represent and understand the environment. PerceptionNet Human Activity Recognition (HAR) based on motion sensors has drawn a lot of attention over the last few years, since perceiving the human status enables context-aware applications to adapt their services on users’ needs. However, motion sensor fusion and feature extraction have not reached their full potentials, remaining still an open issue. In this paper, we introduce PerceptionNet, a deep Convolutional Neural Network (CNN) that applies a late 2D convolution to multimodal time-series sensor data, in order to extract automatically efficient features for HAR. We evaluate our approach on two public available HAR datasets to demonstrate that the proposed model fuses effectively multimodal sensors and improves the performance of HAR. In particular, PerceptionNet surpasses the performance of state-of-the-art HAR methods based on: (i) features extracted from humans, (ii) deep CNNs exploiting early fusion approaches, and (iii) Long Short-Term Memory (LSTM), by an average accuracy of more than 3%. PerceptNet In order to design haptic icons or build a haptic vocabulary, we require a set of easily distinguishable haptic signals to avoid perceptual ambiguity, which in turn requires a way to accurately estimate the perceptual (dis)similarity of such signals. In this work, we present a novel method to learn such a perceptual metric based on data from human studies. Our method is based on a deep neural network that projects signals to an embedding space where the natural Euclidean distance accurately models the degree of dissimilarity between two signals. The network is trained only on non-numerical comparisons of triplets of signals, using a novel triplet loss that considers both types of triplets that are easy to order (inequality constraints), as well as those that are unorderable/ambiguous (equality constraints). Unlike prior MDS-based non-parametric approaches, our method can be trained on a partial set of comparisons and can embed new haptic signals without retraining the model from scratch. Extensive experimental evaluations show that our method is significantly more effective at modeling perceptual dissimilarity than alternatives. Perceptor Gradients Algorithm We present the perceptor gradients algorithm — a novel approach to learning symbolic representations based on the idea of decomposing an agent’s policy into i) a perceptor network extracting symbols from raw observation data and ii) a task encoding program which maps the input symbols to output actions. We show that the proposed algorithm is able to learn representations that can be directly fed into a Linear-Quadratic Regulator (LQR) or a general purpose A* planner. Our experimental results confirm that the perceptor gradients algorithm is able to efficiently learn transferable symbolic representations as well as generate new observations according to a semantically meaningful specification. Perceptron Ranking Using Interval Labeled Data(PRIL) In this paper, we propose an online learning algorithm PRIL for learning ranking classifiers using interval labeled data and show its correctness. We show its convergence in finite number of steps if there exists an ideal classifier such that the rank given by it for an example always lies in its label interval. We then generalize this mistake bound result for the general case. We also provide regret bound for the proposed algorithm. We propose a multiplicative update algorithm for PRIL called M-PRIL. We provide its correctness and convergence results. We show the effectiveness of PRIL by showing its performance on various datasets. Perceptron Turing Machine We introduce the perceptron Turing machine and show how it can be used to create a system of neuroevolution. Advantages of this approach include automatic scaling of solutions to larger problem sizes, the ability to experiment with hand-coded solutions, and an enhanced potential for understanding evolved solutions. Hand-coded solutions may be implemented in the low-level language of Turing machines, which is the genotype used in neuroevolution, but a high-level language called Lopro is introduced to make the job easier. Perceptual Feature(PF) Learning Implicit Generative Models by Matching Perceptual Features Perceptual Visual Interactive Learning(PVIL) Supervised learning methods are widely used in machine learning. However, the lack of labels in existing data limits the application of these technologies. Visual interactive learning (VIL) compared with computers can avoid semantic gap, and solve the labeling problem of small label quantity (SLQ) samples in a groundbreaking way. In order to fully understand the importance of VIL to the interaction process, we re-summarize the interactive learning related algorithms (e.g. clustering, classification, retrieval etc.) from the perspective of VIL. Note that, perception and cognition are two main visual processes of VIL. On this basis, we propose a perceptual visual interactive learning (PVIL) framework, which adopts gestalt principle to design interaction strategy and multi-dimensionality reduction (MDR) to optimize the process of visualization. The advantage of PVIL framework is that it combines computer’s sensitivity of detailed features and human’s overall understanding of global tasks. Experimental results validate that the framework is superior to traditional computer labeling methods (such as label propagation) in both accuracy and efficiency, which achieves significant classification results on dense distribution and sparse classes dataset. Percival Online advertising has been a long-standing concern for user privacy and overall web experience. Several techniques have been proposed to block ads, mostly based on filter-lists and manually-written rules. While a typical ad blocker relies on manually-curated block lists, these inevitably get out-of-date, thus compromising the ultimate utility of this ad blocking approach. In this paper we present Percival, a browser-embedded, lightweight, deep learning-powered ad blocker. Percival embeds itself within the browser’s image rendering pipeline, which makes it possible to intercept every image obtained during page execution and to perform blocking based on applying machine learning for image classification to flag potential ads. Our implementation inside both Chromium and Brave browsers shows only a minor rendering performance overhead of 4.55%, demonstrating the feasibility of deploying traditionally heavy models (i.e. deep neural networks) inside the critical path of the rendering engine of a browser. We show that our image-based ad blocker can replicate EasyList rules with an accuracy of 96.76%. To show the versatility of the Percival’s approach we present case studies that demonstrate that Percival 1) does surprisingly well on ads in languages other than English; 2) Percival also performs well on blocking first-party Facebook ads, which have presented issues for other ad blockers. Percival proves that image-based perceptual ad blocking is an attractive complement to today’s dominant approach of block lists Perfect Match Learning representations for counterfactual inference from observational data is of high practical relevance for many domains, such as healthcare, public policy and economics. Counterfactual inference enables one to answer ‘What if…?’ questions, such as ‘What would be the outcome if we gave this patient treatment $t_1$?’. However, current methods for training neural networks for counterfactual inference on observational data are either overly complex, limited to settings with only two available treatment options, or both. Here, we present Perfect Match (PM), a method for training neural networks for counterfactual inference that is easy to implement, compatible with any architecture, does not add computational complexity or hyperparameters, and extends to any number of treatments. PM is based on the idea of augmenting samples within a minibatch with their propensity-matched nearest neighbours. Our experiments demonstrate that PM outperforms a number of more complex state-of-the-art methods in inferring counterfactual outcomes across several real-world and semi-synthetic datasets. Perfect Privacy The problem of private data disclosure is studied from an information theoretic perspective. Considering a pair of correlated random variables $(X,Y)$, where $Y$ denotes the observed data while $X$ denotes the private latent variables, the following problem is addressed: What is the maximum information that can be revealed about $Y$, while disclosing no information about $X$? Assuming that a Markov kernel maps $Y$ to the revealed information $U$, it is shown that the maximum mutual information between $Y$ and $U$, i.e., $I(Y;U)$, can be obtained as the solution of a standard linear program, when $X$ and $U$ are required to be independent, called \textit{perfect privacy}. This solution is shown to be greater than or equal to the \textit{non-private information about $X$ carried by $Y$.} Maximal information disclosure under perfect privacy is is shown to be the solution of a linear program also when the utility is measured by the reduction in the mean square error, $\mathbb{E}[(Y-U)^2]$, or the probability of error, $\mbox{Pr}$. For jointly Gaussian $(X,Y)$, it is shown that perfect privacy is not possible if the kernel is applied to only $Y$; whereas perfect privacy can be achieved if the mapping is from both $X$ and $Y$; that is, if the private latent variables can also be observed at the encoder. Next, measuring the utility and privacy by $I(Y;U)$ and $I(X;U)$, respectively, the slope of the optimal utility-privacy trade-off curve is studied when $I(X;U)=0$. Finally, through a similar but independent analysis, an alternative characterization of the maximal correlation between two random variables is provided. Performance Analytics Decision Support Framework(PADS) The PADS (Performance Analytics Decision Support) Framework represents a more strategic approach to linking next-generation performance management and big data analytics technologies. The twin missions of the PADS Framework are to: 1. facilitate communication and collaboration among IT and business teams to proactively anticipate, identify and resolve application performance problems by focusing on user experience across the entire application delivery chain; and, 2. enable IT to orchestrate and manage internally and externally sourced services efficiently to improve decision-making and business outcomes. The PADS Framework can help companies ensure employee engagement and increase customer satisfaction and loyalty to drive higher operating results and market valuation. Performance Envelope One way to speed up the algorithm configuration task is to use short runs instead of long runs as much as possible, but without discarding the configurations that eventually do well on the long runs. We consider the problem of selecting the top performing configurations of the Conditional Markov Chain Search (CMCS), a general algorithm schema that includes, for examples, VNS. We investigate how the structure of performance on short tests links with those on long tests, showing that significant differences arise between test domains. We propose a ‘performance envelope’ method to exploit the links; that learns when runs should be terminated, but that automatically adapts to the domain. Performance-Feedback Autoscaler(PFA) The growing popularity of workflows in the cloud domain promoted the development of sophisticated autoscaling policies that allow automatic allocation and deallocation of resources. However, many state-of-the-art autoscaling policies for workflows are mostly plan-based or designed for batches (ensembles) of workflows. This reduces their flexibility when dealing with workloads of workflows, as the workloads are often subject to unpredictable resource demand fluctuations. Moreover, autoscaling in clouds almost always imposes budget constraints that should be satisfied. The budget-aware autoscalers for workflows usually require task runtime estimates to be provided beforehand, which is not always possible when dealing with workloads due to their dynamic nature. To address these issues, we propose a novel Performance-Feedback Autoscaler (PFA) that is budget-aware and does not require task runtime estimates for its operation. Instead, it uses the performance-feedback loop that monitors the average throughput on each resource type. We implement PFA in the popular Apache Airflow workflow management system, and compare the performance of our autoscaler with other two state-of-the-art autoscalers, and with the optimal solution obtained with the Mixed Integer Programming approach. Our results show that PFA outperforms other considered online autoscalers, as it effectively minimizes the average job slowdown by up to 47% while still satisfying the budget constraints. Moreover, PFA shows by up to 76% lower average runtime than the competitors. Periodicity-based Time Series Prediction ➘ “Time Series Data Compression and Abstraction” Permutation Distribution Clustering pdc Permutation Invariant Gaussian Matrix Model Permutation invariant Gaussian matrix models were recently developed for applications in computational linguistics. A 5-parameter family of models was solved. In this paper, we use a representation theoretic approach to solve the general 13-parameter Gaussian model, which can be viewed as a zero-dimensional quantum field theory. We express the two linear and eleven quadratic terms in the action in terms of representation theoretic parameters. These parameters are coefficients of simple quadratic expressions in terms of appropriate linear combinations of the matrix variables transforming in specific irreducible representations of the symmetric group $S_D$ where $D$ is the size of the matrices. They allow the identification of constraints which ensure a convergent Gaussian measure and well-defined expectation values for polynomial functions of the random matrix at all orders. A graph-theoretic interpretation is known to allow the enumeration of permutation invariants of matrices at linear, quadratic and higher orders. We express the expectation values of all the quadratic graph-basis invariants and a selection of cubic and quartic invariants in terms of the representation theoretic parameters of the model. Permutation Invariant Multi-Modal Segmentation(PIMMS) In a research context, image acquisition will often involve a pre-defined static protocol and the data will be of high quality. If we are to build applications that work in hospitals without significant operational changes in care delivery, algorithms should be designed to cope with the available data in the best possible way. In a clinical environment, imaging protocols are highly flexible, with MRI sequences commonly missing appropriate sequence labeling (e.g. T1, T2, FLAIR). To this end we introduce PIMMS, a Permutation Invariant Multi-Modal Segmentation technique that is able to perform inference over sets of MRI scans without using modality labels. We present results which show that our convolutional neural network can, in some settings, outperform a baseline model which utilizes modality labels, and achieve comparable performance otherwise. Permutation Phase Defense(PPD) Deep neural networks have demonstrated cutting edge performance on various tasks including classification. However, it is well known that adversarially designed imperceptible perturbation of the input can mislead advanced classifiers. In this paper, Permutation Phase Defense (PPD), is proposed as a novel method to resist adversarial attacks. PPD combines random permutation of the image with phase component of its Fourier transform. The basic idea behind this approach is to turn adversarial defense problems analogously into symmetric cryptography, which relies solely on safekeeping of the keys for security. In PPD, safe keeping of the selected permutation ensures effectiveness against adversarial attacks. Testing PPD on MNIST and CIFAR-10 datasets yielded state-of-the-art robustness against the most powerful adversarial attacks currently available. Permutation Tests A permutation test (also called a randomization test, re-randomization test, or an exact test) is a type of statistical significance test in which the distribution of the test statistic under the null hypothesis is obtained by calculating all possible values of the test statistic under rearrangements of the labels on the observed data points. In other words, the method by which treatments are allocated to subjects in an experimental design is mirrored in the analysis of that design. If the labels are exchangeable under the null hypothesis, then the resulting tests yield exact significance levels; see also exchangeability. Confidence intervals can then be derived from the tests. Perpetual Learning Machine Despite the promise of brain-inspired machine learning, deep neural networks (DNN) have frustratingly failed to bridge the deceptively large gap between learning and memory. Here, we introduce a Perpetual Learning Machine; a new type of DNN that is capable of brain-like dynamic ‘on the fly’ learning because it exists in a self-supervised state of Perpetual Stochastic Gradient Descent. Thus, we provide the means to unify learning and memory within a machine learning framework. Perplexity In information theory, perplexity is a measurement of how well a probability distribution or probability model predicts a sample. It may be used to compare probability models. Persistence bag-of-Words Persistent homology (PH) is a rigorous mathematical theory that provides a robust descriptor of data in the form of persistence diagrams (PDs). PDs are compact 2D representations formed by multisets of points. Their variable size makes them, however, difficult to combine with typical machine learning workflows. In this paper, we introduce persistence bag-of-words, which is a novel, expressive and discriminative vectorized representation of PDs for topological data analysis. It represents PDs in a convenient way for machine learning and statistical analysis and has a number of favorable practical and theoretical properties like 1-Wasserstein stability. We evaluate our representation on several heterogeneous datasets and show its high discriminative power. Our approach achieves state-of-the-art performance and even beyond in much less time than alternative approaches. Thereby, it facilitates the topological analysis of large-scale data sets in future. Persistence Curve Persistence diagrams are a main tool in the field of Topological Data Analysis (TDA). They contain fruitful information about the shape of data. The use of machine learning algorithms on the space of persistence diagrams proves to be challenging as the space is complicated. For that reason, summarizing and vectorizing these diagrams is an important topic currently researched in TDA. In this work, we provide a general framework of summarizing diagrams that we call Persistence Curves (PC). The main idea is so-called Fundamental Lemma of Persistent Homology, which is derived from the classic elder rule. Under this framework, certain well-known summaries, such as persistent Betti numbers, and persistence landscape, are special cases of the PC. Moreover, we prove a rigorous bound for a general families of PCs. In particular, certain family of PCs admit the stability property under an additional assumption. Finally, we apply PCs to textures classification on four well-know texture datasets. The result outperforms several existing TDA methods. Persistence Diagrams Persistence diagrams have been widely recognized as a compact descriptor for characterizing multiscale topological features in data. When many datasets are available, statistical features embedded in those persistence diagrams can be extracted by applying machine learnings. In particular, the ability for explicitly analyzing the inverse in the original data space from those statistical features of persistence diagrams is significantly important for practical applications. Persistent Homology for Generative Model(PHom-GeM) Generative neural network models, including Generative Adversarial Network (GAN) and Auto-Encoders (AE), are among the most popular neural network models to generate adversarial data. The GAN model is composed of a generator that produces synthetic data and of a discriminator that discriminates between the generator’s output and the true data. AE consist of an encoder which maps the model distribution to a latent manifold and of a decoder which maps the latent manifold to a reconstructed distribution. However, generative models are known to provoke chaotically scattered reconstructed distribution during their training, and consequently, incomplete generated adversarial distributions. Current distance measures fail to address this problem because they are not able to acknowledge the shape of the data manifold, i.e. its topological features, and the scale at which the manifold should be analyzed. We propose Persistent Homology for Generative Models, PHom-GeM, a new methodology to assess and measure the distribution of a generative model. PHom-GeM minimizes an objective function between the true and the reconstructed distributions and uses persistent homology, the study of the topological features of a space at different spatial resolutions, to compare the nature of the true and the generated distributions. Our experiments underline the potential of persistent homology for Wasserstein GAN in comparison to Wasserstein AE and Variational AE. The experiments are conducted on a real-world data set particularly challenging for traditional distance measures and generative neural network models. PHom-GeM is the first methodology to propose a topological distance measure, the bottleneck distance, for generative models used to compare adversarial samples in the context of credit card transactions. Persistent Identifier Kernel Information Persistent Identifier (PID) is a widely used long-term unique reference to digital objects. Meanwhile, Handle, one of the main persistent identifier schemes in use, implements a central global registry to resolve PIDs. The value of Handle varies in sizes and types without any restrictions from user side. However, widely using the Handel raises challenges on managing and correlating different PIDs for users and curators. In this research paper, we raise an idea about the value of Handle, called Persistent Identifier Kernel Information, which is the critical metadata describing the minimal information for identifying the PID object. Simultaneously, an API service called Collection API, is collaborating with PID Kernel Information to manage the Backbone Provenance relationships among different PIDs. This paper is an early research exploration describing the strength and weakness of Collection API and PID Kernel Information. Person re-identification(PReID) Intelligent video-surveillance is currently an active research field in computer vision and machine learning techniques. It provides useful tools for surveillance operators and forensic video investigators. Person re-identification (PReID) is one among these tools. It consists of recognizing whether an individual has already been observed over a camera in a network or not. This tool can also be employed in various possible applications such as off-line retrieval of all the video-sequences showing an individual of interest whose image is given a query, and online pedestrian tracking over multiple camera views. To this aim, many techniques have been proposed to increase the performance of PReID. Among the systems, many researchers utilized deep neural networks (DNNs) because of their better performance and fast execution at test time. Our objective is to provide for future researchers the work being done on PReID to date. Therefore, we summarized state-of-the-art DNN models being used for this task. A brief description of each model along with their evaluation on a set of benchmark datasets is given. Finally, a detailed comparison is provided among these models followed by some limitations that can work as guidelines for future research. Personalized Attention Network(PANet) Human visual attention is subjective and biased according to the personal preference of the viewer, however, current works of saliency detection are general and objective, without counting the factor of the observer. This will make the attention prediction for a particular person not accurate enough. In this work, we present the novel idea of personalized attention prediction and develop Personalized Attention Network (PANet), a convolutional network that predicts saliency in images with personal preference. The model consists of two streams which share common feature extraction layers, and one stream is responsible for saliency prediction, while the other is adapted from the detection model and used to fit user preference. We automatically collect user preference from their albums and leaves them freedom to define what and how many categories their preferences are divided into. To train PANet, we dynamically generate ground truth saliency maps upon existing detection labels and saliency labels, and the generation parameters are based upon our collected datasets consists of 1k images. We evaluate the model with saliency prediction metrics and test the trained model on different preference vectors. The results have shown that our system is much better than general models in personalized saliency prediction and is efficient to use for different preferences. Personalized Embedding Propagation(PEP) Neural message passing algorithms for semi-supervised classification on graphs have recently achieved great success. However, these methods only consider nodes that are a few propagation steps away and the size of this utilized neighborhood cannot be easily extended. In this paper, we use the relationship between graph convolutional networks (GCN) and PageRank to derive an improved propagation scheme based on personalized PageRank. We utilize this propagation procedure to construct personalized embedding propagation (PEP) and its approximation, PEP$_\text{A}$. Our model’s training time is on par or faster and its number of parameters on par or lower than previous models. It leverages a large, adjustable neighborhood for classification and can be combined with any neural network. We show that this model outperforms several recently proposed methods for semi-supervised classification on multiple graphs in the most thorough study done so far for GCN-like models. Personalized Evolving Model for Social Network and Opinion(PENO) Network dynamics has always been a meaningful topic deserving exploration in the realm of academy. previous network models contain two parts: (1) generating structure as per user property; (2) changing property as per network structure. Properties in these models, however, cannot be interpreted to concept in prevalent social theories or empirical truth. Also, they usually treat everyone in an uniform fashion. While such assumption is quite misguiding, and thus saliently limits their performance. To overcome these flaws, we devise a personalized evolving model for social network and opinion (PENO), where citizens’ ideology is revealed by variable opinions and four dimensions of personality are considered for each entity – leadership, openness, agreeableness, and neuroticism. Opinion propagates via social tie, tie generates from opinion affinity, and personalities integrally work with opinion and tie across evolution. To our best knowledge, PENO is the first attempt to introduce personality impact in network dynamics and verify social science with reasonable visualization during simulation. We also present its probabilistic graph and conceive iterative learning algorithm. Experiments show PENO outperforms several state-of-the-art baselines over two typical prediction tasks – congress voting prediction for legislative bills and friendship prediction on a book-commenting website. Finally, we discuss its scalability to do multi-task learning and transfer learning in daily scenarios. Personalized Neural Embedding(PNE) Collaborative filtering (CF) is a core technique for recommender systems. Traditional CF approaches exploit user-item relations (e.g., clicks, likes, and views) only and hence they suffer from the data sparsity issue. Items are usually associated with unstructured text such as article abstracts and product reviews. We develop a Personalized Neural Embedding (PNE) framework to exploit both interactions and words seamlessly. We learn such embeddings of users, items, and words jointly, and predict user preferences on items based on these learned representations. PNE estimates the probability that a user will like an item by two terms—behavior factors and semantic factors. On two real-world datasets, PNE shows better performance than four state-of-the-art baselines in terms of three metrics. We also show that PNE learns meaningful word embeddings by visualization. Personalized PageRank(PPP) Personally Identifiable Information(PII) Personally identifiable information (PII), or Sensitive Personal Information (SPI), as used in US privacy law and information security, is information that can be used on its own or with other information to identify, contact, or locate a single person, or to identify an individual in context. The abbreviation PII is widely accepted in the US context, but the phrase it abbreviates has four common variants based on personal / personally, and identifiable / identifying. Not all are equivalent, and for legal purposes the effective definitions vary depending on the jurisdiction and the purposes for which the term is being used. (In other countries with privacy protection laws derived from the OECD privacy principles, the term used is more often ‘personal information’, which may be somewhat broader: in Australia’s Privacy Act 1988 (Cth) ‘personal information’ also includes information from which the person’s identity is ‘reasonably ascertainable’, potentially covering some information not covered by PII.) NIST Special Publication 800-122 defines PII as ‘any information about an individual maintained by an agency, including (1) any information that can be used to distinguish or trace an individual‘s identity, such as name, social security number, date and place of birth, mother‘s maiden name, or biometric records; and (2) any other information that is linked or linkable to an individual, such as medical, educational, financial, and employment information.’ So, for example, a user’s IP address as used in a communication exchange is classed as PII regardless of whether it may or may not on its own be able to uniquely identify a person. Although the concept of PII is old, it has become much more important as information technology and the Internet have made it easier to collect PII through breaches of Internet security, network security and web browser security, leading to a profitable market in collecting and reselling PII. PII can also be exploited by criminals to stalk or steal the identity of a person, or to aid in the planning of criminal acts. As a response to these threats, many website privacy policies specifically address the gathering of PII, and lawmakers have enacted a series of legislations to limit the distribution and accessibility of PII. However, PII is a legal concept, not a technical concept. Because of the versatility and power of modern re-identification algorithms, the absence of PII data does not mean that the remaining data does not identify individuals. While some attributes may be uniquely identifying on their own, any attribute can be identifying in combination with others. These attributes have been referred to as quasi-identifiers or pseudo-identifiers. anonymizer,generator,detector Perturbative GAN Perturbative GAN, which replaces convolution layers of existing convolutional GANs (DCGAN, WGAN-GP, BIGGAN, etc.) with perturbation layers that adds a fixed noise mask, is proposed. Compared with the convolu-tional GANs, the number of parameters to be trained is smaller, the convergence of training is faster, the incep-tion score of generated images is higher, and the overall training cost is reduced. Algorithmic generation of the noise masks is also proposed, with which the training, as well as the generation, can be boosted with hardware acceleration. Perturbative GAN is evaluated using con-ventional datasets (CIFAR10, LSUN, ImageNet), both in the cases when a perturbation layer is adopted only for Generators and when it is introduced to both Generator and Discriminator. Perturbative Neural Network(PNN) Convolutional neural networks are witnessing wide adoption in computer vision systems with numerous applications across a range of visual recognition tasks. Much of this progress is fueled through advances in convolutional neural network architectures and learning algorithms even as the basic premise of a convolutional layer has remained unchanged. In this paper, we seek to revisit the convolutional layer that has been the workhorse of state-of-the-art visual recognition models. We introduce a very simple, yet effective, module called a perturbation layer as an alternative to a convolutional layer. The perturbation layer does away with convolution in the traditional sense and instead computes its response as a weighted linear combination of non-linearly activated additive noise perturbed inputs. We demonstrate both analytically and empirically that this perturbation layer can be an effective replacement for a standard convolutional layer. Empirically, deep neural networks with perturbation layers, called Perturbative Neural Networks (PNNs), in lieu of convolutional layers perform comparably with standard CNNs on a range of visual datasets (MNIST, CIFAR-10, PASCAL VOC, and ImageNet) with fewer parameters. Perturbed Model Validation(PMV) This paper introduces PMV (Perturbed Model Validation), a new technique to validate model relevance and detect overfitting or underfitting. PMV operates by injecting noise to the training data, re-training the model against the perturbed data, then using the training accuracy decrease rate to assess model relevance. A larger decrease rate indicates better concept-hypothesis fit. We realise PMV by using label flipping to inject noise, and evaluate it on four real-world datasets (breast cancer, adult, connect-4, and MNIST) and three synthetic datasets in the binary classification setting. The results reveal that PMV selects models more precisely and in a more stable way than cross-validation, and effectively detects both overfitting and underfitting. Pervasive Analytics During eras of global economic shifts, there was always a key resource discovered that became the spark of transformation for groups of individuals that could effectively harness it. Today, that resource is data. In no uncertain terms, we are witnessing a global data rush and leading companies realize that data will grow enterprise over the next several decades as much as any capital asset. These forward-looking companies realize that to be successful, enterprises must leverage analytics in order to create a more predictable and valuable organization. In some cases they must package data in a way that adds value and informs employees, or their customers, by deploying analytics into decisions making processes everywhere. This idea is referred to as pervasive analytics. But to drive a pervasive analytics strategy and win the data rush, successful companies also recognize the need to transform the way they think about data management and processes in order to unlock the true value of data. Pervasive Computing The idea that technology is moving beyond the personal computer to everyday devices with embedded technology and connectivity as computing devices become progressively smaller and more powerful. Also called ubiquitous computing, pervasive computing is the result of computer technology advancing at exponential speeds — a trend toward all man-made and some natural products having hardware and software. Pervasive computing goes beyond the realm of personal computers: it is the idea that almost any device, from clothing to tools to appliances to cars to homes to the human body to your coffee mug, can be imbedded with chips to connect the device to an infinite network of other devices. The goal of pervasive computing, which combines current network technologies with wireless computing, voice recognition, Internet capability and artificial intelligence, is to create an environment where the connectivity of devices is embedded in such a way that the connectivity is unobtrusive and always available. PeyeDF PeyeDF is a Portable Document Format (PDF) reader with eye tracking support, available as free and open source software. It is especially useful to researchers investigating reading and learning phenomena, as it integrates PDF reading-related behavioural data with gaze-related data. It is suitable for short and long-term research and supports multiple eye tracking systems. We utilised it to conduct an experiment which demonstrated that features obtained from both gaze and reading data collected in the past can predict reading comprehension which takes place in the future. PeyeDF also provides an integrated means for data collection and indexing using the DiMe personal data storage system. It is designed to collect data in the background without interfering with the reading experience, behaving like a modern lightweight PDF reader. Moreover, it supports annotations, tagging and collaborative work. A modular design allows the application to be easily modified in order to support additional eye tracking protocols and run controlled experiments. We discuss the implementation of the software and report on the results of the experiment which we conducted with it. Phaedra Phaedra is an open source platform for data capture and analysis of high-content screening data. It offers functionality to · import image data from any source · assess your data with industry’s richest toolbox · improve data quality using intelligent validation methods · use built-in statistics and machine learning workflows · generate QC and analysis reports using templates Phased LSTM Recurrent Neural Networks (RNNs) have become the state-of-the-art choice for extracting patterns from temporal sequences. However, current RNN models are ill-suited to process irregularly sampled data triggered by events generated in continuous time by sensors or other neurons. Such data can occur, for example, when the input comes from novel event-driven artificial sensors that generate sparse, asynchronous streams of events or from multiple conventional sensors with different update intervals. In this work, we introduce the Phased LSTM model, which extends the LSTM unit by adding a new time gate. This gate is controlled by a parametrized oscillation with a frequency range that produces updates of the memory cell only during a small percentage of the cycle. Even with the sparse updates imposed by the oscillation, the Phased LSTM network achieves faster convergence than regular LSTMs on tasks which require learning of long sequences. The model naturally integrates inputs from sensors of arbitrary sampling rates, thereby opening new areas of investigation for processing asynchronous sensory events that carry timing information. It also greatly improves the performance of LSTMs in standard RNN applications, and does so with an order-of-magnitude fewer computes at runtime. PhaseLin Phase retrieval deals with the recovery of complex- or real-valued signals from magnitude measurements. As shown recently, the method PhaseMax enables phase retrieval via convex optimization and without lifting the problem to a higher dimension. To succeed, PhaseMax requires an initial guess of the solution, which can be calculated via spectral initializers. In this paper, we show that with the availability of an initial guess, phase retrieval can be carried out with an ever simpler, linear procedure. Our algorithm, called PhaseLin, is the linear estimator that minimizes the mean squared error (MSE) when applied to the magnitude measurements. The linear nature of PhaseLin enables an exact and nonasymptotic MSE analysis for arbitrary measurement matrices. We furthermore demonstrate that by iteratively using PhaseLin, one arrives at an efficient phase retrieval algorithm that performs on par with existing convex and nonconvex methods on synthetic and real-world data. PHI Scrubber Confidentiality of patient information is an essential part of Electronic Health Record System. Patient information, if exposed, can cause a serious damage to the privacy of individuals receiving healthcare. Hence it is important to remove such details from physician notes. A system is proposed which consists of a deep learning model where a de-convolutional neural network and bi-directional LSTM-CNN is used along with regular expressions to recognize and eliminate the individually identifiable information. This information is then removed from a medical practitioner’s data which further allows the fair usage of such information among researchers and in clinical trials. PHOENICS In this work we introduce PHOENICS, a probabilistic global optimization algorithm combining ideas from Bayesian optimization with concepts from Bayesian kernel density estimation. We propose an inexpensive acquisition function balancing the explorative and exploitative behavior of the algorithm. This acquisition function enables intuitive sampling strategies for an efficient parallel search of global minima. The performance of PHOENICS is assessed via an exhaustive benchmark study on a set of 15 discrete, quasi-discrete and continuous multidimensional functions. Unlike optimization methods based on Gaussian processes (GP) and random forests (RF), we show that PHOENICS is less sensitive to the nature of the co-domain, and outperforms GP and RF optimizations. We illustrate the performance of PHOENICS on the Oregonator, a difficult case-study describing a complex chemical reaction network. We demonstrate that only PHOENICS was able to reproduce qualitatively and quantitatively the target dynamic behavior of this nonlinear reaction dynamics. We recommend PHOENICS for rapid optimization of scalar, possibly non-convex, black-box unknown objective functions. PhonSenticNet With the current upsurge in the usage of social media platforms, the trend of using short text (microtext) in place of standard words has seen a significant rise. The usage of microtext poses a considerable performance issue in concept-level sentiment analysis, since models are trained on standard words. This paper discusses the impact of coupling sub-symbolic (phonetics) with symbolic (machine learning) Artificial Intelligence to transform the out-of-vocabulary concepts into their standard in-vocabulary form. The phonetic distance is calculated using the Sorensen similarity algorithm. The phonetically similar invocabulary concepts thus obtained are then used to compute the correct polarity value, which was previously being miscalculated because of the presence of microtext. Our proposed framework increases the accuracy of polarity detection by 6% as compared to the earlier model. This also validates the fact that microtext normalization is a necessary pre-requisite for the sentiment analysis task. Photon Machine Learning(Photon ML) Photon ML is a machine learning library based on Apache Spark. It was originally developed by the LinkedIn Machine Learning Algorithms Team. Currently, Photon ML supports training different types of Generalized Linear Models(GLMs) and Generalized Linear Mixed Models(GLMMs/GLMix model): logistic, linear, and Poisson. Physical Hybrid System(PHS) Some hybrid systems models are unsafe for mathematically correct but physically unrealistic reasons. For example, mathematical models can classify a system as being unsafe on a set that is too small to have physical importance. In particular, differences in measure zero sets in models of cyber-physical systems (CPS) have significant mathematical impact on the mathematical safety of these models even though differences on measure zero sets have no tangible physical effect in a real system. We develop the concept of ‘physical hybrid systems’ (PHS) to help reunite mathematical models with physical reality. We modify a hybrid systems logic (differential temporal dynamic logic) by adding a first-class operator to elide distinctions on measure zero sets of time within CPS models. This approach facilitates modeling since it admits the verification of a wider class of models, including some physically realistic models that would otherwise be classified as mathematically unsafe. We also develop a proof calculus to help with the verification of PHS. Physics Enhanced Artificial Intelligence(PEAI) We propose that intelligently combining models from the domains of Artificial Intelligence or Machine Learning with Physical and Expert models will yield a more ‘trustworthy’ model than any one model from a single domain, given a complex and narrow enough problem. Based on mean-variance portfolio theory and bias-variance trade-off analysis, we prove combining models from various domains produces a model that has lower risk, increasing user trust. We call such combined models – physics enhanced artificial intelligence (PEAI), and suggest use cases for PEAI. Physics-Guided Neural Network(PGNN) This paper introduces a novel framework for combining scientific knowledge of physics-based models with neural networks to advance scientific discovery. This framework, termed as physics-guided neural network (PGNN), leverages the output of physics-based model simulations along with observational features to generate predictions using a neural network architecture. Further, this paper presents a novel framework for using physics-based loss functions in the learning objective of neural networks, to ensure that the model predictions not only show lower errors on the training set but are also scientifically consistent with the known physics on the unlabeled set. We illustrate the effectiveness of PGNN for the problem of lake temperature modeling, where physical relationships between the temperature, density, and depth of water are used to design a physics-based loss function. By using scientific knowledge to guide the construction and learning of neural networks, we are able to show that the proposed framework ensures better generalizability as well as scientific consistency of results. Physics-Informed Gaussian Process Regression(GPR) We present a physics-informed Gaussian Process Regression (GPR) model to predict the phase angle, angular speed, and wind mechanical power from a limited number of measurements. In the traditional data-driven GPR method, the form of the Gaussian Process covariance matrix is assumed and its parameters are found from measurements. In the physics-informed GPR, we treat unknown variables (including wind speed and mechanical power) as a random process and compute the covariance matrix from the resulting stochastic power grid equations. We demonstrate that the physics-informed GPR method is significantly more accurate than the standard data-driven one for immediate forecasting of generators’ angular velocity and phase angle. We also show that the physics-informed GPR provides accurate predictions of the unobserved wind mechanical power, phase angle, or angular velocity when measurements from only one of these variables are available. The immediate forecast of observed variables and predictions of unobserved variables can be used for effectively managing power grids (electricity market clearing, regulation actions) and early detection of abnormal behavior and faults. The physics-based GPR forecast time horizon depends on the combination of input (wind power, load, etc.) correlation time and characteristic (relaxation) time of the power grid and can be extended to short and medium-range times. Physics-Informed Kriging(PhIK) In this work, we propose a new Gaussian process regression (GPR) method: physics-informed Kriging (PhIK). In the standard data-driven Kriging, the unknown function of interest is usually treated as a Gaussian process with assumed stationary covariance with hyperparameters estimated from data. In PhIK, we compute the mean and covariance function from realizations of available stochastic models, e.g., from realizations of governing stochastic partial differential equations solutions. Such a constructed Gaussian process generally is non-stationary, and does not assume a specific form of the covariance function. Our approach avoids the costly optimization step in data-driven GPR methods to identify the hyperparameters. More importantly, we prove that the physical constraints in the form of a deterministic linear operator are guaranteed in the resulting prediction. We also provide an error estimate in preserving the physical constraints when errors are included in the stochastic model realizations. To reduce the computational cost of obtaining stochastic model realizations, we propose a multilevel Monte Carlo estimate of the mean and covariance functions. Further, we present an active learning algorithm that guides the selection of additional observation locations. The efficiency and accuracy of PhIK are demonstrated for reconstructing a partially known modified Branin function and learning a conservative tracer distribution from sparse concentration measurements. Picasso Picasso is a free open-source (Eclipse Public License) web application written in Python for rendering standard visualizations useful for training convolutional neural networks. Picasso ships with occlusion maps and saliency maps, two visualizations which help reveal issues that evaluation metrics like loss and accuracy might hide: for example, learning a proxy classification task. Picasso works with the Keras and Tensorflow deep learning frameworks. Picasso can be used with minimal configuration by deep learning researchers and engineers alike across various neural network architectures. Adding new visualizations is simple: the user can specify their visualization code and HTML template separately from the application code. Piecewise Linear(PWL) In this paper, we study the representational power of deep neural networks (DNN) that belong to the family of piecewise-linear (PWL) functions, based on PWL activation units such as rectifier or maxout. We investigate the complexity of such networks by studying the number of linear regions of the PWL function. Typically, a PWL function from a DNN can be seen as a large family of linear functions acting on millions of such regions. We directly build upon the work of Montufar et al. (2014) and Raghu et al. (2017) by refining the upper and lower bounds on the number of linear regions for rectified and maxout networks. In addition to achieving tighter bounds, we also develop a novel method to perform exact enumeration or counting of the number of linear regions with a mixed-integer linear formulation that maps the input space to output. We use this new capability to visualize how the number of linear regions change while training DNNs. Piecewise-Deterministic Markov Processes(PDMP) In probability theory, a piecewise-deterministic Markov process (PDMP) is a process whose behaviour is governed by random jumps at points in time, but whose evolution is deterministically governed by an ordinary differential equation between those times. The class of models is “wide enough to include as special cases virtually all the non-diffusion models of applied probability.” The process is defined by three quantities: the flow, the jump rate, and the transition measure. The model was first introduced in a paper by Mark H. A. Davis in 1984. EstSimPDMP Pierre’s Correlogram Rcriticor Pigeonring The pigeonhole principle states that if n items are contained in m boxes, then at least one box has no fewer than n/m items. It is utilized to solve many data management problems, especially for thresholded similarity searches. Despite many pigeonhole principle-based solutions proposed in the last few decades, the condition stated by the principle is weak. It only constrains the number of items in a single box. By organizing the boxes in a ring, we observe that the number of items in multiple boxes are also constrained. We propose a new principle called the pigeonring principle which formally captures such constraints and yields stronger conditions. To utilize the pigeonring principle, we focus on problems defined in the form of identifying data objects whose similarities or distances to the query is constrained by a threshold. Many solutions to these problems utilize the pigeonhole principle to find candidates that satisfy a filtering condition. By the pigeonring principle, stronger filtering conditions can be established. We show that the pigeonhole principle is a special case of the pigeonring principle. This suggests that all the solutions based on the pigeonhole principle are possible to be accelerated by the pigeonring principle. A universal filtering framework is introduced to encompass the solutions to these problems based on the pigeonring principle. Besides, we discuss how to quickly find candidates specified by the pigeonring principle with minor modifications on top of existing algorithms. Experimental results on real datasets demonstrate the applicability of the pigeonring principle as well as the superior performance of the algorithms based on the principle. Pilot-Streaming An increasing number of scientific applications rely on stream processing for generating timely insights from data feeds of scientific instruments, simulations, and Internet-of-Thing (IoT) sensors. The development of streaming applications is a complex task and requires the integration of heterogeneous, distributed infrastructure, frameworks, middleware and application components. Different application components are often written in different languages using different abstractions and frameworks. Often, additional components, such as a message broker (e.g. Kafka), are required to decouple data production and consumptions and avoiding issues, such as back-pressure. Streaming applications may be extremely dynamic due to factors, such as variable data rates caused by the data source, adaptive sampling techniques or network congestions, variable processing loads caused by usage of different machine learning algorithms. As a result application-level resource management that can respond to changes in one of these factors is critical. We propose Pilot-Streaming, a framework for supporting streaming frameworks, applications and their resource management needs on HPC infrastructure. Pilot-Streaming is based on the Pilot-Job concept and enables developers to manage distributed computing and data resources for complex streaming applications. It enables applications to dynamically respond to resource requirements by adding/removing resources at runtime. This capability is critical for balancing complex streaming pipelines. To address the complexity in developing and characterization of streaming applications, we present the Streaming Mini- App framework, which supports different plug-able algorithms for data generation and processing, e.g., for reconstructing light source images using different techniques. We utilize the Mini-App framework to conduct an evaluation of Pilot-Streaming. PingAn Geo-distributed data analysis in a cloud-edge system is emerging as a daily demand. Out of saving time in wide area data transfer, some tasks are dispersed to the edge clusters satisfied data locality. However, execution in the edge clusters is less well, due to limited resource, overload interference and cluster-level unreachable troubles, which obstructs the guarantee on the speed and completion of jobs. Synthesizing the impact of cluster heterogeneity and costly inter-cluster data fetch, we expect to make effective copies across clusters for tasks to provide both success and efficiency of the arriving jobs. To this end, we design PingAn, an online insurance algorithm making redundance across-cluster copies for tasks, promising $(1+\varepsilon)-speed \, o(\frac{1}{\varepsilon^2+\varepsilon})-competitive$ in sum of the job flowtimes. PingAn shares resource among a part of jobs with an adjustable $\varepsilon$ fraction to fit the system load condition and insures for tasks following efficiency-first reliability-aware principle to optimize the effect of copies on jobs’ performance. Trace-driven simulations demonstrate that PingAn can reduce the average job flowtimes by at least $14\%$ more than the state-of-the-art speculation mechanisms. We also build PingAn in Spark on Yarn System to verify its practicality and generality. Experiments show that PingAn can reduce the average job completion time by up to $40\%$ comparing to the default Spark execution. Pinned AUC This report examines the Pinned AUC metric introduced and highlights some of its limitations. Pinned AUC provides a threshold-agnostic measure of unintended bias in a classification model, inspired by the ROC-AUC metric. However, as we highlight in this report, there are ways that the metric can obscure different kinds of unintended biases when the underlying class distributions on which bias is being measured are not carefully controlled. PinSage Recent advancements in deep neural networks for graph-structured data have led to state-of-the-art performance on recommender system benchmarks. However, making these methods practical and scalable to web-scale recommendation tasks with billions of items and hundreds of millions of users remains a challenge. Here we describe a large-scale deep recommendation engine that we developed and deployed at Pinterest. We develop a data-efficient Graph Convolutional Network (GCN) algorithm PinSage, which combines efficient random walks and graph convolutions to generate embeddings of nodes (i.e., items) that incorporate both graph structure as well as node feature information. Compared to prior GCN approaches, we develop a novel method based on highly efficient random walks to structure the convolutions and design a novel training strategy that relies on harder-and-harder training examples to improve robustness and convergence of the model. We also develop an efficient MapReduce model inference algorithm to generate embeddings using a trained model. We deploy PinSage at Pinterest and train it on 7.5 billion examples on a graph with 3 billion nodes representing pins and boards, and 18 billion edges. According to offline metrics, user studies and A/B tests, PinSage generates higher-quality recommendations than comparable deep learning and graph-based alternatives. To our knowledge, this is the largest application of deep graph embeddings to date and paves the way for a new generation of web-scale recommender systems based on graph convolutional architectures. Pipeline Pilot Bayesian Classifiers(PNPBC) The commercial product “Pipeline Pilot” uses a Naive Bayes statistics based approach, which essentially contrasts the active samples of a target with the whole (background) compound database. It does not explicitly consider the samples labelled as incative. Laplacian-adjusted probability estimates for the features lead to individual feature weights which are finally summed up to give the prediction. We re-implemented the “Pipeline Pilot” Naive Bayes statistics in order to use it on a multi-core supercomputer, which allowed us to compare this method on our benchmark dataset. Pipelined SGD(Pipe-SGD) Distributed training of deep nets is an important technique to address some of the present day computing challenges like memory consumption and computational demands. Classical distributed approaches, synchronous or asynchronous, are based on the parameter server architecture, i.e., worker nodes compute gradients which are communicated to the parameter server while updated parameters are returned. Recently, distributed training with AllReduce operations gained popularity as well. While many of those operations seem appealing, little is reported about wall-clock training time improvements. In this paper, we carefully analyze the AllReduce based setup, propose timing models which include network latency, bandwidth, cluster size and compute time, and demonstrate that a pipelined training with a width of two combines the best of both synchronous and asynchronous training. Specifically, for a setup consisting of a four-node GPU cluster we show wall-clock time training improvements of up to 5.4x compared to conventional approaches. Pix2Vex We present a novel approach to 3D object reconstruction from its 2D projections. Our unique, GAN-inspired system employs a novel $C^\infty$ smooth differentiable renderer. Unlike the state-of-the-art, our renderer does not display any discontinuities at occlusions and dis-occlusions, facilitating training without 3D supervision and only minimal 2D supervision. Through domain adaptation and a novel training scheme, our network, the Reconstructive Adversarial Network (RAN), is able to train on different types of images. In contrast, previous work can only train on images of a similar appearance to those rendered by a differentiable renderer. We validate our reconstruction method through three shape classes from ShapeNet, and demonstrate that our method is robust to perturbations in view directions, different lighting conditions, and levels of texture details. Pixel-Adaptive Convolution(PAC) Convolutions are the fundamental building block of CNNs. The fact that their weights are spatially shared is one of the main reasons for their widespread use, but it also is a major limitation, as it makes convolutions content agnostic. We propose a pixel-adaptive convolution (PAC) operation, a simple yet effective modification of standard convolutions, in which the filter weights are multiplied with a spatially-varying kernel that depends on learnable, local pixel features. PAC is a generalization of several popular filtering techniques and thus can be used for a wide range of use cases. Specifically, we demonstrate state-of-the-art performance when PAC is used for deep joint image upsampling. PAC also offers an effective alternative to fully-connected CRF (Full-CRF), called PAC-CRF, which performs competitively, while being considerably faster. In addition, we also demonstrate that PAC can be used as a drop-in replacement for convolution layers in pre-trained networks, resulting in consistent performance improvements. Pixel-Adaptive Convolutional Neural Network Convolutions are the fundamental building block of CNNs. The fact that their weights are spatially shared is one of the main reasons for their widespread use, but it also is a major limitation, as it makes convolutions content agnostic. We propose a pixel-adaptive convolution (PAC) operation, a simple yet effective modification of standard convolutions, in which the filter weights are multiplied with a spatially-varying kernel that depends on learnable, local pixel features. PAC is a generalization of several popular filtering techniques and thus can be used for a wide range of use cases. Specifically, we demonstrate state-of-the-art performance when PAC is used for deep joint image upsampling. PAC also offers an effective alternative to fully-connected CRF (Full-CRF), called PAC-CRF, which performs competitively, while being considerably faster. In addition, we also demonstrate that PAC can be used as a drop-in replacement for convolution layers in pre-trained networks, resulting in consistent performance improvements. Pixel-Anchor Recently, semantic segmentation and general object detection frameworks have been widely adopted by scene text detecting tasks. However, both of them alone have obvious shortcomings in practice. In this paper, we propose a novel end-to-end trainable deep neural network framework, named Pixel-Anchor, which combines semantic segmentation and SSD in one network by feature sharing and anchor-level attention mechanism to detect oriented scene text. To deal with scene text which has large variances in size and aspect ratio, we combine FPN and ASPP operation as our encoder-decoder structure in the semantic segmentation part, and propose a novel Adaptive Predictor Layer in the SSD. Pixel-Anchor detects scene text in a single network forward pass, no complex post-processing other than an efficient fusion Non-Maximum Suppression is involved. We have benchmarked the proposed Pixel-Anchor on the public datasets. Pixel-Anchor outperforms the competing methods in terms of text localization accuracy and run speed, more specifically, on the ICDAR 2015 dataset, the proposed algorithm achieves an F-score of 0.8768 at 10 FPS for 960 x 1728 resolution images. PixelSNAIL Autoregressive generative models consistently achieve the best results in density estimation tasks involving high dimensional data, such as images or audio. They pose density estimation as a sequence modeling task, where a recurrent neural network (RNN) models the conditional distribution over the next element conditioned on all previous elements. In this paradigm, the bottleneck is the extent to which the RNN can model long-range dependencies, and the most successful approaches rely on causal convolutions, which offer better access to earlier parts of the sequence than conventional RNNs. Taking inspiration from recent work in meta reinforcement learning, where dealing with long-range dependencies is also essential, we introduce a new generative model architecture that combines causal convolutions with self attention. In this note, we describe the resulting model and present state-of-the-art log-likelihood results on CIFAR-10 (2.85 bits per dim) and $32 \times 32$ ImageNet (3.80 bits per dim). Our implementation is available at https://…/pixelsnail-public. Plackett-Luce Model Plackett-Luce model is based on the concept of permutation probability. This model has been extended from Bradley Terry model, where the permutation between two objects for pairwise comparison are applied. Plackett-luce model extends the Bradley Terry in comparing multiple objects at a time by permutation probability of a list of objects to be ranked. The key idea is that for the best ranked list of objects, the permutation probability is maximum, decreases with worse ranked list and is minimum at the worst ranked list of objects. PlackettLuce PlaNet Research into how artificial agents can improve their decisions over time is progressing rapidly via reinforcement learning (RL). For this technique, an agent observes a stream of sensory inputs (e.g. camera images) while choosing actions (e.g. motor commands), and sometimes receives a reward for achieving a specified goal. Model-free approaches to RL aim to directly predict good actions from the sensory observations, enabling DeepMind’s DQN to play Atari and other agents to control robots. However, this blackbox approach often requires several weeks of simulated interaction to learn through trial and error, limiting its usefulness in practice. Model-based RL, in contrast, attempts to have agents learn how the world behaves in general. Instead of directly mapping observations to actions, this allows an agent to explicitly plan ahead, to more carefully select actions by ‘imagining’ their long-term outcomes. Model-based approaches have achieved substantial successes, including AlphaGo, which imagines taking sequences of moves on a fictitious board with the known rules of the game. However, to leverage planning in unknown environments (such as controlling a robot given only pixels as input), the agent must learn the rules or dynamics from experience. Because such dynamics models in principle allow for higher efficiency and natural multi-task learning, creating models that are accurate enough for successful planning is a long-standing goal of RL. Plan-Structured Neural Network Query performance prediction, the task of predicting the latency of a query, is one of the most challenging problem in database management systems. Existing approaches rely on features and performance models engineered by human experts, but often fail to capture the complex interactions between query operators and input relations, and generally do not adapt naturally to workload characteristics and patterns in query execution plans. In this paper, we argue that deep learning can be applied to the query performance prediction problem, and we introduce a novel neural network architecture for the task: a plan-structured neural network. Our approach eliminates the need for human-crafted feature selection and automatically discovers complex performance models both at the operator and query plan level. Our novel neural network architecture can match the structure of any optimizer-selected query execution plan and predict its latency with high accuracy. We also propose a number of optimizations that reduce training overhead without sacrificing effectiveness. We evaluated our techniques on various workloads and we demonstrate that our plan-structured neural network can outperform the state-of-the-art in query performance prediction. Plasticity Plasticity is the ability of a learning algorithm to adapt to new data. Platt Scaling In machine learning, Platt scaling or Platt calibration is a way of transforming the outputs of a classification model into a probability distribution over classes. The method was invented by John Platt in the context of support vector machines, replacing an earlier method by Vapnik, but can be applied to other classification models. Platt scaling works by fitting a logistic regression model to a classifier’s scores. Playgol Children learn though play. We introduce the analogous idea of learning programs through play. In this approach, a program induction system (the learner) is given a set of tasks and initial background knowledge. Before solving the tasks, the learner enters an unsupervised playing stage where it creates its own tasks to solve, tries to solve them, and saves any solutions (programs) to the background knowledge. After the playing stage is finished, the learner enters the supervised building stage where it tries to solve the user-supplied tasks and can reuse solutions learnt whilst playing. The idea is that playing allows the learner to discover reusable general programs on its own which can then help solve the user-supplied tasks. We claim that playing can improve learning performance. We show that playing can reduce the textual complexity of target concepts which in turn reduces the sample complexity of a learner. We implement our idea in Playgol, a new inductive logic programming system. We experimentally test our claim on two domains: robot planning and real-world string transformations. Our experimental results suggest that playing can substantially improve learning performance. We think that the idea of playing (or, more verbosely, unsupervised bootstrapping for supervised program induction) is an important contribution to the problem of developing program induction approaches that self-discover BK. Plug and Play Generative Networks(PPGN) Generating high-resolution, photo-realistic images has been a long-standing goal in machine learning. Recently, Nguyen et al. (2016) showed one interesting way to synthesize novel images by performing gradient ascent in the latent space of a generator network to maximize the activations of one or multiple neurons in a separate classifier network. In this paper we extend this method by introducing an additional prior on the latent code, improving both sample quality and sample diversity, leading to a state-of-the-art generative model that produces high quality images at higher resolutions (227×227) than previous generative models, and does so for all 1000 ImageNet categories. In addition, we provide a unified probabilistic interpretation of related activation maximization methods and call the general class of models ‘Plug and Play Generative Networks’. PPGNs are composed of 1) a generator network G that is capable of drawing a wide range of image types and 2) a replaceable ‘condition’ network C that tells the generator what to draw. We demonstrate the generation of images conditioned on a class (when C is an ImageNet or MIT Places classification network) and also conditioned on a caption (when C is an image captioning network). Our method also improves the state of the art of Multifaceted Feature Visualization, which generates the set of synthetic inputs that activate a neuron in order to better understand how deep neural networks operate. Finally, we show that our model performs reasonably well at the task of image inpainting. While image models are used in this paper, the approach is modality-agnostic and can be applied to many types of data. Plug-and-Play Adversarial Domain Adaptation Network(PnP-AdaNet) Deep convolutional networks have demonstrated the state-of-the-art performance on various medical image computing tasks. Leveraging images from different modalities for the same analysis task holds clinical benefits. However, the generalization capability of deep models on test data with different distributions remain as a major challenge. In this paper, we propose the PnPAdaNet (plug-and-play adversarial domain adaptation network) for adapting segmentation networks between different modalities of medical images, e.g., MRI and CT. We propose to tackle the significant domain shift by aligning the feature spaces of source and target domains in an unsupervised manner. Specifically, a domain adaptation module flexibly replaces the early encoder layers of the source network, and the higher layers are shared between domains. With adversarial learning, we build two discriminators whose inputs are respectively multi-level features and predicted segmentation masks. We have validated our domain adaptation method on cardiac structure segmentation in unpaired MRI and CT. The experimental results with comprehensive ablation studies demonstrate the excellent efficacy of our proposed PnP-AdaNet. Moreover, we introduce a novel benchmark on the cardiac dataset for the task of unsupervised cross-modality domain adaptation. We will make our code and database publicly available, aiming to promote future studies on this challenging yet important research topic in medical imaging. Plugin Network In this paper, we propose a novel method to incorporate partial evidence in the inference of deep convolutional neural networks. Contrary to the existing methods, which either iteratively modify the input of the network or exploit external label taxonomy to take partial evidence into account, we add separate network modules to the intermediate layers of a pre-trained convolutional network. The goal of those modules is to incorporate additional signal – information about known labels – into the inference procedure and adjust the predicted outputs accordingly. Since the attached ‘Plugin Networks’, have a simple structure consisting of only fully connected layers, we drastically reduce the computational cost of training and inference. At the same time, the proposed architecture allows to propagate the information about known labels directly to the intermediate layers that are trained to intrinsically model correlations between the labels. Extensive evaluation of the proposed method confirms that our Plugin Networks outperform the state-of-the-art in a variety of tasks, including scene categorization and multi-label image annotation. Plug-In Stochastic Gradient Method Plug-and-play priors (PnP) is a popular framework for regularized signal reconstruction by using advanced denoisers within an iterative algorithm. In this paper, we discuss our recent online variant of PnP that uses only a subset of measurements at every iteration, which makes it scalable to very large datasets. We additionally present novel convergence results for both batch and online PnP algorithms. Plus L take away R(+L-R) The “Plus L take away R” (+L -R) is basically a combination of SFS and SBS. It append features to the feature subset L-times, and afterwards it removes features R-times until we reach our desired size for the feature subset. Variant 1: L > R If L > R, the algorithm starts with an empty feature subset and adds L features to it from the feature space. Then it goes over to the next step 2, where it removes R features from the feature subset, after which it goes back to step 1 to add L features again. Those steps are repeated until the feature subset reaches the desired size k. Variant 2: R > L Else, if R > L, the algorithms starts with the whole feature space* as feature subset. It remove sR features from it before it adds back L features from those features that were just removed. Those steps are repeated until the feature subset reaches the desired size k*. Poincaré Wasserstein Autoencoder This work presents a reformulation of the recently proposed Wasserstein autoencoder framework on a non-Euclidean manifold, the Poincar\’e ball model of the hyperbolic space. By assuming the latent space to be hyperbolic, we can use its intrinsic hierarchy to impose structure on the learned latent space representations. We demonstrate the model in the visual domain to analyze some of its properties and show competitive results on a graph link prediction task. Point and Figure Chart(P&F) Point and figure (P&F) is a charting technique used in technical analysis. Point and figure charting is unique in that it does not plot price against time as all other techniques do. Instead it plots price against changes in direction by plotting a column of Xs as the price rises and a column of Os as the price falls. rpnf Point Attention Transformer(PAT) Geometric deep learning is increasingly important thanks to the popularity of 3D sensors. Inspired by the recent advances in NLP domain, the self-attention transformer is introduced to consume the point clouds. We develop Point Attention Transformers (PATs), using a parameter-efficient Group Shuffle Attention (GSA) to replace the costly Multi-Head Attention. We demonstrate its ability to process size-varying inputs, and prove its permutation equivariance. Besides, prior work uses heuristics dependence on the input data (e.g., Furthest Point Sampling) to hierarchically select subsets of input points. Thereby, we for the first time propose an end-to-end learnable and task-agnostic sampling operation, named Gumbel Subset Sampling (GSS), to select a representative subset of input points. Equipped with Gumbel-Softmax, it produces a ‘soft’ continuous subset in training phase, and a ‘hard’ discrete subset in test phase. By selecting representative subsets in a hierarchical fashion, the networks learn a stronger representation of the input sets with lower computation cost. Experiments on classification and segmentation benchmarks show the effectiveness and efficiency of our methods. Furthermore, we propose a novel application, to process event camera stream as point clouds, and achieve a state-of-the-art performance on DVS128 Gesture Dataset. Point Completion Network(PCN) Shape completion, the problem of estimating the complete geometry of objects from partial observations, lies at the core of many vision and robotics applications. In this work, we propose Point Completion Network (PCN), a novel learning-based approach for shape completion. Unlike existing shape completion methods, PCN directly operates on raw point clouds without any structural assumption (e.g. symmetry) or annotation (e.g. semantic class) about the underlying shape. It features a decoder design that enables the generation of fine-grained completions while maintaining a small number of parameters. Our experiments show that PCN produces dense, complete point clouds with realistic structures in the missing regions on inputs with various levels of incompleteness and noise, including cars from LiDAR scans in the KITTI dataset. Point Convolutional Neural Network(PCNN) This paper presents Point Convolutional Neural Networks (PCNN): a novel framework for applying convolutional neural networks to point clouds. The framework consists of two operators: extension and restriction, mapping point cloud functions to volumetric functions and vise-versa. A point cloud convolution is defined by pull-back of the Euclidean volumetric convolution via an extension-restriction mechanism. The point cloud convolution is computationally efficient, invariant to the order of points in the point cloud, robust to different samplings and varying densities, and translation invariant, that is the same convolution kernel is used at all points. PCNN generalizes image CNNs and allows readily adapting their architectures to the point cloud setting. Evaluation of PCNN on three central point cloud learning benchmarks convincingly outperform competing point cloud learning methods, and the vast majority of methods working with more informative shape representations such as surfaces and/or normals. Point Linking Network(PLN) Object detection is a core problem in computer vision. With the development of deep ConvNets, the performance of object detectors has been dramatically improved. The deep ConvNets based object detectors mainly focus on regressing the coordinates of bounding box, \eg, Faster-R-CNN, YOLO and SSD. Different from these methods that considering bounding box as a whole, we propose a novel object bounding box representation using points and links and implemented using deep ConvNets, termed as Point Linking Network (PLN). Specifically, we regress the corner/center points of bounding-box and their links using a fully convolutional network; then we map the corner points and their links back to multiple bounding boxes; finally an object detection result is obtained by fusing the multiple bounding boxes. PLN is naturally robust to object occlusion and flexible to object scale variation and aspect ratio variation. In the experiments, PLN with the Inception-v2 model achieves state-of-the-art single-model and single-scale results on the PASCAL VOC 2007, the PASCAL VOC 2012 and the COCO detection benchmarks without bells and whistles. The source code will be released. Point Pattern Analysis(PPA) Point pattern analysis (PPA) is the study of the spatial arrangements of points in (usually 2-dimensional) space. A fundamental problem of PPA is inferring whether a given arrangement is merely random or the result of some process. selectspm,stpp Point Process In statistics and probability theory, a point process is a type of random process for which any one realisation consists of a set of isolated points either in time or geographical space, or in even more general spaces. For example, the occurrence of lightning strikes might be considered as a point process in both time and geographical space if each is recorded according to its location in time and space. Point processes are well studied objects in probability theory and the subject of powerful tools in statistics for modeling and analyzing spatial data, which is of interest in such diverse disciplines as forestry, plant ecology, epidemiology, geography, seismology, materials science, astronomy, telecommunications, computational neuroscience, economics and others. Point processes on the real line form an important special case that is particularly amenable to study, because the different points are ordered in a natural way, and the whole point process can be described completely by the (random) intervals between the points. These point processes are frequently used as models for random events in time, such as the arrival of customers in a queue (queueing theory), of impulses in a neuron (computational neuroscience), particles in a Geiger counter, location of radio stations in a telecommunication network or of searches on the world-wide web. mmpp Point Registration Neural Network Point set registration is defined as a process to determine the spatial transformation from the source point set to the target one. Existing methods often iteratively search for the optimal geometric transformation to register a given pair of point sets, driven by minimizing a predefined alignment loss function. In contrast, the proposed point registration neural network (PR-Net) actively learns the registration pattern as a parametric function from a training dataset, consequently predict the desired geometric transformation to align a pair of point sets. PR-Net can transfer the learned knowledge (i.e. registration pattern) from registering training pairs to testing ones without additional iterative optimization. Specifically, in this paper, we develop novel techniques to learn shape descriptors from point sets that help formulate a clear correlation between source and target point sets. With the defined correlation, PR-Net tends to predict the transformation so that the source and target point sets can be statistically aligned, which in turn leads to an optimal spatial geometric registration. PR-Net achieves robust and superior performance for non-rigid registration of point sets, even in presence of Gaussian noise, outliers, and missing points, but requires much less time for registering large number of pairs. More importantly, for a new pair of point sets, PR-Net is able to directly predict the desired transformation using the learned model without repetitive iterative optimization routine. Our code is available at https://…/PR-Net. Pointer Network(Ptr-Net) We introduce a new neural architecture to learn the conditional probability of an output sequence with elements that are discrete tokens corresponding to positions in an input sequence. Such problems cannot be trivially addressed by existent approaches such as sequence-to-sequence and Neural Turing Machines, because the number of target classes in each step of the output depends on the length of the input, which is variable. Problems such as sorting variable sized sequences, and various combinatorial optimization problems belong to this class. Our model solves the problem of variable size output dictionaries using a recently proposed mechanism of neural attention. It differs from the previous attention attempts in that, instead of using attention to blend hidden units of an encoder to a context vector at each decoder step, it uses attention as a pointer to select a member of the input sequence as the output. We call this architecture a Pointer Net (Ptr-Net). We show Ptr-Nets can be used to learn approximate solutions to three challenging geometric problems — finding planar convex hulls, computing Delaunay triangulations, and the planar Travelling Salesman Problem — using training examples alone. Ptr-Nets not only improve over sequence-to-sequence with input attention, but also allow us to generalize to variable size output dictionaries. We show that the learnt models generalize beyond the maximum lengths they were trained on. We hope our results on these tasks will encourage a broader exploration of neural learning for discrete problems. Pointer Networks Pointer networks are a variation of the sequence-to-sequence model with attention. Instead of translating one sequence into another, they yield a succession of pointers to the elements of the input series. The most basic use of this is ordering the elements of a variable-length sequence. Basic seq2seq is an LSTM encoder coupled with an LSTM decoder. It’s most often heard of in the context of machine translation: given a sentence in one language, the encoder turns it into a fixed-size representation. Decoder transforms this into a sentence again, possibly of different length than the source. For example, ‘como estas?’ – two words – would be translated to ‘how are you?’ – three words. The model gives better results when augmented with attention. Practically it means that instead of processing the input from start to finish, the decoder can look back and forth over input. Specifically, it has access to encoder states from each step, not just the last one. Consider how it may help with Spanish, in which adjectives go before nouns: ‘neural network’ becomes ‘red neuronal’. In technical terms, attention (at least this particular kind, content-based attention) boils down to dot products and weighted averages. In short, a weighted average of encoder states becomes the decoder state. Attention is just the distribution of weights. Point-Wise Convolutional Neural Network Deep learning with 3D data such as reconstructed point clouds and CAD models has received great research interests recently. However, the capability of using point clouds with convolutional neural network has been so far not fully explored. In this technical report, we present a convolutional neural network for semantic segmentation and object recognition with 3D point clouds. At the core of our network is point-wise convolution, a convolution operator that can be applied at each point of a point cloud. Our fully convolutional network design, while being simple to implement, can yield competitive accuracy in both semantic segmentation and object recognition task. POIReviewQ Many services that perform information retrieval for Points of Interest (POI) utilize a Lucene-based setup with spatial filtering. While this type of system is easy to implement it does not make use of semantics but relies on direct word matches between a query and reviews leading to a loss in both precision and recall. To study the challenging task of semantically enriching POIs from unstructured data in order to support open-domain search and question answering (QA), we introduce a new dataset POIReviewQA. It consists of 20k questions (e.g.’is this restaurant dog friendly?’) for 1022 Yelp business types. For each question we sampled 10 reviews, and annotated each sentence in the reviews whether it answers the question and what the corresponding answer is. To test a system’s ability to understand the text we adopt an information retrieval evaluation by ranking all the review sentences for a question based on the likelihood that they answer this question. We build a Lucene-based baseline model, which achieves 77.0% AUC and 48.8% MAP. A sentence embedding-based model achieves 79.2% AUC and 41.8% MAP, indicating that the dataset presents a challenging problem for future research by the GIR community. The result technology can help exploit the thematic content of web documents and social media for characterisation of locations. Poisson Autoregressive Models With Exogenous Covariates(PoARX) This paper introduces multivariate Poisson autoregressive models with exogenous covariates (PoARX) for modelling multivariate time series of counts. We obtain conditions for the PoARX process to be stationary and ergodic before proposing a computationally efficient procedure for estimation of parameters by the method of inference functions (IFM) and obtaining asymptotic normality of these estimators. Lastly, we demonstrate an application to count data for the number of people entering and exiting a building, and show how the different aspects of the model combine to produce a strong predictive model. We conclude by suggesting some further areas of application and by listing directions for future work. Poisson Factorization Machine(PFM) Newsroom in online ecosystem is difficult to untangle. With prevalence of social media, interactions between journalists and individuals become visible, but lack of understanding to inner processing of information feedback loop in public sphere leave most journalists baffled. Can we provide an organized view to characterize journalist behaviors on individual level to know better of the ecosystem? To this end, I propose Poisson Factorization Machine (PFM), a Bayesian analogue to matrix factorization that assumes Poisson distribution for generative process. The model generalizes recent studies on Poisson Matrix Factorization to account temporal interaction which involves tensor-like structure, and label information. Two inference procedures are designed, one based on batch variational EM and another stochastic variational inference scheme that efficiently scales with data size. An important novelty in this note is that I show how to stack layers of PFM to introduce a deep architecture. This work discusses some potential results applying the model and explains how such latent factors may be useful for analyzing latent behaviors for data exploration. Poisson PCA In this paper, we study the problem of computing a Principal Component Analysis of data affected by Poisson noise. We assume samples are drawn from independent Poisson distributions. We want to estimate principle components of a fixed transformation of the latent Poisson means. Our motivating example is microbiome data, though the methods apply to many other situations. We develop a semiparametric approach to correct the bias of variance estimators, both for untransformed and transformed (with particular attention to log-transformation) Poisson means. Furthermore, we incorporate methods for correcting different exposure or sequencing depth in the data. In addition to identifying the principal components, we also address the non-trivial problem of computing the principal scores in this semiparametric framework. Most previous approaches tend to take a more parametric line. For example the Poisson-log-normal (PLN) model, approach. We compare our method with the PLN approach and find that our method is better at identifying the main principal components of the latent log-transformed Poisson means, and as a further major advantage, takes far less time to compute. Comparing methods on real data, we see that our method also appears to be more robust to outliers than the parametric method. Poisson Regression In statistics, Poisson regression is a form of regression analysis used to model count data and contingency tables. Poisson regression assumes the response variable Y has a Poisson distribution, and assumes the logarithm of its expected value can be modeled by a linear combination of unknown parameters. A Poisson regression model is sometimes known as a log-linear model, especially when used to model contingency tables. Polar Convolution ➘ “Polar Envelope” Polar Envelope The Moreau envelope is one of the key convexity-preserving functional operations in convex analysis, and it is central to the development and analysis of many approaches for solving convex optimization problems. This paper develops the theory for a parallel convolution operation, called the polar envelope, specialized to gauge functions. We show that many important properties of the Moreau envelope and the proximal map are mirrored by the polar envelope and its corresponding proximal map. These properties include smoothness of the envelope function, uniqueness and continuity of the proximal map, a role in duality and in the construction of algorithms for gauge optimization. We thus establish a suite of tools with which to build algorithms for this family of optimization problems. Polar Transformer Network(PTN) Convolutional neural networks (CNNs) are equivariant with respect to translation; a translation in the input causes a translation in the output. Attempts to generalize equivariance have concentrated on rotations. In this paper, we combine the idea of the spatial transformer, and the canonical coordinate representations of groups (polar transform) to realize a network that is invariant to translation, and equivariant to rotation and scale. A conventional CNN is used to predict the origin of a polar transform. The polar transform is performed in a differentiable way, similar to the Spatial Transformer Networks, and the resulting polar representation is fed into a second CNN. The model is trained end-to-end with a classification loss. We apply the method on variations of MNIST, obtained by perturbing it with clutter, translation, rotation, and scaling. We achieve state of the art performance in the rotated MNIST, with fewer parameters and faster training time than previous methods, and we outperform all tested methods in the SIM2MNIST dataset, which we introduce. Polarity Detection Policy Gradient Search(PGS) Monte Carlo Tree Search (MCTS) algorithms perform simulation-based search to improve policies online. During search, the simulation policy is adapted to explore the most promising lines of play. MCTS has been used by state-of-the-art programs for many problems, however a disadvantage to MCTS is that it estimates the values of states with Monte Carlo averages, stored in a search tree; this does not scale to games with very high branching factors. We propose an alternative simulation-based search method, Policy Gradient Search (PGS), which adapts a neural network simulation policy online via policy gradient updates, avoiding the need for a search tree. In Hex, PGS achieves comparable performance to MCTS, and an agent trained using Expert Iteration with PGS was able defeat MoHex 2.0, the strongest open-source Hex agent, in 9×9 Hex. Policy Learning based on Completely Behavior Cloning(PLCBC) Direct policy search is one of the most important algorithm of reinforcement learning. However, learning from scratch needs a large amount of experience data and can be easily prone to poor local optima. In addition to that, a partially trained policy tends to perform dangerous action to agent and environment. In order to overcome these challenges, this paper proposed a policy initialization algorithm called Policy Learning based on Completely Behavior Cloning (PLCBC). PLCBC first transforms the Model Predictive Control (MPC) controller into a piecewise affine (PWA) function using multi-parametric programming, and uses a neural network to express this function. By this way, PLCBC can completely clone the MPC controller without any performance loss, and is totally training-free. The experiments show that this initialization strategy can help agent learn at the high reward state region, and converge faster and better. Policy Optimization with Model-based Explorations(POME) Model-free reinforcement learning methods such as the Proximal Policy Optimization algorithm (PPO) have successfully applied in complex decision-making problems such as Atari games. However, these methods suffer from high variances and high sample complexity. On the other hand, model-based reinforcement learning methods that learn the transition dynamics are more sample efficient, but they often suffer from the bias of the transition estimation. How to make use of both model-based and model-free learning is a central problem in reinforcement learning. In this paper, we present a new technique to address the trade-off between exploration and exploitation, which regards the difference between model-free and model-based estimations as a measure of exploration value. We apply this new technique to the PPO algorithm and arrive at a new policy optimization method, named Policy Optimization with Model-based Explorations (POME). POME uses two components to predict the actions’ target values: a model-free one estimated by Monte-Carlo sampling and a model-based one which learns a transition model and predicts the value of the next state. POME adds the error of these two target estimations as the additional exploration value for each state-action pair, i.e, encourages the algorithm to explore the states with larger target errors which are hard to estimate. We compare POME with PPO on Atari 2600 games, and it shows that POME outperforms PPO on 33 games out of 49 games. Policy-Belief-Iteration(P-BIT) In situations where explicit communication is limited, a human collaborator is typically able to learn to: (i) infer the meaning behind their partner’s actions and (ii) balance between taking actions that are exploitative given their current understanding of the state vs. those that can convey private information about the state to their partner. The first component of this learning process has been well-studied in multi-agent systems, whereas the second — which is equally crucial for a successful collaboration — has not. In this work, we complete the learning process and introduce our novel algorithm, Policy-Belief-Iteration (‘P-BIT’), which mimics both components mentioned above. A belief module models the other agent’s private information by observing their actions, whilst a policy module makes use of the inferred private information to return a distribution over actions. They are mutually reinforced with an EM-like algorithm. We use a novel auxiliary reward to encourage information exchange by actions. We evaluate our approach on the non-competitive bidding problem from contract bridge and show that by self-play agents are able to effectively collaborate with implicit communication, and P-BIT outperforms several meaningful baselines that have been considered. POLO We present POLO — a C++ library for large-scale parallel optimization research that emphasizes ease-of-use, flexibility and efficiency in algorithm design. It uses multiple inheritance and template programming to decompose algorithms into essential policies and facilitate code reuse. With its clear separation between algorithm and execution policies, it provides researchers with a simple and powerful platform for prototyping ideas, evaluating them on different parallel computing architectures and hardware platforms, and generating compact and efficient production code. A C-API is included for customization and data loading in high-level languages. POLO enables users to move seamlessly from serial to multi-threaded shared-memory and multi-node distributed-memory executors. We demonstrate how POLO allows users to implement state-of-the-art asynchronous parallel optimization algorithms in just a few lines of code and report experiment results from shared and distributed-memory computing architectures. We provide both POLO and POLO.jl, a wrapper around POLO written in the Julia language, at https://…/pologrp under the permissive MIT license. Polyaxon Deep Learning library for TensorFlow for building end to end models and experiments. Polyaxon was built with the following goals: · Modularity: The creation of a computation graph based on modular and understandable modules, with the possibility to reuse and share the module in subsequent usage. · Usability: Training a model should be easy enough, and should enable quick experimentations. · Configurable: Models and experiments could be created using a YAML/Json file, but also in python files. · Extensibility: The modularity and the extensive documentation of the code makes it easy to build and extend the set of provided modules. · Performance: Polyaxon is based on internal tensorflow code base and leverage the builtin distributed learning. · Data Preprocessing: Polyaxon provides many pipelines and data processor to support different data inputs. Github ➘ “Tensorflow” Polyglot Persistence Today, most large companies are using a variety of different data storage technologies for different kinds of data. A lot of companies still use relational databases to store some data, but the persistence needs of applications are evolving from predominantly relational to a mixture of data sources. Polyglot persistence is commonly used to define this hybrid approach. Increasingly, architects are approaching the data storage problem by first figuring out how they want to manipulate the data, and then choosing the appropriate technology to fit their needs. What polyglot persistence boils down to is choice – the ability to leverage multiple data storages, depending on your use cases. http://…/polyglot-persistence Polyglot Processing … So here we are. Above observations motivate me to suggest a new term that aims to capture the shift of focus toward processing of the data: polyglot processing – which essentially means using the right processing engine for a given task. To the best of my knowledge no one has suggested or attempted to define this term yet, besides a somewhat related mentioning in the realm of the Apache Bigtop project, however in a much narrower context…. Polyglot Programming Beyond being something incredibly difficult to say many times in a row, polyglot programming is the use of different programming languages, frameworks, services and databases for developing individual applications. Polygonal Symbolic Data Analysis ➘ “Symbolic Data Analysis” psda Polygon-RNN++ Manually labeling datasets with object masks is extremely time consuming. In this work, we follow the idea of Polygon-RNN to produce polygonal annotations of objects interactively using humans-in-the-loop. We introduce several important improvements to the model: 1) we design a new CNN encoder architecture, 2) show how to effectively train the model with Reinforcement Learning, and 3) significantly increase the output resolution using a Graph Neural Network, allowing the model to accurately annotate high-resolution objects in images. Extensive evaluation on the Cityscapes dataset shows that our model, which we refer to as Polygon-RNN++, significantly outperforms the original model in both automatic (10% absolute and 16% relative improvement in mean IoU) and interactive modes (requiring 50% fewer clicks by annotators). We further analyze the cross-domain scenario in which our model is trained on one dataset, and used out of the box on datasets from varying domains. The results show that Polygon-RNN++ exhibits powerful generalization capabilities, achieving significant improvements over existing pixel-wise methods. Using simple online fine-tuning we further achieve a high reduction in annotation time for new datasets, moving a step closer towards an interactive annotation tool to be used in practice. PolyNeuron Automated deep neural network architecture design has received a significant amount of recent attention. However, this attention has not been equally shared by one of the fundamental building blocks of a deep neural network, the neurons. In this study, we propose PolyNeuron, a novel automatic neuron discovery approach based on learned polyharmonic spline activations. More specifically, PolyNeuron revolves around learning polyharmonic splines, characterized by a set of control points, that represent the activation functions of the neurons in a deep neural network. A relaxed variant of PolyNeuron, which we term PolyNeuron-R, loosens the constraints imposed by PolyNeuron to reduce the computational complexity for discovering the neuron activation functions in an automated manner. Experiments show both PolyNeuron and PolyNeuron-R lead to networks that have improved or comparable performance on multiple network architectures (LeNet-5 and ResNet-20) using different datasets (MNIST and CIFAR10). As such, automatic neuron discovery approaches such as PolyNeuron is a worthy direction to explore. Polypus In this paper we propose a new parallel architecture based on Big Data technologies for real-time sentiment analysis on microblogging posts. Polypus is a modular framework that provides the following functionalities: (1) massive text extraction from Twitter, (2) distributed non-relational storage optimized for time range queries, (3) memory-based intermodule buffering, (4) real-time sentiment classification, (5) near real-time keyword sentiment aggregation in time series, (6) a HTTP API to interact with the Polypus cluster and (7) a web interface to analyze results visually. The whole architecture is self-deployable and based on Docker containers. Polytomous Discrimination Index(PDI) Polytomous Discrimination Index (PDI), described in the paper: Van Calster B (2012) . Jialiang Li (2017) . mcca Pomegranate We present pomegranate, an open source machine learning package for probabilistic modeling in Python. Probabilistic modeling encompasses a wide range of methods that explicitly describe uncertainty using probability distributions. Three widely used probabilistic models implemented in pomegranate are general mixture models, hidden Markov models, and Bayesian networks. A primary focus of pomegranate is to abstract away the complexities of training models from their definition. This allows users to focus on specifying the correct model for their application instead of being limited by their understanding of the underlying algorithms. An aspect of this focus involves the collection of additive sufficient statistics from data sets as a strategy for training models. This approach trivially enables many useful learning strategies, such as out-of-core learning, minibatch learning, and semi-supervised learning, without requiring the user to consider how to partition data or modify the algorithms to handle these tasks themselves. pomegranate is written in Cython to speed up calculations and releases the global interpreter lock to allow for built-in multithreaded parallelism, making it competitive with—or outperform—other implementations of similar algorithms. This paper presents an overview of the design choices in pomegranate, and how they have enabled complex features to be supported by simple code. Pontogammarus Maeoticus Swarm Optimization(PMSO) Nowadays, metaheuristic optimization algorithms are used to find the global optima in difficult search spaces. Pontogammarus Maeoticus Swarm Optimization (PMSO) is a metaheuristic algorithm imitating aquatic nature and foraging behavior. Pontogammarus Maeoticus, also called Gammarus in short, is a tiny creature found mostly in coast of Caspian Sea in Iran. In this algorithm, global optima is modeled as sea edge (coast) to which Gammarus creatures are willing to move in order to rest from sea waves and forage in sand. Sea waves satisfy exploration and foraging models exploitation. The strength of sea wave is determined according to distance of Gammarus from sea edge. The angles of waves applied on several particles are set randomly helping algorithm not be stuck in local bests. Meanwhile, the neighborhood of particles change adaptively resulting in more efficient progress in searching. The proposed algorithm, although is applicable on any optimization problem, is experimented for partially shaded solar PV array. Experiments on CEC05 benchmarks, as well as solar PV array, show the effectiveness of this optimization algorithm. Pontryagin Maximum Principle Pontryagin’s maximum (or minimum) principle is used in optimal control theory to find the best possible control for taking a dynamical system from one state to another, especially in the presence of constraints for the state or input controls. It was formulated in 1956 by the Russian mathematician Lev Pontryagin and his students. It has as a special case the Euler-Lagrange equation of the calculus of variations. The principle states, informally, that the control Hamiltonian must take an extreme value over controls in the set of all permissible controls. Whether the extreme value is maximum or minimum depends both on the problem and on the sign convention used for defining the Hamiltonian. The normal convention, which is the one used in Hamiltonian, leads to a maximum hence maximum principle but the sign convention used in this article makes the extreme value a minimum. Pool Adjacent Violators Algorithm(PAVA) Pool Adjacent Violators Algorithm (PAVA) is a linear time (and linear memory) algorithm for linear ordering isotonic regression. ➚ “Isotonic Regression” http://…/deleeuw_hornik_mair_R_09.pdf PoPPy PoPPy is a Point Process toolbox based on PyTorch, which achieves flexible designing and efficient learning of point process models. It can be used for interpretable sequential data modeling and analysis, e.g., Granger causality analysis of multi-variate point processes, point process-based simulation and prediction of event sequences. In practice, the key points of point process-based sequential data modeling include: 1) How to design intensity functions to describe the mechanism behind observed data? 2) How to learn the proposed intensity functions from observed data? The goal of PoPPy is providing a user-friendly solution to the key points above and achieving large-scale point process-based sequential data analysis, simulation and prediction. POPQORN The vulnerability to adversarial attacks has been a critical issue for deep neural networks. Addressing this issue requires a reliable way to evaluate the robustness of a network. Recently, several methods have been developed to compute $\textit{robustness quantification}$ for neural networks, namely, certified lower bounds of the minimum adversarial perturbation. Such methods, however, were devised for feed-forward networks, e.g. multi-layer perceptron or convolutional networks. It remains an open problem to quantify robustness for recurrent networks, especially LSTM and GRU. For such networks, there exist additional challenges in computing the robustness quantification, such as handling the inputs at multiple steps and the interaction between gates and states. In this work, we propose $\textit{POPQORN}$ ($\textbf{P}$ropagated-$\textbf{o}$ut$\textbf{p}$ut $\textbf{Q}$uantified R$\textbf{o}$bustness for $\textbf{RN}$Ns), a general algorithm to quantify robustness of RNNs, including vanilla RNNs, LSTMs, and GRUs. We demonstrate its effectiveness on different network architectures and show that the robustness quantification on individual steps can lead to new insights. Population Based Augmentation(PBA) A key challenge in leveraging data augmentation for neural network training is choosing an effective augmentation policy from a large search space of candidate operations. Properly chosen augmentation policies can lead to significant generalization improvements; however, state-of-the-art approaches such as AutoAugment are computationally infeasible to run for the ordinary user. In this paper, we introduce a new data augmentation algorithm, Population Based Augmentation (PBA), which generates nonstationary augmentation policy schedules instead of a fixed augmentation policy. We show that PBA can match the performance of AutoAugment on CIFAR-10, CIFAR-100, and SVHN, with three orders of magnitude less overall compute. On CIFAR-10 we achieve a mean test error of 1.46%, which is a slight improvement upon the current state-of-the-art. The code for PBA is open source and is available at https://…/pba. Population Based Training(PBT) Neural networks dominate the modern machine learning landscape, but their training and success still suffer from sensitivity to empirical choices of hyperparameters such as model architecture, loss function, and optimisation algorithm. In this work we present \emph{Population Based Training (PBT)}, a simple asynchronous optimisation algorithm which effectively utilises a fixed computational budget to jointly optimise a population of models and their hyperparameters to maximise performance. Importantly, PBT discovers a schedule of hyperparameter settings rather than following the generally sub-optimal strategy of trying to find a single fixed set to use for the whole course of training. With just a small modification to a typical distributed hyperparameter training framework, our method allows robust and reliable training of models. We demonstrate the effectiveness of PBT on deep reinforcement learning problems, showing faster wall-clock convergence and higher final performance of agents by optimising over a suite of hyperparameters. In addition, we show the same method can be applied to supervised learning for machine translation, where PBT is used to maximise the BLEU score directly, and also to training of Generative Adversarial Networks to maximise the Inception score of generated images. In all cases PBT results in the automatic discovery of hyperparameter schedules and model selection which results in stable training and better final performance. Population-Attributable Fraction(PAF) The contribution of a risk factor to a disease or a death is quantified using the population attributable fraction (PAF). PAF is the proportional reduction in population disease or mortality that would occur if exposure to a risk factor were reduced to an alternative ideal exposure scenario (eg. no tobacco use). Many diseases are caused by multiple risk factors, and individual risk factors may interact in their impact on overall risk of disease. As a result, PAFs for individual risk factors often overlap and add up to more than 100 percent. Causal inference with multi-state models – estimands and estimators of the population-attributable fraction Porcellio Scaber Algorithm(PSA) Bio-inspired algorithms have received a significant amount of attention in both academic and engineering societies. In this paper, based on the observation of two major survival rules of a species of woodlice, i.e., porcellio scaber, we design and propose an algorithm called the porcellio scaber algorithm (PSA) for solving optimization problems, including differentiable and non-differential ones as well as the case with local optimums. Numerical results based on benchmark problems are presented to validate the efficacy of PSA. Porcupine Neural Network(PNN) Neural networks have been used prominently in several machine learning and statistics applications. In general, the underlying optimization of neural networks is non-convex which makes their performance analysis challenging. In this paper, we take a novel approach to this problem by asking whether one can constrain neural network weights to make its optimization landscape have good theoretical properties while at the same time, be a good approximation for the unconstrained one. For two-layer neural networks, we provide affirmative answers to these questions by introducing Porcupine Neural Networks (PNNs) whose weight vectors are constrained to lie over a finite set of lines. We show that most local optima of PNN optimizations are global while we have a characterization of regions where bad local optimizers may exist. Moreover, our theoretical and empirical results suggest that an unconstrained neural network can be approximated using a polynomially-large PNN. Portmanteau Test A portmanteau test is a type of statistical hypothesis test in which the null hypothesis is well specified, but the alternative hypothesis is more loosely specified. Tests constructed in this context can have the property of being at least moderately powerful against a wide range of departures from the null hypothesis. Thus, in applied statistics, a portmanteau test provides a reasonable way of proceeding as a general check of a model’s match to a dataset where there are many different ways in which the model may depart from the underlying data generating process. Use of such tests avoids having to be very specific about the particular type of departure being tested. PoseNet The Convolution Neural Network (CNN) has demonstrated the unique advantage in audio, image and text learning; recently it has also challenged Recurrent Neural Networks (RNNs) with long short-term memory cells (LSTM) in sequence-to-sequence learning, since the computations involved in CNN are easily parallelizable whereas those involved in RNN are mostly sequential, leading to a performance bottleneck. However, unlike RNN, the native CNN lacks the history sensitivity required for sequence transformation; therefore enhancing the sequential order awareness, or position-sensitivity, becomes the key to make CNN the general deep learning model. In this work we introduce an extended CNN model with strengthen position-sensitivity, called PoseNet. A notable feature of PoseNet is the asymmetric treatment of position information in the encoder and the decoder. Experiments shows that PoseNet allows us to improve the accuracy of CNN based sequence-to-sequence learning significantly, achieving around 33-36 BLEU scores on the WMT 2014 English-to-German translation task, and around 44-46 BLEU scores on the English-to-French translation task. Position, Sequence and Set Similarity Measure In this paper the author presents a new similarity measure for strings of characters based on S3M which he expands to take into account not only the characters set and sequence but also their position. After demonstrating the superiority of this new measure and discussing the need for a self adaptive spell checker, this work is further developed into an adaptive spell checker that produces a cluster with a defined number of words for each presented misspelled word. The accuracy of this solution is measured comparing its results against the results of the most widely used spell checker. Positional Cartesian Genetic Programming(Positional CGP) Cartesian Genetic Programming (CGP) has many modifications across a variety of implementations, such as recursive connections and node weights. Alternative genetic operators have also been proposed for CGP, but have not been fully studied. In this work, we present a new form of genetic programming based on a floating point representation. In this new form of CGP, called Positional CGP, node positions are evolved. This allows for the evaluation of many different genetic operators while allowing for previous CGP improvements like recurrency. Using nine benchmark problems from three different classes, we evaluate the optimal parameters for CGP and PCGP, including novel genetic operators. Positive Unknown Learning ➘ “Positive Unlabeled Learning” Positive Unlabeled Learning(PU Learning) PU learning, in which a binary classifier is learned in a semi-supervised way from only positive and unlabeled sample points. In PU learning, two sets of examples are assumed to be available for training: the positive set P {\displaystyle P} P and a mixed set U {\displaystyle U} U, which is assumed to contain both positive and negative samples, but without these being labeled as such. This contrasts with other forms of semisupervised learning, where it is assumed that a labeled set containing examples of both classes is available in addition to unlabeled samples. A variety of techniques exist to adapt supervised classifiers to the PU learning setting, including variants of the EM algorithm. PU learning has been successfully applied to text, time series, and bioinformatics tasks. Learning from positive and unlabeled data or PU learning is the setting where a learner only has access to positive examples and unlabeled data. The assumption is that the unlabeled data can contain both positive and negative examples. This setting has attracted increasing interest within the machine learning literature as this type of data naturally arises in applications such as medical diagnosis and knowledge base completion. Learning From Positive and Unlabeled Data: A Survey PU Learning – Positive/unknown class machine learning approaches Possibilistic C-Means(PCM) PCM partitions an m-dimensional dataset Formula into several clusters to describe an underlying structure within the data. A possibilistic partition is defined as a Formula matrix Formula, where Formula is the membership value of object Formula towards the ith cluster … The Possibilistic C-Means Algorithm: Insights and Recommendations A Possibilistic Fuzzy c-Means Clustering Algorithm PCM and APCM Revisited: An Uncertainty Perspective Posterior Linearisation Gaussian process classification using posterior linearisation Posterior Predictive Distribution In statistics, and especially Bayesian statistics, the posterior predictive distribution is the distribution of unobserved observations (prediction) conditional on the observed data. Described as the distribution that a new i.i.d. data point \tilde{x} would have, given a set of N existing i.i.d. observations \mathbf{X} = . In a frequentist context, this might be derived by computing the maximum likelihood estimate (or some other estimate) of the parameter(s) given the observed data, and then plugging them into the distribution function of the new observations. Posterior Probability In Bayesian statistics, the posterior probability of a random event or an uncertain proposition is the conditional probability that is assigned after the relevant evidence or background is taken into account. Similarly, the posterior probability distribution is the probability distribution of an unknown quantity, treated as a random variable, conditional on the evidence obtained from an experiment or survey. “Posterior”, in this context, means after taking into account the relevant evidence related to the particular case being examined. Posterior Sampling for Pure Exploration(PSPE) In several realistic situations, an interactive learning agent can practice and refine its strategy before going on to be evaluated. For instance, consider a student preparing for a series of tests. She would typically take a few practice tests to know which areas she needs to improve upon. Based of the scores she obtains in these practice tests, she would formulate a strategy for maximizing her scores in the actual tests. We treat this scenario in the context of an agent exploring a fixed-horizon episodic Markov Decision Process (MDP), where the agent can practice on the MDP for some number of episodes (not necessarily known in advance) before starting to incur regret for its actions. During practice, the agent’s goal must be to maximize the probability of following an optimal policy. This is akin to the problem of Pure Exploration (PE). We extend the PE problem of Multi Armed Bandits (MAB) to MDPs and propose a Bayesian algorithm called Posterior Sampling for Pure Exploration (PSPE), which is similar to its bandit counterpart. We show that the Bayesian simple regret converges at an optimal exponential rate when using PSPE. When the agent starts being evaluated, its goal would be to minimize the cumulative regret incurred. This is akin to the problem of Reinforcement Learning (RL). The agent uses the Posterior Sampling for Reinforcement Learning algorithm (PSRL) initialized with the posteriors of the practice phase. We hypothesize that this PSPE + PSRL combination is an optimal strategy for minimizing regret in RL problems with an initial practice phase. We show empirical results which prove that having a lower simple regret at the end of the practice phase results in having lower cumulative regret during evaluation. Posterior-Based Proposal(PBP) Markov chain Monte Carlo (MCMC) is widely used for Bayesian inference in models of complex systems. Performance, however, is often unsatisfactory in models with many latent variables due to so-called poor mixing, necessitating development of application specific implementations. This limits rigorous use of real-world data to inform development and testing of models in applications ranging from statistical genetics to finance. This paper introduces ‘posterior-based proposals’ (PBPs), a new type of MCMC update applicable to a huge class of statistical models (whose conditional dependence structures are represented by directed acyclic graphs). PBPs generates large joint updates in parameter and latent variable space, whilst retaining good acceptance rates (typically 33 percent). Evaluation against standard approaches (Gibbs or Metropolis-Hastings updates) shows performance improvements by a factor of 2 to over 100 for widely varying model types: an individual-based model for disease diagnostic test data, a financial stochastic volatility model and mixed and generalised linear mixed models used in statistical genetics. PBPs are competitive with similarly targeted state-of-the-art approaches such as Hamiltonian MCMC and particle MCMC, and importantly work under scenarios where these approaches do not. PBPs therefore represent an additional general purpose technique that can be usefully applied in a wide variety of contexts. Potential Confounding Factor(PCF) Potts Model Potts model in Potts, R. B. (1952) PottsUtils Power Iteration In mathematics, power iteration (also known as the power method) is an eigenvalue algorithm: given a diagonalizable matrix A, the algorithm will produce a number lambda, which is the greatest (in absolute value) eigenvalue of A, and a nonzero vector v, the corresponding eigenvector of lambda. The algorithm is also known as the Von Mises iteration. Power iteration is a very simple algorithm, but it may converge slowly. It does not compute a matrix decomposition, and hence it can be used when A is a very large sparse matrix. Scale Invariant Power Iteration Power Linear Unit(PoLU) In this paper, we introduce ‘Power Linear Unit’ (PoLU) which increases the nonlinearity capacity of a neural network and thus helps improving its performance. PoLU adopts several advantages of previously proposed activation functions. First, the output of PoLU for positive inputs is designed to be identity to avoid the gradient vanishing problem. Second, PoLU has a non-zero output for negative inputs such that the output mean of the units is close to zero, hence reducing the bias shift effect. Thirdly, there is a saturation on the negative part of PoLU, which makes it more noise-robust for negative inputs. Furthermore, we prove that PoLU is able to map more portions of every layer’s input to the same space by using the power function and thus increases the number of response regions of the neural network. We use image classification for comparing our proposed activation function with others. In the experiments, MNIST, CIFAR-10, CIFAR-100, Street View House Numbers (SVHN) and ImageNet are used as benchmark datasets. The neural networks we implemented include widely-used ELU-Network, ResNet-50, and VGG16, plus a couple of shallow networks. Experimental results show that our proposed activation function outperforms other state-of-the-art models with most networks. Power Normal Distribution(PN) … PowerNormal PR Product In this paper, we analyze the inner product of weight vector and input vector in neural networks from the perspective of vector orthogonal decomposition and prove that the local direction gradient of weight vector decreases as the angle between them gets closer to 0 or $\pi$. We propose the PR Product, a substitute for the inner product, which makes the local direction gradient of weight vector independent of the angle and consistently larger than the one in the conventional inner product while keeping the forward propagation identical. As the basic operation in neural networks, the PR Product can be applied into many existing deep learning modules, so we develop the PR Product version of the fully connected layer, convolutional layer, and LSTM layer. In static image classification, the experiments on CIFAR10 and CIFAR100 datasets demonstrate that the PR Product can robustly enhance the ability of various state-of-the-art classification networks. On the task of image captioning, even without any bells and whistles, our PR Product version of captioning model can compete or outperform the state-of-the-art models on MS COCO dataset. Praaline This paper presents Praaline, an open-source software system for managing, annotating, analysing and visualising speech corpora. Researchers working with speech corpora are often faced with multiple tools and formats, and they need to work with ever-increasing amounts of data in a collaborative way. Praaline integrates and extends existing time-proven tools for spoken corpora analysis (Praat, Sonic Visualiser and a bridge to the R statistical package) in a modular system, facilitating automation and reuse. Users are exposed to an integrated, user-friendly interface from which to access multiple tools. Corpus metadata and annotations may be stored in a database, locally or remotely, and users can define the metadata and annotation structure. Users may run a customisable cascade of analysis steps, based on plug-ins and scripts, and update the database with the results. The corpus database may be queried, to produce aggregated data-sets. Praaline is extensible using Python or C++ plug-ins, while Praat and R scripts may be executed against the corpus data. A series of visualisations, editors and plug-ins are provided. Praaline is free software, released under the GPL license. Practical Scoring Rules In situations where forecasters are scored on the quality of their probabilistic predictions, it is standard to use proper’ scoring rules to perform such scoring. These rules are desirable because they give forecasters no incentive to lie about their probabilistic beliefs. However, in the real world context of creating a training program designed to help people improve calibration through prediction practice, there are a variety of desirable traits for scoring rules that go beyond properness. These potentially may have a substantial impact on the user experience, usability of the program, or efficiency of learning. The space of proper scoring rules is too broad, in the sense that most proper scoring rules lack these other desirable properties. On the other hand, the space of proper scoring rules is potentially also too narrow, in the sense that we may sometimes choose to give up properness when it conflicts with other properties that are even more desirable from the point of view of usability and effective training. We introduce a class of scoring rules that we call Practical’ scoring rules, designed to be intuitive to users in the context of right’ vs. wrong’ probabilistic predictions. We also introduce two specific scoring rules for prediction intervals, the Distance’ and Order of magnitude’ rules. These rules are designed to satisfy a variety of properties that, based on user testing, we believe are desirable for applied calibration training. Prais-Winsten Estimation In econometrics, Prais-Winsten estimation is a procedure meant to take care of the serial correlation of type AR(1) in a linear model. Conceived by Sigbert Prais and Christopher Winsten in 1954, it is a modification of Cochrane-Orcutt estimation in the sense that it does not lose the first observation and leads to more efficiency as a result. prais Preattentive Processing Pre-attentive processing is the unconscious accumulation of information from the environment. All available information is pre-attentively processed. Then, the brain filters and processes what is important. Information that has the highest salience (a stimulus that stands out the most) or relevance to what a person is thinking about is selected for further and more complete analysis by conscious (attentive) processing. Understanding how pre-attentive processing works is useful in advertising, in education, and for prediction of cognitive ability. Precision In pattern recognition and information retrieval with binary classification, precision (also called positive predictive value) is the fraction of retrieved instances that are relevant, while recall (also known as sensitivity) is the fraction of relevant instances that are retrieved. Both precision and recall are therefore based on an understanding and measure of relevance. Suppose a program for recognizing dogs in scenes from a video identifies 7 dogs in a scene containing 9 dogs and some cats. If 4 of the identifications are correct, but 3 are actually cats, the program’s precision is 4/7 while its recall is 4/9. When a search engine returns 30 pages only 20 of which were relevant while failing to return 40 additional relevant pages, its precision is 20/30 = 2/3 while its recall is 20/60 = 1/3. In statistics, if the null hypothesis is that all and only the relevant items are retrieved, absence of type I and type II errors corresponds respectively to maximum precision (no false positive) and maximum recall (no false negative). The above pattern recognition example contained 7 – 4 = 3 type I errors and 9 – 4 = 5 type II errors. Precision can be seen as a measure of exactness or quality, whereas recall is a measure of completeness or quantity. In simple terms, high precision means that an algorithm returned substantially more relevant results than irrelevant, while high recall means that an algorithm returned most of the relevant results. Precision and Recall In pattern recognition and information retrieval with binary classification, precision (also called positive predictive value) is the fraction of retrieved instances that are relevant, while recall (also known as sensitivity) is the fraction of relevant instances that are retrieved. Both precision and recall are therefore based on an understanding and measure of relevance. Suppose a program for recognizing dogs in scenes from a video identifies 7 dogs in a scene containing 9 dogs and some cats. If 4 of the identifications are correct, but 3 are actually cats, the program’s precision is 4/7 while its recall is 4/9. When a search engine returns 30 pages only 20 of which were relevant while failing to return 40 additional relevant pages, its precision is 20/30 = 2/3 while its recall is 20/60 = 1/3. In statistics, if the null hypothesis is that all and only the relevant items are retrieved, absence of type I and type II errors corresponds respectively to maximum precision (no false positive) and maximum recall (no false negative). The above pattern recognition example contained 7 – 4 = 3 type I errors and 9 – 4 = 5 type II errors. Precision can be seen as a measure of exactness or quality, whereas recall is a measure of completeness or quantity. In simple terms, high precision means that an algorithm returned substantially more relevant results than irrelevant, while high recall means that an algorithm returned most of the relevant results. http://…/recallprecision.pdf Precognitive Stream We consider interactive algorithms in the pool-based setting, and in the stream-based setting. Interactive algorithms observe suggested elements (representing actions or queries), and interactively select some of them and receive responses. Pool-based algorithms can select elements at any order, while stream-based algorithms observe elements in sequence, and can only select elements immediately after observing them. We further consider an intermediate setting, which we term precognitive stream, in which the algorithm knows in advance the identity of all the elements in the sequence, but can select them only in the order of their appearance. For all settings, we assume that the suggested elements are generated independently from some source distribution, and ask what is the stream size required for emulating a pool algorithm with a given pool size, in the stream-based setting and in the precognitive stream setting. We provide algorithms and matching lower bounds for general pool algorithms, and for utility-based pool algorithms. We further derive nearly matching upper and lower bounds on the gap between the two settings for the special case of active learning for binary classification. Preconditioned Stochastic Gradient Descent(PSGD) This paper studies the performance of preconditioned stochastic gradient descent (PSGD), which can be regarded as an enhance stochastic Newton method with the ability to handle gradient noise and non-convexity at the same time. We have improved the implementation of PSGD, unrevealed its relationship to equilibrated stochastic gradient descent (ESGD) and batch normalization, and provided a software package (https://…/psgd_tf ) implemented in Tensorflow to compare variations of PSGD and stochastic gradient descent (SGD) on a wide range of benchmark problems with commonly used neural network models, e.g., convolutional and recurrent neural networks. Comparison results clearly demonstrate the advantages of PSGD in terms of convergence speeds and generalization performances. Predictability, Computability, and Stability(PCS) We propose the predictability, computability, and stability (PCS) framework to extract reproducible knowledge from data that can guide scientific hypothesis generation and experimental design. The PCS framework builds on key ideas in machine learning, using predictability as a reality check and evaluating computational considerations in data collection, data storage, and algorithm design. It augments PC with an overarching stability principle, which largely expands traditional statistical uncertainty considerations. In particular, stability assesses how results vary with respect to choices (or perturbations) made across the data science life cycle, including problem formulation, pre-processing, modeling (data and algorithm perturbations), and exploratory data analysis (EDA) before and after modeling. Furthermore, we develop PCS inference to investigate the stability of data results and identify when models are consistent with relatively simple phenomena. We compare PCS inference with existing methods, such as selective inference, in high-dimensional sparse linear model simulations to demonstrate that our methods consistently outperform others in terms of ROC curves over a wide range of simulation settings. Finally, we propose a PCS documentation based on Rmarkdown, iPython, or Jupyter Notebook, with publicly available, reproducible codes and narratives to back up human choices made throughout an analysis. The PCS workflow and documentation are demonstrated in a genomics case study available on Zenodo. Predicted Change of F Measure During active learning, an effective stopping method allows users to limit the number of annotations, which is cost effective. In this paper, a new stopping method called Predicted Change of F Measure will be introduced that attempts to provide the users an estimate of how much performance of the model is changing at each iteration. This stopping method can be applied with any base learner. This method is useful for reducing the data annotation bottleneck encountered when building text classification systems. Predicted Relevance Model(PRM) Evaluation of search engines relies on assessments of search results for selected test queries, from which we would ideally like to draw conclusions in terms of relevance of the results for general (e.g., future, unknown) users. In practice however, most evaluation scenarios only allow us to conclusively determine the relevance towards the particular assessor that provided the judgments. A factor that cannot be ignored when extending conclusions made from assessors towards users, is the possible disagreement on relevance, assuming that a single gold truth label does not exist. This paper presents and analyzes the Predicted Relevance Model (PRM), which allows predicting a particular result’s relevance for a random user, based on an observed assessment and knowledge on the average disagreement between assessors. With the PRM, existing evaluation metrics designed to measure binary assessor relevance, can be transformed into more robust and effectively graded measures that evaluate relevance towards a random user. It also leads to a principled way of quantifying multiple graded or categorical relevance levels for use as gains in established graded relevance measures, such as normalized discounted cumulative gain (nDCG), which nowadays often use heuristic and data-independent gain values. Given a set of test topics with graded relevance judgments, the PRM allows evaluating systems on different scenarios, such as their capability of retrieving top results, or how well they are able to filter out non-relevant ones. Its use in actual evaluation scenarios is illustrated on several information retrieval test collections. Predicted Variables(PVars) We present Predicted Variables (PVars), an approach to making machine learning (ML) a first class citizen in programming languages. There is a growing divide in approaches to building systems: using human experts (e.g. programming) on the one hand, and using behavior learned from data (e.g. ML) on the other hand. PVars aim to make ML in programming as easy as `if’ statements and with that hybridize ML with programming. We leverage the existing concept of variables and create a new type, a predicted variable. PVars are akin to native variables with one important distinction: PVars determine their value using ML when evaluated. We describe PVars and their interface, how they can be used in programming, and demonstrate the feasibility of our approach on three algorithmic problems: binary search, Quicksort, and caches. We show experimentally that PVars are able to improve over the commonly used heuristics and lead to a better performance than the original algorithms. As opposed to previous work applying ML to algorithmic problems, PVars have the advantage that they can be used within the existing frameworks and do not require the existing domain knowledge to be replaced. PVars allow for a seamless integration of ML into existing systems and algorithms. Our PVars implementation currently relies on standard Reinforcement Learning (RL) methods. To learn faster, PVars use the heuristic function, which they are replacing, as an initial function. We show that PVars quickly pick up the behavior of the initial function and then improve performance beyond that without ever performing substantially worse — allowing for a safe deployment in critical applications. Predicting Application Resilience Using Machine Learning(PARIS) Extreme-scale scientific applications can be more vulnerable to soft errors (transient faults) as high-performance computing systems increase in scale. The common practice to evaluate the resilience to faults of an application is random fault injection, a method that can be highly time consuming. While resilience prediction modeling has been recently proposed to predict application resilience in a faster way than fault injection, it can only predict a single class of fault manifestation (SDC) and there is no evidence demonstrating that it can work on previously unseen programs, which greatly limits its re-usability. We present PARIS, a resilience prediction method that addresses the problems of existing prediction methods using machine learning. Using carefully-selected features and a machine learning model, our method is able to make resilience predictions of three classes of fault manifestations (success, SDC, and interruption) as opposed to one class like in current resilience prediction modeling. The generality of our approach allows us to make prediction on new applications, i.e., previously unseen applications, providing large applicability to our model. Our evaluation on 125 programs shows that PARIS provides high prediction accuracy, 82% and 77% on average for predicting the rate of success and interruption, respectively, while the state-of-the-art resilience prediction model cannot predict them. When predicting the rate of SDC, PARIS provides much better accuracy than the state-of-the-art (38% vs. -273%). PARIS is much faster (up to 450x speedup) than the traditional method (random fault injection). Prediction Advantage(PA) We introduce the Prediction Advantage (PA), a novel performance measure for prediction functions under any loss function (e.g., classification or regression). The PA is defined as the performance advantage relative to the Bayesian risk restricted to knowing only the distribution of the labels. We derive the PA for well-known loss functions, including 0/1 loss, cross-entropy loss, absolute loss, and squared loss. In the latter case, the PA is identical to the well-known R-squared measure, widely used in statistics. The use of the PA ensures meaningful quantification of prediction performance, which is not guaranteed, for example, when dealing with noisy imbalanced classification problems. We argue that among several known alternative performance measures, PA is the best (and only) quantity ensuring meaningfulness for all noise and imbalance levels. Prediction Difference Analysis This article presents the prediction difference analysis method for visualizing the response of a deep neural network to a specific input. When classifying images, the method highlights areas in a given input image that provide evidence for or against a certain class. It overcomes several shortcoming of previous methods and provides great additional insight into the decision making process of classifiers. Making neural network decisions interpretable through visualization is important both to improve models and to accelerate the adoption of black-box classifiers in application areas such as medicine. We illustrate the method in experiments on natural images (ImageNet data), as well as medical images (MRI brain scans). Prediction Factory In this paper, we present a data science automation system called Prediction Factory. The system uses several key automation algorithms to enable data scientists to rapidly develop predictive models and share them with domain experts. To assess the system’s impact, we implemented 3 different interfaces for creating predictive modeling projects: baseline automation, full automation, and optional automation. With a dataset of online grocery shopper behaviors, we divided data scientists among the interfaces to specify prediction problems, learn and evaluate models, and write a report for domain experts to judge whether or not to fund to continue working on. In total, 22 data scientists created 94 reports that were judged 296 times by 26 experts. In a head-to-head trial, reports generated utilizing full data science automation interface reports were funded 57.5% of the time, while the ones that used baseline automation were only funded 42.5% of the time. An intermediate interface which supports optional automation generated reports were funded 58.6% more often compared to the baseline. Full automation and optional automation reports were funded about equally when put head-to-head. These results demonstrate that Prediction Factory has implemented a critical amount of automation to augment the role of data scientists and improve business outcomes. Prediction Interval In statistical inference, specifically predictive inference, a prediction interval is an estimate of an interval in which future observations will fall, with a certain probability, given what has already been observed. Prediction intervals are often used in regression analysis. Prediction intervals are used in both frequentist statistics and Bayesian statistics: a prediction interval bears the same relationship to a future observation that a frequentist confidence interval or Bayesian credible interval bears to an unobservable population parameter: prediction intervals predict the distribution of individual future points, whereas confidence intervals and credible intervals of parameters predict the distribution of estimates of the true population mean or other quantity of interest that cannot be observed. Prediction Interval, the wider sister of Confidence Interval Prediction Specific Calibration Error(PSCE) The reliability of a machine learning model’s confidence in its predictions is critical for highrisk applications. Calibration-the idea that a model’s predicted probabilities of outcomes reflect true probabilities of those outcomes-formalizes this notion. While analyzing the calibration of deep neural networks, we’ve identified core problems with the way calibration is currently measured. We design the Thresholded Adaptive Calibration Error (TACE) metric to resolve these pathologies and show that it outperforms other metrics, especially in settings where predictions beyond the maximum prediction that is chosen as the output class matter. There are many cases where what a practitioner cares about is the calibration of a specific prediction, and so we introduce a dynamic programming based Prediction Specific Calibration Error (PSCE) that smoothly considers the calibration of nearby predictions to give an estimate of the calibration error of a specific prediction. Prediction Tournament Paradox In a prediction tournament, contestants ‘forecast’ by asserting a numerical probability for each of (say) 100 future real-world events. The scoring system is designed so that (regardless of the unknown true probabilities) more accurate forecasters will likely score better. This is true for one-on-one comparisons between contestants. But consider a realistic-size tournament with many contestants, with a range of accuracies. It may seem self-evident that the winner will likely be one of the most accurate forecasters. But, in the setting where the range extends to very accurate forecasters, simulations show this is mathematically false, within a somewhat plausible model. Even outside that setting the winner is less likely than intuition suggests to be one of the handful of best forecasters. Though implicit in recent technical papers, this paradox has apparently not been explicitly pointed out before, though is easily explained. It perhaps has implications for the ongoing IARPA-sponsored research programs involving forecasting. Prediction with Unpredictable Feature Evolution(PUFE) Feature space can change or evolve when learning with streaming data. Several recent works have studied feature evolvable learning. They usually assume that features would not vanish or appear in an arbitrary way. For example, when knowing the battery lifespan, old features and new features represented by data gathered by sensors will disappear and emerge at the same time along with the sensors exchanging simultaneously. However, different sensors would have different lifespans, and thus the feature evolution can be unpredictable. In this paper, we propose a novel paradigm: Prediction with Unpredictable Feature Evolution (PUFE). We first complete the unpredictable overlapping period into an organized matrix and give a theoretical bound on the least number of observed entries. Then we learn the mapping from the completed matrix to recover the data from old feature space when observing the data from new feature space. With predictions on the recovered data, our model can make use of the advantage of old feature space and is always comparable with any combinations of the predictions on the current instance. Experiments on the synthetic and real datasets validate the effectiveness of our method. PredictionIO PredictionIO is an open source machine learning server for software developers to create predictive features, such as personalization, recommendation and content discovery. Prediction-Performance-Plot ➘ “Receiver Operating Characteristic” ROCR Prediction-Tracking-Segmentation We introduce a prediction driven method for visual tracking and segmentation in videos. Instead of solely relying on matching with appearance cues for tracking, we build a predictive model which guides finding more accurate tracking regions efficiently. With the proposed prediction mechanism, we improve the model robustness against distractions and occlusions during tracking. We demonstrate significant improvements over state-of-the-art methods not only on visual tracking tasks (VOT 2016 and VOT 2018) but also on video segmentation datasets (DAVIS 2016 and DAVIS 2017). Predictive Analysis Library(PAL) The Predictive Analysis Library (PAL) defines functions that can be called from within SQLScript procedures to perform analytic algorithms. This release of PAL includes classic and universal predictive analysis algorithms in eight data-mining categories: · Clustering · Classification · Association · Time Series · Preprocessing · Statistics · Social Network Analysis · Miscellaneous Predictive Analytics / Predictive Analysis(PA) Predictive analytics encompasses a variety of statistical techniques from modeling, machine learning, and data mining that analyze current and historical facts to make predictions about future, or otherwise unknown, events. In business, predictive models exploit patterns found in historical and transactional data to identify risks and opportunities. Models capture relationships among many factors to allow assessment of risk or potential associated with a particular set of conditions, guiding decision making for candidate transactions. Predictive analytics is used in actuarial science, marketing, financial services, insurance, telecommunications, retail, travel, healthcare, pharmaceuticals and other fields. One of the most well known applications is credit scoring, which is used throughout financial services. Scoring models process a customer’s credit history, loan application, customer data, etc., in order to rank-order individuals by their likelihood of making future credit payments on time. A well-known example is FICO Predictive CLR In this paper we explore different regression models based on Clusterwise Linear Regression (CLR). CLR aims to find the partition of the data into $k$ clusters, such that linear regressions fitted to each of the clusters minimize overall mean squared error on the whole data. The main obstacle preventing to use found regression models for prediction on the unseen test points is the absence of a reasonable way to obtain CLR cluster labels when the values of target variable are unknown. In this paper we propose two novel approaches on how to solve this problem. The first approach, predictive CLR builds a separate classification model to predict test CLR labels. The second approach, constrained CLR utilizes a set of user-specified constraints that enforce certain points to go to the same clusters. Assuming the constraint values are known for the test points, they can be directly used to assign CLR labels. We evaluate these two approaches on three UCI ML datasets as well as on a large corpus of health insurance claims. We show that both of the proposed algorithms significantly improve over the known CLR-based regression methods. Moreover, predictive CLR consistently outperforms linear regression and random forest, and shows comparable performance to support vector regression on UCI ML datasets. The constrained CLR approach achieves the best performance on the health insurance dataset, while enjoying only $\approx 20$ times increased computational time over linear regression. Predictive Clustering We show how to convert any clustering into a prediction set. This has the effect of converting the clustering into a (possibly overlapping) union of spheres or ellipsoids. The tuning parameters can be chosen to minimize the size of the prediction set. When applied to k-means clustering, this method solves several problems: the method tells us how to choose k, how to merge clusters and how to replace the Voronoi partition with more natural shapes. We show that the same reasoning can be applied to other clustering methods. Predictive Ensemble Learning(PEL) Deep learning based approaches have achieved significant progresses in different tasks like classification, detection, segmentation, and so on. Ensemble learning is widely known to further improve performance by combining multiple complementary models. It is easy to apply ensemble learning for classification tasks, for example, based on averaging, voting, or other methods. However, for other tasks (like object detection) where the outputs are varying in quantity and unable to be simply compared, the ensemble of multiple models become difficult. In this paper, we propose a new method called Predictive Ensemble Learning (PEL), based on powerful predictive ability of deep neural networks, to directly predict the best performing model among a pool of base models for each test example, thus transforming ensemble learning to a traditional classification task. Taking scene text detection as the application, where no suitable ensemble learning strategy exists, PEL can significantly improve the performance, compared to either individual state-of-the-art models, or the fusion of multiple models by non-maximum suppression. Experimental results show the possibility and potential of PEL in predicting different models’ performance based only on a query example, which can be extended for ensemble learning in many other complex tasks. Predictive Maintenance(PdM) Predictive maintenance (PdM) techniques are designed to help determine the condition of in-service equipment in order to predict when maintenance should be performed. This approach promises cost savings over routine or time-based preventive maintenance, because tasks are performed only when warranted. The main promise of Predicted Maintenance is to allow convenient scheduling of corrective maintenance, and to prevent unexpected equipment failures. The key is ‘the right information in the right time’. By knowing which equipment needs maintenance, maintenance work can be better planned (spare parts, people, etc.) and what would have been ‘unplanned stops’ are transformed to shorter and fewer ‘planned stops’, thus increasing plant availability. Other potential advantages include increased equipment lifetime, increased plant safety, fewer accidents with negative impact on environment, and optimized spare parts handling. ➚ “Condition Monitoring” Predictive Model Markup Language(PMML) The Predictive Model Markup Language (PMML) is an XML-based file format developed by the Data Mining Group to provide a way for applications to describe and exchange models produced by data mining and machine learning algorithms. It supports common models such as logistic regression and feedforward neural networks. Since PMML is an XML-based standard, the specification comes in the form of an XML schema. http://…/pmml-revolution.html Predictive Neural Network(PNN) Recurrent neural networks are a powerful means to cope with time series. We show that already linearly activated recurrent neural networks can approximate any time-dependent function f(t) given by a number of function values. The approximation can effectively be learned by simply solving a linear equation system; no backpropagation or similar methods are needed. Furthermore the network size can be reduced by taking only the most relevant components of the network. Thus, in contrast to others, our approach not only learns network weights but also the network architecture. The networks have interesting properties: In the stationary case they end up in ellipse trajectories in the long run, and they allow the prediction of further values and compact representations of functions. We demonstrate this by several experiments, among them multiple superimposed oscillators (MSO) and robotic soccer. Predictive neural networks outperform the previous state-of-the-art for the MSO task with a minimal number of units. Predictive Personalization Predictive personalization is defined as the ability to predict customer behavior, needs or wants – and tailor offers and communications very precisely. Social data is one source of providing this predictive analysis, particularly social data that is structured. Predictive personalization is a much more recent means of personalization and can be used well to augment current personalization offerings. Predictive Probabilistic Merging of Policies(PPMP) Deep Reinforcement Learning has enabled the control of increasingly complex and high-dimensional problems. However, the need of vast amounts of data before reasonable performance is attained prevents its widespread application. We employ binary corrective feedback as a general and intuitive manner to incorporate human intuition and domain knowledge in model-free machine learning. The uncertainty in the policy and the corrective feedback is combined directly in the action space as probabilistic conditional exploration. As a result, the greatest part of the otherwise ignorant learning process can be avoided. We demonstrate the proposed method, Predictive Probabilistic Merging of Policies (PPMP), in combination with DDPG. In experiments on continuous control problems of the OpenAI Gym, we achieve drastic improvements in sample efficiency, final performance, and robustness to erroneous feedback, both for human and synthetic feedback. Additionally, we show solutions beyond the demonstrated knowledge. Predictive Quality and Maintenance(PQM) PQM solutions, which harness data gathered by both the Internet of Things (IoT) and data from traditional legacy systems, focus on detecting and addressing quality and maintenance issues before they turn into serious problems-for example, problems that can cause unplanned downtime. Predictive State Recurrent Neural Networks(PSRNN) We present a new model, called Predictive State Recurrent Neural Networks (PSRNNs), for filtering and prediction in dynamical systems. PSRNNs draw on insights from both Recurrent Neural Networks (RNNs) and Predictive State Representations (PSRs), and inherit advantages from both types of models. Like many successful RNN architectures, PSRNNs use (potentially deeply composed) bilinear transfer functions to combine information from multiple sources, so that one source can act as a gate for another. These bilinear functions arise naturally from the connection to state updates in Bayes filters like PSRs, in which observations can be viewed as gating belief states. We show that PSRNNs can be learned effectively by combining backpropogation through time (BPTT) with an initialization based on a statistically consistent learning algorithm for PSRs called two-stage regression (2SR). We also show that PSRNNs can be can be factorized using tensor decomposition, reducing model size and suggesting interesting theoretical connections to existing multiplicative architectures such as LSTMs. We applied PSRNNs to 4 datasets, and showed that we outperform several popular alternative approaches to modeling dynamical systems in all cases. Predictive State Representation(PSR) In computer science, a predictive state representation (PSR) is a way to model a state of controlled dynamical system from a history of actions taken and resulting observations. PSR captures the state of a system as a vector of predictions for future tests (experiments) that can be done on the system. A test is a sequence of action-observation pairs and its prediction is the probability of the test’s observation- sequence happening if the test’s action-sequence were to be executed on the system. One of the advantage of using PSR is that the predictions are directly related to observable quantities. This is in contrast to other models of dynamical systems, such as partially observable Markov decision processes (POMDPs) where the state of the system is represented as a probability distribution over unobserved nominal states. Predictive, Descriptive, Relevant Framework(PDR) Machine-learning models have demonstrated great success in learning complex patterns that enable them to make predictions about unobserved data. In addition to using models for prediction, the ability to interpret what a model has learned is receiving an increasing amount of attention. However, this increased focus has led to considerable confusion about the notion of interpretability. In particular, it is unclear how the wide array of proposed interpretation methods are related, and what common concepts can be used to evaluate them. We aim to address these concerns by defining interpretability in the context of machine learning and introducing the Predictive, Descriptive, Relevant (PDR) framework for discussing interpretations. The PDR framework provides three overarching desiderata for evaluation: predictive accuracy, descriptive accuracy and relevancy, with relevancy judged relative to a human audience. Moreover, to help manage the deluge of interpretation methods, we introduce a categorization of existing techniques into model-based and post-hoc categories, with sub-groups including sparsity, modularity and simulatability. To demonstrate how practitioners can use the PDR framework to evaluate and understand interpretations, we provide numerous real-world examples. These examples highlight the often under-appreciated role played by human audiences in discussions of interpretability. Finally, based on our framework, we discuss limitations of existing methods and directions for future work. We hope that this work will provide a common vocabulary that will make it easier for both practitioners and researchers to discuss and choose from the full range of interpretation methods. Predictor-Corrector Policy Optimization(PicCoLO) We present a predictor-corrector framework, called PicCoLO, that can transform a first-order model-free reinforcement or imitation learning algorithm into a new hybrid method that leverages predictive models to accelerate policy learning. The new ‘PicCoLOed’ algorithm optimizes a policy by recursively repeating two steps: In the Prediction Step, the learner uses a model to predict the unseen future gradient and then applies the predicted estimate to update the policy; in the Correction Step, the learner runs the updated policy in the environment, receives the true gradient, and then corrects the policy using the gradient error. Unlike previous algorithms, PicCoLO corrects for the mistakes of using imperfect predicted gradients and hence does not suffer from model bias. The development of PicCoLO is made possible by a novel reduction from predictable online learning to adversarial online learning, which provides a systematic way to modify existing first-order algorithms to achieve the optimal regret with respect to predictable information. We show, in both theory and simulation, that the convergence rate of several first-order model-free algorithms can be improved by PicCoLO. Predictron One of the key challenges of artificial intelligence is to learn models that are effective in the context of planning. In this document we introduce the predictron architecture. The predictron consists of a fully abstract model, represented by a Markov reward process, that can be rolled forward multiple ‘imagined’ planning steps. Each forward pass of the predictron accumulates internal rewards and values over multiple planning depths. The predictron is trained end-to-end so as to make these accumulated values accurately approximate the true value function. We applied the predictron to procedurally generated random mazes and a simulator for the game of pool. The predictron yielded significantly more accurate predictions than conventional deep neural network architectures. PredRNN++ We present PredRNN++, an improved recurrent network for video predictive learning. In pursuit of a greater spatiotemporal modeling capability, our approach increases the transition depth between adjacent states by leveraging a novel recurrent unit, which is named Causal LSTM for re-organizing the spatial and temporal memories in a cascaded mechanism. However, there is still a dilemma in video predictive learning: increasingly deep-in-time models have been designed for capturing complex variations, while introducing more difficulties in the gradient back-propagation. To alleviate this undesirable effect, we propose a Gradient Highway architecture, which provides alternative shorter routes for gradient flows from outputs back to long-range inputs. This architecture works seamlessly with causal LSTMs, enabling PredRNN++ to capture short-term and long-term dependencies adaptively. We assess our model on both synthetic and real video datasets, showing its ability to ease the vanishing gradient problem and yield state-of-the-art prediction results even in a difficult objects occlusion scenario. Preference Mapping Preference Mapping allows to build maps which are useful in a variety of domains. A preference map is a decision support tool in analyses where a configuration of objects has been obtained from a first analysis (PCA, MCA, MDS), and where a table with complementary data describing the objects is available (attributes or preference data). There are two types of preference mapping methods: 1.External preference mapping or PREFMAP 2.Internal preference mapping SensoMineR Preference Neural Network(PNN) This paper proposes a preference neural network (PNN) to address the problem of indifference preferences orders with new activation function. PNN also solves the Multi-label ranking problem, where labels may have indifference preference orders or subgroups are equally ranked. PNN follows a multi-layer feedforward architecture with fully connected neurons. Each neuron contains a novel smooth stairstep activation function based on the number of preference orders. PNN inputs represent data features and output neurons represent label indexes. The proposed PNN is evaluated using new preference mining dataset that contains repeated label values which have not experimented before. PNN outperforms five previously proposed methods for strict label ranking in terms of accurate results with high computational efficiency. Preference-Informed Individual Fairness(PIIF) As algorithms are increasingly used to make important decisions pertaining to individuals, algorithmic discrimination is becoming a prominent concern. The seminal work of Dwork et al. [ITCS 2012] introduced the notion of individual fairness (IF): given a task-specific similarity metric, every pair of similar individuals should receive similar outcomes. In this work, we study fairness when individuals have diverse preferences over the possible outcomes. We show that in such settings, individual fairness can be too restrictive: requiring individual fairness can lead to less-preferred outcomes for the very individuals that IF aims to protect (e.g. a protected minority group). We introduce and study a new notion of preference-informed individual fairness (PIIF), a relaxation of individual fairness that allows for outcomes that deviate from IF, provided the deviations are in line with individuals’ preferences. We show that PIIF can allow for solutions that are considerably more beneficial to individuals than the best IF solution. We further show how to efficiently optimize any convex objective over the outcomes subject to PIIF, for a rich class of individual preferences. Motivated by fairness concerns in targeted advertising, we apply this new fairness notion to the multiple-task setting introduced by Dwork and Ilvento [ITCS 2019]. We show that, in this setting too, PIIF can allow for considerably more beneficial solutions, and we extend our efficient optimization algorithm to this setting. Preferential Attachment(PA) A preferential attachment process is any of a class of processes in which some quantity, typically some form of wealth or credit, is distributed among a number of individuals or objects according to how much they already have, so that those who are already wealthy receive more than those who are not. ‘Preferential attachment’ is only the most recent of many names that have been given to such processes. They are also referred to under the names ‘Yule process’, ‘cumulative advantage’, ‘the rich get richer’, and, less correctly, the ‘Matthew effect’. They are also related to Gibrat’s law. The principal reason for scientific interest in preferential attachment is that it can, under suitable circumstances, generate power law distributions. Pre-Partitioned Generalized Matrix-Vector Multiplication(PMV) How can we analyze enormous networks including the Web and social networks which have hundreds of billions of nodes and edges? Network analyses have been conducted by various graph mining methods including shortest path computation, PageRank, connected component computation, random walk with restart, etc. These graph mining methods can be expressed as generalized matrix-vector multiplication which consists of few operations inspired by typical matrix-vector multiplication. Recently, several graph processing systems based on matrix-vector multiplication or their own primitives have been proposed to deal with large graphs; however, they all have failed on Web-scale graphs due to insufficient memory space or the lack of consideration for I/O costs. In this paper, we propose PMV (Pre-partitioned generalized Matrix-Vector multiplication), a scalable distributed graph mining method based on generalized matrix-vector multiplication on distributed systems. PMV significantly decreases the communication cost, which is the main bottleneck of distributed systems, by partitioning the input graph in advance and judiciously applying execution strategies based on the density of the pre-partitioned sub-matrices. Experiments show that PMV succeeds in processing up to 16x larger graphs than existing distributed memory-based graph mining methods, and requires 9x less time than previous disk-based graph mining methods by reducing I/O costs significantly. Prescriptive Analytics Prescriptive analytics not only anticipates what will happen and when it will happen, but also why it will happen. Further, prescriptive analytics suggests decision options on how to take advantage of a future opportunity or mitigate a future risk and shows the implication of each decision option. Prescriptive analytics can continually take in new data to re-predict and re-prescribe, thus automatically improving prediction accuracy and prescribing better decision options. Prescriptive analytics ingests hybrid data, a combination of structured (numbers, categories) and unstructured data (videos, images, sounds, texts), and business rules to predict what lies ahead and to prescribe how to take advantage of this predicted future without compromising other priorities. PRESISTANT Data pre-processing is one of the most time consuming and relevant steps in a data analysis process (e.g., classification task). A given data pre-processing operator (e.g., transformation) can have positive, negative or zero impact on the final result of the analysis. Expert users have the required knowledge to find the right pre-processing operators. However, when it comes to non-experts, they are overwhelmed by the amount of pre-processing operators and it is challenging for them to find operators that would positively impact their analysis (e.g., increase the predictive accuracy of a classifier). Existing solutions either assume that users have expert knowledge, or they recommend pre-processing operators that are only ‘syntactically’ applicable to a dataset, without taking into account their impact on the final analysis. In this work, we aim at providing assistance to non-expert users by recommending data pre-processing operators that are ranked according to their impact on the final analysis. We developed a tool PRESISTANT, that uses Random Forests to learn the impact of pre-processing operators on the performance (e.g., predictive accuracy) of 5 different classification algorithms, such as J48, Naive Bayes, PART, Logistic Regression, and Nearest Neighbor. Extensive evaluations on the recommendations provided by our tool, show that PRESISTANT can effectively help non-experts in order to achieve improved results in their analytical tasks. PRESS Nonlinear models are frequently applied to determine the optimal supply natural gas to a given residential unit based on economical and technical factors, or used to fit biochemical and pharmaceutical assay nonlinear data. In this article we propose PRESS statistics and prediction coefficients for a class of nonlinear beta regression models, namely $P^2$ statistics. We aim at using both prediction coefficients and goodness-of-fit measures as a scheme of model select criteria. In this sense, we introduce for beta regression models under nonlinearity the use of the model selection criteria based on robust pseudo-$R^2$ statistics. Monte Carlo simulation results on the finite sample behavior of both prediction-based model selection criteria $P^2$ and the pseudo-$R^2$ statistics are provided. Three applications for real data are presented. The linear application relates to the distribution of natural gas for home usage in S\~ao Paulo, Brazil. Faced with the economic risk of too overestimate or to underestimate the distribution of gas has been necessary to construct prediction limits and to select the best predicted and fitted model to construct best prediction limits it is the aim of the first application. Additionally, the two nonlinear applications presented also highlight the importance of considering both goodness-of-predictive and goodness-of-fit of the competitive models. PRESTO In query optimisation accurate cardinality estimation is essential for finding optimal query plans. It is especially challenging for RDF due to the lack of explicit schema and the excessive occurrence of joins in RDF queries. Existing approaches typically collect statistics based on the counts of triples and estimate the cardinality of a query as the product of its join components, where errors can accumulate even when the estimation of each component is accurate. As opposed to existing methods, we propose PRESTO, a cardinality estimation method that is based on the counts of subgraphs instead of triples and uses a probabilistic method to estimate cardinalities of RDF queries as a whole. PRESTO avoids some major issues of existing approaches and is able to accurately estimate arbitrary queries under a bound memory constraint. We evaluate PRESTO with YAGO and show that PRESTO is more accurate for both simple and complex queries. Pre-Synaptic Pool Modification(PSPM) A central question in neuroscience is how to develop realistic models that predict output firing behavior based on provided external stimulus. Given a set of external inputs and a set of output spike trains, the objective is to discover a network structure which can accomplish the transformation as accurately as possible. Due to the difficulty of this problem in its most general form, approximations have been made in previous work. Past approximations have sacrificed network size, recurrence, allowed spiked count, or have imposed layered network structure. Here we present a learning rule without these sacrifices, which produces a weight matrix of a leaky integrate-and-fire (LIF) network to match the output activity of both deterministic LIF networks as well as probabilistic integrate-and-fire (PIF) networks. Inspired by synaptic scaling, our pre-synaptic pool modification (PSPM) algorithm outputs deterministic, fully recurrent spiking neural networks that can provide a novel generative model for given spike trains. Similarity in output spike trains is evaluated with a variety of metrics including a van-Rossum like measure and a numerical comparison of inter-spike interval distributions. Application of our algorithm to randomly generated networks improves similarity to the reference spike trains on both of these stated measures. In addition, we generated LIF networks that operate near criticality when trained on critical PIF outputs. Our results establish that learning rules based on synaptic homeostasis can be used to represent input-output relationships in fully recurrent spiking neural networks. Pretty Quick Version of R(pqR) pqR is a new version of the R interpreter. It is based on R-2.15.0, distributed by the R Core Team (at r-project.org), but improves on it in many ways, mostly ways that speed it up, but also by implementing some new features and fixing some bugs. pqR is an open-source project licensed under the GPL. One notable improvement in pqR is that it is able to do some numeric computations in parallel with each other, and with other operations of the interpreter, on systems with multiple processors or processor cores. PRETZEL Machine Learning models are often composed of pipelines of transformations. While this design allows to efficiently execute single model components at training time, prediction serving has different requirements such as low latency, high throughput and graceful performance degradation under heavy load. Current prediction serving systems consider models as black boxes, whereby prediction-time-specific optimizations are ignored in favor of ease of deployment. In this paper, we present PRETZEL, a prediction serving system introducing a novel white box architecture enabling both end-to-end and multi-model optimizations. Using production-like model pipelines, our experiments show that PRETZEL is able to introduce performance improvements over different dimensions; compared to state-of-the-art approaches PRETZEL is on average able to reduce 99th percentile latency by 5.5x while reducing memory footprint by 25x, and increasing throughput by 4.7x. Price of Fairness(PoF) We introduce a flexible family of fairness regularizers for (linear and logistic) regression problems. These regularizers all enjoy convexity, permitting fast optimization, and they span the rang from notions of group fairness to strong individual fairness. By varying the weight on the fairness regularizer, we can compute the efficient frontier of the accuracy-fairness trade-off on any given dataset, and we measure the severity of this trade-off via a numerical quantity we call the Price of Fairness (PoF). The centerpiece of our results is an extensive comparative study of the PoF across six different datasets in which fairness is a primary consideration. Prima Facie Prima facie is a Latin expression meaning on its first encounter or at first sight. The literal translation would be ‘at first face’ or ‘at first appearance’, from the feminine forms of primus (‘first’) and facies (‘face’), both in the ablative case. In modern, colloquial and conversational English, a common translation would be ‘on the face of it’. The term prima facie is used in modern legal English (including both civil law and criminal law) to signify that upon initial examination, sufficient corroborating evidence appears to exist to support a case. In common law jurisdictions, prima facie denotes evidence that, unless rebutted, would be sufficient to prove a particular proposition or fact. The term is used similarly in academic philosophy. Most legal proceedings, in most jurisdictions, require a prima facie case to exist, following which proceedings may then commence to test it, and create a ruling. Primal-Dual Active-Set(PDAS) Isotonic regression (IR) is a non-parametric calibration method used in supervised learning. For performing large-scale IR, we propose a primal-dual active-set (PDAS) algorithm which, in contrast to the state-of-the-art Pool Adjacent Violators (PAV) algorithm, can be parallized and is easily warm-started thus well-suited in the online settings. We prove that, like the PAV algorithm, our PDAS algorithm for IR is convergent and has a work complexity of O(n), though our numerical experiments suggest that our PDAS algorithm is often faster than PAV. In addition, we propose PDAS variants (with safeguarding to ensure convergence) for solving related trend filtering (TF) problems, providing the results of experiments to illustrate their effectiveness. Primal-Dual Group Convolutional Neural Networks(PDGCNets) In this paper, we present a simple and modularized neural network architecture, named primal-dual group convolutional neural networks (PDGCNets). The main point lies in a novel building block, a pair of two successive group convolutions: primal group convolution and dual group convolution. The two group convolutions are complementary: (i) the convolution on each primal partition in primal group convolution is a spatial convolution, while on each dual partition in dual group convolution, the convolution is a point-wise convolution; (ii) the channels in the same dual partition come from different primal partitions. We discuss one representative advantage: Wider than a regular convolution with the number of parameters and the computation complexity preserved. We also show that regular convolutions, group convolution with summation fusion (as used in ResNeXt), and the Xception block are special cases of primal-dual group convolutions. Empirical results over standard benchmarks, CIFAR-$10$, CIFAR-$100$, SVHN and ImageNet demonstrate that our networks are more efficient in using parameters and computation complexity with similar or higher accuracy. Prim’s Algorithm In computer science, Prim’s algorithm is a greedy algorithm that finds a minimum spanning tree for a connected weighted undirected graph. This means it finds a subset of the edges that forms a tree that includes every vertex, where the total weight of all the edges in the tree is minimized. The algorithm was developed in 1930 by Czech mathematician Vojtěch Jarník and later independently by computer scientist Robert C. Prim in 1957 and rediscovered by Edsger Dijkstra in 1959. Therefore it is also sometimes called the DJP algorithm, the Jarník algorithm, or the Prim-Jarník algorithm. Other algorithms for this problem include Kruskal’s algorithm and Borůvka’s algorithm. These algorithms find the minimum spanning forest in a possibly disconnected graph. By running Prim’s algorithm for each connected component of the graph, it can also be used to find the minimum spanning forest. ➚ “Minimum Spanning Tree” Principal Analysis by Conditional Estimation(PACE) Recovering the Underlying Trajectory from Sparse and Irregular Longitudinal Data Principal Component Analysis(PCA) Principal component analysis (PCA) is a statistical procedure that uses orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of linearly uncorrelated variables called principal components. The number of principal components is less than or equal to the number of original variables. This transformation is defined in such a way that the first principal component has the largest possible variance (that is, accounts for as much of the variability in the data as possible), and each succeeding component in turn has the highest variance possible under the constraint that it is orthogonal to (i.e., uncorrelated with) the preceding components. Principal components are guaranteed to be independent if the data set is jointly normally distributed. PCA is sensitive to the relative scaling of the original variables. ➚ “Independent Component Analysis” http://…roper-applications-of-principal-component pcaBootPlot,idm,rpca Principal Component Projection(PCP) Principal component projection is a mathematical procedure that projects high dimensional data onto a lower dimensional space. This lower dimensional space is defined by the principal components with the highest variance in the training data. Principal Component Projection with Low-Degree Polynomials Principal Component Pursuit(PCP) see section 1.2 ➘ “Robust Principal Component Analysis” Principal Component Regression(PCR) In statistics, principal component regression (PCR) is a regression analysis technique that is based on principal component analysis (PCA). Typically, it considers regressing the outcome (also known as the response or the dependent variable) on a set of covariates (also known as predictors, or explanatory variables, or independent variables) based on a standard linear regression model, but uses PCA for estimating the unknown regression coefficients in the model. In PCR, instead of regressing the dependent variable on the explanatory variables directly, the principal components of the explanatory variables are used as regressors. One typically uses only a subset of all the principal components for regression, thus making PCR some kind of a regularized procedure. Often the principal components with higher variances (the ones based on eigenvectors corresponding to the higher eigenvalues of the sample variance-covariance matrix of the explanatory variables) are selected as regressors. However, for the purpose of predicting the outcome, the principal components with low variances may also be important, in some cases even more important. One major use of PCR lies in overcoming the multicollinearity problem which arises when two or more of the explanatory variables are close to being collinear. PCR can aptly deal with such situations by excluding some of the low-variance principal components in the regression step. In addition, by usually regressing on only a subset of all the principal components, PCR can result in dimension reduction through substantially lowering the effective number of parameters characterizing the underlying model. This can be particularly useful in settings with high-dimensional covariates. Also, through appropriate selection of the principal components to be used for regression, PCR can lead to efficient prediction of the outcome based on the assumed model. Sketching for Principal Component Regression Principal Component Signal Recovery(PCSR) The high-dimensionality and volume of large scale multistream data has inhibited significant research progress in developing an integrated monitoring and diagnostics (M&D) approach. This data, also categorized as big data, is becoming common in manufacturing plants. In this paper, we propose an integrated M\&D approach for large scale streaming data. We developed a novel monitoring method named Adaptive Principal Component monitoring (APC) which adaptively chooses PCs that are most likely to vary due to the change for early detection. Importantly, we integrate a novel diagnostic approach, Principal Component Signal Recovery (PCSR), to enable a streamlined SPC. This diagnostics approach draws inspiration from Compressed Sensing and uses Adaptive Lasso for identifying the sparse change in the process. We theoretically motivate our approaches and do a performance evaluation of our integrated M&D method through simulations and case studies. Principal Component-Guided Sparse Regression(pcLasso) We propose a new method for supervised learning, especially suited to wide data where the number of features is much greater than the number of observations. The method combines the lasso (l 1 ) sparsity penalty with a quadratic penalty that shrinks the coefficient vector toward the leading principal components of the feature matrix. We call the proposed method the ‘principal components lasso’ (‘pcLasso’). The method can be especially powerful if the features are pre-assigned to groups (such as cell-pathways, assays or protein interaction networks). In that case, pcLasso shrinks each group-wise component of the solution toward the leading principal components of that group. In the process, it also carries out selection of the feature groups. We provide some theory for this method and illustrate it on a number of simulated and real data examples. Principal Covariates Regression(PCovR) A method for multivariate regression is proposed that is based on the simultaneous least-squares minimization of Y residuals and X residuals by a number of orthogonal X components. By lending increasing weight to the X variables relative to the Y variables, the procedure moves from ordinary least-squares regression to principal component regression, forming a relatively simple alternative for continuum regression. PCovR Principal Differences Analysis(PDA) We introduce principal differences analysis (PDA) for analyzing differences between high-dimensional distributions. The method operates by finding the projection that maximizes the Wasserstein divergence between the resulting univariate populations. Relying on the Cramer-Wold device, it requires no assumptions about the form of the underlying distributions, nor the nature of their inter-class differences. A sparse variant of the method is introduced to identify features responsible for the differences. We provide algorithms for both the original minimax formulation as well as its semidefinite relaxation. In addition to deriving some convergence results, we illustrate how the approach may be applied to identify differences between cell populations in the somatosensory cortex and hippocampus as manifested by single cell RNA-seq. Our broader framework extends beyond the specific choice of Wasserstein divergence. Principal Filter Analysis(PFA) Principal Filter Analysis (PFA), is an elegant, easy to implement, yet effective methodology for neural network compression. PFA exploits the intrinsic correlation between filter responses within network layers to recommend a smaller network footprint. Principal Model Analysis(PMA) Motivated by the Bagging Partial Least Squares (PLS) and Principal Component Analysis (PCA) algorithms, we propose a Principal Model Analysis (PMA) method in this paper. In the proposed PMA algorithm, the PCA and the PLS are combined. In the method, multiple PLS models are trained on sub-training sets, derived from the original training set based on the random sampling with replacement method. The regression coefficients of all the sub-PLS models are fused in a joint regression coefficient matrix. The final projection direction is then estimated by performing the PCA on the joint regression coefficient matrix. The proposed PMA method is compared with other traditional dimension reduction methods, such as PLS, Bagging PLS, Linear discriminant analysis (LDA) and PLS-LDA. Experimental results on six public datasets show that our proposed method can achieve better classification performance and is usually more stable. Principal Orthogonal ComplEment Thresholding(POET) Estimate large covariance matrices in approximate factor models by thresholding principal orthogonal complements. POET Principal Parameterization Insightful visualization of multidimensional scalar fields, in particular parameter spaces, is key to many fields in computational science and engineering. We propose a principal component-based approach to visualize such fields that accurately reflects their sensitivity to input parameters. The method performs dimensionality reduction on the vast $L^2$ Hilbert space formed by all possible partial functions (i.e., those defined by fixing one or more input parameters to specific values), which are projected to low-dimensional parameterized manifolds such as 3D curves, surfaces, and ensembles thereof. Our mapping provides a direct geometrical and visual interpretation in terms of Sobol’s celebrated method for variance-based sensitivity analysis. We furthermore contribute a practical realization of the proposed method by means of tensor decomposition, which enables accurate yet interactive integration and multilinear principal component analysis of high-dimensional models. Principal Points The k principal points of a p-variate random variable X are defined as those points x1,…xk which minimize the expected squared distance of X from the nearest of the xj. High Precision Numerical Computation of Principal Points For Univariate Distributions Principal Stratification Sensitivity Analyses sensitivityPStrat Principal Variance Component Analysis(PVCA) Often times ‘batch effects’ are present in microarray data due to any number of factors, including e.g. a poor experimental design or when the gene expression data is combined from different studies with limited standardization. To estimate the variability of experimental effects including batch, a novel hybrid approach known as principal variance component analysis (PVCA) has been developed. The approach leverages the strengths of two very popular data analysis methods: first, principal component analysis (PCA) is used to efficiently reduce data dimension with maintaining the majority of the variability in the data, and variance components analysis (VCA) fits a mixed linear model using factors of interest as random effects to estimate and partition the total variability. The PVCA approach can be used as a screening tool to determine which sources of variability (biological, technical or other) are most prominent in a given microarray data set. Using the eigenvalues associated with their corresponding eigenvectors as weights, associated variations of all factors are standardized and the magnitude of each source of variability (including each batch effect) is presented as a proportion of total variance. Although PVCA is a generic approach for quantifying the corresponding proportion of variation of each effect, it can be a handy assessment for estimating batch effect before and after batch normalization. Principle of Minimum Differentiation Hotelling’s law is an observation in economics that in many markets it is rational for producers to make their products as similar as possible. This is also referred to as the principle of minimum differentiation as well as Hotelling’s linear city model. The observation was made by Harold Hotelling (1895-1973) in the article ‘Stability in Competition’ in Economic Journal in 1929. The opposing phenomenon is product differentiation, which is usually considered to be a business advantage if executed properly. Principled Bayesian Workflow Experiments in research on memory, language, and in other areas of cognitive science are increasingly being analyzed using Bayesian methods. This has been facilitated by the development of probabilistic programming languages such as Stan, and easily accessible front-end packages such as brms. However, the utility of Bayesian methods ultimately depends on the relevance of the Bayesian model, in particular whether or not it accurately captures the structure of the data and the data analyst’s domain expertise. Even with powerful software, the analyst is responsible for verifying the utility of their model. To accomplish this, we introduce a principled Bayesian workflow (Betancourt, 2018) to cognitive science. Using a concrete working example, we describe basic questions one should ask about the model: prior predictive checks, computational faithfulness, model sensitivity, and posterior predictive checks. The running example for demonstrating the workflow is data on reading times with a linguistic manipulation of object versus subject relative sentences. This principled Bayesian workflow also demonstrates how to use domain knowledge to inform prior distributions. It provides guidelines and checks for valid data analysis, avoiding overfitting complex models to noise, and capturing relevant data structure in a probabilistic model. Given the increasing use of Bayesian methods, we aim to discuss how these methods can be properly employed to obtain robust answers to scientific questions. Prior Network Ensemble of Neural Network (NN) models are known to yield improvements in accuracy. Furthermore, they have been empirically shown to yield robust measures of uncertainty, though without theoretical guarantees. However, ensembles come at high computational and memory cost, which may be prohibitive for certain application. There has been significant work done on the distillation of an ensemble into a single model. Such approaches decrease computational cost and allow a single model to achieve accuracy comparable to that of an ensemble. However, information about the \emph{diversity} of the ensemble, which can yield estimates of \emph{knowledge uncertainty}, is lost. Recently, a new class of models, called Prior Networks, has been proposed, which allows a single neural network to explicitly model a distribution over output distributions, effectively emulating an ensemble. In this work ensembles and Prior Networks are combined to yield a novel approach called \emph{Ensemble Distribution Distillation} (EnD$^2$), which allows distilling an ensemble into a single Prior Network. This allows a single model to retain both the improved classification performance as well as measures of diversity of the ensemble. In this initial investigation the properties of EnD$^2$ have been investigated and confirmed on an artificial dataset. Prior Probability In Bayesian statistical inference, a prior probability distribution, often called simply the prior, of an uncertain quantity p is the probability distribution that would express one’s uncertainty about p before some evidence is taken into account. For example, p could be the proportion of voters who will vote for a particular politician in a future election. It is meant to attribute uncertainty, rather than randomness, to the uncertain quantity. The unknown quantity may be a parameter or latent variable. One applies Bayes’ theorem, multiplying the prior by the likelihood function and then normalizing, to get the posterior probability distribution, which is the conditional distribution of the uncertain quantity, given the data. A prior is often the purely subjective assessment of an experienced expert. Some will choose a conjugate prior when they can, to make calculation of the posterior distribution easier. Parameters of prior distributions are called hyperparameters, to distinguish them from parameters of the model of the underlying data. Prior-Aware Dual Decomposition(PADD) Spectral topic modeling algorithms operate on matrices/tensors of word co-occurrence statistics to learn topic-specific word distributions. This approach removes the dependence on the original documents and produces substantial gains in efficiency and provable topic inference, but at a cost: the model can no longer provide information about the topic composition of individual documents. Recently Thresholded Linear Inverse (TLI) is proposed to map the observed words of each document back to its topic composition. However, its linear characteristics limit the inference quality without considering the important prior information over topics. In this paper, we evaluate Simple Probabilistic Inverse (SPI) method and novel Prior-aware Dual Decomposition (PADD) that is capable of learning document-specific topic compositions in parallel. Experiments show that PADD successfully leverages topic correlations as a prior, notably outperforming TLI and learning quality topic compositions comparable to Gibbs sampling on various data. Priority Queue Training(PQT) We consider the task of program synthesis in the presence of a reward function over the output of programs, where the goal is to find programs with maximal rewards. We employ an iterative optimization scheme, where we train an RNN on a dataset of K best programs from a priority queue of the generated programs so far. Then, we synthesize new programs and add them to the priority queue by sampling from the RNN. We benchmark our algorithm, called priority queue training (or PQT), against genetic algorithm and reinforcement learning baselines on a simple but expressive Turing complete programming language called BF. Our experimental results show that our simple PQT algorithm significantly outperforms the baselines. By adding a program length penalty to the reward function, we are able to synthesize short, human readable programs. PriPeARL Preserving privacy of users is a key requirement of web-scale analytics and reporting applications, and has witnessed a renewed focus in light of recent data breaches and new regulations such as GDPR. We focus on the problem of computing robust, reliable analytics in a privacy-preserving manner, while satisfying product requirements. We present PriPeARL, a framework for privacy-preserving analytics and reporting, inspired by differential privacy. We describe the overall design and architecture, and the key modeling components, focusing on the unique challenges associated with privacy, coverage, utility, and consistency. We perform an experimental study in the context of ads analytics and reporting at LinkedIn, thereby demonstrating the tradeoffs between privacy and utility needs, and the applicability of privacy-preserving mechanisms to real-world data. We also highlight the lessons learned from the production deployment of our system at LinkedIn. Privacy Utility Trade-off(PUT) AI intensive systems that operate upon user data face the challenge of balancing data utility with privacy concerns. We propose the idea and present the prototype of an open-source tool called Privacy Utility Trade-off (PUT) Workbench which seeks to aid software practitioners to take such crucial decisions. We pick a simple privacy model that doesn’t require any background knowledge in Data Science and show how even that can achieve significant results over standard and real-life datasets. The tool and the source code is made freely available for extensions and usage. Privacy-Adversarial Framework Latent factor models for recommender systems represent users and items as low dimensional vectors. Privacy risks have been previously studied mostly in the context of recovery of personal information in the form of usage records from the training data. However, the user representations themselves may be used together with external data to recover private user information such as gender and age. In this paper we show that user vectors calculated by a common recommender system can be exploited in this way. We propose the privacy-adversarial framework to eliminate such leakage, and study the trade-off between recommender performance and leakage both theoretically and empirically using a benchmark dataset. We briefly discuss further applications of this method towards the generation of deeper and more insightful recommendations. Privacy-Preserving Adversarial Network(PPAN) We propose a data-driven framework for optimizing privacy-preserving data release mechanisms toward the information-theoretically optimal tradeoff between minimizing distortion of useful data and concealing sensitive information. Our approach employs adversarially-trained neural networks to implement randomized mechanisms and to perform a variational approximation of mutual information privacy. We empirically validate our Privacy-Preserving Adversarial Networks (PPAN) framework with experiments conducted on discrete and continuous synthetic data, as well as the MNIST handwritten digits dataset. With the synthetic data, we find that our model-agnostic PPAN approach achieves tradeoff points very close to the optimal tradeoffs that are analytically-derived from model knowledge. In experiments with the MNIST data, we visually demonstrate a learned tradeoff between minimizing the pixel-level distortion versus concealing the written digit. Privacy-Preserving Multi-Task Learning ➚ “Model-Protected Multi-Task Learning” Privacy-pREserving StochasTIc Gradual lEarning(PRESTIGE) It is challenging for stochastic optimizations to handle large-scale sensitive data safely. Recently, Duchi et al. proposed private sampling strategy to solve privacy leakage in stochastic optimizations. However, this strategy leads to robustness degeneration, since this strategy is equal to the noise injection on each gradient, which adversely affects updates of the primal variable. To address this challenge, we introduce a robust stochastic optimization under the framework of local privacy, which is called Privacy-pREserving StochasTIc Gradual lEarning (PRESTIGE). PRESTIGE bridges private updates of the primal variable (by private sampling) with the gradual curriculum learning (CL). Specifically, the noise injection leads to the issue of label noise, but the robust learning process of CL can combat with label noise. Thus, PRESTIGE yields ‘private but robust’ updates of the primal variable on the private curriculum, namely an reordered label sequence provided by CL. In theory, we reveal the convergence rate and maximum complexity of PRESTIGE. Empirical results on six datasets show that, PRESTIGE achieves a good tradeoff between privacy preservation and robustness over baselines. Privacy-Preserving-Summation-Consistent(PPSC) A distributed computing protocol consists of three components: (i) Data Localization: a network-wide dataset is decomposed into local datasets separately preserved at a network of nodes; (ii) Node Communication: the nodes hold individual dynamical states and communicate with the neighbors about these dynamical states; (iii) Local Computation: state recursions are computed at each individual node. Information about the local datasets enters the computation process through the node-to-node communication and the local computations, which may be leaked to dynamics eavesdroppers having access to global or local node states. In this paper, we systematically investigate this potential computational privacy risks in distributed computing protocols in the form of structured system identification, and then propose and thoroughly analyze a Privacy-Preserving-Summation-Consistent (PPSC) mechanism as a generic privacy encryption subroutine for consensus-based distributed computations. The central idea is that the consensus manifold is where we can both hide node privacy and achieve computational accuracy. In this first part of the paper, we demonstrate the computational privacy risks in distributed algorithms against dynamics eavesdroppers and particularly in distributed linear equation solvers, and then propose the PPSC mechanism and illustrate its usefulness. Privado Recently, cloud providers have extended support for trusted hardware primitives such as Intel SGX. Simultaneously, the field of deep learning is seeing enormous innovation and increase in adoption. In this paper, we therefore ask the question: ‘Can third-party cloud services use SGX to provide practical, yet secure DNN Inference-as-a-service? ‘ Our work addresses the three main challenges that SGX-based DNN inferencing faces, namely, security, ease-of-use, and performance. We first demonstrate that side-channel based attacks on DNN models are indeed possible. We show that, by observing access patterns, we can recover inputs to the DNN model. This motivates the need for Privado, a system we have designed for secure inference-as-a-service. Privado is input-oblivious: it transforms any deep learning framework written in C/C++ to be free of input-dependent access patterns. Privado is fully-automated and has a low TCB: with zero developer effort, given an ONNX description, it generates compact C code for the model which can run within SGX-enclaves. Privado has low performance overhead: we have used Privado with Torch, and have shown its overhead to be 20.77\% on average on 10 contemporary networks. Private ADMM(P-ADMM) Due to massive amounts of data distributed across multiple locations, distributed machine learning has attracted a lot of research interests. Alternating Direction Method of Multipliers (ADMM) is a powerful method of designing distributed machine learning algorithm, whereby each agent computes over local datasets and exchanges computation results with its neighbor agents in an iterative procedure. There exists significant privacy leakage during this iterative process if the local data is sensitive. In this paper, we propose a differentially private ADMM algorithm (P-ADMM) to provide dynamic zero-concentrated differential privacy (dynamic zCDP), by inserting Gaussian noise with linearly decaying variance. We prove that P-ADMM has the same convergence rate compared to the non-private counterpart, i.e., $\mathcal{O}(1/K)$ with $K$ being the number of iterations and linear convergence for general convex and strongly convex problems while providing differentially private guarantee. Moreover, through our experiments performed on real-world datasets, we empirically show that P-ADMM has the best-known performance among the existing differentially private ADMM based algorithms. Private Incremental Regression Data is continuously generated by modern data sources, and a recent challenge in machine learning has been to develop techniques that perform well in an incremental (streaming) setting. In this paper, we investigate the problem of private machine learning, where as common in practice, the data is not given at once, but rather arrives incrementally over time. We introduce the problems of private incremental ERM and private incremental regression where the general goal is to always maintain a good empirical risk minimizer for the history observed under differential privacy. Our first contribution is a generic transformation of private batch ERM mechanisms into private incremental ERM mechanisms, based on a simple idea of invoking the private batch ERM procedure at some regular time intervals. We take this construction as a baseline for comparison. We then provide two mechanisms for the private incremental regression problem. Our first mechanism is based on privately constructing a noisy incremental gradient function, which is then used in a modified projected gradient procedure at every timestep. This mechanism has an excess empirical risk of $\approx\sqrt{d}$, where $d$ is the dimensionality of the data. While from the results of [Bassily et al. 2014] this bound is tight in the worst-case, we show that certain geometric properties of the input and constraint set can be used to derive significantly better results for certain interesting regression problems. Privileged Multi-Label Learning(PrML) This paper presents privileged multi-label learning (PrML) to explore and exploit the relationship between labels in multi-label learning problems. We suggest that for each individual label, it cannot only be implicitly connected with other labels via the low-rank constraint over label predictors, but also its performance on examples can receive the explicit comments from other labels together acting as an \emph{Oracle teacher}. We generate privileged label feature for each example and its individual label, and then integrate it into the framework of low-rank based multi-label learning. The proposed algorithm can therefore comprehensively explore and exploit label relationships by inheriting all the merits of privileged information and low-rank constraints. We show that PrML can be efficiently solved by dual coordinate descent algorithm using iterative optimization strategy with cheap updates. Experiments on benchmark datasets show that through privileged label features, the performance can be significantly improved and PrML is superior to several competing methods in most cases. PrivyNet Massive data exist among user local platforms that usually cannot support deep neural network (DNN) training due to computation and storage resource constraints. Cloud-based training schemes can provide beneficial services, but rely on excessive user data collection, which can lead to potential privacy risks and violations. In this paper, we propose PrivyNet, a flexible framework to enable DNN training on the cloud while protecting the data privacy simultaneously. We propose to split the DNNs into two parts and deploy them separately onto the local platforms and the cloud. The local neural network (NN) is used for feature extraction. To avoid local training, we rely on the idea of transfer learning and derive the local NNs by extracting the initial layers from pre-trained NNs. We identify and compare three factors that determine the topology of the local NN, including the number of layers, the depth of output channels, and the subset of selected channels. We also propose a hierarchical strategy to determine the local NN topology, which is flexible to optimize the accuracy of the target learning task under the constraints on privacy loss, local computation, and storage. To validate PrivyNet, we use the convolutional NN (CNN) based image classification task as an example and characterize the dependency of privacy loss and accuracy on the local NN topology in detail. We also demonstrate that PrivyNet is efficient and can help explore and optimize the trade-off between privacy loss and accuracy. Probabilistic Adaptive Computation Time We present a probabilistic model with discrete latent variables that control the computation time in deep learning models such as ResNets and LSTMs. A prior on the latent variables expresses the preference for faster computation. The amount of computation for an input is determined via amortized maximum a posteriori (MAP) inference. MAP inference is performed using a novel stochastic variational optimization method. The recently proposed Adaptive Computation Time mechanism can be seen as an ad-hoc relaxation of this model. We demonstrate training using the general-purpose Concrete relaxation of discrete variables. Evaluation on ResNet shows that our method matches the speed-accuracy trade-off of Adaptive Computation Time, while allowing for evaluation with a simple deterministic procedure that has a lower memory footprint. Probabilistic approach to neural ARchitecture SEarCh(PARSEC) In neural architecture search (NAS), the space of neural network architectures is automatically explored to maximize predictive accuracy for a given task. Despite the success of recent approaches, most existing methods cannot be directly applied to large scale problems because of their prohibitive computational complexity or high memory usage. In this work, we propose a Probabilistic approach to neural ARchitecture SEarCh (PARSEC) that drastically reduces memory requirements while maintaining state-of-the-art computational complexity, making it possible to directly search over more complex architectures and larger datasets. Our approach only requires as much memory as is needed to train a single architecture from our search space. This is due to a memory-efficient sampling procedure wherein we learn a probability distribution over high-performing neural network architectures. Importantly, this framework enables us to transfer the distribution of architectures learnt on smaller problems to larger ones, further reducing the computational cost. We showcase the advantages of our approach in applications to CIFAR-10 and ImageNet, where our approach outperforms methods with double its computational cost and matches the performance of methods with costs that are three orders of magnitude larger. Probabilistic Argumentation Probabilistic argumentation refers to different formal frameworks in the literature. All share the idea that qualitative aspects can be captured by an underlying logic, while quantitative aspects of uncertainty can be accounted for by probabilistic measures. Probabilistic Augmentation of Data Using Diffeomorphic Image Transformation(PADDIT) For proper generalization performance of convolutional neural networks (CNNs) in medical image segmentation, the learnt features should be invariant under particular non-linear shape variations of the input. To induce invariance in CNNs to such transformations, we propose Probabilistic Augmentation of Data using Diffeomorphic Image Transformation (PADDIT) — a systematic framework for generating realistic transformations that can be used to augment data for training CNNs. We show that CNNs trained with PADDIT outperforms CNNs trained without augmentation and with generic augmentation in segmenting white matter hyperintensities from T1 and FLAIR brain MRI scans. Probabilistic Automaton In mathematics and computer science, the probabilistic automaton (PA) is a generalization of the nondeterministic finite automaton; it includes the probability of a given transition into the transition function, turning it into a transition matrix. Thus, the probabilistic automaton generalizes the concept of a Markov chain or subshift of finite type. The languages recognized by probabilistic automata are called stochastic languages; these include the regular languages as a subset. The number of stochastic languages is uncountable. Probabilistic Binary Neural Network(BLRNet) Low bit-width weights and activations are an effective way of combating the increasing need for both memory and compute power of Deep Neural Networks. In this work, we present a probabilistic training method for Neural Network with both binary weights and activations, called BLRNet. By embracing stochasticity during training, we circumvent the need to approximate the gradient of non-differentiable functions such as sign(), while still obtaining a fully Binary Neural Network at test time. Moreover, it allows for anytime ensemble predictions for improved performance and uncertainty estimates by sampling from the weight distribution. Since all operations in a layer of the BLRNet operate on random variables, we introduce stochastic versions of Batch Normalization and max pooling, which transfer well to a deterministic network at test time. We evaluate the BLRNet on multiple standardized benchmarks. Probabilistic Causation Probabilistic causation is a concept in a group of philosophical theories that aim to characterize the relationship between cause and effect using the tools of probability theory. The central idea behind these theories is that causes raise the probabilities of their effects, all else being equal. Interpreting causation as a deterministic relation means that if A causes B, then A must always be followed by B. In this sense, war does not cause deaths, nor does smoking cause cancer. As a result, many turn to a notion of probabilistic causation. Informally, A probabilistically causes B if A’s occurrence increases the probability of B. This is sometimes interpreted to reflect imperfect knowledge of a deterministic system but other times interpreted to mean that the causal system under study has an inherently indeterministic nature. (Propensity probability is an analogous idea, according to which probabilities have an objective existence and are not just limitations in a subject’s knowledge). Philosophers such as Hugh Mellor and Patrick Suppes have defined causation in terms of a cause preceding and increasing the probability of the effect. Probabilistic Computation Tree Logic(PCTL) In this paper, we develop approximate dynamic programming methods for stochastic systems modeled as Markov Decision Processes, given both soft performance criteria and hard constraints in a class of probabilistic temporal logic called Probabilistic Computation Tree Logic (PCTL). Our approach consists of two steps: First, we show how to transform a class of PCTL formulas into chance constraints that can be enforced during planning in stochastic systems. Second, by integrating randomized optimization and entropy-regulated dynamic programming, we devise a novel trajectory sampling-based approximate value iteration method to iteratively solve for an upper bound on the value function while ensuring the constraints that PCTL specifications are satisfied. Particularly, we show that by the on-policy sampling of the trajectories, a tight bound can be achieved between the upper bound given by the approximation and the true value function. The correctness and efficiency of the method are demonstrated using robotic motion planning examples. Probabilistic Computing The MIT Probabilistic Computing Project aims to build software and hardware systems that augment human and machine intelligence. We are currently focused on probabilistic programming. Probabilistic programming is an emerging field that draws on probability theory, programming languages, and systems programming to provide concise, expressive languages for modeling and general-purpose inference engines that both humans and machines can use. Our research projects include BayesDB and Picture, domain-specific probabilistic programming platforms aimed at augmenting intelligence in the fields of data science and computer vision, respectively. BayesDB, which is open source and in use by organizations like the Bill & Melinda Gates Foundation and JPMorgan, lets users who lack statistics training understand the probable implications of data by writing queries in a simple, SQL-like language. Picture, a probabilistic language being developed in collaboration with Microsoft, lets users solve hard computer vision problems such as inferring 3D models of faces, human bodies and novel generic objects from single images by writing short (<50 line) computer graphics programs that generate and render random scenes. Unlike bottom-up vision algorithms, Picture programs build on prior knowledge about scene structure and produce complete 3D wireframes that people can manipulate using ordinary graphics software. The core platform for our research is Venture, an interactive platform suitable for teaching and applications in fields ranging from statistics to robotics. ➚ “BayesDB” Probabilistic Conditional Preference Network(PCP-net) In order to represent the preferences of a group of individuals, we introduce Probabilistic CP-nets (PCP-nets). PCP-nets provide a compact language for representing probability distributions over preference orderings. We argue that they are useful for aggregating preferences or modelling noisy preferences. Then we give efficient algorithms for the main reasoning problems, namely for computing the probability that a given outcome is preferred to another one, and the probability that a given outcome is optimal. As a by-product, we obtain an unexpected linear-time algorithm for checking dominance in a standard, tree-structured CP-net. Probabilistic Data Structure Probabilistic data structures are a group of data structures that are extremely useful for big data and streaming applications. Generally speaking, these data structures use hash functions to randomize and compactly represent a set of items. Collisions are ignored but errors can be well-controlled under certain threshold. Comparing with error-free approaches, these algorithms use much less memory and have constant query time. They usually support union and intersection operations and therefore can be easily parallelized. http://…/Category:Probabilistic_data_structures Probabilistic Database Most real databases contain data whose correctness is uncertain. In order to work with such data, there is a need to quantify the integrity of the data. This is achieved by using probabilistic databases. A probabilistic database is an uncertain database in which the possible worlds have associated probabilities. Probabilistic database management systems are currently an active area of research. ‘While there are currently no commercial probabilistic database systems, several research prototypes exist…’ Probabilistic databases distinguish between the logical data model and the physical representation of the data much like relational databases do in the ANSI-SPARC Architecture. In probabilistic databases this is even more crucial since such databases have to represent very large numbers of possible worlds, often exponential in the size of one world (a classical database), succinctly. On Constrained Open-World Probabilistic Databases Probabilistic D-Clustering We present a new iterative method for probabilistic clustering of data. Given clusters, their centers and the distances of data points from these centers, the probability of cluster membership at any point is assumed inversely proportional to the distance from (the center of) the cluster in question. This assumption is our working principle. The method is a generalization, to several centers, of theWeiszfeld method for solving the Fermat-Weber location problem. At each iteration, the distances (Euclidean, Mahalanobis, etc.) from the cluster centers are computed for all data points, and the centers are updated as convex combinations of these points, with weights determined by the above principle. Computations stop when the centers stop moving. Progress is monitored by the joint distance function, a measure of distance from all cluster centers, that evolves during the iterations, and captures the data in its low contours. The method is simple, fast (requiring a small number of cheap iterations) and insensitive to outliers. Probabilistic Deep Hashing(PDH) With the growth of image on the web, research on hashing which enables high-speed image retrieval has been actively studied. In recent years, various hashing methods based on deep neural networks have been proposed and achieved higher precision than the other hashing methods. In these methods, multiple losses for hash codes and the parameters of neural networks are defined. They generate hash codes that minimize the weighted sum of the losses. Therefore, an expert has to tune the weights for the losses heuristically, and the probabilistic optimality of the loss function cannot be explained. In order to generate explainable hash codes without weight tuning, we theoretically derive a single loss function with no hyperparameters for the hash code from the probability distribution of the images. By generating hash codes that minimize this loss function, highly accurate image retrieval with probabilistic optimality is performed. We evaluate the performance of hashing using MNIST, CIFAR-10, SVHN and show that the proposed method outperforms the state-of-the-art hashing methods. Probabilistic Dependency Networks ➚ “Dependency Network” Probabilistic Distance Clustering(PD-clustering) Probabilistic distance clustering (PD-clustering) is an iterative, distribution free, probabilistic clustering method. PD-clustering assigns units to a cluster according to their probability of membership, under the constraint that the product of the probability and the distance of each point to any cluster centre is a constant. PD-clustering is a flexible method that can be used with non-spherical clusters, outliers, or noisy data. Facto PD-clustering (FPDC) is a recently proposed factor clustering method that involves a linear transformation of variables and a cluster optimizing the PD-clustering criterion. It allows clustering of high dimensional data sets. Probabilistic Eigenvalue Shaping(PES) We consider a nonlinear Fourier transform (NFT)-based transmission scheme, where data is embedded into the imaginary part of the nonlinear discrete spectrum. Inspired by probabilistic amplitude shaping, we propose a probabilistic eigenvalue shaping (PES) scheme as a means to increase the data rate of the system. We exploit the fact that for an NFTbased transmission scheme the pulses in the time domain are of unequal duration by transmitting them with a dynamic symbol interval and find a capacity-achieving distribution. The PES scheme shapes the information symbols according to the capacity-achieving distribution and transmits them together with the parity symbols at the output of a low-density parity-check encoder, suitably modulated, via time-sharing. We furthermore derive an achievable rate for the proposed PES scheme. We verify our results with simulations of the discrete-time model as well as with split-step Fourier simulations. Probabilistic Event Calculus(PEC) We present PEC, an Event Calculus (EC) style action language for reasoning about probabilistic causal and narrative information. It has an action language style syntax similar to that of the EC variant Modular-E. Its semantics is given in terms of possible worlds which constitute possible evolutions of the domain, and builds on that of EFEC, an epistemic extension of EC. We also describe an ASP implementation of PEC and show the sense in which this is sound and complete. Probabilistic Face Embedding(PFE) Embedding methods have achieved success in face recognition by comparing facial features in a latent semantic space. However, in a fully unconstrained face setting, the features learned by the embedding model could be ambiguous or may not even be present in the input face, leading to noisy representations. We propose Probabilistic Face Embeddings (PFEs), which represent each face image as a Gaussian distribution in the latent space. The mean of the distribution estimates the most likely feature values while the variance shows the uncertainty in the feature values. Probabilistic solutions can then be naturally derived for matching and fusing PFEs using the uncertainty information. Empirical evaluation on different baseline models, training datasets and benchmarks show that the proposed method can improve the face recognition performance of deterministic embeddings by converting them into PFEs. The uncertainties estimated by PFEs also serve as good indicators of the potential matching accuracy, which are important for a risk-controlled recognition system. Probabilistic Generative Adversarial Network(PGAN) We introduce the Probabilistic Generative Adversarial Network (PGAN), a new GAN variant based on a new kind of objective function. The central idea is to integrate a probabilistic model (a Gaussian Mixture Model, in our case) into the GAN framework which supports a new kind of loss function (based on likelihood rather than classification loss), and at the same time gives a meaningful measure of the quality of the outputs generated by the network. Experiments with MNIST show that the model learns to generate realistic images, and at the same time computes likelihoods that are correlated with the quality of the generated images. We show that PGAN is better able to cope with instability problems that are usually observed in the GAN training procedure. We investigate this from three aspects: the probability landscape of the discriminator, gradients of the generator, and the perfect discriminator problem. Probabilistic Graphical Model(PGM) Uncertainty is unavoidable in real-world applications: we can almost never predict with certainty what will happen in the future, and even in the present and the past, many important aspects of the world are not observed with certainty. Probability theory gives us the basic foundation to model our beliefs about the different possible states of the world, and to update these beliefs as new evidence is obtained. These beliefs can be combined with individual preferences to help guide our actions, and even in selecting which observations to make. While probability theory has existed since the 17th century, our ability to use it effectively on large problems involving many inter-related variables is fairly recent, and is due largely to the development of a framework known as Probabilistic Graphical Models (PGMs). This framework, which spans methods such as Bayesian networks and Markov random fields, uses ideas from discrete data structures in computer science to efficiently encode and manipulate probability distributions over high-dimensional spaces, often involving hundreds or even many thousands of variables. These methods have been used in an enormous range of application domains, which include: web search, medical and fault diagnosis, image understanding, reconstruction of biological networks, speech recognition, natural language processing, decoding of messages sent over a noisy communication channel, robot navigation, and many more. ➚ “Graphical Model” Probabilistic Inference for Particle-Based Policy Search(PIPPS) Previously, the exploding gradient problem has been explained to be central in deep learning and model-based reinforcement learning, because it causes numerical issues and instability in optimization. Our experiments in model-based reinforcement learning imply that the problem is not just a numerical issue, but it may be caused by a fundamental chaos-like nature of long chains of nonlinear computations. Not only do the magnitudes of the gradients become large, the direction of the gradients becomes essentially random. We show that reparameterization gradients suffer from the problem, while likelihood ratio gradients are robust. Using our insights, we develop a model-based policy search framework, Probabilistic Inference for Particle-Based Policy Search (PIPPS), which is easily extensible, and allows for almost arbitrary models and policies, while simultaneously matching the performance of previous data-efficient learning algorithms. Finally, we invent the total propagation algorithm, which efficiently computes a union over all pathwise derivative depths during a single backwards pass, automatically giving greater weight to estimators with lower variance, sometimes improving over reparameterization gradients by $10^6$ times. Probabilistic Kernel Support Vector Machine We propose a probabilistic enhancement of standard Kernel Support Vector Machines for binary classification, in order to address the case when, along with given data sets, a description of uncertainty (e.g., error bounds) may be available on each datum. In the present paper, we specifically consider Gaussian distributions to model uncertainty. Thereby, our data consist of pairs $(x_i,\Sigma_i)$, $i\in ,…,N$, along with an indicator $y_i\in(-1,1)$ to declare membership in one of two categories for each pair. These pairs may be viewed to represent the mean and covariance, respectively, of random vectors $\xi_i$ taking values in a suitable linear space (typically ${\mathbb R}^n$). Thus, our setting may also be viewed as a modification of Support Vector Machines to classify distributions, albeit, at present, only Gaussian ones. We outline the formalism that allows computing suitable classifiers via a natural modification of the standard ‘kernel trick’. The main contribution of this work is to point out a suitable kernel function for applying Support Vector techniques to the setting of uncertain data for which a detailed uncertainty description is also available (herein, ‘Gaussian points’). Probabilistic Latent Feature Models Probabilistic Latent Feature Models assume that objects and attributes can be represented as a set of binary latent features and that the strength of object-attribute associations can be explained as a non-compensatory (e.g., disjunctive or conjunctive) mapping of latent features. plfm Probabilistic Latent Semantic Analysis(PLSA) We consider the problem of discovering the simplest latent variable that can make two observed discrete variables conditionally independent. This problem has appeared in the literature as probabilistic latent semantic analysis (pLSA), and has connections to non-negative matrix factorization. When the simplicity of the variable is measured through its cardinality, we show that a solution to this latent variable discovery problem can be used to distinguish direct causal relations from spurious correlations among almost all joint distributions on simple causal graphs with two observed variables. Conjecturing a similar identifiability result holds with Shannon entropy, we study a loss function that trades-off between entropy of the latent variable and the conditional mutual information of the observed variables. We then propose a latent variable discovery algorithm — LatentSearch — and show that its stationary points are the stationary points of our loss function. We experimentally show that LatentSearch can indeed be used to distinguish direct causal relations from spurious correlations. Entropic Latent Variable Discovery Probabilistic Learning in Control(PILCO) Synthesizing Neural Network Controllers with Probabilistic Model based Reinforcement Learning Probabilistic Matrix Factorization(PMF) Many existing approaches to collaborative filtering can neither handle very large datasets nor easily deal with users who have very few ratings. In this paper we present the Probabilistic Matrix Factorization (PMF) model which scales linearly with the number of observations and, more importantly, performs well on the large, sparse, and very imbalanced Netflix dataset. We further extend the PMF model to include an adaptive prior on the model parameters and show how the model capacity can be controlled automatically. Finally, we introduce a constrained version of the PMF model that is based on the assumption that users who have rated similar sets of movies are likely to have similar preferences. The resulting model is able to generalize considerably better for users with very few ratings. When the predictions of multiple PMF models are linearly combined with the predictions of Restricted Boltzmann Machines models, we achieve an error rate of 0.8861, that is nearly 7% better than the score of Netflix’s own system. Probabilistic Metric Space A probabilistic metric space is a generalization of metric spaces where the distance is no longer valued in non-negative real numbers, but instead is valued in distribution functions. Probabilistic Model-Agnostic Meta-Learning Meta-learning for few-shot learning entails acquiring a prior over previous tasks and experiences, such that new tasks be learned from small amounts of data. However, a critical challenge in few-shot learning is task ambiguity: even when a powerful prior can be meta-learned from a large number of prior tasks, a small dataset for a new task can simply be too ambiguous to acquire a single model (e.g., a classifier) for that task that is accurate. In this paper, we propose a probabilistic meta-learning algorithm that can sample models for a new task from a model distribution. Our approach extends model-agnostic meta-learning, which adapts to new tasks via gradient descent, to incorporate a parameter distribution that is trained via a variational lower bound. At meta-test time, our algorithm adapts via a simple procedure that injects noise into gradient descent, and at meta-training time, the model is trained such that this stochastic adaptation procedure produces samples from the approximate model posterior. Our experimental results show that our method can sample plausible classifiers and regressors in ambiguous few-shot learning problems. Probabilistic Multilayer Network Here we introduce probabilistic weighted and unweighted multilayer networks as derived from information theoretical correlation measures on large multidimensional datasets. We present the fundamentals of the formal application of probabilistic inference on problems embedded in multilayered environments, providing examples taken from the analysis of biological and social systems: cancer genomics and drug-related violence. Probabilistic Neural Network(PNN) A probabilistic neural network (PNN) is a feedforward neural network, which was derived from the Bayesian network and a statistical algorithm called Kernel Fisher discriminant analysis. It was introduced by D.F. Specht in the early 1990s. In a PNN, the operations are organized into a multilayered feedforward network with four layers: · Input layer · Hidden layer · Pattern layer/Summation layer · Output layer Probabilistic Neural Programs We present probabilistic neural programs, a framework for program induction that permits flexible specification of both a computational model and inference algorithm while simultaneously enabling the use of deep neural networks. Probabilistic neural programs combine a computation graph for specifying a neural network with an operator for weighted nondeterministic choice. Thus, a program describes both a collection of decisions as well as the neural network architecture used to make each one. We evaluate our approach on a challenging diagram question answering task where probabilistic neural programs correctly execute nearly twice as many programs as a baseline model. Probabilistic PARAFAC2 The PARAFAC2 is a multimodal factor analysis model suitable for analyzing multi-way data when one of the modes has incomparable observation units, for example because of differences in signal sampling or batch sizes. A fully probabilistic treatment of the PARAFAC2 is desirable in order to improve robustness to noise and provide a well founded principle for determining the number of factors, but challenging because the factor loadings are constrained to be orthogonal. We develop two probabilistic formulations of the PARAFAC2 along with variational procedures for inference: In the one approach, the mean values of the factor loadings are orthogonal leading to closed form variational updates, and in the other, the factor loadings themselves are orthogonal using a matrix Von Mises-Fisher distribution. We contrast our probabilistic formulation to the conventional direct fitting algorithm based on maximum likelihood. On simulated data and real fluorescence spectroscopy and gas chromatography-mass spectrometry data, we compare our approach to the conventional PARAFAC2 model estimation and find that the probabilistic formulation is more robust to noise and model order misspecification. The probabilistic PARAFAC2 thus forms a promising framework for modeling multi-way data accounting for uncertainty. PRObabilistic Parametric rEgression Loss(PROPEL) Recently, Convolutional Neural Networks (CNNs) have dominated the field of computer vision. Their widespread success has been attributed to their representation learning capabilities. For classification tasks, CNNs have widely employed probabilistic output and have shown the significance of providing additional confidence for predictions. However, such probabilistic methodologies are not widely applicable for addressing regression problems using CNNs, as regression involves learning unconstrained continuous and, in many cases, multi-variate target variables. We propose a PRObabilistic Parametric rEgression Loss (PROPEL) that enables probabilistic regression using CNNs. PROPEL is fully differentiable and, hence, can be easily incorporated for end-to-end training of existing regressive CNN architectures. The proposed method is flexible as it learns complex unconstrained probabilities while being generalizable to higher dimensional multi-variate regression problems. We utilize a PROPEL-based CNN to address the problem of learning hand and head orientation from uncalibrated color images. Comprehensive experimental validation and comparisons with existing CNN regression loss functions are provided. Our experimental results indicate that PROPEL significantly improves the performance of a CNN, while reducing model parameters by 10x as compared to the existing state-of-the-art. Probabilistic Partial Least Squares(PPLS) With a rapid increase in volume and complexity of data sets there is a need for methods that can extract useful information in these data sets. Dimension reduction approaches such as Partial least squares (PLS) are increasingly being utilized for finding relationships between two data sets. However these methods often lack a probabilistic formulation, hampering development of more flexible models. Moreover dimension reduction methods in general suffer from identifiability problems, causing difficulties in combining and comparing results from multiple studies. We propose Probabilistic PLS (PPLS) as an extension of PLS to model the overlap between two data sets. The likelihood formulation provides opportunities to address issues typically present in data, such as missing entries and heterogeneity between subjects. We show that the PPLS parameters are identifiable up to sign. We derive Maximum Likelihood estimators that respect the identifiability conditions by using an EM algorithm with a constrained optimization in the M step. A simulation study is conducted and we observe a good performance of the PPLS estimates in various scenarios, when compared to PLS estimates. Most notably the estimates seem to be robust against departures from normality. To illustrate the PPLS model, we apply it to real IgG glycan data from two cohorts. We infer the contributions of each variable to the correlated part and observe very similar behavior across cohorts. Probabilistic Personalization of Black-Box Sequence Model(PROPS) We present PROPS, a lightweight transfer learning mechanism for sequential data. PROPS learns probabilistic perturbations around the predictions of one or more arbitrarily complex, pre-trained black box models (such as recurrent neural networks). The technique pins the black-box prediction functions to ‘source nodes’ of a hidden Markov model (HMM), and uses the remaining nodes as ‘perturbation nodes’ for learning customized perturbations around those predictions. In this paper, we describe the PROPS model, provide an algorithm for online learning of its parameters, and demonstrate the consistency of this estimation. We also explore the utility of PROPS in the context of personalized language modeling. In particular, we construct a baseline language model by training a LSTM on the entire Wikipedia corpus of 2.5 million articles (around 6.6 billion words), and then use PROPS to provide lightweight customization into a personalized language model of President Donald J. Trump’s tweeting. We achieved good customization after only 2,000 additional words, and find that the PROPS model, being fully probabilistic, provides insight into when President Trump’s speech departs from generic patterns in the Wikipedia corpus. Python code (for both the PROPS training algorithm as well as experiment reproducibility) is available at https://…/perturbed-sequence-model. Probabilistic Programming A probabilistic programming language is a high-level language that makes it easy for a developer to define probability models and then ‘solve’ these models automatically. These languages incorporate random events as primitives and their runtime environment handles inference. Now, it is a matter of programming that enables a clean separation between modeling and inference. This can vastly reduce the time and effort associated with implementing new models and understanding data. Just as high-level programming languages transformed developer productivity by abstracting away the details of the processor and memory architecture, probabilistic languages promise to free the developer from the complexities of high-performance probabilistic inference. Probabilistic-Programming.org What is probabilistic programming? Probabilistic Programming for Advancing Machine Learning(PPAML) Machine learning – the ability of computers to understand data, manage results and infer insights from uncertain information – is the force behind many recent revolutions in computing. Email spam filters, smartphone personal assistants and self-driving vehicles are all based on research advances in machine learning. Unfortunately, even as the demand for these capabilities is accelerating, every new application requires a Herculean effort. Teams of hard-to-find experts must build expensive, custom tools that are often painfully slow and can perform unpredictably against large, complex data sets. The Probabilistic Programming for Advancing Machine Learning (PPAML) program aims to address these challenges. Probabilistic programming is a new programming paradigm for managing uncertain information. Using probabilistic programming languages, PPAML seeks to greatly increase the number of people who can successfully build machine learning applications and make machine learning experts radically more effective. Moreover, the program seeks to create more economical, robust and powerful applications that need less data to produce more accurate results – features inconceivable with today’s technology. http://…/wiki Probabilistic Programming Language(PPL) A probabilistic programming language (PPL) is a programming language designed to describe probabilistic models and then perform inference in those models. PPLs are closely related to graphical models and Bayesian networks, but are more expressive and flexible. Probabilistic programming represents an attempt to ‘ general purpose programming with probabilistic modeling.’ Probabilistic reasoning is a foundational technology of machine learning. It is used by companies such as Google, Amazon.com and Microsoft. Probabilistic reasoning has been used for predicting stock prices, recommending movies, diagnosing computers, detecting cyber intrusions and image detection. PPLs often extend from a basic language. The choice of underlying basic language depends on the similarity of the model to the basic language’s ontology, as well as commercial considerations and personal preference. For instance, Dimple and Chimple are based on Java, Infer.NET is based on .NET framework, while PRISM extends from Prolog. However, some PPLs such as WinBUGS and Stan offer a self-contained language, with no obvious origin in another language. Several PPLs are in active development, including some in beta test. Probabilistic Record Linkage Probabilistic record linkage (PRL) is the process of determining which records in two databases correspond to the same underlying entity in the absence of a unique identifier. Bayesian solutions to this problem provide a powerful mechanism for propagating uncertainty due to uncertain links between records (via the posterior distribution). However, computational considerations severely limit the practical applicability of existing Bayesian approaches. We propose a new computational approach, providing both a fast algorithm for deriving point estimates of the linkage structure that properly account for one-to-one matching and a restricted MCMC algorithm that samples from an approximate posterior distribution. Our advances make it possible to perform Bayesian PRL for larger problems, and to assess the sensitivity of results to varying prior specifications. We demonstrate the methods on simulated data and an application to a post-enumeration survey for coverage estimation in the Italian census. Probabilistic Relational Agent-based Model(PRAM) PRAM puts agent-based models on a sound probabilistic footing as a basis for integrating agent-based and probabilistic models. It extends the themes of probabilistic relational models and lifted inference to incorporate dynamical models and simulation. It can also be much more efficient than agent-based simulation. Probabilistic Robustness Neural networks are becoming increasingly prevalent in software, and it is therefore important to be able to verify their behavior. Because verifying the correctness of neural networks is extremely challenging, it is common to focus on the verification of other properties of these systems. One important property, in particular, is robustness. Most existing definitions of robustness, however, focus on the worst-case scenario where the inputs are adversarial. Such notions of robustness are too strong, and unlikely to be satisfied by-and verifiable for-practical neural networks. Observing that real-world inputs to neural networks are drawn from non-adversarial probability distributions, we propose a novel notion of robustness: probabilistic robustness, which requires the neural network to be robust with at least $(1 – \epsilon)$ probability with respect to the input distribution. This probabilistic approach is practical and provides a principled way of estimating the robustness of a neural network. We also present an algorithm, based on abstract interpretation and importance sampling, for checking whether a neural network is probabilistically robust. Our algorithm uses abstract interpretation to approximate the behavior of a neural network and compute an overapproximation of the input regions that violate robustness. It then uses importance sampling to counter the effect of such overapproximation and compute an accurate estimate of the probability that the neural network violates the robustness property. Probabilistic Supervised Learning Predictive modelling and supervised learning are central to modern data science. With predictions from an ever-expanding number of supervised black-box strategies – e.g., kernel methods, random forests, deep learning aka neural networks – being employed as a basis for decision making processes, it is crucial to understand the statistical uncertainty associated with these predictions. As a general means to approach the issue, we present an overarching framework for black-box prediction strategies that not only predict the target but also their own predictions’ uncertainty. Moreover, the framework allows for fair assessment and comparison of disparate prediction strategies. For this, we formally consider strategies capable of predicting full distributions from feature variables, so-called probabilistic supervised learning strategies. Our work draws from prior work including Bayesian statistics, information theory, and modern supervised machine learning, and in a novel synthesis leads to (a) new theoretical insights such as a probabilistic bias-variance decomposition and an entropic formulation of prediction, as well as to (b) new algorithms and meta-algorithms, such as composite prediction strategies, probabilistic boosting and bagging, and a probabilistic predictive independence test. Our black-box formulation also leads (c) to a new modular interface view on probabilistic supervised learning and a modelling workflow API design, which we have implemented in the newly released skpro machine learning toolbox, extending the familiar modelling interface and meta-modelling functionality of sklearn. The skpro package provides interfaces for construction, composition, and tuning of probabilistic supervised learning strategies, together with orchestration features for validation and comparison of any such strategy – be it frequentist, Bayesian, or other. Probabilistic Surface Optimization(PSO) In this paper we contribute a novel algorithm family, which generalizes many unsupervised techniques including unnormalized and energy models, and allows to infer different statistical modalities (e.g.~data likelihood and ratio between densities) from data samples. The proposed unsupervised technique Probabilistic Surface Optimization (PSO) views a neural network (NN) as a flexible surface which can be pushed according to loss-specific virtual stochastic forces, where a dynamical equilibrium is achieved when the point-wise forces on the surface become equal. Concretely, the surface is pushed up and down at points sampled from two different distributions, with overall up and down forces becoming functions of these two distribution densities and of force intensity magnitudes defined by loss of a particular PSO instance. The eventual force equilibrium upon convergence enforces the NN to be equal to various statistical functions depending on the used magnitude functions, such as data density. Furthermore, this dynamical-statistical equilibrium is extremely intuitive and useful, providing many implications and possible usages in probabilistic inference. Further, we provide new PSO-based approaches as demonstration of PSO exceptional usability. We also analyze PSO convergence and optimization stability, and relate them to the gradient similarity function over NN input space. Further, we propose new ways to improve the above stability. Finally, we present new instances of PSO, termed PSO-LDE, for data density estimation on logarithmic scale and also provide a new NN block-diagonal architecture for increased surface flexibility, which significantly improves estimation accuracy. Both PSO-LDE and the new architecture are combined together as a new density estimation technique. In our experiments we demonstrate this technique to produce highly accurate density estimation for 20D data. PRObabilistically VErify Neural networks with statistical guarantees(PROVEN) With deep neural networks providing state-of-the-art machine learning models for numerous machine learning tasks, quantifying the robustness of these models has become an important area of research. However, most of the research literature merely focuses on the \textit{worst-case} setting where the input of the neural network is perturbed with noises that are constrained within an $\ell_p$ ball; and several algorithms have been proposed to compute certified lower bounds of minimum adversarial distortion based on such worst-case analysis. In this paper, we address these limitations and extend the approach to a \textit{probabilistic} setting where the additive noises can follow a given distributional characterization. We propose a novel probabilistic framework PROVEN to PRObabilistically VErify Neural networks with statistical guarantees — i.e., PROVEN certifies the probability that the classifier’s top-1 prediction cannot be altered under any constrained $\ell_p$ norm perturbation to a given input. Importantly, we show that it is possible to derive closed-form probabilistic certificates based on current state-of-the-art neural network robustness verification frameworks. Hence, the probabilistic certificates provided by PROVEN come naturally and with almost no overhead when obtaining the worst-case certified lower bounds from existing methods such as Fast-Lin, CROWN and CNN-Cert. Experiments on small and large MNIST and CIFAR neural network models demonstrate our probabilistic approach can achieve up to around $75\%$ improvement in the robustness certification with at least a $99.99\%$ confidence compared with the worst-case robustness certificate delivered by CROWN. Probability Calibration Tree Obtaining accurate and well calibrated probability estimates from classifiers is useful in many applications, for example, when minimising the expected cost of classifications. Existing methods of calibrating probability estimates are applied globally, ignoring the potential for improvements by applying a more fine-grained model. We propose probability calibration trees, a modification of logistic model trees that identifies regions of the input space in which different probability calibration models are learned to improve performance. We compare probability calibration trees to two widely used calibration methods—isotonic regression and Platt scaling—and show that our method results in lower root mean squared error on average than both methods, for estimates produced by a variety of base learners. Probability Collectives(PC) Probability Collectives is a broad framework for analyzing and controlling distributed systems. It is based on deep formal connections relating game theory, statistical physics, and distributed control/optimization. http://…/Library http://…/978-3-319-15999-7 Probability Density Function(PDF) In probability theory, a probability density function (pdf), or density of a continuous random variable, is a function that describes the relative likelihood for this random variable to take on a given value. The probability of the random variable falling within a particular range of values is given by the integral of this variable’s density over that range – that is, it is given by the area under the density function but above the horizontal axis and between the lowest and greatest values of the range. The probability density function is nonnegative everywhere, and its integral over the entire space is equal to one. Probability Functional Descent(PFD) The goal of this paper is to provide a unifying view of a wide range of problems of interest in machine learning by framing them as the minimization of functionals defined on the space of probability measures. In particular, we show that generative adversarial networks, variational inference, and actor-critic methods in reinforcement learning can all be seen through the lens of our framework. We then discuss a generic optimization algorithm for our formulation, called probability functional descent (PFD), and show how this algorithm recovers existing methods developed independently in the settings mentioned earlier. Probability Mass Function(PMF) In probability theory and statistics, a probability mass function (pmf) is a function that gives the probability that a discrete random variable is exactly equal to some value. The probability mass function is often the primary means of defining a discrete probability distribution, and such functions exist for either scalar or multivariate random variables whose domain is discrete. A probability mass function differs from a probability density function (pdf) in that the latter is associated with continuous rather than discrete random variables; the values of the latter are not probabilities as such: a pdf must be integrated over an interval to yield a probability. Ake Probability of Default(PD) Probability of default (PD) is a financial term describing the likelihood of a default over a particular time horizon. It provides an estimate of the likelihood that a borrower will be unable to meet its debt obligations. PD is used in a variety of credit analyses and risk management frameworks. Under Basel II, it is a key parameter used in the calculation of economic capital or regulatory capital for a banking institution. LDPD Probability of Default Calibration ➘ “Probability of Default” LDPD Probability of Exceedance(POE) The ‘probability of exceedance’ curves give the forecast probability that a temperature or precipitation quantity, shown on the horizontal axis, will be exceeded at the location in question, for the given season at the given lead time. http://…5868_calculate-exceedance-probability.htm https://…/risk.pdf http://…/INTR.html http://…/Cumulative_frequency_analysis http://…/poe_index.php?lead=1&var=t http://…/Survival_function Probability of Informed Trading(PIN) Introduced by Easley et. al. (1996) . InfoTrad Probability of query Completeness(PC) Nowadays, query optimization has been highly concerned in big data management, especially in NoSQL databases. Approximate queries boost query performance by loss of accuracy, for example, sampling approaches trade off query completeness for efficiency. Different from them, we propose an uncertainty of query completeness, called Probability of query Completeness (PC for short). PC refers to the possibility that query results contain all satisfied records. For example PC=0.95, it guarantees that there are no more than 5 incomplete queries among 100 ones, but not guarantees how incomplete they are. We trade off PC for query performance, and experiments show that a small loss of PC doubles query performance. The proposed Probery (PROBability-based data quERY) adopts the uncertainty of query completeness to accelerate OLTP queries. This paper illustrates the data and probability models, the probability based data placement and query processing, and the Apache Drill-based implementation of Probery. In experiments, we first prove that the percentage of complete queries is larger than the given PC confidence for various cases, namely that the PC guarantee is validate. Then Probery is compared with Drill, Impala and Hive in terms of query performance. The results indicate that Drill-based Probery performs as fast as Drill with complete query, while averagely 1.8x, 1.3x and 1.6x faster than Drill, Impala and Hive with possible complete query, respectively. Probability Theory Probability theory is the branch of mathematics concerned with probability, the analysis of random phenomena. The central objects of probability theory are random variables, stochastic processes, and events: mathematical abstractions of non-deterministic events or measured quantities that may either be single occurrences or evolve over time in an apparently random fashion. If an individual coin toss or the roll of dice is considered to be a random event, then if repeated many times the sequence of random events will exhibit certain patterns, which can be studied and predicted. Two representative mathematical results describing such patterns are the law of large numbers and the central limit theorem. As a mathematical foundation for statistics, probability theory is essential to many human activities that involve quantitative analysis of large sets of data. Methods of probability theory also apply to descriptions of complex systems given only partial knowledge of their state, as in statistical mechanics. A great discovery of twentieth century physics was the probabilistic nature of physical phenomena at atomic scales, described in quantum mechanics. Probability-based Detection Quality(PDQ) We propose a new visual object detector evaluation measure which not only assesses detection quality, but also accounts for the spatial and label uncertainties produced by object detection systems. Current evaluation measures such as mean average precision (mAP) do not take these two aspects into account, accepting detections with no spatial uncertainty and using only the label with the winning score instead of a full class probability distribution to rank detections. To overcome these limitations, we propose the probability-based detection quality (PDQ) measure which evaluates both spatial and label probabilities, requires no thresholds to be predefined, and optimally assigns ground-truth objects to detections. Our experimental evaluation shows that PDQ rewards detections with accurate spatial probabilities and explicitly evaluates label probability to determine detection quality. PDQ aims to encourage the development of new object detection approaches that provide meaningful spatial and label uncertainty measures. Probably Approximately Correct(PAC) Probably Approximately Correct (PAC) Bayes framework (McAllester, 1999). ➘ “Probably Approximately Correct Learning” Data-dependent PAC-Bayes priors via differential privacy Probably Approximately Correct Learning(PAC Learning,WARL) In computational learning theory, probably approximately correct learning (PAC learning) is a framework for mathematical analysis of machine learning. It was proposed in 1984 by Leslie Valiant. In this framework, the learner receives samples and must select a generalization function (called the hypothesis) from a certain class of possible functions. The goal is that, with high probability (the “probably” part), the selected function will have low generalization error (the “approximately correct” part). The learner must be able to learn the concept given any arbitrary approximation ratio, probability of success, or distribution of the samples. The model was later extended to treat noise (misclassified samples). An important innovation of the PAC framework is the introduction of computational complexity theory concepts to machine learning. In particular, the learner is expected to find efficient functions (time and space requirements bounded to a polynomial of the example size), and the learner itself must implement an efficient procedure (requiring an example count bounded to a polynomial of the concept size, modified by the approximation and likelihood bounds). Probably Certifiably Correct Algorithm(PCC) Many optimization problems of interest are known to be intractable, and while there are often heuristics that are known to work on typical instances, it is usually not easy to determine a posteriori whether the optimal solution was found. In this short note, we discuss algorithms that not only solve the problem on typical instances, but also provide a posteriori certificates of optimality, probably certifiably correct (PCC) algorithms. As an illustrative example, we present a fast PCC algorithm for minimum bisection under the stochastic block model and briefly discuss other examples. probit In probability theory and statistics, the probit function is the quantile function associated with the standard normal distribution. It has applications in exploratory statistical graphics and specialized regression modeling of binary response variables. Probit Model In statistics, a probit model is a type of regression where the dependent variable can take only two values, for example married or not married. The word is a portmanteau, coming from probability + unit.[1] The purpose of the model is to estimate the probability that an observation with particular characteristics will fall into a specific one of the categories; moreover, classifying observations based on their predicted probabilities is a type of binary classification model. A probit model is a popular specification for an ordinal[2] or a binary response model. As such it treats the same set of problems as does logistic regression using similar techniques. The probit model, which employs a probit link function, is most often estimated using the standard maximum likelihood procedure, such an estimation being called a probit regression. ProBO Optimizing an expensive-to-query function is a common task in science and engineering, where it is beneficial to keep the number of queries to a minimum. A popular strategy is Bayesian optimization (BO), which leverages probabilistic models for this task. Most BO today uses Gaussian processes (GPs), or a few other surrogate models. However, there is a broad set of Bayesian modeling techniques that we may want to use to capture complex systems and reduce the number of queries. Probabilistic programs (PPs) are modern tools that allow for flexible model composition, incorporation of prior information, and automatic inference. In this paper, we develop ProBO, a framework for BO using only standard operations common to most PPs. This allows a user to drop in an arbitrary PP implementation and use it directly in BO. To do this, we describe black box versions of popular acquisition functions that can be used in our framework automatically, without model-specific derivation, and show how to optimize these functions. We also introduce a model, which we term the Bayesian Product of Experts, that integrates into ProBO and can be used to combine information from multiple models implemented with different PPs. We show empirical results using multiple PP implementations, and compare against standard BO methods. Procedural Content Generation via Machine Learning(PCGML) This survey explores Procedural Content Generation via Machine Learning (PCGML), defined as the generation of game content using machine learning models trained on existing content. As the importance of PCG for game development increases, researchers explore new avenues for generating high-quality content with or without human involvement; this paper addresses the relatively new paradigm of using machine learning (in contrast with search-based, solver-based, and constructive methods). We focus on what is most often considered functional game content such as platformer levels, game maps, interactive fiction stories, and cards in collectible card games, as opposed to cosmetic content such as sprites and sound effects. In addition to using PCG for autonomous generation, co-creativity, mixed-initiative design, and compression, PCGML is suited for repair, critique, and content analysis because of its focus on modeling existing content. We discuss various data sources and representations that affect the resulting generated content. Multiple PCGML methods are covered, including neural networks, long short-term memory (LSTM) networks, autoencoders, and deep convolutional networks; Markov models, $n$-grams, and multi-dimensional Markov chains; clustering; and matrix factorization. Finally, we discuss open problems in the application of PCGML, including learning from small datasets, lack of training data, multi-layered learning, style-transfer, parameter tuning, and PCG as a game mechanic. Process Mining Process mining is a process management technique that allows for the analysis of business processes based on event logs. The basic idea is to extract knowledge from event logs recorded by an information system. Process mining aims at improving this by providing techniques and tools for discovering process, control, data, organizational, and social structures from event logs. Process Mining Procrustes Analysis In statistics, Procrustes analysis is a form of statistical shape analysis used to analyse the distribution of a set of shapes. The name Procrustes refers to a bandit from Greek mythology who made his victims fit his bed either by stretching their limbs or cutting them off. ProdSumNet We consider a general framework for reducing the number of trainable model parameters in deep learning networks by decomposing linear operators as a product of sums of simpler linear operators. Recently proposed deep learning architectures such as CNN, KFC, Dilated CNN, etc. are all subsumed in this framework and we illustrate other types of neural network architectures within this framework. We show that good accuracy on MNIST and Fashion MNIST can be obtained using a relatively small number of trainable parameters. In addition, since implementation of the convolutional layer is resource-heavy, we consider an approach in the transform domain that obviates the need for convolutional layers. One of the advantages of this general framework over prior approaches is that the number of trainable parameters is not fixed and can be varied arbitrarily. In particular, we illustrate the tradeoff of varying the number of trainable variables and the corresponding error rate. As an example, by using this decomposition on a reference CNN architecture for MNIST with over 3×10^6 trainable parameters, we are able to obtain an accuracy of 98.44% using only 3554 trainable parameters. Product Community Question Answering(PCQA) Product Community Question Answering (PCQA) provides useful information about products and their features (aspects) that may not be well addressed by product descriptions and reviews. We observe that a product’s compatibility issues with other products are frequently discussed in PCQA and such issues are more frequently addressed in accessories, i.e., via a yes/no question ‘Does this mouse work with windows 10?’. In this paper, we address the problem of extracting compatible and incompatible products from yes/no questions in PCQA. This problem can naturally have a two-stage framework: first, we perform Complementary Entity (product) Recognition (CER) on yes/no questions; second, we identify the polarities of yes/no answers to assign the complementary entities a compatibility label (compatible, incompatible or unknown). We leverage an existing unsupervised method for the first stage and a 3-class classifier by combining a distant PU-learning method (learning from positive and unlabeled examples) together with a binary classifier for the second stage. The benefit of using distant PU-learning is that it can help to expand more implicit yes/no answers without using any human annotated data. We conduct experiments on 4 products to show that the proposed method is effective. Product Intelligence What makes Product Intelligence interesting to us as a field of focus is that it is a superb application for Big Data – providing highly targeted, real time intelligence that serves up insights INSIDE of the new product development process at the exact moment when conclusive, authoritative insight is most needed. When it’s literally make or break. What Is It? What differentiates product intelligence from other research? It provides real-time, data-driven insights for new product development decisions and innovation initiatives based on the large multiples – the scale of big data. What features will attract consumers to my product? How do customers perceive it relative to competing products? In which geographic markets will it be the most successful? Product intelligence can tell you this. Imagine this…you’re developing a personal hair care product and you’re looking for a particular niche, let’s say hair color in China, which could be called a mature market. You can listen to 25 people or perhaps 500 or 5000 in focus groups or online panels. Or you can listen to 500,000. That’s the unique advantage and why big data got the name Big. Product Logarithm ➚ “Lambert W Function” Product Unit Neural Network(PUNN) Leveraging Product as an Activation Function in Deep Networks Productive Machine Learning(Pro-ML) The goal of Pro-ML is to double the effectiveness of machine learning engineers while simultaneously opening the tools for AI and modeling to engineers from across the LinkedIn stack. As we mapped out the effort, we kept a set of key ideas in place to constrain the solution space and focus our efforts. • We will leverage and improve best-of-breed components from our existing code base to the maximum extent feasible. We are unlikely to rewrite our entire tech stack, but any particular component is fair game. • The state of the art is constantly evolving with new algorithms and open source frameworks – we need to be flexible to support our existing major ML algorithms as well as new ones that will emerge. • We will use an agile-inspired strategy so that each step we take is delivering value by making at least one product line better or providing generally useable improvements to existing components. • The ability to run the models in real-time is as important as the ability to author or train them. The services hosting the models must be able to be independently upgraded without breaking their downstream or upstream services. • New models, retrained models, and models using new technologies must be A/B testable in production. • We must build GDPR privacy requirements into every stage of the solution. Professor Forcing The Teacher Forcing algorithm trains recurrent networks by supplying observed sequence values as inputs during training and using the network’s own one-step-ahead predictions to do multi-step sampling. We introduce the Professor Forcing algorithm, which uses adversarial domain adaptation to encourage the dynamics of the recurrent network to be the same when training the network and when sampling from the network over multiple time steps. We apply Professor Forcing to language modeling, vocal synthesis on raw waveforms, handwriting generation, and image generation. Empirically we find that Professor Forcing acts as a regularizer, improving test likelihood on character level Penn Treebank and sequential MNIST. We also find that the model qualitatively improves samples, especially when sampling for a large number of time steps. This is supported by human evaluation of sample quality. Trade-offs between Professor Forcing and Scheduled Sampling are discussed. We produce T-SNEs showing that Professor Forcing successfully makes the dynamics of the network during training and sampling more similar. Proficiency Rank Yask is an online social collaborative network for practicing languages in a framework that includes requests, answers, and votes. Since measuring linguistic competence using current approaches is difficult, expensive and in many cases imprecise, we present a new alternative approach based on social networks. Our method, called Proficiency Rank, extends the well-known Page Rank algorithm to measure the reputation of users in a collaborative social graph. First, we extended Page Rank so that it not only considers positive links (votes) but also negative links. Second, in addition to using explicit links, we also incorporate other 4 types of signals implicit in the social graph. These extensions allow Proficiency Rank to produce proficiency rankings for almost all users in the data set used, where only a minority contributes by answering, while the majority contributes only by voting. This overcomes the intrinsic limitation of Page Rank of only being able to rank the nodes that have incoming links. Our experimental validation showed that the reputation/importance of the users in Yask is significantly correlated with their language proficiency. In contrast, their written production was poorly correlated with the vocabulary profiles of the Common European Framework of Reference. In addition, we found that negative signals (votes) are considerably more informative than positive ones. We concluded that the use of this technology is a promising tool for measuring second language proficiency, even for relatively small groups of people. Profile Closeness We introduce a new centrality measure, known as profile closeness, for complex networks. This network attribute originates from the graph-theoretic analysis of consensus problems. We also demonstrate its relevance in inferring the evolution of network communities. Profiling In software engineering, profiling (“program profiling”, “software profiling”) is a form of dynamic program analysis that measures, for example, the space (memory) or time complexity of a program, the usage of particular instructions, or frequency and duration of function calls. The most common use of profiling information is to aid program optimization. Profiling is achieved by instrumenting either the program source code or its binary executable form using a tool called a profiler (or code profiler). A number of different techniques may be used by profilers, such as event-based, statistical, instrumented, and simulation methods. Profit-Maximizing A/B Test Marketers often use A/B testing as a tactical tool to compare marketing treatments in a test stage and then deploy the better-performing treatment to the remainder of the consumer population. While these tests have traditionally been analyzed using hypothesis testing, we re-frame such tactical tests as an explicit trade-off between the opportunity cost of the test (where some customers receive a sub-optimal treatment) and the potential losses associated with deploying a sub-optimal treatment to the remainder of the population. We derive a closed-form expression for the profit-maximizing test size and show that it is substantially smaller than that typically recommended for a hypothesis test, particularly when the response is noisy or when the total population is small. The common practice of using small holdout groups can be rationalized by asymmetric priors. The proposed test design achieves nearly the same expected regret as the flexible, yet harder-to-implement multi-armed bandit. We demonstrate the benefits of the method in three different marketing contexts — website design, display advertising and catalog tests — in which we estimate priors from past data. In all three cases, the optimal sample sizes are substantially smaller than for a traditional hypothesis test, resulting in higher profit. Programming Word Problem A programming word problem is a problem written in natural language, which can be solved using an algorithm or a program. Progressive Augmentation of GAN(PA-GAN) Training of Generative Adversarial Networks (GANs) is notoriously fragile, which partially attributed to the discriminator performing well very quickly; its loss converges to zero, providing no reliable backpropagation signal to the generator. In this work we introduce a new technique – progressive augmentation of GANs (PA-GAN) – that helps to mitigate this issue and thus improve the GAN training. The key idea is to gradually increase the task difficulty of the discriminator by progressively augmenting its input or feature space, enabling continuous learning of the generator. We show that the proposed progressive augmentation preserves the original GAN objective, does not bias the optimality of the discriminator and encourages the healthy competition between the generator and discriminator, leading to a better-performing generator. We experimentally demonstrate the effectiveness of PA-GAN across different architectures and on multiple benchmarks for the image generation task. Progressive Expectation Maximization(PEM) Progressive Feature Alignment Network(PFAN) Unsupervised domain adaptation (UDA) transfers knowledge from a label-rich source domain to a fully-unlabeled target domain. To tackle this task, recent approaches resort to discriminative domain transfer in virtue of pseudo-labels to enforce the class-level distribution alignment across the source and target domains. These methods, however, are vulnerable to the error accumulation and thus incapable of preserving cross-domain category consistency, as the pseudo-labeling accuracy is not guaranteed explicitly. In this paper, we propose the Progressive Feature Alignment Network (PFAN) to align the discriminative features across domains progressively and effectively, via exploiting the intra-class variation in the target domain. To be specific, we first develop an Easy-to-Hard Transfer Strategy (EHTS) and an Adaptive Prototype Alignment (APA) step to train our model iteratively and alternatively. Moreover, upon observing that a good domain adaptation usually requires a non-saturated source classifier, we consider a simple yet efficient way to retard the convergence speed of the source classification loss by further involving a temperature variate into the soft-max function. The extensive experimental results reveal that the proposed PFAN exceeds the state-of-the-art performance on three UDA datasets. Progressive Operational Perceptron With Memory Generalized Operational Perceptron (GOP) was proposed to generalize the linear neuron model in the traditional Multilayer Perceptron (MLP) and this model can mimic the synaptic connections of the biological neurons that have nonlinear neurochemical behaviours. Progressive Operational Perceptron (POP) is a multilayer network composing of GOPs which is formed layer-wise progressively. In this work, we propose major modifications that can accelerate as well as augment the progressive learning procedure of POP by incorporating an information-preserving, linear projection path from the input to the output layer at each progressive step. The proposed extensions can be interpreted as a mechanism that provides direct information extracted from the previously learned layers to the network, hence the term ‘memory’. This allows the network to learn deeper architectures with better data representations. An extensive set of experiments show that the proposed modifications can surpass the learning capability of the original POPs and other related algorithms. Progressive Sampling-Based Bayesian Optimization Purpose: Machine learning is broadly used for clinical data analysis. Before training a model, a machine learning algorithm must be selected. Also, the values of one or more model parameters termed hyper-parameters must be set. Selecting algorithms and hyper-parameter values requires advanced machine learning knowledge and many labor-intensive manual iterations. To lower the bar to machine learning, miscellaneous automatic selection methods for algorithms and/or hyper-parameter values have been proposed. Existing automatic selection methods are inefficient on large data sets. This poses a challenge for using machine learning in the clinical big data era. Methods: To address the challenge, this paper presents progressive sampling-based Bayesian optimization, an efficient and automatic selection method for both algorithms and hyper-parameter values. Results: We report an implementation of the method. We show that compared to a state of the art automatic selection method, our method can significantly reduce search time, classification error rate, and standard deviation of error rate due to randomization. Conclusions: This is major progress towards enabling fast turnaround in identifying high-quality solutions required by many machine learning-based clinical data analysis tasks. Progressive Scale Expansion Network(PSENet) Scene text detection has witnessed rapid progress especially with the recent development of convolutional neural networks. However, there still exists two challenges which prevent the algorithm into industry applications. On the one hand, most of the state-of-art algorithms require quadrangle bounding box which is in-accurate to locate the texts with arbitrary shape. On the other hand, two text instances which are close to each other may lead to a false detection which covers both instances. Traditionally, the segmentation-based approach can relieve the first problem but usually fail to solve the second challenge. To address these two challenges, in this paper, we propose a novel Progressive Scale Expansion Network (PSENet), which can precisely detect text instances with arbitrary shapes. More specifically, PSENet generates the different scale of kernels for each text instance, and gradually expands the minimal scale kernel to the text instance with the complete shape. Due to the fact that there are large geometrical margins among the minimal scale kernels, our method is effective to split the close text instances, making it easier to use segmentation-based methods to detect arbitrary-shaped text instances. Extensive experiments on CTW1500, Total-Text, ICDAR 2015 and ICDAR 2017 MLT validate the effectiveness of PSENet. Notably, on CTW1500, a dataset full of long curve texts, PSENet achieves a F-measure of 74.3% at 27 FPS, and our best F-measure (82.2%) outperforms state-of-art algorithms by 6.6%. The code will be released in the future. Progressive Spatial Recurrent Neural Network(PS-RNN) Intra prediction is an important component of modern video codecs, which is able to efficiently squeeze out the spatial redundancy in video frames. With preceding pixels as the context, traditional intra prediction schemes generate linear predictions based on several predefined directions (i.e. modes) for blocks to be encoded. However, these modes are relatively simple and their predictions may fail when facing blocks with complex textures, which leads to additional bits encoding the residue. In this paper, we design a Progressive Spatial Recurrent Neural Network (PS-RNN) that learns to conduct intra prediction. Specifically, our PS-RNN consists of three spatial recurrent units and progressively generates predictions by passing information along from preceding contents to blocks to be encoded. To make our network generate predictions considering both distortion and bit-rate, we propose to use Sum of Absolute Transformed Difference (SATD) as the loss function to train PS-RNN since SATD is able to measure rate-distortion cost of encoding a residue block. Moreover, our method supports variable-block-size for intra prediction, which is more practical in real coding conditions. The proposed intra prediction scheme achieves on average 2.4% bit-rate reduction on variable-block-size settings under the same reconstruction quality compared with HEVC. Progressive Unscented Kalman Filter(progressive EKF) ➘ “Sparse Unscented Kalman Filter” Progressive Web Application(PWA) Progressive web applications (PWAs) are web applications that load like regular web pages or websites but can offer the user functionality such as working offline, push notifications, and device hardware access traditionally available only to native applications. PWAs combine the flexibility of the web with the experience of a native application. Progressively Growing Generative Autoencoder(PIONEER,Pioneer Network) We introduce a novel generative autoencoder network model that learns to encode and reconstruct images with high quality and resolution, and supports smooth random sampling from the latent space of the encoder. Generative adversarial networks (GANs) are known for their ability to simulate random high-quality images, but they cannot reconstruct existing images. Previous works have attempted to extend GANs to support such inference but, so far, have not delivered satisfactory high-quality results. Instead, we propose the Progressively Growing Generative Autoencoder (PIONEER) network which achieves high-quality reconstruction with $128{\times}128$ images without requiring a GAN discriminator. We merge recent techniques for progressively building up the parts of the network with the recently introduced adversarial encoder-generator network. The ability to reconstruct input images is crucial in many real-world applications, and allows for precise intelligent manipulation of existing images. We show promising results in image synthesis and inference, with state-of-the-art results in CelebA inference tasks. Project Analyzing Big Education Data(PABED) Cloud computing and big data have risen to become the most popular technologies of the modern world. Apparently, the reason behind their immense popularity is their wide range of applicability as far as the areas of interest are concerned. Education and research remain one of the most obvious and befitting application areas. This research paper introduces a big data analytics tool, PABED Project Analyzing Big Education Data, for the education sector that makes use of cloud-based technologies. This tool is implemented using Google BigQuery and R programming language and allows comparison of undergraduate enrollment data for different academic years. Although, there are many proposed applications of big data in education, there is a lack of tools that can actualize the concept into practice. PABED is an effort in this direction. The implementation and testing details of the project have been described in this paper. This tool validates the use of cloud computing and big data technologies in education and shall head start development of more sophisticated educational intelligence tools. Projected Data Assimilation We introduce a framework for Data Assimilation (DA) in which the data is split into multiple sets corresponding to low-rank projections of the state space. Algorithms are developed that assimilate some or all of the projected data, including an algorithm compatible with any generic DA method. The major application explored here is PF-AUS, a new implementation of Assimilation in the Unstable Subspace (AUS) for Particle Filters. The PF-AUS implementation assimilates highly informative but low-dimensional observations. In the context of particle filtering, the projected approach mitigates the collapse of particle ensembles in high dimensional DA problems while preserving as much relevant information as possible, as the unstable and neutral modes correspond to the most uncertain model predictions. In particular we formulate and numerically implement PF-AUS with the optimal proposal and compare to the standard optimal proposal and to the Local Ensemble Transform Kalman Filter. Projected Pólya Tree One way of defining probability distributions for circular variables (directions in two dimensions) is to radially project probability distributions, originally defined on $\mathbb{R}^2$, to the unit circle. Projected distributions have proved to be useful in the study of circular and directional data. Although any bivariate distribution can be used to produce a projected circular model, these distribution are typically parametric. In this article we consider a bivariate P\’olya tree on $\mathbb{R}^2$ and project it to the unit circle to define a new Bayesian nonparametric model for circular data. We study the properties of the proposed model, obtain its posterior characterisation and show its performance with simulated and real datasets. projected Stein variational Newton(pSVN) We propose a fast and scalable variational method for Bayesian inference in high-dimensional parameter space, which we call projected Stein variational Newton (pSVN) method. We exploit the intrinsic low-dimensional geometric structure of the posterior distribution in the high-dimensional parameter space via its Hessian (of the log posterior) operator and perform a parallel update of the parameter samples projected into a low-dimensional subspace by an SVN method. The subspace is adaptively constructed using the eigenvectors of the averaged Hessian at the current samples. We demonstrate fast convergence of the proposed method and its scalability with respect to the number of parameters, samples, and processor cores. Projection Matrix A projection matrix P is an nxn square matrix that gives a vector space projection from Rn to a subspace W. The columns of P are the projections of the standard basis vectors, and W is the image of P. A square matrix P is a projection matrix iff P^2 = P. Projection Pursuit(PP) Projection pursuit (PP) is a type of statistical technique which involves finding the most “interesting” possible projections in multidimensional data. Often, projections which deviate more from a normal distribution are considered to be more interesting. As each projection is found, the data are reduced by removing the component along that projection, and the process is repeated to find new projections; this is the “pursuit” aspect that motivated the technique known as matching pursuit. The idea of projection pursuit is to locate the projection or projections from high-dimensional space to low-dimensional space that reveal the most details about the structure of the data set. Once an interesting set of projections has been found, existing structures (clusters, surfaces, etc.) can be extracted and analyzed separately. Projection pursuit has been widely use for blind source separation, so it is very important in independent component analysis. Projection pursuit seek one projection at a time such that the extracted signal is as non-Gaussian as possible Projection Pursuit Classification Forest(PPforest) PPforest Projection Pursuit Classification Tree(PPtree) In this paper, we propose a new classification tree, the projection pursuit classification tree (PPtree). It combines tree structured methods with projection pursuit dimension reduction. This tree is originated from the projection pursuit method for classification. In each node, one of the projection pursuit indices using class information – LDA, L r or PDA indices – is maximized to find the projection with the most separated group view. On this optimized data projection, the tree splitting criteria are applied to separate the groups. These steps are iterated until the last two classes are separated. The main advantages of this tree is that it effectively uses correlation between variables to find separations, and it has visual representation of the differences between groups in a 1-dimensional space that can be used to interpret results. Also in each node of the tree, the projection coefficients represent the variable importance for the group separation. This information is very helpful to select variables in classification problems. PPtreeViz Projection Pursuit Random Forest(PPF) This paper presents a new ensemble learning method for classification problems called projection pursuit random forest (PPF). PPF uses the PPtree algorithm introduced in Lee et al. (2013). In PPF, trees are constructed by splitting on linear combinations of randomly chosen variables. Projection pursuit is used to choose a projection of the variables that best separates the classes. Utilizing linear combinations of variables to separate classes takes the correlation between variables into account which allows PPF to outperform a traditional random forest when separations between groups occurs in combinations of variables. The method presented here can be used in multi-class problems and is implemented into an R (R Core Team, 2018) package, PPforest, which is available on CRAN, with development versions at https://…/PPforest. Projection Weighted Canonical Correlation Analysis(projection weighted CCA) Comparing different neural network representations and determining how representations evolve over time remain challenging open questions in our understanding of the function of neural networks. Comparing representations in neural networks is fundamentally difficult as the structure of representations varies greatly, even across groups of networks trained on identical tasks, and over the course of training. Here, we develop projection weighted CCA (Canonical Correlation Analysis) as a tool for understanding neural networks, building off of SVCCA, a recently proposed method. We first improve the core method, showing how to differentiate between signal and noise, and then apply this technique to compare across a group of CNNs, demonstrating that networks which generalize converge to more similar representations than networks which memorize, that wider networks converge to more similar solutions than narrow networks, and that trained networks with identical topology but different learning rates converge to distinct clusters with diverse representations. We also investigate the representational dynamics of RNNs, across both training and sequential timesteps, finding that RNNs converge in a bottom-up pattern over the course of training and that the hidden state is highly variable over the course of a sequence, even when accounting for linear transforms. Together, these results provide new insights into the function of CNNs and RNNs, and demonstrate the utility of using CCA to understand representations. Projection-based Utility Mining(ProUM) In recent decade, utility mining has attracted a great attention, but most of the existing studies are developed to deal with itemset-based data. Different from the itemset-based data, the time-ordered sequence data is more commonly seen in real-world situations. Current utility mining algorithms have the limitation when dealing with sequence data since they are time-consuming and require large amount of memory usage. In this paper, we propose an efficient Projection-based Utility Mining (ProUM) approach to discover high-utility sequential patterns from sequence data. The utility-array structure is designed to store necessary information of sequence-order and utility. By utilizing the projection technique in generating utility-array, ProUM can significantly improve the mining efficiency, and effectively reduce the memory consumption. Besides, we propose a new upper bound named sequence extension utility. Several pruning strategies are further applied to improve the efficiency of ProUM. Experimental results show that the proposed ProUM algorithm significantly outperforms the state-of-the-art algorithms. ProjectionNet Deep neural networks have become ubiquitous for applications related to visual recognition and language understanding tasks. However, it is often prohibitive to use typical neural networks on devices like mobile phones or smart watches since the model sizes are huge and cannot fit in the limited memory available on such devices. While these devices could make use of machine learning models running on high-performance data centers with CPUs or GPUs, this is not feasible for many applications because data can be privacy sensitive and inference needs to be performed directly ‘on’ device. We introduce a new architecture for training compact neural networks using a joint optimization framework. At its core lies a novel objective that jointly trains using two different types of networks–a full trainer neural network (using existing architectures like Feed-forward NNs or LSTM RNNs) combined with a simpler ‘projection’ network that leverages random projections to transform inputs or intermediate representations into bits. The simpler network encodes lightweight and efficient-to-compute operations in bit space with a low memory footprint. The two networks are trained jointly using backpropagation, where the projection network learns from the full network similar to apprenticeship learning. Once trained, the smaller network can be used directly for inference at low memory and computation cost. We demonstrate the effectiveness of the new approach at significantly shrinking the memory requirements of different types of neural networks while preserving good accuracy on visual recognition and text classification tasks. We also study the question ‘how many neural bits are required to solve a given task?’ using the new framework and show empirical results contrasting model predictive capacity (in bits) versus accuracy on several datasets. Projective Sparse Latent Space Network Models In typical latent-space network models, nodes have latent positions, which are all drawn independently from a common distribution. As a consequence, the number of edges in a network scales quadratically with the number of nodes, resulting in a dense graph sequence as the number of nodes grows. We propose an adjustment to latent-space network models which allows the number edges to scale linearly with the number of nodes, to scale quadratically, or at any intermediate rate. Our models also form projective families, making statistical inference and prediction well-defined. Built through point processes, our models are related to both the Poisson random connection model and the graphex framework. Prolificity Under what circumstances might every extension of a combinatorial structure contain more copies of another one than the original did? This property, which we call prolificity, holds universally in some cases (e.g., finite linear orders) and only trivially in others (e.g., permutations). Integer compositions, or equivalently layered permutations, provide a middle ground. In that setting, there are prolific compositions for a given pattern if and only if that pattern begins and ends with 1. For each pattern, there is an easily constructed automaton that recognises prolific compositions for that pattern. Some instances where there is a unique minimal prolific composition for a pattern are classified. ProLoNet Deep reinforcement learning has seen great success across a breadth of tasks such as in game playing and robotic manipulation. However, the modern practice of attempting to learn tabula rasa disregards the logical structure of many domains and the wealth of readily-available human domain experts’ knowledge that could help “warm start” the learning process. Further, learning from demonstration techniques are not yet sufficient to infer this knowledge through sampling-based mechanisms in large state and action spaces, or require immense amounts of data. We present a new reinforcement learning architecture that can encode expert knowledge, in the form of propositional logic, directly into a neural, tree-like structure of fuzzy propositions that are amenable to gradient descent. We show that our novel architecture is able to outperform reinforcement and imitation learning techniques across an array of canonical challenge problems for artificial intelligence. Proof-of-Deep-Learning(PoDL) An enormous amount of energy is wasted in Proofof-Work (PoW) mechanisms adopted by popular blockchain applications (e.g., PoW-based cryptocurrencies), because miners must conduct a large amount of computation. Owing to this, one serious rising concern is that the energy waste not only dilutes the value of the blockchain but also hinders its further application. In this paper, we propose a novel blockchain design that fully recycles the energy required for facilitating and maintaining it, which is re-invested to the computation of deep learning. We realize this by proposing Proof-of-Deep-Learning (PoDL) such that a valid proof for a new block can be generated if and only if a proper deep learning model is produced. We present a proof-of-concept design of PoDL that is compatible with the majority of the cryptocurrencies that are based on hash-based PoW mechanisms. Our benchmark and simulation results show that the proposed design is feasible for various popular cryptocurrencies such as Bitcoin, Bitcoin Cash, and Litecoin. Propagation Map Deep neural networks were shown to be vulnerable to single pixel modifications. However, the reason behind such phenomena has never been elucidated. Here, we propose Propagation Maps which show the influence of the perturbation in each layer of the network. Propagation Maps reveal that even in extremely deep networks such as Resnet, modification in one pixel easily propagates until the last layer. In fact, this initial local perturbation is also shown to spread becoming a global one and reaching absolute difference values that are close to the maximum value of the original feature maps in a given layer. Moreover, we do a locality analysis in which we demonstrate that nearby pixels of the perturbed one in the one-pixel attack tend to share the same vulnerability, revealing that the main vulnerability lies in neither neurons nor pixels but receptive fields. Hopefully, the analysis conducted in this work together with a new technique called propagation maps shall shed light into the inner workings of other adversarial samples and be the basis of new defense systems to come. Propagation Network(PropNet) There has been an increasing interest in learning dynamics simulators for model-based control. Compared with off-the-shelf physics engines, a learnable simulator can quickly adapt to unseen objects, scenes, and tasks. However, existing models like interaction networks only work for fully observable systems; they also only consider pairwise interactions within a single time step, both restricting their use in practical systems. We introduce Propagation Networks (PropNet), a differentiable, learnable dynamics model that handles partially observable scenarios and enables instantaneous propagation of signals beyond pairwise interactions. With these innovations, our propagation networks not only outperform current learnable physics engines in forward simulation, but also achieves superior performance on various control tasks. Compared with existing deep reinforcement learning algorithms, model-based control with propagation networks is more accurate, efficient, and generalizable to novel, partially observable scenes and tasks. Propagation, Decimation and Prediction(PDP) There have been recent efforts for incorporating Graph Neural Network models for learning full-stack solvers for constraint satisfaction problems (CSP) and particularly Boolean satisfiability (SAT). Despite the unique representational power of these neural embedding models, it is not clear how the search strategy in the learned models actually works. On the other hand, by fixing the search strategy (e.g. greedy search), we would effectively deprive the neural models of learning better strategies than those given. In this paper, we propose a generic neural framework for learning CSP solvers that can be described in terms of probabilistic inference and yet learn search strategies beyond greedy search. Our framework is based on the idea of propagation, decimation and prediction (and hence the name PDP) in graphical models, and can be trained directly toward solving CSP in a fully unsupervised manner via energy minimization, as shown in the paper. Our experimental results demonstrate the effectiveness of our framework for SAT solving compared to both neural and the state-of-the-art baselines. Propensity Score A propensity score is the probability of a unit (e.g., person, classroom, school) being assigned to a particular treatment given a set of observed covariates. Propensity scores are used to reduce selection bias by equating groups based on these covariates. Propensity Score Matching(PSM) Propensity Score Matching (PSM) is an useful method to reduce the impact of treatment-selection bias in the estimation of causal effects in observational studies. Despite the fact of the Treatment – Selection Bias reduction, the overall behaviour of this PSM compared with a Multivariate Regression Model (MRM) has never been tested. Monte Carlo Simulations are made to construct groups with different effects in order to compare the behaviour of PSM and MRM estimating this effects. Also the Treatment – Selection Bias reduction for the PSM is calculated. With the PSM a reduction in the Treatment – Selection Bias is achieved, with a reduction in the Relative Real Treatment Effect Estimation Error, but despite of this bias reduction and estimation error reduction, the MRM significantly reduces more this estimation error compared with the PSM. Also the PSM leads to a not insignificant reduction of the sample, which may lead to another not known bias, and thus, to the inaccurate of the effect estimation compared with the MRM. Property Graph The term property graph has come to denote an attributed, multi-relational graph. That is, a graph where the edges are labeled and both vertices and edges can have any number of key/value properties associated with them. Prophet Today Facebook is open sourcing Prophet, a forecasting tool available in Python and R. Forecasting is a data science task that is central to many activities within an organization. For instance, large organizations like Facebook must engage in capacity planning to efficiently allocate scarce resources and goal setting in order to measure performance relative to a baseline. Producing high quality forecasts is not an easy problem for either machines or for most analysts. We have observed two main themes in the practice of creating a variety of business forecasts: · Completely automatic forecasting techniques can be brittle and they are often too inflexible to incorporate useful assumptions or heuristics. · Analysts who can produce high quality forecasts are quite rare because forecasting is a specialized data science skill requiring substantial experience. {Link|https://github.com/facebook/prophet|Prophet: Automatic Forecasting Procedure} {Link|https://facebook.github.io/prophet/|Prophet: Forecasting at Scale} Propheticus Due to recent technological developments, Machine Learning (ML), a subfield of Artificial Intelligence (AI), has been successfully used to process and extract knowledge from a variety of complex problems. However, a thorough ML approach is complex and highly dependent on the problem at hand. Additionally, implementing the logic required to execute the experiments is no small nor trivial deed, consequentially increasing the probability of faulty code which can compromise the results. Propheticus is a data-driven framework which results of the need for a tool that abstracts some of the inherent complexity of ML, whilst being easy to understand and use, as well as to adapt and expand to assist the user’s specific needs. Propheticus systematizes and enforces various complex concepts of an ML experiment workflow, taking into account the nature of both the problem and the data. It contains functionalities to execute all the different tasks, from data preprocessing, to results analysis and comparison. Notwithstanding, it can be fairly easily adapted to different problems due to its flexible architecture, and customized as needed to address the user’s needs. Proportional Degree Several algorithms have been proposed to filter information on a complete graph of correlations across stocks to build a stock-correlation network. Among them the planar maximally filtered graph (PMFG) algorithm uses $3n-6$ edges to build a graph whose features include a high frequency of small cliques and a good clustering of stocks. We propose a new algorithm which we call proportional degree (PD) to filter information on the complete graph of normalised mutual information (NMI) across stocks. Our results show that the PD algorithm produces a network showing better homogeneity with respect to cliques, as compared to economic sectoral classification than its PMFG counterpart. We also show that the partition of the PD network obtained through normalised spectral clustering (NSC) agrees better with the NSC of the complete graph than the corresponding one obtained from PMFG. Finally, we show that the clusters in the PD network are more robust with respect to the removal of random sets of edges than those in the PMFG network. Proportional Hazards Model Proportional hazards models are a class of survival models in statistics. Survival models relate the time that passes before some event occurs to one or more covariates that may be associated with that quantity of time. In a proportional hazards model, the unique effect of a unit increase in a covariate is multiplicative with respect to the hazard rate. For example, taking a drug may halve one’s hazard rate for a stroke occurring, or, changing the material from which a manufactured component is constructed may double its hazard rate for failure. Other types of survival models such as accelerated failure time models do not exhibit proportional hazards. The accelerated failure time model describes a situation where the biological or mechanical life history of an event is accelerated. Proportional Subdistribution Hazards(PSH) The proportional hazards model for the subdistribution that Fine and Gray (1999) propose aims at modeling the cumulative incidence of an event of interest. ➘ “Proportional Hazards Model” crrp Proposal Cluster Learning(PCL) Weakly Supervised Object Detection (WSOD), using only image-level annotations to train object detectors, is of growing importance in object recognition. In this paper, we propose a novel end-to-end deep network for WSOD. Unlike previous networks that transfer the object detection problem to an image classification problem using Multiple Instance Learning (MIL), our strategy generates proposal clusters to learn refined instance classifiers by an iterative process. The proposals in the same cluster are spatially adjacent and associated with the same object. This prevents the network from concentrating too much on parts of objects instead of whole objects. We first show that instances can be assigned object or background labels directly based on proposal clusters for instance classifier refinement, and then show that treating each cluster as a small new bag yields fewer ambiguities than the directly assigning label method. The iterative instance classifier refinement is implemented online using multiple streams in convolutional neural networks, where the first is an MIL network and the others are for instance classifier refinement supervised by the preceding one. Experiments are conducted on the PASCAL VOC and ImageNet detection benchmarks for WSOD. Results show that our method outperforms the previous state of the art significantly. Protocols and Structures for Inference(PSI) The Protocols and Structures for Inference (PSI) project has developed an architecture for presenting machine learning algorithms, their inputs (data) and outputs (predictors) as resource-oriented RESTful web services in order to make machine learning technology accessible to a broader range of people than just machine learning researchers. Currently, many machine learning implementations (e.g., in toolkits such as Weka, Orange, Elefant, Shogun, SciKit.Learn, etc.) are tied to specific choices of programming language, and data sets to particular formats (e.g., CSV, svmlight, ARFF). This limits their accessibility, since new users may have to learn a new programming language to run a learner or write a parser for a new data format, and their interoperability, requiring data format converters and multiple language platforms. While there is also a growing number of machine learning web services, each has its own API and is tailored to suit a different subset of machine learning activities. Standardizing the World of Machine Learning Web Service APIs Proto-MAML Few-shot classification refers to learning a classifier for new classes given only a few examples. While a plethora of models have emerged to tackle this recently, we find the current procedure and datasets that are used to systematically assess progress in this setting lacking. To address this, we propose Meta-Dataset: a new benchmark for training and evaluating few-shot classifiers that is large-scale, consists of multiple datasets, and presents more natural and realistic tasks. The aim is to measure the ability of state-of-the-art models to leverage diverse sources of data to achieve higher generalization, and to evaluate that generalization ability in a more challenging setting. We additionally measure robustness of current methods to variations in the number of available examples and the number of classes. Finally our extensive empirical evaluation leads us to identify weaknesses in Prototypical Networks and MAML, two popular few-shot classification methods, and to propose a new method, Proto-MAML, which achieves improved performance on our benchmark. Prototype Propagation Network(PPN) A variety of machine learning applications expect to achieve rapid learning from a limited number of labeled data. However, the success of most current models is the result of heavy training on big data. Meta-learning addresses this problem by extracting common knowledge across different tasks that can be quickly adapted to new tasks. However, they do not fully explore weakly-supervised information, which is usually free or cheap to collect. In this paper, we show that weakly-labeled data can significantly improve the performance of meta-learning on few-shot classification. We propose prototype propagation network (PPN) trained on few-shot tasks together with data annotated by coarse-label. Given a category graph of the targeted fine-classes and some weakly-labeled coarse-classes, PPN learns an attention mechanism which propagates the prototype of one class to another on the graph, so that the K-nearest neighbor (KNN) classifier defined on the propagated prototypes results in high accuracy across different few-shot tasks. The training tasks are generated by subgraph sampling, and the training objective is obtained by accumulating the level-wise classification loss on the subgraph. The resulting graph of prototypes can be continually re-used and updated for new tasks and classes. We also introduce two practical test/inference settings which differ according to whether the test task can leverage any weakly-supervised information as in training. On two benchmarks, PPN significantly outperforms most recent few-shot learning methods in different settings, even when they are also allowed to train on weakly-labeled data. Prototype Reminding Continual learning is a critical ability of continually acquiring and transferring knowledge without catastrophically forgetting previously learned knowledge. However, enabling continual learning for AI remains a long-standing challenge. In this work, we propose a novel method, Prototype Reminding, that efficiently embeds and recalls previously learnt knowledge to tackle catastrophic forgetting issue. In particular, we consider continual learning in classification tasks. For each classification task, our method learns a metric space containing a set of prototypes where embedding of the samples from the same class cluster around prototypes and class-representative prototypes are separated apart. To alleviate catastrophic forgetting, our method preserves the embedding function from the samples to the previous metric space, through our proposed prototype reminding from previous tasks. Specifically, the reminding process is implemented by replaying a small number of samples from previous tasks and correspondingly matching their embedding to their nearest class-representative prototypes. Compared with recent continual learning methods, our contributions are fourfold: first, our method achieves the best memory retention capability while adapting quickly to new tasks. Second, our method uses metric learning for classification, and does not require adding in new neurons given new object classes. Third, our method is more memory efficient since only class-representative prototypes need to be recalled. Fourth, our method suggests a promising solution for few-shot continual learning. Without tampering with the performance on initial tasks, our method learns novel concepts given a few training examples of each class in new tasks. Prototype-Based Vector Quantization Provenance Unification through Graph(PUG) Explaining why an answer is (or is not) returned by a query is important for many applications including auditing, debugging data and queries, and answering hypothetical questions about data. In this work, we present the first practical approach for answering such questions for queries with negation (first- order queries). Specifically, we introduce a graph-based provenance model that, while syntactic in nature, supports reverse reasoning and is proven to encode a wide range of provenance models from the literature. The implementation of this model in our PUG (Provenance Unification through Graphs) system takes a provenance question and Datalog query as an input and generates a Datalog program that computes an explanation, i.e., the part of the provenance that is relevant to answer the question. Furthermore, we demonstrate how a desirable factorization of provenance can be achieved by rewriting an input query. We experimentally evaluate our approach demonstrating its efficiency. Proximal Alternating Direction Network Deep learning models have gained great success in many real-world applications. However, most existing networks are typically designed in heuristic manners, thus lack of rigorous mathematical principles and derivations. Several recent studies build deep structures by unrolling a particular optimization model that involves task information. Unfortunately, due to the dynamic nature of network parameters, their resultant deep propagation networks do \emph{not} possess the nice convergence property as the original optimization scheme does. This paper provides a novel proximal unrolling framework to establish deep models by integrating experimentally verified network architectures and rich cues of the tasks. More importantly, we \emph{prove in theory} that 1) the propagation generated by our unrolled deep model globally converges to a critical-point of a given variational energy, and 2) the proposed framework is still able to learn priors from training data to generate a convergent propagation even when task information is only partially available. Indeed, these theoretical results are the best we can ask for, unless stronger assumptions are enforced. Extensive experiments on various real-world applications verify the theoretical convergence and demonstrate the effectiveness of designed deep models. Proximal Meta-Policy Search(ProMP) Credit assignment in Meta-reinforcement learning (Meta-RL) is still poorly understood. Existing methods either neglect credit assignment to pre-adaptation behavior or implement it naively. This leads to poor sample-efficiency during meta-training as well as ineffective task identification strategies. This paper provides a theoretical analysis of credit assignment in gradient-based Meta-RL. Building on the gained insights we develop a novel meta-learning algorithm that overcomes both the issue of poor credit assignment and previous difficulties in estimating meta-policy gradients. By controlling the statistical distance of both pre-adaptation and adapted policies during meta-policy search, the proposed algorithm endows efficient and stable meta-learning. Our approach leads to superior pre-adaptation policy behavior and consistently outperforms previous Meta-RL algorithms in sample-efficiency, wall-clock time, and asymptotic performance. Our code is available at https://…/promp. Proximal Policy Optimization(PPO) We propose a new family of policy gradient methods for reinforcement learning, which alternate between sampling data through interaction with the environment, and optimizing a ‘surrogate’ objective function using stochastic gradient ascent. Whereas standard policy gradient methods perform one gradient update per data sample, we propose a novel objective function that enables multiple epochs of minibatch updates. The new methods, which we call proximal policy optimization (PPO), have some of the benefits of trust region policy optimization (TRPO), but they are much simpler to implement, more general, and have better sample complexity (empirically). Our experiments test PPO on a collection of benchmark tasks, including simulated robotic locomotion and Atari game playing, and we show that PPO outperforms other online policy gradient methods, and overall strikes a favorable balance between sample complexity, simplicity, and wall-time. Proximal Policy Optimization with Covariance Matrix Adaptation(PPO-CMA) Proximal Policy Optimization (PPO) is a highly popular model-free reinforcement learning (RL) approach. However, in continuous state and actions spaces and a Gaussian policy — common in computer animation and robotics — PPO is prone to getting stuck in local optima. In this paper, we observe a tendency of PPO to prematurely shrink the exploration variance, which naturally leads to slow progress. Motivated by this, we borrow ideas from CMA-ES, a black-box optimization method designed for intelligent adaptive Gaussian exploration, to derive PPO-CMA, a novel proximal policy optimization approach that can expand the exploration variance on objective function slopes and shrink the variance when close to the optimum. This is implemented by using separate neural networks for policy mean and variance and training the mean and variance in separate passes. Our experiments demonstrate a clear improvement over vanilla PPO in many difficult OpenAI Gym MuJoCo tasks. Proximity Forest Research into the classification of time series has made enormous progress in the last decade. The UCR time series archive has played a significant role in challenging and guiding the development of new learners for time series classification. The largest dataset in the UCR archive holds 10 thousand time series only; which may explain why the primary research focus has been in creating algorithms that have high accuracy on relatively small datasets. This paper introduces Proximity Forest, an algorithm that learns accurate models from datasets with millions of time series, and classifies a time series in milliseconds. The models are ensembles of highly randomized Proximity Trees. Whereas conventional decision trees branch on attribute values (and usually perform poorly on time series), Proximity Trees branch on the proximity of time series to one exemplar time series or another; allowing us to leverage the decades of work into developing relevant measures for time series. Proximity Forest gains both efficiency and accuracy by stochastic selection of both exemplars and similarity measures. Our work is motivated by recent time series applications that provide orders of magnitude more time series than the UCR benchmarks. Our experiments demonstrate that Proximity Forest is highly competitive on the UCR archive: it ranks among the most accurate classifiers while being significantly faster. We demonstrate on a 1M time series Earth observation dataset that Proximity Forest retains this accuracy on datasets that are many orders of magnitude greater than those in the UCR repository, while learning its models at least 100,000 times faster than current state of the art models Elastic Ensemble and COTE. Proximity Measure In order to understand and act on situations that are represented by a set of objects, very often we are required to compare them. Humans perform this comparison subconsciously using the brain. In the context of artificial intelligence, however, we should be able to describe how the machine might perform this comparison. In this context, one of the basic elements that must be specified is the proximity measure between objects. proxy Proximity Variational Inference(PVI) Variational inference is a powerful approach for approximate posterior inference. However, it is sensitive to initialization and can be subject to poor local optima. In this paper, we develop proximity variational inference (PVI). PVI is a new method for optimizing the variational objective that constrains subsequent iterates of the variational parameters to robustify the optimization path. Consequently, PVI is less sensitive to initialization and optimization quirks and finds better local optima. We demonstrate our method on three proximity statistics. We study PVI on a Bernoulli factor model and sigmoid belief network with both real and synthetic data and compare to deterministic annealing (Katahira et al., 2008). We highlight the flexibility of PVI by designing a proximity statistic for Bayesian deep learning models such as the variational autoencoder (Kingma and Welling, 2014; Rezende et al., 2014). Empirically, we show that PVI consistently finds better local optima and gives better predictive performance. Proximity-Ambiguity Sensitive(PAS) Distributed representations of words (aka word embedding) have proven helpful in solving natural language processing (NLP) tasks. Training distributed representations of words with neural networks has lately been a major focus of researchers in the field. Recent work on word embedding, the Continuous Bag-of-Words (CBOW) model and the Continuous Skip-gram (Skip-gram) model, have produced particularly impressive results, significantly speeding up the training process to enable word representation learning from largescale data. However, both CBOW and Skip-gram do not pay enough attention to word proximity in terms of model or word ambiguity in terms of linguistics. In this paper, we propose Proximity-Ambiguity Sensitive (PAS) models (i.e. PAS CBOW and PAS Skip-gram) to produce high quality distributed representations of words considering both word proximity and ambiguity. From the model perspective, we introduce proximity weights as parameters to be learned in PAS CBOW and used in PAS Skip-gram. By better modeling word proximity, we reveal the strength of pooling-structured neural networks in word representation learning. The proximitysensitive pooling layer can also be applied to other neural network applications that employ pooling layers. From the linguistics perspective, we train multiple representation vectors per word. Each representation vector corresponds to a particular group of POS tags of the word. By using PAS models, we achieved a 16.9% increase in accuracy over state-of-theart models. ProxSARAH In this paper, we propose a new stochastic algorithmic framework to solve stochastic composite nonconvex optimization problems that covers both finite-sum and expectation settings. Our algorithms rely on the SARAH estimator introduced in (Nguyen et al., 2017a) and consist of two steps: a proximal gradient step and an averaging step that are different from existing nonconvex proximal-type algorithms. The algorithms only require a smoothness assumption of the nonconvex objective term. In the finite-sum case, we show that our algorithm achieves optimal convergence rate by matching the lower-bound worst-case complexity, while in the expectation case, it attains the best-known convergence rate under only standard smoothness and bounded variance assumptions. One key step of our algorithms is a new constant step-size that helps to achieve desired convergence rate. Our step-size is much larger than existing methods including proximal SVRG schemes in the single sample case. We generalize our algorithm to mini-batches for both inner and outer loops, and adaptive step-sizes. We also specify the algorithm to the non-composite case that covers and dominates existing state-of-the-arts in terms of convergence rate. We test the proposed algorithms on two composite nonconvex optimization problems and feedforward neural networks using several well-known datasets. ProxylessNAS Neural architecture search (NAS) has a great impact by automatically designing effective neural network architectures. However, the prohibitive computational demand of conventional NAS algorithms (e.g. $10^4$ GPU hours) makes it difficult to \emph{directly} search the architectures on large-scale tasks (e.g. ImageNet). Differentiable NAS can reduce the cost of GPU hours via a continuous representation of network architecture but suffers from the high GPU memory consumption issue (grow linearly w.r.t. candidate set size). As a result, they need to utilize~\emph{proxy} tasks, such as training on a smaller dataset, or learning with only a few blocks, or training just for a few epochs. These architectures optimized on proxy tasks are not guaranteed to be optimal on target task. In this paper, we present \emph{ProxylessNAS} that can \emph{directly} learn the architectures for large-scale target tasks and target hardware platforms. We address the high memory consumption issue of differentiable NAS and reduce the computational cost (GPU hours and GPU memory) to the same level of regular training while still allowing a large candidate set. Experiments on CIFAR-10 and ImageNet demonstrate the effectiveness of directness and specialization. On CIFAR-10, our model achieves 2.08\% test error with only 5.7M parameters, better than the previous state-of-the-art architecture AmoebaNet-B, while using 6$\times$ fewer parameters. On ImageNet, our model achieves 3.1\% better top-1 accuracy than MobileNetV2, while being 1.2$\times$ faster with measured GPU latency. We also apply ProxylessNAS to specialize neural architectures for hardware with direct hardware metrics (e.g. latency) and provide insights for efficient CNN architecture design. PRUNE The majority of contemporary mobile devices and personal computers are based on heterogeneous computing platforms that consist of a number of CPU cores and one or more Graphics Processing Units (GPUs). Despite the high volume of these devices, there are few existing programming frameworks that target full and simultaneous utilization of all CPU and GPU devices of the platform. This article presents a dataflow-flavored Model of Computation (MoC) that has been developed for deploying signal processing applications to heterogeneous platforms. The presented MoC is dynamic and allows describing applications with data dependent run-time behavior. On top of the MoC, formal design rules are presented that enable application descriptions to be simultaneously dynamic and decidable. Decidability guarantees compile-time application analyzability for deadlock freedom and bounded memory. The presented MoC and the design rules are realized in a novel Open Source programming environment ‘PRUNE’ and demonstrated with representative application examples from the domains of image processing, computer vision and wireless communications. Experimental results show that the proposed approach outperforms the state-of-the-art in analyzability, flexibility and performance. Pruned Exact Linear Time(PELT) This approach is based on the algorithm of Jackson et al. (2005 (‘An algorithm for optimal partitioning of data on an interval’)) , but involves a pruning step within the dynamic program. This pruning reduces the computational cost of the method, but does not affect the exactness of the resulting segmentation. It can be applied to find changepoints under a range of statistical criteria such as penalised likelihood, quasi-likelihood (Braun et al., 2000 (‘Multiple changepoint fitting via quasilikelihood, with application to DNA sequence segmentation’)) and cumulative sum of squares (Inclan and Tiao, 1994 (‘Use of cumulative sums of squares for retrospective detection of changes of variance.’); Picard et al., 2011 (‘Joint segmentation, calling and normalization of multiple cgh profiles’)). In simulations we compare PELT with both Binary Segmentation and Optimal Partitioning. We show that PELT can be calculated orders of magnitude faster than Optimal Partitioning, particularly for long data sets. Whilst asymptotically PELT can be quicker, we find that in practice Binary Segmentation is quicker on the examples we consider, and we believe this would be the case in almost all applications. However, we show that PELT leads to a substantially more accurate segmentation than Binary Segmentation. changepoint PruneTrain Model pruning is a popular mechanism to make a network more efficient for inference. In this paper, we explore the use of pruning to also make the training of such neural networks more efficient. Unlike all prior model pruning methods that sparsify a pre-trained model and then prune it, we train the network from scratch, while gradually and structurally pruning parameters during the training. We build on our key observations: 1) once parameters are sparsified via regularization, they rarely re-appear in later steps, and 2) setting the appropriate regularization penalty at the beginning of training effectively converges the loss. We train ResNet and VGG networks on CIFAR10/100 and ImageNet datasets from scratch, and achieve 30-50% improvement in training FLOPs and 20-30% improvement in measured training time on modern GPUs. Pruning Pruning is a technique in machine learning that reduces the size of decision trees by removing sections of the tree that provide little power to classify instances. The dual goal of pruning is reduced complexity of the final classifier as well as better predictive accuracy by the reduction of overfitting and removal of sections of a classifier that may be based on noisy or erroneous data. PruningKOSR Motivated by many practical applications in logistics and mobility-as-a-service, we study the top-k optimal sequenced routes (KOSR) querying on large, general graphs where the edge weights may not satisfy the triangle inequality, e.g., road network graphs with travel times as edge weights. The KOSR querying strives to find the top-k optimal routes (i.e., with the top-k minimal total costs) from a given source to a given destination, which must visit a number of vertices with specific vertex categories (e.g., gas stations, restaurants, and shopping malls) in a particular order (e.g., visiting gas stations before restaurants and then shopping malls). To efficiently find the top-k optimal sequenced routes, we propose two algorithms PruningKOSR and StarKOSR. In PruningKOSR, we define a dominance relationship between two partially-explored routes. The partially-explored routes that can be dominated by other partially-explored routes are postponed being extended, which leads to a smaller searching space and thus improves efficiency. In StarKOSR, we further improve the efficiency by extending routes in an A* manner. With the help of a judiciously designed heuristic estimation that works for general graphs, the cost of partially explored routes to the destination can be estimated such that the qualified complete routes can be found early. In addition, we demonstrate the high extensibility of the proposed algorithms by incorporating Hop Labeling, an effective label indexing technique for shortest path queries, to further improve efficiency. Extensive experiments on multiple real-world graphs demonstrate that the proposed methods significantly outperform the baseline method. Furthermore, when k=1, StarKOSR also outperforms the state-of-the-art method for the optimal sequenced route queries. PruningNet A kind of meta network, which is able to generate weight parameters for any pruned structure given the target network. MetaPruning: Meta Learning for Automatic Neural Network Channel Pruning PSB_PG Many real-world networks known as attributed networks contain two types of information: topology information and node attributes. It is a challenging task on how to use these two types of information to explore structural regularities. In this paper, by characterizing potential relationship between link communities and node attributes, a principled statistical model named PSB_PG that generates link topology and node attributes is proposed. This model for generating links is based on the stochastic blockmodels following a Poisson distribution. Therefore, it is capable of detecting a wide range of network structures including community structures, bipartite structures and other mixture structures. The model for generating node attributes assumes that node attributes are high dimensional and sparse and also follow a Poisson distribution. This makes the model be uniform and the model parameters can be directly estimated by expectation-maximization (EM) algorithm. Experimental results on artificial networks and real networks containing various structures have shown that the proposed model PSB_PG is not only competitive with the state-of-the-art models, but also provides good semantic interpretation for each community via the learned relationship between the community and its related attributes. PS-DBSCAN We present PS-DBSCAN, a communication efficient parallel DBSCAN algorithm that combines the disjoint-set data structure and Parameter Server framework in Platform of AI (PAI). Since data points within the same cluster may be distributed over different workers which result in several disjoint-sets, merging them incurs large communication costs. In our algorithm, we employ a fast global union approach to union the disjoint-sets to alleviate the communication burden. Experiments over the datasets of different scales demonstrate that PS-DBSCAN outperforms the PDSDBSCAN with 2-10 times speedup on communication efficiency. We have released our PS-DBSCAN in an algorithm platform called Platform of AI (PAI – https://pai.base.shuju.aliyun.com ) in Alibaba Cloud. We have also demonstrated how to use the method in PAI. PSDVec PSDVec is a Python/Perl toolbox that learns word embeddings, i.e. the mapping of words in a natural language to continuous vectors which encode the semantic/syntactic regularities between the words. PSDVec implements a word embedding learning method based on a weighted low-rank positive semidefinite approximation. To scale up the learning process, we implement a blockwise online learning algorithm to learn the embeddings incrementally. This strategy greatly reduces the learning time of word embeddings on a large vocabulary, and can learn the embeddings of new words without re-learning the whole vocabulary. On 9 word similarity/analogy benchmark sets and 2 Natural Language Processing (NLP) tasks, PSDVec produces embeddings that has the best average performance among popular word embedding tools. PSDVec provides a new option for NLP practitioners. Pseudoinverse Learning(PIL) A Vest of the Pseudoinverse Learning Algorithm Pseudoinverse Learning Algorithm for Autoencoder(PILAE) In this work, a non-gradient descent learning scheme is proposed for deep feedforward neural networks (DNN). As we known, autoencoder can be used as the building blocks of the multi-layer perceptron (MLP) deep neural network. So, the MLP will be taken as an example to illustrate the proposed scheme of pseudoinverse learning algorithm for autoencoder (PILAE) training. The PILAE with low rank approximation is a non-gradient based learning algorithm, and the encoder weight matrix is set to be the low rank approximation of the pseudoinverse of the input matrix, while the decoder weight matrix is calculated by the pseudoinverse learning algorithm. It is worth to note that only few network structure hyperparameters need to be tuned. Hence, the proposed algorithm can be regarded as a quasi-automated training algorithm which can be utilized in autonomous machine learning research field. The experimental results show that the proposed learning scheme for DNN can achieve better performance on considering the tradeoff between training efficiency and classification accuracy. Pseudo-Kernel We propose a novel adaptive kernel based regression method for complex-valued signals: the generalized complex-valued kernel least-mean-square (gCKLMS). We borrow from the new results on widely linear reproducing kernel Hilbert space (WL-RKHS) for nonlinear regression and complex-valued signals, recently proposed by the authors. This paper shows that in the adaptive version of the kernel regression for complex-valued signals we need to include another kernel term, the so-called pseudo-kernel. This new solution is endowed with better representation capabilities in complex-valued fields, since it can efficiently decouple the learning of the real and the imaginary part. Also, we review previous realizations of the complex KLMS algorithm and its augmented version to prove that they can be rewritten as particular cases of the gCKLMS. Furthermore, important conclusions on the kernels design are drawn that help to greatly improve the convergence of the algorithms. In the experiments, we revisit the nonlinear channel equalization problem to highlight the better convergence of the gCKLMS compared to previous solutions. Also, the flexibility of the proposed generalized approach is tested in a second experiment with non-independent real and imaginary parts. The results illustrate the significant performance improvements of the gCKLMS approach when the complex-valued signals have different properties for the real and imaginary parts. PsiRec We propose PsiRec, a novel user preference propagation recommender that incorporates pseudo-implicit feedback for enriching the original sparse implicit feedback dataset. Three of the unique characteristics of PsiRec are: (i) it views user-item interactions as a bipartite graph and models pseudo-implicit feedback from this perspective; (ii) its random walks-based approach extracts graph structure information from this bipartite graph, toward estimating pseudo-implicit feedback; and (iii) it adopts a Skip-gram inspired measure of confidence in pseudo-implicit feedback that captures the pointwise mutual information between users and items. This pseudo-implicit feedback is ultimately incorporated into a new latent factor model to estimate user preference in cases of extreme sparsity. PsiRec results in improvements of 21.5% and 22.7% in terms of Precision@10 and Recall@10 over state-of-the-art Collaborative Denoising Auto-Encoders. Our implementation is available at https://…/PsiRecICDM2018. PT-ISABB Asymmetric Distributed Constraint Optimization Problems (ADCOPs) have emerged as an important formalism in multi-agent community due to their ability to capture personal preferences. However, the existing search-based complete algorithms for ADCOPs can only use local knowledge to compute lower bounds, which leads to inefficient pruning and prohibits them from solving large scale problems. On the other hand, inference-based complete algorithms (e.g., DPOP) for Distributed Constraint Optimization Problems (DCOPs) require only a linear number of messages, but they cannot be directly applied into ADCOPs due to a privacy concern. Therefore, in the paper, we consider the possibility of combining inference and search to effectively solve ADCOPs at an acceptable loss of privacy. Specifically, we propose a hybrid complete algorithm called PT-ISABB which uses a tailored inference algorithm to provide tight lower bounds and a tree-based complete search algorithm to exhaust the search space. We prove the correctness of our algorithm and the experimental results demonstrate its superiority over other state-of-the-art complete algorithms. P-Tree Programming We propose a novel method for automatic program synthesis. P-Tree Programming represents the program search space through a single probabilistic prototype tree. From this prototype tree we form program instances which we evaluate on a given problem. The error values from the evaluations are propagated through the prototype tree. We use them to update the probability distributions that determine the symbol choices of further instances. The iterative method is applied to several symbolic regression benchmarks from the literature. It outperforms standard Genetic Programming to a large extend. Furthermore, it relies on a concise set of parameters which are held constant for all problems. The algorithm can be employed for most of the typical computational intelligence tasks such as classification, automatic program induction, and symbolic regression. PublicSelf Model Most of agents that learn policy for tasks with reinforcement learning (RL) lack the ability to communicate with people, which makes human-agent collaboration challenging. We believe that, in order for RL agents to comprehend utterances from human colleagues, RL agents must infer the mental states that people attribute to them because people sometimes infer an interlocutor’s mental states and communicate on the basis of this mental inference. This paper proposes PublicSelf model, which is a model of a person who infers how the person’s own behavior appears to their colleagues. We implemented the PublicSelf model for an RL agent in a simulated environment and examined the inference of the model by comparing it with people’s judgment. The results showed that the agent’s intention that people attributed to the agent’s movement was correctly inferred by the model in scenes where people could find certain intentionality from the agent’s behavior. PUBSUB-SGX This paper presents PUBSUB-SGX, a content-based publish-subscribe system that exploits trusted execution environments (TEEs), such as Intel SGX, to guarantee confidentiality and integrity of data as well as anonymity and privacy of publishers and subscribers. We describe the technical details of our Python implementation, as well as the required system support introduced to deploy our system in a container-based runtime. Our evaluation results show that our approach is sound, while at the same time highlighting the performance and scalability trade-offs. In particular, by supporting just-in-time compilation inside of TEEs, Python programs inside of TEEs are in general faster than when executed natively using standard CPython. Pull Message Passing for Nonparametric Belief Propagation(PMPNBP) We present a ‘pull’ approach to approximate products of Gaussian mixtures within message updates for Nonparametric Belief Propagation (NBP) inference. Existing NBP methods often represent messages between continuous-valued latent variables as Gaussian mixture models. To avoid computational intractability in loopy graphs, NBP necessitates an approximation of the product of such mixtures. Sampling-based product approximations have shown effectiveness for NBP inference. However, such approximations used within the traditional ‘push’ message update procedures quickly become computationally prohibitive for multi-modal distributions over high-dimensional variables. In contrast, we propose a ‘pull’ method, as the Pull Message Passing for Nonparametric Belief propagation (PMPNBP) algorithm, and demonstrate its viability for efficient inference. We report results using an experiment from an existing NBP method, PAMPAS, for inferring the pose of an articulated structure in clutter. Results from this illustrative problem found PMPNBP has a greater ability to efficiently scale the number of components in its mixtures and, consequently, improve inference accuracy. PullNet We consider open-domain queston answering (QA) where answers are drawn from either a corpus, a knowledge base (KB), or a combination of both of these. We focus on a setting in which a corpus is supplemented with a large but incomplete KB, and on questions that require non-trivial (e.g., “multi-hop”) reasoning. We describe PullNet, an integrated framework for (1) learning what to retrieve (from the KB and/or corpus) and (2) reasoning with this heterogeneous information to find the best answer. PullNet uses an {iterative} process to construct a question-specific subgraph that contains information relevant to the question. In each iteration, a graph convolutional network (graph CNN) is used to identify subgraph nodes that should be expanded using retrieval (or “pull”) operations on the corpus and/or KB. After the subgraph is complete, a similar graph CNN is used to extract the answer from the subgraph. This retrieve-and-reason process allows us to answer multi-hop questions using large KBs and corpora. PullNet is weakly supervised, requiring question-answer pairs but not gold inference paths. Experimentally PullNet improves over the prior state-of-the art, and in the setting where a corpus is used with incomplete KB these improvements are often dramatic. PullNet is also often superior to prior systems in a KB-only setting or a text-only setting. Pumpout It is challenging to train deep neural networks robustly on the industrial-level data, since labels of such data are heavily noisy, and their label generation processes are normally agnostic. To handle these issues, by using the memorization effects of deep neural networks, we may train deep neural networks on the whole dataset only the first few iterations. Then, we may employ early stopping or the small-loss trick to train them on selected instances. However, in such training procedures, deep neural networks inevitably memorize some noisy labels, which will degrade their generalization. In this paper, we propose a meta algorithm called Pumpout to overcome the problem of memorizing noisy labels. By using scaled stochastic gradient ascent, Pumpout actively squeezes out the negative effects of noisy labels from the training model, instead of passively forgetting these effects. We leverage Pumpout to upgrade two representative methods: MentorNet and Backward Correction. Empirical results on benchmark datasets demonstrate that Pumpout can significantly improve the robustness of representative methods. PUN-list In this paper, we propose a novel data structure called PUN-list, which maintains both the utility information about an itemset and utility upper bound for facilitating the processing of mining high utility itemsets. Based on PUN-lists, we present a method, called MIP (Mining high utility Itemset using PUN-Lists), for fast mining high utility itemsets. The efficiency of MIP is achieved with three techniques. First, itemsets are represented by a highly condensed data structure, PUN-list, which avoids costly, repeatedly utility computation. Second, the utility of an itemset can be efficiently calculated by scanning the PUN-list of the itemset and the PUN-lists of long itemsets can be fast constructed by the PUN-lists of short itemsets. Third, by employing the utility upper bound lying in the PUN-lists as the pruning strategy, MIP directly discovers high utility itemsets from the search space, called set-enumeration tree, without generating numerous candidates. Extensive experiments on various synthetic and real datasets show that PUN-list is very effective since MIP is at least an order of magnitude faster than recently reported algorithms on average. Puppet In computing, Puppet is an open-source software configuration management tool. It runs on many Unix-like systems as well as on Microsoft Windows, and includes its own declarative language to describe system configuration. Puppet is designed to manage the configuration of Unix-like and Microsoft Windows systems declaratively. The user describes system resources and their state, either using Puppet’s declarative language or a Ruby DSL (domain-specific language). This information is stored in files called ‘Puppet manifests’. Puppet discovers the system information via a utility called Facter, and compiles the Puppet manifests into a system-specific catalog containing resources and resource dependency, which are applied against the target systems. Any actions taken by Puppet are then reported. Puppet consists of a custom declarative language to describe system configuration, which can be either applied directly on the system, or compiled into a catalog and distributed to the target system via client-server paradigm (using a REST API), and the agent uses system specific providers to enforce the resource specified in the manifests. The resource abstraction layer enables administrators to describe the configuration in high-level terms, such as users, services and packages without the need to specify OS specific commands (such as rpm, yum, apt). Puppet is model-driven, requiring limited programming knowledge to use. Puppet comes in two versions, Puppet Enterprise and Open Source Puppet. In addition to providing functionalities of Open Source Puppet, Puppet Enterprise also provides GUI, API and command line tools for node management. Purchase Intent Session-bAsed(PISA) Recommendation systems have become ubiquitous in today’s online world and are an integral part of practically every e-commerce platform. While traditional recommender systems use customer history, this approach is not feasible in ‘cold start’ scenarios. Such scenarios include the need to produce recommendations for new or unregistered users and the introduction of new items. In this study, we present the Purchase Intent Session-bAsed (PISA) algorithm, a content-based algorithm for predicting the purchase intent for cold start session-based scenarios. Our approach employs deep learning techniques both for modeling the content and purchase intent prediction. Our experiments show that PISA outperforms a well-known deep learning baseline when new items are introduced. In addition, while content-based approaches often fail to perform well in highly imbalanced datasets, our approach successfully handles such cases. Finally, our experiments show that combining PISA with the baseline in non-cold start scenarios further improves performance. Purifying Variational Autoencoder(PuVAE) Deep neural networks are widely used and exhibit excellent performance in many areas. However, they are vulnerable to adversarial attacks that compromise the network at the inference time by applying elaborately designed perturbation to input data. Although several defense methods have been proposed to address specific attacks, other attack methods can circumvent these defense mechanisms. Therefore, we propose Purifying Variational Autoencoder (PuVAE), a method to purify adversarial examples. The proposed method eliminates an adversarial perturbation by projecting an adversarial example on the manifold of each class, and determines the closest projection as a purified sample. We experimentally illustrate the robustness of PuVAE against various attack methods without any prior knowledge. In our experiments, the proposed method exhibits performances competitive with state-of-the-art defense methods, and the inference time is approximately 130 times faster than that of Defense-GAN that is the state-of-the art purifier model. Purpose Built Analytic Modules(PBAM) Highly tuned special purpose modules such as those for fraud detection. These are practically plug-and-play in the industries and applications for which they´re targeted. And they allow Citizen Data Scientists (aka business analysts and some LOB managers) to operate advanced ML without the need to extensively configure the underlying DS techniques. Push-Pull Layer We propose a new layer in Convolutional Neural Networks (CNNs) to increase their robustness to several types of noise perturbations of the input images. We call this a push-pull layer and compute its response as the combination of two half-wave rectified convolutions, with kernels of opposite polarity. It is based on a biologically-motivated non-linear model of certain neurons in the visual system that exhibit a response suppression phenomenon, known as push-pull inhibition. We validate our method by substituting the first convolutional layer of the LeNet-5 and WideResNet architectures with our push-pull layer. We train the networks on nonperturbed training images from the MNIST, CIFAR-10 and CIFAR-100 data sets, and test on images perturbed by noise that is unseen by the training process. We demonstrate that our push-pull layers contribute to a considerable improvement in robustness of classification of images perturbed by noise, while maintaining state-of-the-art performance on the original image classification task. PUTWorkbench ➚ “Privacy Utility Trade-off” PyCharm PyCharm is an Integrated Development Environment (IDE) used in computer programming, specifically for the Python language. It is developed by the Czech company JetBrains. It provides code analysis, a graphical debugger, an integrated unit tester, integration with version control systems (VCSes), and supports web development with Django. PyCharm is cross-platform, with Windows, macOS and Linux versions. The Community Edition is released under the Apache License, and there is also Professional Edition released under a proprietary license – this has extra features. Pycnophylactic Interpolation Thiessen polygon’s are an extreme case – we assume homogeneity within the polygons and abrupt changes at the borders This is unlikely to be correct – for example, precipitation or population totals don’t have abrupt changes at arbitrary borders Tobler developed pycnophylactic interpolation to overcome this problem. Here, values are reassigned by mass-preserving reallocation. The basic principle is that the volume of the attribute within a region remains the same. However, it is assumed that a better representation of the variation is a smooth surface. The volume (the sum within each region) remains constant, whilst the surface becomes smoother. The solution is iterative – the stopping point is arbitrary. pycno PyData PyData is an educational program of NumFOCUS, a 501(c)3 non-profit organization in the United States. PyData provides a forum for the international community of users and developers of data analysis tools to share ideas and learn from each other. The global PyData network promotes discussion of best practices, new approaches, and emerging technologies for data management, processing, analytics, and visualization. PyData communities approach data science using many languages, including (but not limited to) Python, Julia, and R. We aim to be an accessible, community-driven conference, with novice to advanced level presentations. PyData tutorials and talks bring attendees the latest project features along with cutting-edge use cases. PyDCI This paper introduces PyDCI, a new implementation of Distributional Correspondence Indexing (DCI) written in Python. DCI is a transfer learning method for cross-domain and cross-lingual text classification for which we had provided an implementation (here called JaDCI) built on top of JaTeCS, a Java framework for text classification. PyDCI is a stand-alone version of DCI that exploits scikit-learn and the SciPy stack. We here report on new experiments that we have carried out in order to test PyDCI, and in which we use as baselines new high-performing methods that have appeared after DCI was originally proposed. These experiments show that, thanks to a few subtle ways in which we have improved DCI, PyDCI outperforms both JaDCI and the above-mentioned high-performing methods, and delivers the best known results on the two popular benchmarks on which we had tested DCI, i.e., MultiDomainSentiment (a.k.a. MDS — for cross-domain adaptation) and Webis-CLS-10 (for cross-lingual adaptation). PyDCI, together with the code allowing to replicate our experiments, is available at https://…/pydci . PyMC3 Probabilistic Programming (PP) allows flexible specification of statistical Bayesian models in code. PyMC3 is a new, open-source PP framework with an intutive and readable, yet powerful, syntax that is close to the natural syntax statisticians use to describe models. It features next-generation Markov chain Monte Carlo (MCMC) sampling algorithms such as the No-U-Turn Sampler (NUTS; Hoffman, 2014), a self-tuning variant of Hamiltonian Monte Carlo (HMC; Duane, 1987). This class of samplers work well on high dimensional and complex posterior distributions and allows many complex models to be fit without specialized knowledge about fitting algorithms. HMC and NUTS take advantage of gradient information from the likelihood to achieve much faster convergence than traditional sampling methods, especially for larger models. NUTS also has several self-tuning strategies for adaptively setting the tunable parameters of Hamiltonian Monte Carlo, which means you usually don’t need to have specialized knowledge about how the algorithms work. PyMC3, Stan (Stan Development Team, 2014), and the LaplacesDemon package for R are currently the only PP packages to offer HMC. Pymc-learn $\textit{Pymc-learn}$ is a Python package providing a variety of state-of-the-art probabilistic models for supervised and unsupervised machine learning. It is inspired by $\textit{scikit-learn}$ and focuses on bringing probabilistic machine learning to non-specialists. It uses a general-purpose high-level language that mimics $\textit{scikit-learn}$. Emphasis is put on ease of use, productivity, flexibility, performance, documentation, and an API consistent with $\textit{scikit-learn}$. It depends on $\textit{scikit-learn}$ and $\textit{pymc3}$ and is distributed under the new BSD-3 license, encouraging its use in both academia and industry. Source code, binaries, and documentation are available on http://…/pymc-learn. PyOD PyOD is an open-source Python toolbox for performing scalable outlier detection on multivariate data. Uniquely, it provides access to a wide range of outlier detection algorithms, including established outlier ensembles and more recent neural network-based approaches, under a single, well-documented API designed for use by both practitioners and researchers. With robustness and scalability in mind, best practices such as unit testing, continuous integration, code coverage, maintainability checks, interactive examples and parallelization are emphasized as core components in the toolbox’s development. PyOD is compatible with both Python 2 and 3 and can be installed through Python Package Index (PyPI) or https://…/pyod. Pyomo Pyomo is a Python-based open-source software package that supports a diverse set of optimization capabilities for formulating, solving, and analyzing optimization models. A core capability of Pyomo is modeling structured optimization applications. Pyomo can be used to define general symbolic problems, create specific problem instances, and solve these instances using commercial and open-source solvers. Pyomo’s modeling objects are embedded within a full-featured high-level programming language providing a rich set of supporting libraries, which distinguishes Pyomo from other algebraic modeling languages like AMPL, AIMMS and GAMS. Pyomo supports a wide range of problem types, including: · Linear programming · Quadratic programming · Nonlinear programming · Mixed-integer linear programming · Mixed-integer quadratic programming · Mixed-integer nonlinear programming · Stochastic programming · Generalized disjunctive programming · Differential algebraic equations · Bilevel programming · Mathematical programs with equilibrium constraints Pyomo also supports iterative analysis and scripting capabilities within a full-featured programming language. Further, Pyomo has also proven an effective framework for developing high-level optimization and analysis tools. For example, the PySP package provides generic solvers for stochastic programming. PySP leverages the fact that Pyomo’s modeling objects are embedded within a full-featured high-level programming language, which allows for transparent parallelization of subproblems using Python parallel communication libraries. Pypeline Pypeline is a simple yet powerful python library for creating concurrent data pipelines. • Pypeline was designed to solve simple medium data tasks that require concurrency and parallelism but where using frameworks like Spark or Dask feel exaggerated or unnatural. • Pypeline exposes an easy to use, familiar, functional API. • Pypeline enables you to build pipelines using Processes, Threads and asyncio.Tasks via the exact same API. • Pypeline allows you to have control over the memory and cpu resources used at each stage of your pipeline. Pyramid Attention Network(PAN) A Pyramid Attention Network(PAN) is proposed to exploit the impact of global contextual information in semantic segmentation. Different from most existing works, we combine attention mechanism and spatial pyramid to extract precise dense features for pixel labeling instead of complicated dilated convolution and artificially designed decoder networks. Specifically, we introduce a Feature Pyramid Attention module to perform spatial pyramid attention structure on high-level output and combining global pooling to learn a better feature representation, and a Global Attention Upsample module on each decoder layer to provide global context as a guidance of low-level features to select category localization details. The proposed approach achieves state-of-the-art performance on PASCAL VOC 2012 and Cityscapes benchmarks with a new record of mIoU accuracy 84.0% on PASCAL VOC 2012, while training without COCO dataset. Pyramid Feature Selective Network Saliency detection is one of the basic challenges in computer vision. How to extract effective features is a critical point for saliency detection. Recent methods mainly adopt integrating multi-scale convolutional features indiscriminately. However, not all features are useful for saliency detection and some even cause interferences. To solve this problem, we propose Pyramid Feature Selective network to focus on effective high-level context features and low-level spatial structural features. First, we design Context-aware Pyramid Feature Extraction (CPFE) module for multi-scale high-level feature maps to capture rich context features. Second, we adopt channel-wise attention (CA) after CPFE feature maps and spatial attention (SA) after low-level feature maps, then fuse outputs of CA & SA together. Finally, we propose an edge preservation loss to guide network to learn more detailed information in boundary localization. Extensive evaluations on five benchmark datasets demonstrate that the proposed method outperforms the state-of-the-art approaches under different evaluation metrics. Pyramid Mask Text Detector(PMTD) Scene text detection, an essential step of scene text recognition system, is to locate text instances in natural scene images automatically. Some recent attempts benefiting from Mask R-CNN formulate scene text detection task as an instance segmentation problem and achieve remarkable performance. In this paper, we present a new Mask R-CNN based framework named Pyramid Mask Text Detector (PMTD) to handle the scene text detection. Instead of binary text mask generated by the existing Mask R-CNN based methods, our PMTD performs pixel-level regression under the guidance of location-aware supervision, yielding a more informative soft text mask for each text instance. As for the generation of text boxes, PMTD reinterprets the obtained 2D soft mask into 3D space and introduces a novel plane clustering algorithm to derive the optimal text box on the basis of 3D shape. Experiments on standard datasets demonstrate that the proposed PMTD brings consistent and noticeable gain and clearly outperforms state-of-the-art methods. Specifically, it achieves an F-measure of 80.13% on ICDAR 2017 MLT dataset. Pyramidal Recurrent Unit(PRU) LSTMs are powerful tools for modeling contextual information, as evidenced by their success at the task of language modeling. However, modeling contexts in very high dimensional space can lead to poor generalizability. We introduce the Pyramidal Recurrent Unit (PRU), which enables learning representations in high dimensional space with more generalization power and fewer parameters. PRUs replace the linear transformation in LSTMs with more sophisticated interactions including pyramidal and grouped linear transformations. This architecture gives strong results on word-level language modeling while reducing the number of parameters significantly. In particular, PRU improves the perplexity of a recent state-of-the-art language model Merity et al. (2018) by up to 1.3 points while learning 15-20% fewer parameters. For similar number of model parameters, PRU outperforms all previous RNN models that exploit different gating mechanisms and transformations. We provide a detailed examination of the PRU and its behavior on the language modeling tasks. Our code is open-source and available at https://…/PRU pyRecLab This paper introduces pyRecLab, a software library written in C++ with Python bindings which allows to quickly train, test and develop recommender systems. Although there are several software libraries for this purpose, only a few let developers to get quickly started with the most traditional methods, permitting them to try different parameters and approach several tasks without a significant loss of performance. Among the few libraries that have all these features, they are available in languages such as Java, Scala or C#, what is a disadvantage for less experienced programmers more used to the popular Python programming language. In this article we introduce details of pyRecLab, showing as well performance analysis in terms of error metrics (MAE and RMSE) and train/test time. We benchmark it against the popular Java-based library LibRec, showing similar results. We expect programmers with little experience and people interested in quickly prototyping recommender systems to be benefited from pyRecLab. PySnooper PySnooper is a poor man’s debugger. You’re trying to figure out why your Python code isn’t doing what you think it should be doing. You’d love to use a full-fledged debugger with breakpoints and watches, but you can’t be bothered to set one up right now. You want to know which lines are running and which aren’t, and what the values of the local variables are. Most people would use print lines, in strategic locations, some of them showing the values of variables. PySnooper lets you do the same, except instead of carefully crafting the right print lines, you just add one decorator line to the function you’re interested in. You’ll get a play-by-play log of your function, including which lines ran and when, and exactly when local variables were changed. What makes PySnooper stand out from all other code intelligence tools? You can use it in your shitty, sprawling enterprise codebase without having to do any setup. Just slap the decorator on, as shown below, and redirect the output to a dedicated log file by specifying its path as the first argument. PyStruct PyStruct aims at being an easy-to-use structured learning and prediction library. Currently it implements only max-margin methods and a perceptron, but other algorithms might follow. The learning algorithms implemented in PyStruct have various names, which are often used loosely or differently in different communities. Common names are conditional random fields (CRFs), maximum-margin Markov random fields (M3N) or structural support vector machines. If you are new to structured learning, have a look at What is structured learning?. The goal of PyStruct is to provide a well-documented tool for researchers as well as non-experts to make use of structured prediction algorithms. The design tries to stay as close as possible to the interface and conventions of scikit-learn. PyText We introduce PyText – a deep learning based NLP modeling framework built on PyTorch. PyText addresses the often-conflicting requirements of enabling rapid experimentation and of serving models at scale. It achieves this by providing simple and extensible interfaces for model components, and by using PyTorch’s capabilities of exporting models for inference via the optimized Caffe2 execution engine. We report our own experience of migrating experimentation and production workflows to PyText, which enabled us to iterate faster on novel modeling ideas and then seamlessly ship them at industrial scale. Python Package Index(PyPI) The Python Package Index is a repository of software for the Python programming language. Python Reconstruction Operators in Neural Networks(PYRO-NN) Purpose: Recently, several attempts were conducted to transfer deep learning to medical image reconstruction. An increasingly number of publications follow the concept of embedding the CT reconstruction as a known operator into a neural network. However, most of the approaches presented lack an efficient CT reconstruction framework fully integrated into deep learning environments. As a result, many approaches are forced to use workarounds for mathematically unambiguously solvable problems. Methods: PYRO-NN is a generalized framework to embed known operators into the prevalent deep learning framework Tensorflow. The current status includes state-of-the-art parallel-, fan- and cone-beam projectors and back-projectors accelerated with CUDA provided as Tensorflow layers. On top, the framework provides a high level Python API to conduct FBP and iterative reconstruction experiments with data from real CT systems. Results: The framework provides all necessary algorithms and tools to design end-to-end neural network pipelines with integrated CT reconstruction algorithms. The high level Python API allows a simple use of the layers as known from Tensorflow. To demonstrate the capabilities of the layers, the framework comes with three baseline experiments showing a cone-beam short scan FDK reconstruction, a CT reconstruction filter learning setup, and a TV regularized iterative reconstruction. All algorithms and tools are referenced to a scientific publication and are compared to existing non deep learning reconstruction frameworks. The framework is available as open-source software at \url{https://…/PYRO-NN}. Conclusions: PYRO-NN comes with the prevalent deep learning framework Tensorflow and allows to setup end-to-end trainable neural networks in the medical image reconstruction context. We believe that the framework will be a step towards reproducible research PyTorch PyTorch is a python package that provides two high-level features: · Tensor computation (like numpy) with strong GPU acceleration · Deep Neural Networks built on a tape-based autograd system You can reuse your favorite python packages such as numpy, scipy and Cython to extend PyTorch when needed. PyTorch-BigGraph(PBG) Graph embedding methods produce unsupervised node features from graphs that can then be used for a variety of machine learning tasks. Modern graphs, particularly in industrial applications, contain billions of nodes and trillions of edges, which exceeds the capability of existing embedding systems. We present PyTorch-BigGraph (PBG), an embedding system that incorporates several modifications to traditional multi-relation embedding systems that allow it to scale to graphs with billions of nodes and trillions of edges. PBG uses graph partitioning to train arbitrarily large embeddings on either a single machine or in a distributed environment. We demonstrate comparable performance with existing embedding systems on common benchmarks, while allowing for scaling to arbitrarily large graphs and parallelization on multiple machines. We train and evaluate embeddings on several large social network graphs as well as the full Freebase dataset, which contains over 100 million nodes and 2 billion edges. PyTorch-Kaldi Speech Recognition Toolkit The availability of open-source software is playing a remarkable role in the popularization of speech recognition and deep learning. Kaldi, for instance, is nowadays an established framework used to develop state-of-the-art speech recognizers. PyTorch is used to build neural networks with the Python language and has recently spawn tremendous interest within the machine learning community thanks to its simplicity and flexibility. The PyTorch-Kaldi project aims to bridge the gap between these popular toolkits, trying to inherit the efficiency of Kaldi and the flexibility of PyTorch. PyTorch-Kaldi is not only a simple interface between these software, but it embeds several useful features for developing modern speech recognizers. For instance, the code is specifically designed to naturally plug-in user-defined acoustic models. As an alternative, users can exploit several pre-implemented neural networks that can be customized using intuitive configuration files. PyTorch-Kaldi supports multiple feature and label streams as well as combinations of neural networks, enabling the use of complex neural architectures. The toolkit is publicly-released along with a rich documentation and is designed to properly work locally or on HPC clusters. Experiments, that are conducted on several datasets and tasks, show that PyTorch-Kaldi can effectively be used to develop modern state-of-the-art speech recognizers. PyUnfold PyUnfold is a Python package for incorporating imperfections of the measurement process into a data analysis pipeline. In an ideal world, we would have access to the perfect detector: an apparatus that makes no error in measuring a desired quantity. However, in real life, detectors have finite resolutions, characteristic biases that cannot be eliminated, less than full detection efficiencies, and statistical and systematic uncertainties. By building a matrix that encodes a detector’s smearing of the desired true quantity into the measured observable(s), a deconvolution can be performed that provides an estimate of the true variable. This deconvolution process is known as unfolding. The unfolding method implemented in PyUnfold accomplishes this deconvolution via an iterative procedure, providing results based on physical expectations of the desired quantity. Furthermore, tedious book-keeping for both statistical and systematic errors produces precise final uncertainty estimates. Pyxley Web-based dashboards are the most straightforward way to share insights with clients and business partners. For R users, Shiny provides a framework that allows data scientists to create interactive web applications without having to write Javascript, HTML, or CSS. Despite Shiny’s utility and success as a dashboard framework, there is no equivalent in Python. There are packages in development, such as Spyre, but nothing that matches Shiny’s level of customization. We have written a Python package, called Pyxley, to not only help simplify the development of web-applications, but to provide a way to easily incorporate custom Javascript for maximum flexibility. This is enabled through Flask, PyReact, and Pandas. Pyxley: Python Powered Dashboards PyXLL The Python Excel Add-In = python(‘in excel’)