Trajectory Mining google
Predicting transportation modes from GPS (Global Positioning System) records is a hot topic in the trajectory mining domain. Each GPS record is called a trajectory point and a trajectory is a sequence of these points. Trajectory mining has applications including but not limited to transportation mode detection, tourism, traffic congestion, smart cities management, animal behaviour analysis, environmental preservation, and traffic dynamics are some of the trajectory mining applications. Transportation modes prediction as one of the tasks in human mobility and vehicle mobility applications plays an important role in resource allocation, traffic management systems, tourism planning and accident detection. …

Statistical Matching Problem google
The statistical matching problem is a data integration problem with structured missing data. The general form involves the analysis of multiple datasets that only have a strict subset of variables jointly observed across all datasets. The simplest version involves two datasets, labelled A and B, with three variables of interest $X, Y$ and $Z$. Variables $X$ and $Y$ are observed in dataset A and variables $X$ and $Z$ are observed in dataset $B$. Statistical inference is complicated by the absence of joint $(Y, Z)$ observations. Parametric modelling can be challenging due to identifiability issues and the difficulty of parameter estimation. …

f-differential privacy google
Differential privacy has seen remarkable success as a rigorous and practical formalization of data privacy in the past decade. But it also has some well known weaknesses: notably, it does not tightly handle composition. This weakness has inspired several recent relaxations of differential privacy based on Renyi divergences. We propose an alternative relaxation of differential privacy, which we term ‘$f$-differential privacy’, which has a number of appealing properties and avoids some of the difficulties associated with divergence based relaxations. First, it preserves the hypothesis testing interpretation of differential privacy, which makes its guarantees easily interpretable. It allows for lossless reasoning about composition and post-processing, and notably, a direct way to import existing tools from differential privacy, including privacy amplification by subsampling. We define a canonical single parameter family of definitions within our class which we call ‘Gaussian Differential Privacy’, defined based on the hypothesis testing of two shifted Gaussian distributions. We show that this family is focal by proving a central limit theorem, which shows that the privacy guarantees of \emph{any} hypothesis-testing based definition of privacy (including differential privacy) converges to Gaussian differential privacy in the limit under composition. We also prove a finite (Berry-Esseen style) version of the central limit theorem, which gives a useful tool for tractably analyzing the exact composition of potentially complicated expressions. We demonstrate the use of the tools we develop by giving an improved analysis of the privacy guarantees of noisy stochastic gradient descent. …

K-Norm Gradient (KNG) google
This paper presents a new mechanism for producing sanitized statistical summaries that achieve \emph{differential privacy}, called the \emph{K-Norm Gradient} Mechanism, or KNG. This new approach maintains the strong flexibility of the exponential mechanism, while achieving the powerful utility performance of objective perturbation. KNG starts with an inherent objective function (often an empirical risk), and promotes summaries that are close to minimizing the objective by weighting according to how far the gradient of the objective function is from zero. Working with the gradient instead of the original objective function allows for additional flexibility as one can penalize using different norms. We show that, unlike the exponential mechanism, the noise added by KNG is asymptotically negligible compared to the statistical error for many problems. In addition to theoretical guarantees on privacy and utility, we confirm the utility of KNG empirically in the settings of linear and quantile regression through simulations. …