SyncGAN google
Generative adversarial network (GAN) has achieved impressive success on cross-domain generation, but it faces difficulty in cross-modal generation due to the lack of a common distribution between heterogeneous data. Most existing methods of conditional based cross-modal GANs adopt the strategy of one-directional transfer and have achieved preliminary success on text-to-image transfer. Instead of learning the transfer between different modalities, we aim to learn a synchronous latent space representing the cross-modal common concept. A novel network component named synchronizer is proposed in this work to judge whether the paired data is synchronous/corresponding or not, which can constrain the latent space of generators in the GANs. Our GAN model, named as SyncGAN, can successfully generate synchronous data (e.g., a pair of image and sound) from identical random noise. For transforming data from one modality to another, we recover the latent code by inverting the mappings of a generator and use it to generate data of different modality. In addition, the proposed model can achieve semi-supervised learning, which makes our model more flexible for practical applications. …

Cat2Vec google
This paper presents a method of learning distributed representation for multi-field categorical data, which is a common data format with various applications such as recommender systems, social link prediction, and computational advertising. The success of non-linear models, e.g., factorisation machines, boosted trees, has proved the potential of exploring the interactions among inter-field categories. Inspired by Word2Vec, the distributed representation for natural language, we propose Cat2Vec (categories to vectors) model. In Cat2Vec, a low-dimensional continuous vector is automatically learned for each category in each field. The interactions among inter-field categories are further explored by different neural gates and the most informative ones are selected by pooling layers. In our experiments, with the exploration of the interactions between pairwise categories over layers, the model attains great improvement over state-of-the-art models in a supervised learning task, e.g., click prediction, while capturing the most significant interactions from the data.
Deep embedding’s for categorical variables (Cat2Vec)


WarpFlow google
WarpFlow is a fast, interactive data querying and processing system with a focus on petabyte-scale spatiotemporal datasets and Tesseract queries. With the rapid growth in smartphones and mobile navigation services, we now have an opportunity to radically improve urban mobility and reduce friction in how people and packages move globally every minute-mile, with data. WarpFlow speeds up three key metrics for data engineers working on such datasets — time-to-first-result, time-to-full-scale-result, and time-to-trained-model for machine learning. …

Variance Inflation Factor (VIF) google
A result from a standard linear model course is that the variance of the ordinary least squares (OLS) coefficient of a variable will never decrease if we add additional covariates. The variance inflation factor (VIF) measures the increase of the variance. Another result from a standard linear model or experimental design course is that including additional covariates in a linear model of the outcome on the treatment indicator will never increase the variance of the OLS coefficient of the treatment at least asymptotically. This technique is called the analysis of covariance (ANCOVA), which is often used to improve the efficiency of treatment effect estimation. So we have two paradoxical results: adding covariates never decreases the variance in the first result but never increases the variance in the second result. In fact, these two results are derived under different assumptions. More precisely, the VIF result conditions on the treatment indicators but the ANCOVA result requires random treatment indicators. In a completely randomized experiment, the estimator without adjusting for additional covariates has smaller conditional variance at the cost of a larger conditional bias, compared to the estimator adjusting for additional covariates. Thus, there is no real paradox. …