Productive Machine Learning (Pro-ML) google
The goal of Pro-ML is to double the effectiveness of machine learning engineers while simultaneously opening the tools for AI and modeling to engineers from across the LinkedIn stack. As we mapped out the effort, we kept a set of key ideas in place to constrain the solution space and focus our efforts.
• We will leverage and improve best-of-breed components from our existing code base to the maximum extent feasible. We are unlikely to rewrite our entire tech stack, but any particular component is fair game.
• The state of the art is constantly evolving with new algorithms and open source frameworks – we need to be flexible to support our existing major ML algorithms as well as new ones that will emerge.
• We will use an agile-inspired strategy so that each step we take is delivering value by making at least one product line better or providing generally useable improvements to existing components.
• The ability to run the models in real-time is as important as the ability to author or train them. The services hosting the models must be able to be independently upgraded without breaking their downstream or upstream services.
• New models, retrained models, and models using new technologies must be A/B testable in production.
• We must build GDPR privacy requirements into every stage of the solution. …


Random Projection Forest (rpForest) google
K-nearest neighbor (kNN) search has wide applications in many areas, including data mining, machine learning, statistics and many applied domains. Inspired by the success of ensemble methods and the flexibility of tree-based methodology, we propose random projection forests (rpForests), for kNN search. rpForests finds kNNs by aggregating results from an ensemble of random projection trees with each constructed recursively through a series of carefully chosen random projections. rpForests achieves a remarkable accuracy in terms of fast decay in the missing rate of kNNs and that of discrepancy in the kNN distances. rpForests has a very low computational complexity. The ensemble nature of rpForests makes it easily run in parallel on multicore or clustered computers; the running time is expected to be nearly inversely proportional to the number of cores or machines. We give theoretical insights by showing the exponential decay of the probability that neighboring points would be separated by ensemble random projection trees when the ensemble size increases. Our theory can be used to refine the choice of random projections in the growth of trees, and experiments show that the effect is remarkable. …

IPMAN google
We present a new methodology, called IPMAN, that combines interior point methods and generative adversarial networks to solve constrained optimization problems with feasible sets that are non-convex or not explicitly defined. Our methodology produces {\epsilon}-optimal solutions and demonstrates that, when there are multiple global optima, it learns a distribution over the optimal set. We apply our approach to synthetic examples to demonstrate its effectiveness and to a problem in radiation therapy treatment optimization with a non-convex feasible set. …

Fuzzy Bayesian Learning google
In this paper we propose a novel approach for learning from data using rule based fuzzy inference systems where the model parameters are estimated using Bayesian inference and Markov Chain Monte Carlo (MCMC) techniques. We show the applicability of the method for regression and classification tasks using synthetic data-sets and also a real world example in the financial services industry. Then we demonstrate how the method can be extended for knowledge extraction to select the individual rules in a Bayesian way which best explains the given data. Finally we discuss the advantages and pitfalls of using this method over state-of-the-art techniques and highlight the specific class of problems where this would be useful. …