Kanri Distance approach for translating Predictive Models to Actions

Kanri’s proprietary combination of patented statistical and process methods provides a uniquely powerful and insightful ability to evaluate large data sets with multiple variables. While many tools evaluate patterns and dynamics for large data, only the Kanri Distance Calculator allows users to understand where they stand with respect to a desired target state and the specific contribution of each variable toward the overall distance from the target state. The Kanri model not only calculates the relationship of variables within the overall data set, but more importantly mathematically teases out the interaction between each of them. This combination of relational insights fuels Kanri’s breakthrough distance calculator. It answers the question “In a world of exponentially expanding data how do I find the variables that will solve my problem?” and it helps quickly to reach that conclusion. But the Kanri model does not stop there. Kanri tells you exactly, formulaically how much each variable contributes. The Kanri Distance Calculator opens a new world of solution development possibilities that can apply the power of massive data sets to an individual…or to an individualized objective.

Lessons from 2MM machine learning models on Kaggle.com

• XG Boosting is the engine of choice for structured problems (where feature manufacturing is the key). Now available as python package <http://xgboost.readthedocs.org>. Behind XG are the typical suspects – Random Forest and Gradient Boosted Trees. However, hyper parameter tuning is only the few % accuracy points improvement on top, the major breakthroughs in predictive power come from feature manufacturing;
• Feature manufacturing for structured problems is the key process (or otherwise random permutation of features to find most predictive/telling combination) either by iteratively trying various approaches (as do thousands of individual contributions to Kaggle.com competition) or in an automatic fashion (as done by DataRobot <http://www.datarobot.com>. BTW, DataRobot is based partially in Boston and partially in Ukraine). Some Amazon engineers who attended from Seattle commented they are building a platform which would iteratively try to permute features to randomly (aka ‘genetic algorithm’ fashion) find best features for structured problems, too;
• For unstructured problems (visuals, text, sound) – Neural Networks run the show (and their deep learning – auto feature extracting – and variants of those). Great example was application of NN to Diabetic Retinopathy problem at Kaggle.com which surpassed in accuracy commercially available products;
• Kaggle.com is really suitable for two types of problems: A. a problem solved now for which a more accurate solution is highly desirable – any fraction % accuracy turns into millions of $ (e.g. loan default rate prediction) or B. problems which were never tackled by machine learning in order to see if ML can help solve them (e.g. EEG readings to predict epilepsy);
• Don’t expect data scientists to perform best in the office! Anthony mentioned his first successful 24h data science hackathon when his senior was guiding him 5 min, coding himself for 15 min and then playing basketball for 40 min each hour. Personally, I find walking, gardening and running are great creativity boosters. How will you work tomorrow? 🙂