TCAV: Interpretability Beyond Feature Attribution

The emphasis today is slowly moving towards model interpretability rather than model predictions alone. The real essence of Interpretability, however, should be to make Machine learning models more understandable for humans and especially for those who don’t know much about machine learning. Machine Learning is a very powerful tool and with such power comes a responsibility to ensure that values like fairness are well reflected within the models. It is also important to ensure that the AI models do not reinforce the bias that exists in the real world. To tackle such issues, Google AI Researchers are working on a solution called TCAV (Testing with Concept Activation Vectors) to understand what signals the neural network models use for their prediction.

An End-to-End AutoML Solution for Tabular Data at KaggleDays

Machine learning (ML) for tabular data (e.g. spreadsheet data) is one of the most active research areas in both ML research and business applications. Solutions to tabular data problems, such as fraud detection and inventory prediction, are critical for many business sectors, including retail, supply chain, finance, manufacturing, marketing and others. Current ML-based solutions to these problems can be achieved by those with significant ML expertise, including manual feature engineering and hyper-parameter tuning, to create a good model. However, the lack of broad availability of these skills limits the efficiency of business improvements through ML.

In Big Data, It’s Not the Volume that’s Important, It’s the Granularity

In many of my presentations and lectures, I have made the following declaration: In Big Data, it isn’t the volume of data that’s interesting, it’s the granularity; it’s the ability to build detailed analytic or behavioral profiles on every human and every device that ultimately drives monetization. Ever-larger volumes of aggregated data enable organizations to spot trends – what products are hot, what movies or TV shows are trendy, what restaurants or destinations are popular, etc. This is all interesting information, but how do I monetize these trends? I can build more products or create more TV episodes or promote select destinations, but to make those trends actionable – to monetize these trends – I need to get down to the granularity of the individual. I need detailed, individual insights with regards to who is interested and the conditions (price, location, time of day/day of week, weather, season, etc.) that attracts them. I need to understand each individual’s behaviors (tendencies, propensities, inclinations, patterns, trends, associations, relationships) in order to target my efforts to drive the most value at the least cost. I need to be able to codify (turn into math) customer, product and operational behaviors such as individual patterns, trends, associations, and relationships.

Breaking Into Data Science in 2019

I remember thinking about breaking into data science as if it were yesterday. I had just started my semester abroad in Shanghai and attended several talks and guest lectures about data science and machine learning. However, I had never coded before (except for some basic SQL) and did not really know where to start. Initial web searches resulted in more confusion than insight as many people recommended many different paths into data science. Some even suggested that becoming a data scientist without a Ph.D. is not possible. This article takes a different approach. I am not going to attempt to provide a one-fits-all path into data science. Instead, I am going to elaborate on my experiences while trying to break into data science, which I hope may be of use to aspiring data scientists.

Scaling Transformer-XL to 128 GPUs

One of the difficulties of researching language models is that you often don’t know if your ideas work until you try them on a real-world datasets. However, training on such datasets on one machine can take weeks.
Fortunately there’s a straightforward recipe to speed up this process:
1.Find a good single machine model
2.Run N copies of the model on N machines in parallel, synchronizing at each step
3.Solve all remaining technical challenges
We used this recipe to reduce ImageNet training time from 2 weeks to 18 minutes. You could also apply the same optimization to train a model in 2 weeks that would originally require 4 years, so you can choose to scale up your research in scope instead of iteration time.

What’s new in R 3.6.0

A major update to the open-source R language, R 3.6.0, was released on April 26 and is now available for download for Windows, Mac and Linux. As a major update, it has many new features, user-visible changes and bug fixes. You can read the details in the release announcement, and in this blog post I’ll highlight the most significant ones.

How (not) to use Machine Learning for time series forecasting: Avoiding the pitfalls

In my other posts, I have covered topics such as: How to combine machine learning and physics, and how machine learning can be used for production optimization as well as anomaly detection and condition monitoring. But in this post, I will discuss some of the common pitfalls of machine learning for time series forecasting. Time series forecasting is an important area of machine learning. It is important because there are so many prediction problems that involve a time component. However, while the time component adds additional information, it also makes time series problems more difficult to handle compared to many other prediction tasks. This post will go through the task of time series forecasting using machine learning, and how to avoid some of the common pitfalls. Through a concrete example, I will demonstrate how one could seemingly have a good model and decide to put it into production, whereas in reality, the model might have no predictive power whatsoever, More specifically, I will focus on how to evaluate your model accuracy, and show how relying simply on common error metrics such as mean percentage error, R2 score etc. can be very misleading if they are applied without caution.

What is Cognitive Computing? How are Enterprises benefitting from Cognitive Technology?

AI has truly been a far-flung goal ever since the conception of computing, and every day we seem to be getting closer and closer to that goal with new cognitive computing models. Coming from the amalgamation of cognitive science and based on the basic premise of simulating the human thought process, the concept, as well as applications of cognitive computing, are bound to have far-reaching impacts on not just our private lives, but also industries like healthcare, insurance and more. The advantages of cognitive technology are well and truly a step beyond the conventional AI systems. According to David Kenny, General Manager, IBM Watson – the most advanced cognitive computing framework, ‘AI can only be as smart as the people teaching it.’ The same is not true for the latest cognitive revolution. Cognitive computing process uses a blend of artificial intelligence, neural networks, machine learning, natural language processing, sentiment analysis and contextual awareness to solve day-to-day problems just like humans. IBM defines cognitive computing as an advanced system that learns at scale, reason with purpose and interacts with humans in a natural form.

Metrics for Imbalanced Classification

The notion of metrics in Data Science is extremely important. If you don’t know how to estimate current results properly, you are unable to improve them either. The wrong understanding of metrics also leads to the wrong estimate of the model capacity and an insight to the state of the problem. The current story will reveal the nature of popular metrics for classification problem. All discussed metrics will be implemented in NumPy to fill them in-hand. The list of discussed metrics are: precision, recall, F1, MCC and ROC-AUC.

Robot Thinking Will Power New Frontiers in Deep Learning AI

Deep learning has advanced to the point where we’re seeing computers do things that would have been considered science fiction just a few years ago. Areas such as language translation, image captioning, picture generation, and facial recognition display major advances on a regular basis. But certain artificial intelligence problems don’t mesh well with deep learning’s traditional training algorithms, and these areas might require new ways of thinking. Neural networks learn by taking tiny steps in the direction of an adequate solution (Figure 1). This means that the path neural networks navigate – called the loss function – needs to be relatively smooth. But many real-life situations don’t provide anything close to the continuous loss function that neural networks require (Figure 2). For instance, natural language processing (NLP) poses many challenges that can’t be solved through traditional machine learning gradient descent. Let’s say we want an AI system to rewrite text into a more elegant form, and that a hypothetical ‘language effectiveness score’ measures how clear, concise, and polished a sentence is.

Artificial Intelligence Standards

The February 11, 2019, Executive Order on Maintaining American Leadership in Artificial Intelligence (AI) directs the National Institute of Standards and Technology (NIST) to create a plan for Federal engagement in the development of technical standards and related tools in support of reliable, robust, and trustworthy systems that use AI technologies (Plan). This notice requests information to help NIST understand the current state, plans, challenges, and opportunities regarding the development and availability of AI technical standards and related tools, as well as priority areas for federal involvement in AI standards-related activities. To assist in developing the Plan, NIST will consult with Federal agencies, the private sector, academia, non-governmental entities, and other stakeholders with interest in and expertise relating to AI.

NIST Requests Information on Artificial Intelligence Technical Standards and Tools

The U.S. Department of Commerce’s National Institute of Standards and Technology (NIST) is seeking information about technical standards and related tools for artificial intelligence (AI). The Request for Information (RFI), published today in the Federal Register, is in response to the Feb. 11, 2019, Executive Order on Maintaining American Leadership in Artificial Intelligence. The executive order directs NIST to create a plan for federal engagement in the development of these standards and tools in support of reliable, robust and trustworthy systems that use AI technologies. ‘The inputs of the U.S. stakeholder community are essential to inform development of a plan that will support continued American leadership in AI,’ said Under Secretary of Commerce for Standards and Technology and NIST Director Walter G. Copan. ‘Sound technical standards, performance metrics and tools are needed to foster public trust and confidence in AI technologies, enabling the market adoption of the next wave of innovations that will contribute to the economic and national security of the United States.’ To develop the plan, NIST will engage with other federal agencies, the private sector, academic institutions, nongovernmental organizations and other stakeholders with an interest and expertise in AI and related standards.

Center for Data Innovation comments in response to the National Institute of Standards and Technology’s request for information on artificial intelligence (AI) standards.

The Center for Data Innovation is the leading think tank studying the intersection of data, technology, and public policy. With staff in Washington, D.C., and Brussels, the Center formulates and promotes pragmatic public policies designed to maximize the benefits of data-driven innovation in the public and private sectors. It educates policymakers and the public about the opportunities and challenges associated with data, as well as important data-related technology trends. The Center is a non-profit, non-partisan research institute affiliated with the Information Technology and Innovation Foundation. Robust technical standards for AI will be crucial to the success of the technology in the United States and abroad because they can serve as authoritative guidelines and benchmarks for the development and evaluation of AI. However thus far, concerns about the oversight of AI have stymied productive discussions about standards development by causing policymakers to prioritize oversight at the expense of technical understanding. NIST should shift this focus back to technical standards development to provide a sound scientific underpinning for any future efforts to increase oversight of AI. Additionally, NIST should strengthen U.S. leadership in developing AI standards and encouraging their broad adoption to ensure a globally competitive marketplace.

JMLR Volume 19

1. Numerical Analysis near Singularities in RBF Networks
2. A Two-Stage Penalized Least Squares Method for Constructing Large Systems of Structural Equations
3. Approximate Submodularity and its Applications: Subset Selection, Sparse Approximation and Dictionary Selection
4. A Hidden Absorbing Semi-Markov Model for Informatively Censored Temporal Data: Learning and Inference
5. Can We Trust the Bootstrap in High-dimensions? The Case of Linear Models
6. RSG: Beating Subgradient Method without Smoothness and Strong Convexity
7. Patchwork Kriging for Large-scale Gaussian Process Regression
8. Scalable Bayes via Barycenter in Wasserstein Space
9. Experience Selection in Deep Reinforcement Learning for Control
10. Change-Point Computation for Large Graphical Models: A Scalable Algorithm for Gaussian Graphical Models with Change-Points
11. Statistical Analysis and Parameter Selection for Mapper
12. A Robust Learning Approach for Regression Models Based on Distributionally Robust Optimization
13. Model-Free Trajectory-based Policy Optimization with Monotonic Improvement
14. Regularized Optimal Transport and the Rot Mover’s Distance
15. ELFI: Engine for Likelihood-Free Inference
16. Streaming kernel regression with provably adaptive mean, variance, and regularization
17. Distributed Proximal Gradient Algorithm for Partially Asynchronous Computer Clusters
18. Refining the Confidence Level for Optimistic Bandit Strategies
19. ThunderSVM: A Fast SVM Library on GPUs and CPUs
20. Robust Synthetic Control
21. Reverse Iterative Volume Sampling for Linear Regression
22. Universal discrete-time reservoir computers with stochastic inputs and linear readouts using non-homogeneous state-affine systems
23. Deep Hidden Physics Models: Deep Learning of Nonlinear Partial Differential Equations
24. OpenEnsembles: A Python Resource for Ensemble Clustering
25. Importance Sampling for Minibatches
26. Generalized Rank-Breaking: Computational and Statistical Tradeoffs
27. Gradient Descent Learns Linear Dynamical Systems
28. Parallelizing Spectrally Regularized Kernel Algorithms
29. A Direct Approach for Sparse Quadratic Discriminant Analysis
30. Distribution-Specific Hardness of Learning Neural Networks
31. Goodness-of-Fit Tests for Random Partitions via Symmetric Polynomials
32. A Spectral Approach for the Design of Experiments: Design, Analysis and Algorithms
33. Kernel Density Estimation for Dynamical Systems
34. Invariant Models for Causal Transfer Learning
35. The xyz algorithm for fast interaction search in high-dimensional data
36. Local Rademacher Complexity-based Learning Guarantees for Multi-Task Learning
37. State-by-state Minimax Adaptive Estimation for Nonparametric Hidden Markov Models
38. Learning from Comparisons and Choices
39. Connections with Robust PCA and the Role of Emergent Sparsity in Variational Autoencoder Models
40. An Efficient and Effective Generic Agglomerative Hierarchical Clustering Approach
41. Markov Blanket and Markov Boundary of Multiple Variables
42. Kernel Distribution Embeddings: Universal Kernels, Characteristic Kernels and Kernel Metrics on Distributions
43. Random Forests, Decision Trees, and Categorical Predictors: The “Absent Levels” Problem
44. On Tight Bounds for the Lasso
45. Harmonic Mean Iteratively Reweighted Least Squares for Low-Rank Matrix Recovery
46. On Generalized Bellman Equations and Temporal-Difference Learning
47. Design and Analysis of the NIPS 2016 Review Process
48. Emergence of Invariance and Disentanglement in Deep Representations
49. Covariances, Robustness, and Variational Bayes
50. Accelerating Cross-Validation in Multinomial Logistic Regression with l1-Regularization
51. Profile-Based Bandit with Unknown Profiles
52. How Deep Are Deep Gaussian Processes?
53. Fast MCMC Sampling Algorithms on Polytopes
54. Modular Proximal Optimization for Multidimensional Total-Variation Regularization
55. On Semiparametric Exponential Family Graphical Models
56. Theoretical Analysis of Cross-Validation for Estimating the Risk of the k-Nearest Neighbor Classifier
57. Maximum Selection and Sorting with Adversarial Comparators
58. A New and Flexible Approach to the Analysis of Paired Comparison
59. Simple Classification Using Binary Data
60. Hinge-Minimax Learner for the Ensemble of Hyperplanes
61. Short-term Sparse Portfolio Optimization Based on Alternating Direction Method of Multipliers
62. Scaling up Data Augmentation MCMC via Calibration
63. Extrapolating Expected Accuracies for Large Multi-Class Problems
64. Inference via Low-Dimensional Couplings
65. Efficient Bayesian Inference of Sigmoidal Gaussian Cox Processes
66. Multivariate Bayesian Structural Time Series Model
67. Inverse Reinforcement Learning via Nonparametric Spatio-Temporal Subgoal Modeling
68. The Implicit Bias of Gradient Descent on Separable Data
69. Optimal Quantum Sample Complexity of Learning Algorithms
70. Scikit-Multiflow: A Multi-output Streaming Framework
71. Optimal Bounds for Johnson-Lindenstrauss Transformations
72. An efficient distributed learning algorithm based on effective local functional approximations
73. Sparse Estimation in Ising Model via Penalized Monte Carlo Methods
74. Using Side Information to Reliably Learn Low-Rank Matrices from Missing and Corrupted Observations
75. A Note on Quickly Sampling a Sparse Matrix with Low Rank Expectation
76. Online Bootstrap Confidence Intervals for the Stochastic Gradient Descent Estimator
77. A Random Matrix Analysis and Improvement of Semi-Supervised Learning for Large Dimensional Data
78. Robust PCA by Manifold Optimization
79. Improved Asynchronous Parallel Optimization Analysis for Stochastic Incremental Methods
80. Clustering is semidefinitely not that hard: Nonnegative SDP for manifold disentangling
81. Seglearn: A Python Package for Learning Sequences and Time Series
82. DALEX: Explainers for Complex Predictive Models in R

Basic Git/GitHub Cheat Sheet

If you aren’t already familiar with version control and incorporating it into your daily workflow, now is the time to get started. This is a barebones basic guide to get you started with Git and give you a solid foundation to develop further. Git is almost certainly used in any professional environment and the more you familiarize yourself with it early on the more valuable you will be to employers. Also, this will make your personal experience better by being able to switch computers without having to worry about saving your project on a flash drive. Working on groups projects will become so much easier to manage. Ever messed up your code so bad you felt like it would just be easier to start from scratch? With version control, you can just revert back to a stable version free from all of those crazy ideas you wanted to implement at 2 am.

Your Mobile Banking App has a Problem (and I’m Not Sure Anyone Knows About it)

The technology behind mobile banking is pretty incredible, but what happens when there’s a mistake? What happens if we don’t see the mistake? We’re living in a world where so many technological advancements have been made that they almost blend into the background. We’ve gotten used to the idea that we can let our phones and computers do the little things for us. It’s easy to forget how new all of this technology really is. But it is new. It’s changing every day. There are algorithms behind most of the basic things that you take for granted, from social media and entertainment to banking and finances. They are constantly evolving. They are not perfect.

Cool Factor: How to Steal Styles with Machine Learning, Turi Create, and ResNet

I was excited when I first heard that Turi Create was acquired by Apple and then later open-sourced to the greater machine learning community! Earlier this year, I wrote about how Turi Create is Disrupting the Machine Learning Landscape. Then came WWDC18 and a host of improvements to Turi Create, including a beta version 5.0.