AI investment activity – trends of 2018

AI hype slowdown, building cognitive tech stack, vertical integration and other observations. This review highlights trends in launching/investing in artificial intelligence (AI) startups from 2018. It contains an analysis of 47 AI startups that were launched in 2018 and managed to raise at least $1M each. Also, it includes a more broad overview of some longer term trends. I cover the US, Canada, Europe, and Israel in this review.

How to troubleshoot your Azure Data Science project in production

Suppose a model is running in production and creates predictions that are not understood. Then troubleshooting is needed and at least the following shall be traced back:
• Version of model deployed in production
• Metrics and statistics of model
• Data and algorithm used in model
• Person/team that created model
In this blog, an Azure data science project is defined and then it is discussed how troubleshooting can be done. In case you are interested how the project is implemented in detail, refer to my previous blog. Notice that this blog is standalone and it is not necessary to read the previous blog first.

Complementing A B Testing with Machine Learning and Feature Importance

In this post I want to focus on using Machine Learning to complement other data science tasks, in particular A/B testing. This is supposed to be a practical post rather than a theoretical discussion and I assume that you are at least somewhat familar with A/B Testing and Random Forest and Feature Importance. I have come across an interesting challenge recently that required me to run an A/B test to test whether one auction type sold cars faster than another. Simply running the A/B test would have concluded that one auction type did indeed sell cars faster than the other, but it turned out that the auction type wasn’t the primary driver behind faster selling times. It was the lower associated prices of that one auction type. This could have had dire consequences if the company selling these cars would have concluded to focus on selling via that one auction type instead of focusing on pricing first. The way I found out about this was to run a Random Forest on the dataset and then get the feature importances, which helped me uncover this. In fact, I generally believe that machine learning is a great complementary tool for Exploratory Data Analysis (EDA) as well as A/B testing.

Data Science in Inventory Management: Real case in managing a warehouse

Hello and welcome back to another story about Data Science! This is a real case example, not just a theory or academically post so we will do a testing experiment with problem in real life. So in case you have any concern, let’s drop a comment and we can discuss together!

Will Self-Supervised Learning Help?

Self-Supervision is in the air. Explaining the difference between self-, un-, weakly-, semi-, distantly-, and fully-supervised learning (and of course, RL) just got a 100 times tougher. 🙂 Nevertheless, we are gonna try. The problem, in context, is to encode an object (a word, sentence, image, video, audio, …) into a general-enough representation (blobs of numbers) which is useful (preserves enough object features) for solving multiple tasks, e.g., find sentiment of a sentence, translate it into another language, locate things in an image, make it higher-resolution, detect text being spoken, identify speaker switches, and so on. Given how diverse images or videos or speech can be, we must often make do with representations tied to a few tasks (or even a single one), which break down if we encounter new examples or new tasks. Learning more, and repeatedly, and continuously, from new examples (inputs labeled with expected outputs) is our go-to strategy (supervised learning). We’ve secretly (and ambitiously) wished that this tiresome, repeated learning process will eventually go away and we’d learn good universal representations for these objects. Learn once, reuse forever. But, the so-called unsupervised learning paradigm (only-input-no-labels) hasn’t delivered much (mild exceptions like GANs and learn-to-cluster models).

Principal Component Analysis for Dimensionality Reduction

In the modern age of technology, increasing amounts of data are produced and collected. In machine learning, however, too much data can be a bad thing. At a certain point, more features or dimensions can decrease a model’s accuracy since there is more data that needs to be generalized – this is known as the curse of dimensionality. Dimensionality reduction is way to reduce the complexity of a model and avoid overfitting. There are two main categories of dimensionality reduction: feature selection and feature extraction. Via feature selection, we select a subset of the original features, whereas in feature extraction, we derive information from the feature set to construct a new feature subspace. In this tutorial we will explore feature extraction. In practice, feature extraction is not only used to improve storage space or the computational efficiency of the learning algorithm, but can also improve the predictive performance by reducing the curse of dimensionality – especially if we are working with non-regularized models.

#Fail: Artificial Intelligence is a Science

Science is messy. I don’t think people outside the science field appreciate the ratio of failures to successes. In my work, I very often fail to develop an idea to completion. Sometimes the model doesn’t work. Sometimes the idea is just wrong. Sometimes the idea needs to change. Artificial intelligence is more about experimentation and iteration than it is building up strong and clear solutions from paper to production.

An Easy Guide to Gauge Equivariant Convolutional Networks

Geometric deep learning is a very exciting new field, but its mathematics is slowly drifting into the territory of algebraic topology and theoretical physics. This is especially true for the paper ‘Gauge Equivariant Convolutional Networks and the Icosahedral CNN’ by Cohen et. al.(https://…/1902.04615 ), which I want to explore in this article. The paper uses the language of gauge theory, which lies in the center of anything in physics that likes to use the words ‘quantum’ and ‘field’ together. It promises to give an intuitive understanding of the basics of gauge theory and I must say, it delivers and is probably the nicest introduction I have seen so far. But it still remains a difficult subject. What I want to do here, is give a purely intuitive understanding, with no math. While I don’t follow the structure of the paper exactly, you can still open the paper side-by-side, as I will try and highlight all important terminology. In the following I’ll assume you know how convolutional neural networks (CNN) work, but have no idea what they have to do with manifolds. So let’s go!

Maximizing Scarce Maintenance Resources with Data

A common scenario facing governmental and non-governmental organizations is how to deploy finite resources to maximum impact. Often these organizations must make decisions without full information transparency. This essay demonstrates how data analysis and modeling can be used to help organizations in this scenario maximize their resources.

A Lite Introduction to Markov Chain

Previously I wrote a lite introduction to Latent Dirichlet Allocation as part of a deep dive into what makes Natural Language Processing so great. Natural Language Processing will be picking up steam over the next couple of years (until 2025 to be exact) as Artificial Intelligence continues to grow. NLP is one of the key drivers, and as it continues to be a part of our daily lives, maybe its important to understand how it works behind the scenes. There are a ton of different concepts, mathematics, and models that go into different Natural Language Processing use cases, but we are only going to understand one on a basic level and that is Markov Chains.

How to Hide your API keys in Python

In Data Science, it is important to document your work. Documenting your work is how others can even understand what is going on, after all. The same reason researchers publish full reports and scientists post their studies, it helps validate your findings and let other people build off of it. Another key point in Data Science is, well, data. While in a professional capacity you might be working with data that is already available, this is not always the case, and in fact, you will have to acquire your own data quite often. Probably one of the easiest ways to obtain a lot of structured data is via APIs. APIs help everyone get along, and part of that getting along is being able to authenticate who is retrieving what data and how much. Often this authentication is done via encryption, based on an RSA public/private model. In effect, especially when you want the full power of an API, it means having a pair of ‘key values’, one public and one secret. Part of documenting your data science projects means documenting how you obtain your data, and if you are interfacing with an API, you can be exposing your API keys when you publish that code!

Bayesian, and Non-Bayesian, Cause-Specific Competing-Risk Analysis for Parametric and Nonparametric Survival Functions: The R Package CFC

The R package CFC performs cause-specific, competing-risk survival analysis by computing cumulative incidence functions from unadjusted, cause-specific survival functions. A high-level API in CFC enables end-to-end survival and competing-risk analysis, using a single-line function call, based on the parametric survival regression models in the survival package. A low-level API allows users to achieve more flexibility by supplying their custom survival functions, perhaps in a Bayesian setting. Utility methods for summarizing and plotting the output allow population-average cumulative incidence functions to be calculated, visualized and compared to unadjusted survival curves. Numerical and computational optimization strategies are employed for efficient and reliable computation of the coupled integrals involved. To address potential integrable singularities caused by infinite cause-specific hazards, particularly near time-from-index of zero, integrals are transformed to remove their dependency on hazard functions, making them solely functions of causespecific, unadjusted survival functions. This implicit variable transformation also provides for easier extensibility of CFC to handle custom survival models since it only requires the users to implement a maximum of one function per cause. The transformed integrals are numerically calculated using a generalization of Simpson’s rule to handle the implicit change of variable from time to survival, while a generalized trapezoidal rule is used as reference for error calculation. An OpenMP-parallelized, efficient C++ implementation – using packages Rcpp and RcppArmadillo – makes the application of CFC in Bayesian settings practical, where a potentially large number of samples represent the posterior distribution of cause-specific survival functions.

How to Improve Clinical Trial Using AWS

Amazon Web Services or AWS is a cloud infrastructure-as-a-Service platform which has enabled helped organization from the burden of setting up infrastructure for any application as it provides services for various applications whose price is minimal and could be paid only for the specific services delivered. Moreover, robust, scalable and secure services are provided by the AWS which are much better than any website which a company hosts. Its data centers all over the world ensure no data is lost. There are various instances of AWS which contributes to its workflow such as the EC2 instance, the Elastic Compute Cloud, Elastic Load Balancer, Amazon Cloud Front and so on. The description of all these components is beyond the scope of this blog. Amazon Web Services has reduced the management, maintenance overhead. The resources of AWS are reliable and available all over. The right tools could improve productivity and scalability as well. One of the key areas where Amazon Web Services has impacted is healthcare. In this article, we would learn AWS has helped to improve clinical trials and the process that’s being followed under the hood.

Forty-two countries adopt new OECD Principles on Artificial Intelligence

22/05/2019 – OECD and partner countries formally adopted the first set of intergovernmental policy guidelines on Artificial Intelligence (AI) today, agreeing to uphold international standards that aim to ensure AI systems are designed to be robust, safe, fair and trustworthy. The OECD’s 36 member countries, along with Argentina, Brazil, Colombia, Costa Rica, Peru and Romania, signed up to the OECD Principles on Artificial Intelligence at the Organisation’s annual Ministerial Council Meeting, taking place today and tomorrow in Paris and focused this year on ‘Harnessing the Digital Transition for Sustainable Development’. Elaborated with guidance from an expert group formed by more than 50 members from governments, academia, business, civil society, international bodies, the tech community and trade unions, the Principles comprise five values-based principles for the responsible deployment of trustworthy AI and five recommendations for public policy and international co-operation. They aim to guide governments, organisations and individuals in designing and running AI systems in a way that puts people’s best interests first and ensuring that designers and operators are held accountable for their proper functioning. ‘Artificial Intelligence is revolutionising the way we live and work, and offering extraordinary benefits for our societies and economies. Yet, it raises new challenges and is also fuelling anxieties and ethical concerns. This puts the onus on governments to ensure that AI systems are designed in a way that respects our values and laws, so people can trust that their safety and privacy will be paramount,’ said OECD Secretary-General Angel Gurría. ‘These Principles will be a global reference point for trustworthy AI so that we can harness its opportunities in a way that delivers the best outcomes for all.’ (Read the full speech.) The AI Principles have the backing of the European Commission, whose high-level expert group has produced Ethics Guidelines for Trustworthy AI, and they will be part of the discussion at the forthcoming G20 Leaders’ Summit in Japan. The OECD’s digital policy experts will build on the Principles in the months ahead to produce practical guidance for implementing them. While not legally binding, existing OECD Principles in other policy areas have proved highly influential in setting international standards and helping governments to design national legislation. For example, the OECD Privacy Guidelines, which set limits to the collection and use of personal data, underlie many privacy laws and frameworks in the United States, Europe and Asia. The G20-endorsed OECD Principles of Corporate Governance have become an international benchmark for policy makers, investors, companies and other stakeholders working on institutional and regulatory frameworks for corporate governance.

Ten Simple Rules for Better Figures

Scientific visualization is classically defined as the process of graphically displaying scientific data. However, this process is far from direct or automatic. There are so many different ways to represent the same data: scatter plots, linear plots, bar plots, and pie charts, to name just a few. Furthermore, the same data, using the same type of plot, may be perceived very differently depending on who is looking at the figure. A more accurate definition for scientific visualization would be a graphical interface between people and data. In this short article, we do not pretend to explain everything about this interface; rather, see [1], [2] for introductory work. Instead we aim to provide a basic set of rules to improve figure design and to explain some of the common pitfalls.