Building Interactive Histograms with Bokeh

You are probably familiar with Matplotlib and Seaborn, two excellent (and highly related) Python plotting libraries. The purpose of this article is to get you started with Bokeh if you are not yet familiar with it. You will learn how to write a custom Python class to simplify plotting interactive histograms with Bokeh.


Seamlessly Integrated Deep Learning Environment with Terraform, Google cloud, Gitlab and Docker

When you are starting with some serious deep learning projects, you usually have the problem that you need a proper GPU. Buying reasonable workstations which are suitable for deep learning workloads can easily become very expensive. Luckily there are some options in the cloud. One that I tried out was using the wonderful Google Compute Engine. GPUs are available in the GCE as external accelerators of an instance. Currently, there are these GPUs available (prices for us-central1).


Implementing a ResNet model from scratch.

When implementing the ResNet architecture in a deep learning project I was working on, it was a huge leap from the basic, simple convolutional neural networks I was used to. One prominent feature of ResNet is that it utilizes a micro-architecture within it’s larger macroarchitecture: residual blocks! I decided to look into the model myself to gain a better understanding of it, as well as look into why it was so successful at ILSVRC. I implemented the exact same ResNet model class in Deep Learning for Computer Vision with Python by Dr. Adrian Rosebrock , which followed the ResNet model from the 2015 ResNet academic publication, Deep Residual Learning for Image Recognition by He et al. .


The Tentpoles of Data Science

When I ask myself the question ‘What is data science?’ I tend to think of the following five components. Data science is
• the application of design thinking to data problems;
• the creation and management of workflows for transforming and processing data;
• the negotiation of human relationships to identify context, allocate resources, and characterize audiences for data analysis products;
• the application of statistical methods to quantify evidence; and
• the transformation of data analytic information into coherent narratives and stories


AI Policy 101: An Introduction to the 10 Key Aspects of AI Policy

What in the world is AI policy? First, a definition: AI policy is defined as public policies that maximize the benefits of AI, while minimizing its potential costs and risks. From this perspective, the purpose of AI policy is two-fold. On the one hand, governments should invest in the development and adoption of AI to secure its many benefits for the economy and society. Governments can do this by investing in fundamental and applied research, the development of specialized AI and ‘AI + X’ talent, digital infrastructure and related technologies, and programs to help the private and public sectors adopt and apply new AI technologies. On the other hand, governments need to also respond to the economic and societal challenges brought on by advances in AI. Automation, algorithmic bias, data exploitation, and income inequality are just a few of the many challenges that governments around the world need to develop policy solutions for. These policies include investments into skills development, the creation of new regulations and standards, and targeted efforts to remove bias from AI algorithms and data sets.


A Common Data Science Mistake: Prediction/Recommendation by Manipulating Model Inputs

We trained a machine learning model with high performance. However, it did not work and was not useful in practice.’ I have heard this sentence several times, and each time I was eager to find out the reason. There could be different reasons that a model failed to work in practice. As these issues are not usually addressed in data science courses, in this article I address one of the common mistakes in designing and deploying a machine learning model. In the rest of this article, first, I will discuss the confusion between Correlation and Causation that leads to the misuse of machine learning models. I will illustrate the discussion with an example. After that, different possibilities between inputs and outputs of the model are shown. Finally, I provide some suggestions to avoid this mistake.


Gini Regressions and Heteroskedasticity

We propose an Aitken estimator for Gini regression. The suggested A-Gini estimator is proven to be a U-statistics. Monte Carlo simulations are provided to deal with heteroskedasticity and to make some comparisons between the generalized least squares and the Gini regression. A Gini-White test is proposed and shows that a better power is obtained compared with the usual White test when outlying observations contaminate the data


How to Monitor Machine Learning Models in Real-Time

We present practical methods for near real-time monitoring of machine learning systems which detect system-level or model-level faults and can see when the world changes.


Everything You Need to Know About Decision Trees

Tree-based methods can be used for regression or classification. They involve segmenting the prediction space into a number of simple regions. The set of splitting rules can be summarized in a tree, hence the name decision tree methods. A single decision tree is often not as performant as linear regression, logistic regression, LDA, etc. However, by introducing bagging, random forests, and boosting, it can result in dramatic improvements in prediction accuracy at the expense of some loss in interpretation. In this post, we introduce everything you need to know about decision trees, bagging, random forests, and boosting. It will be a long read, but it will be worth it!


Combining supervised learning and unsupervised learning to improve word vectors

To achieve state-of-the-art result in NLP tasks, researchers try tremendous way to let machine understand language and solving downstream tasks such as textual entailment, semantic classification. OpenAI released a new model which named as Generative Pre-Training (GPT). After reading this article, you will understand:
• Finetuned Transformer LM Design
• Architecture
• Experiments
• Implementation
• Take Away


Prediction task with Multivariate TimeSeries and VAR model.

Time Series data is can be confusing, but very interesting to explore. The reason this sort of data grabbed my attention is that it can be found in almost every business (sales, deliveries, weather conditions etc.). For instance: using Google BigQuery ho


Ethics Commission Automated and Connected Driving

Throughout the world, mobility is becoming increasingly shaped by the digital revolution. The ‘automation’ of private transport operating in the public road environment is taken to mean technological driving aids that relieve the pressure on drivers, assist or even replace them in part or in whole. The partial automation of driving is already standard equipment in new vehicles. Conditionally and highly automated systems which, without human intervention, can autonomously change lanes, brake and steer are available or about to go into mass production. In both Germany and the US, there are test tracks on which conditionally automated vehicles can operate. For local public transport, driverless robot taxis or buses are being developed and trialled. Today, processors are already available or are being developed that are able, by means of appropriate sensors, to detect in real time the traffic situation in the immediate surroundings of a car, determine the car’s own position on appropriate mapping material and dynamically plan and modify the car’s route and adapt it to the traffic conditions. As the ‘perception’ of the vehicle’s surroundings becomes increasingly perfected, there is likely to be an ever better differentiation of road users, obstacles and hazardous situations. This makes it likely that it will be possible to significantly enhance road safety. Indeed, it cannot be ruled out that, at the end of this development, there will be motor vehicles that are inherently safe, in other words will never be involved in an accident under any circumstances. Nevertheless, at the level of what is technologically possible today, and given the realities of heterogeneous and nonconnected road traffic, it will not be possible to prevent accidents completely. This makes it essential that decisions be taken when programming the software of conditionally and highly automated driving systems. The technological developments are forcing government and society to reflect on the emerging changes. The decision that has to be taken is whether the licensing of automated driving systems is ethically justifiable or possibly even imperative. If these systems are licensed – and it is already apparent that this is happening at international level – everything hinges on the conditions in which they are used and the way in which they are designed. At the fundamental level, it all comes down to the following questions. How much dependence on technologically complex systems – which in the future will be based on artificial intelligence, possibly with machine learning capabilities – are we willing to accept in order to achieve, in return, more safety, mobility and convenience? What precautions need to be taken to ensure controllability, transparency and data autonomy? What technological development guidelines are required to ensure that we do not blur the contours of a human society that places individuals, their freedom of development, their physical and intellectual integrity and their entitlement to social respect at the heart of its legal regime?


An Evaluation of Early Warning Models for Systemic Banking Crises: Does Machine Learning Improve Predictions?

This paper compares the out-of-sample predictive performance of different early warning models for systemic banking crises using a sample of advanced economies covering the past 45 years. We compare a benchmark logit approach to several machine learning approaches recently proposed in the literature. We find that while machine learning methods often attain a very high in-sample fit, they are outperformed by the logit approach in recursive out-of-sample evaluations. This result is robust to the choice of performance measure, crisis definition, preference parameter, and sample length, as well as to using different sets of variables and data transformations. Thus, our paper suggests that further enhancements to machine learning early warning models are needed before they are able to offer a substantial value-added for predicting systemic banking crises. Conventional logit models appear to use the available information already fairly effciently, and would for instance have been able to predict the 2007/2008 financial crisis out-of-sample for many countries. In line with economic intuition, these models identify credit expansions, asset price booms and external imbalances as key predictors of systemic banking crises.
Advertisements