Principal Component Analysis (PCA) is unsupervised learning technique and it is used to reduce the dimension of the data with minimum loss of information. PCA is used in an application like face recognition and image compression.
This Monday, February the 12th, we launched a public beta of Datalore – an intelligent web application for data analysis and visualization in Python, brought to you by JetBrains. This tool turns the data science workflow into a delightful experience with the help of smart coding assistance, incremental computations, and built-in tools for machine learning.
On a regular basis people tell me about their impressive achievements using AI. 99% of these things are completely stupid. This post may come off as a rant, but that’s not so much its intent, as it is to point out why we went from having very few AI experts, to having so many in so little time. Also to convey that most of these experts only seem experty because so few people know how to call them on their bull shit.
Imagine your Aunt Ida is in an autonomous vehicle (AV) – a self-driving car – on a city street closed to human-driven vehicles. Imagine a swarm of puppies drops from an overpass, a sinkhole opens up beneath a bus full of mathematical geniuses, or Beethoven (or Tupac) jumps into the street from the left as Mozart (or Biggie) jumps in from the right. Whatever the dilemma, imagine that the least worst option for the network of AVs is to drive the car containing your Aunt Ida into a concrete abutment. Even if the system made the right choice – all other options would have resulted in more deaths – you’d probably want an explanation.
A reinforcement learning algorithm engaging in policy improvement from a continuous stream of experience needs to solve an opportunity-cost problem. (The RL lingo for opportunity-cost is “advantage”.) Thinking about this in the context of a 2-person game, at a given state, with your existing rollout policy, is taking the first action leading to a win 1/2 the time good or bad? It could be good since the player is well behind and every other action is worse. Or it could be bad since the player is well ahead and every other action is better. Understanding one action’s long term value relative to another’s is the essence of the opportunity cost trade-off at the core of many reinforcement learning algorithms.
In this episode of the Data Show, I spoke with Leo Meyerovich, co-founder and CEO of Graphistry. Graphs have always been part of the big data revolution (think of the large graphs generated by the early social media startups). In recent months, I’ve come across companies releasing and using new tools for creating, storing, and (most importantly) analyzing large graphs. There are many problems and use cases that lend themselves naturally to graphs, and recent advances in hardware and software building blocks have made large-scale analytics possible.
Deep learning has proven its effectiveness in many fields, such as computer vision, natural language processing (NLP), text translation, or speech to text. It takes its name from the high number of layers used to build the neural network performing machine learning tasks. There are several types of layers as well as overall network architectures, but the general rule holds that the deeper the network is, the more complexity it can grasp. This article will explain fundamental concepts of neural network layers and walk through the process of creating several types using TensorFlow. TensorFlow is the platform that contributed to making artificial intelligence (AI) available to the broader public. It’s an open source library with a vast community and great support. TensorFlow provides a set of tools for building neural network architectures, and then training and serving the models. It offers different levels of abstraction, so you can use it for cut-and-dried machine learning processes at a high level or go more in-depth and write the low-level calculations yourself.
Yesterday I was part of an introductory session on machine learning and unsurprisingly, the issue of supervised vs. unsupervised learning came up. In social sciences, there is a definite tendency for the former; there is more or less always a target outcome or measure that we want to optimise the performance of our models for. This reminded me of a draft that I had written the code a couple of months ago but for some reason never converted into a blog post until now. This will also allow me to take a break from conflict forecasting for a bit and go back to my usual topic of UK. My frequent usage of all things UK is at such a level that my Google Search Console insights lists r/CasualUK as the top referrer. Cheers, mates!
This blog post is about my newly released RGF package (the blog post consists mainly of the package Vignette). The RGF package is a wrapper of the Regularized Greedy Forest python package, which also includes a Multi-core implementation (FastRGF). Portability from Python to R was made possible using the reticulate package and the installation requires basic knowledge of Python. Except for the Linux Operating System, the installation on Macintosh and Windows might be somehow cumbersome (on windows the package currently can be used only from within the command prompt). Detailed installation instructions for all three Operating Systems can be found in the README.md file and in the rgf_python Github repository.