Measuring the Progress of AI Research

This pilot project collects problems and metrics/datasets from the AI research literature, and tracks progress on them. You can use this Notebook to see how things are progressing in specific subfields or AI/ML as a whole, as a place to report new results you’ve obtained, as a place to look for problems that might benefit from having new datasets/metrics designed for them, or as a source to build on for data science projects. At EFF, we’re ultimately most interested in how this data can influence our understanding of the likely implications of AI. To begin with, we’re focused on gathering it.

Using csvkit to Summarize Data: A Quick Example

As data analysts, we’re frequently presented with comma-separated value files and tasked with reporting insights. While it’s tempting to import that data directly into R or Python in order to perform data munging and exploratory data analysis, there are also a number of utilities to examine, fix, slice, transform, and summarize data through the command line. In particular, Csvkit is a suite of python based utilities for working with CSV files from the terminal. For this post, we will grab data using wget, subset rows containing a particular value, and summarize the data in different ways. The goal is to take data on criminal activity, group by a particular offense type, and develop counts to understand the frequency distribution.

Julia vs R and Python: what does Stack Overflow Developer Survey 2017 tell us?

TLDR: Most Julia programmers also use Python. However, among all languages R is the one whose users are most likely to also develop in Julia. Recently Stack Overflow has made public the results of Developer Survey 2017. It is definitely an interesting data set. In this post I analyzed the answers to the question ‘Which of the following languages have you done extensive development work in over the past year, and which do you want to work in over the next year?’ from the perspective of Julia language against other programming languages. Actually we get two variables of interest: 1) what was used and 2) what is planned to be used.

dbplyr 1.1.0

I’m pleased to announce the release of the dbplyr package, which now contains all dplyr code related to connecting to databases. This shouldn’t affect you-as-a-user much, but it makes dplyr simpler, and makes it easier to release improvements just for database related code.

Using the TensorFlow API: An Introductory Tutorial Series

This post summarizes and links to a great multi-part tutorial series on learning the TensorFlow API for building a variety of neural networks, as well as a bonus tutorial on backpropagation from the beginning.

GLM with H2O in R

Below is an example showing how to fit a Generalized Linear Model with H2O in R. The output is much more comprehensive than the one generated by the generic R glm().