Correctness in Data Science – Data Science Pop-up Seattle


Simple Sequential A/B Testing

Stopping an A/B test early because the results are statistically significant is usually a bad idea. In this post, I will describe a simple procedure for analyzing data in a continuous fashion via sequential sampling. Sequential sampling allows the experimenter to stop the trial early if the treatment appears to be a winner; it therefore addresses the ‘peeking’ problem associated with eager experimenters who use (abuse) traditional fixed-sample methods.

Course: Reinforcement Learning

You should take this course if you have an interest in machine learning and the desire to engage with it from a theoretical perspective. Through a combination of classic papers and more recent work, you will explore automated decision-making from a computer-science perspective. You will examine efficient algorithms, where they exist, for single-agent and multi-agent planning as well as approaches to learning near-optimal decisions from experience. At the end of the course, you will replicate a result from a published paper in reinforcement learning.

Smoothing Techniques using basis functions: Fourier Basis

In this post we will introduce the Fourier basis functions in the context of Functional Data Analysis. The Fourier basis function is method to smooth out data varying over a continuum and exhibiting a cyclical trend. Smoothing techniques play an important role in Functional Data Analysis (FDA) as they provide insight in the functional behavior of stochastic process. The mathematical background as well as its application will be done. The R-packages used here are: fda and fda.usc. Anyone interested in this topic should check out this website: Functional Data Analysis.

Gradient-based Optimization of Hyperparameters through Reversible Learning

Gradient-based Optimization of Hyperparameters through Reversible Learning

Grasp-and-Lift EEG Winners’ interview: 1st place, Cat & Dog

Team Cat & Dog took first place in the Grasp-and-Lift EEG Detection competition ahead of 378 other teams. The pair also comprised 2/3 of the first place team from another recent EEG focused competition on Kaggle, BCI Challenge @ NER 2015. Domain knowledge and a strong collaborative relationship have made Alexandre Barachant (aka Cat) and Rafał Cycoń (aka Dog) successful in both competitions. In this blog, they share best practices for working with EEG data, as well as the tools and code that took them to the top of the Grasp-and-Lift EEG Detection leaderboard. They also tip their hat to all the Kagglers who shared scripts during the competition. The code shared and models developed during this challenge were huge contributions to the WAY Consortium’s work in developing prosthetic devices for patients who have lost hand function due to neurological disabilities or amputation.

Do more with Python: Creating a graph application with Python, Neo4j, Gephi, and Linkurious

Here is how to build a neat app with graph visualization of Python and related topics from Packt and StackOverflow, combining Gephi, Linkurious, and Neo4j.

The Network Underlying Consumer Perceptions of the European Car Market

The nodes have been assigned a color by the author so that the underlying distinctions are more pronounced. Cars that are perceived as Economical (in aquamarine) are not seen as Sporty or Powerful (in cyan). The red edges connecting these attributes indicate negative relationships. Similarly, a Practical car (in light goldenrod) is not Technically Advanced (in light pink). This network of feature associations replicates both the economical to luxury and the practical to advanced differentiations so commonly found in the car market. North Americans living in the suburbs may need to be reminded that Europe has many older cities with less parking and narrower streets, which explains the inclusion of the city focus feature.

Minimal R Package Check List

1. Install most recent version of R
2. Install most recent version of RStudio
3. Open RStudio
4. Install devtools package
5. Click on Project –> New Project… –> New Directory –> R package
6. Enter package name
7. Delete boilerplate code and ‘hello.R’ file
8. Goto ‘man’ directory an delete ‘hello.Rd’ file
9. In File browser, click on package name to go to the top level directory
10. Click ‘Build’ tab in environment browser
11. Click ‘Configure Build Tools…’
12. Check ‘Generate documentation with Roxygen’
13. Check ‘Build & Reload’ when Roxygen Options window opens –> Click OK
14. Click OK in Project Options window

Baking priors

There remains a bit of a two-way snobbery that Frequentist statistics is what we teach (as so-called objective statistics remain the same no matter who works with them) and Bayesian statistics is what we do (as it tends to directly estimate posterior probabilities we are actually interested in). Nina Zumel hit the nail on the head when she wrote an article explaining the appropriateness of the type of statistical theory depends on the type of question you are trying to answer, not on your personal prejudices. We will discuss a few more examples that have been in our mind, including one I am calling “baking priors.” This final example will demonstrate some of the advantages of allowing researchers to document their priors.