Return to the Comedy Hour: P-values vs posterior probabilities (1)

Some recent criticisms of statistical tests of significance have breathed brand new life into some very old howlers, many of which have been discussed on this blog. One variant that returns to the scene every decade I think (for 50+ years?), takes a “disagreement on numbers” to show a problem with significance tests even from a “frequentist” perspective.

Can multivariate modeling predict taste of wine? Beyond human intuition and univariate reductionism

At a certain meetup on the other day, I talked about a brand-new relationship between taste of wine (i.e. professional tasting) and data science. This talk was inspired by a book ‘Wine Science: The Application of Science in Winemaking’.

HarvardX / MITx Online Courses – Year 1

Contributed by John Montroy. John took NYC Data Science Academy 12 week full time Data Science Bootcamp program with Christopher Markis, Luke Lin, Sam Kamin, Zeyu Zhang between Sept 23 to Dec 18, 2015. The post was based on His second class project(due at 4th week of the program).

Introduction to Circular Statistics – Rao’s Spacing Test

Today will be a brief introduction in to circular statistics (sometimes referred to as directional statistics). Circular statistics is an interesting subdivision of statistics involving observations taken as vectors around a unit circle. As an example, imagine measuring birth times at a hospital over a 24-hour cycle, or the directional dispersion of a group of migratory animals. This type of data is involved in a variety fields, such as ecology, climatology, and biochemistry. The nature of measuring observations around a unit circle necessitates a different approach to hypothesis testing. Distributions need to be “wrapped” around the circle to be of use, and conventional estimators such as the sample mean or sample variance hold no water.

Additional thoughts about ‘Lorenz curves’ to compare models

A few month ago, I did mention a graph, of some so-called Lorenz curves to compare regression models.

Bidirectional Helmholtz Machines

Here we propose a new method, referred to as bidirectional Helmholtz machine (BiHM), that is based on the idea that the generative model should be close to the class of distributions that can be modeled by our approximate inference distribution and that both the top-down and bottom-up distributions should contribute to the model.

Vowpal Wabbit (Fast Learning)

This is a project started at Yahoo! Research and continuing at Microsoft Research to design a fast, scalable, useful learning algorithm. VW is the essence of speed in machine learning, able to learn from terafeature datasets with ease. Via parallel learning, it can exceed the throughput of any single machine network interface when doing linear learning, a first amongst learning algorithms.

Sixer – R package cricketr’s new Shiny avatar

In this post I create a Shiny App, Sixer, based on my R package cricketr. I had developed the R package cricketr, a few months back for analyzing the performances of batsman and bowlers in all formats of the game (Test, ODI and Twenty 20). This package uses the statistics info available in ESPN Cricinfo Statsguru. I had written a series of posts using the cricketr package where I chose a few batsmen, bowlers and compared their performances of these players. Here I have created a complete Shiny app with a lot more players and with almost all the features of the cricketr package.

Top 6 Data Modeling Tools

1. PowerDesigner
2. ER/Studio
3. Sparx Enterprise Architect
4. Oracle SQL Developer Data Modeler
5. CA ERwin
6. IBM – InfoSphere Data Architect