The Paradox of Replication, and the vindication of the P-value (but she can go deeper)

Critic 1: It’s much too easy to get small P-values. Critic 2: We find it very difficult to get small P-values; only 36 of 100 psychology experiments were found to yield small P-values in the recent Open Science collaboration.

Data Visualization Tools For Data scientists & analysts


Logistic Regression in R – Part One

Logistic regression is used to analyze the relationship between a dichotomous dependent variable and one or more categorical or continuous independent variables. It specifies the likelihood of the response variable as a function of various predictors. The model expressed as log(odds) = \beta_0 + \beta_1*x_1 + … + \beta_n*x_n , where \beta refers to the parameters and x_i represents the independent variables. The log(odds), or log of the odds ratio, is defined as ln[\frac{p}{1-p}]. It expresses the natural logarithm of the ratio between the probability that an event will occur, p(Y=1), to the probability that an event will not occur, p(Y=0).

Correction For Spatial And Temporal Auto-Correlation In Panel Data: Using R To Estimate Spatial HAC Errors Per Conley

Economists and political scientists often employ panel data that track units (e.g., firms or villages) over time. When estimating regression models using such data, we often need to be concerned about two forms of auto-correlation: serial (within units over time) and spatial (across nearby units). As Cameron and Miller (2013) note in their excellent guide to cluster-robust inference, failure to account for such dependence can lead to incorrect conclusions: “[f]ailure to control for within-cluster error correlation can lead to very misleadingly small standard errors…” (p. 4).

R: Why it’s the next programming language you should learn

• R in business
• R in higher education
• R is profitable
• R has a diverse community
• R is fun

SAP Unveils New Cloud Platform Services and In-Memory Innovation on Hadoop to Accelerate Digital Transformation

SAP HANA Vora is a new in-memory query engine that leverages and extends the Apache Spark execution framework to provide enriched interactive analytics on Hadoop. As companies take part in their digital transformation journey, they face complex hurdles in dealing with distributed Big Data everywhere, compounded by the lack of business process awareness across enterprise apps, analytics, Big Data and Internet of Things (IoT) sources.

Data Driven Digest for August 28: Treemaps

The first treemap was created around 1990, for a reason that seems laughable today: 14 people in a University of Maryland computer lab shared an 80-megabyte disk drive, and one of them – professor Ben Shneiderman – wanted to know which individuals, and which files, took up the most space. After considering circular, triangular, and rectangular representations, Prof. Shneiderman came up with the nested, colored rectangle format we use today. (His history of treemaps is fun reading if you want to learn more.) Of course, treemaps have proven valuable for much more than determining who’s hogging the hard drive, as evidenced by the examples below. – See more at: http://…/#sthash.JaipVshE.dpuf