Visualising the predictive distribution of a log-transformed linear model

Today I will take a closer look at the log-transformed linear model and use Stan/rstan, not only to model the sales statistics, but also to generate samples from the posterior predictive distribution. The posterior predictive distribution is what I am most interested in. From the simulations I can get the 95% prediction interval, which will be slightly wider than the theoretical 95% interval, as it takes into account the parameter uncertainty as well.

Review: Circular Statistics in R

Circular Statistics in R by Pewsey, Neuhaeuser and Ruxton provides an accessible and R- centric introduction to circular statistics, together with an online resource furnishing the R code used in the book. Circular statistics is a topic whose global usefulness is matched by its poor coverage in statistical texts: just now I scanned my bookshelf of undergraduate-level statistics textbooks but not one even mentioned circular data and indeed the authors claim that only six books have been published that treat circular data in depth. Perhaps only compositional data (Aitchison 1986) shares this peculiar status of being ubiquitous yet so frequently overlooked.

New Pacakge “docxtractr” – Easily Extract Tables From Microsoft Word Docs

This is more of a follow-up from yesterday’s post. The hack and function in said post was fine, but it was limited to uniform tables and made you do more work than you had to. So, there’s now a devtools-installable package on github that makes it way easier to get information about the tables in a Word document and extract them—uniform or not.

Comparing the leading big data analytics software options

Leading vendors of big data analytics software
• Alteryx, which consists of a Designer module for designing analytics applications, a Server component for scaling across the organization and an Analytics Gallery for sharing applications with external partners.
• IBM, which provides SPSS Modeler, a tool targeted to users with little or no analytical background. IBM also has SPSS Statistics, which is geared toward more sophisticated analysts.
• KNIME, an open source product commercialized by software vendor that includes an analytics platform and a number of commercial extensions for big data, cluster operations and collaboration.
• Microsoft Revolution Analytics, which spans two products — Revolution R Open, a free download that’s an enhanced version of the R programming language, and Revolution R Enterprise, which supports the use of R in clustered environments (like Hadoop).
• Oracle Advanced Analytics, which includes Oracle Data Miner, Oracle R Advanced Analytics for Hadoop and Oracle Big Data Discovery, as well as connectors and interfaces for SQL and R.
• RapidMiner, which provides a Studio component for design, a Server component, a Hadoop connector called Radoop and a component for stream processing.
• SAP Predictive Analytics, which comprises two versions, Automated Analytics (for business users without a formal background) and Expert Analytics (targeted to professional data analysts and data scientists).
• SAS Enterprise Miner, which is intended to help users quickly develop descriptive and predictive models, including components for predictive modeling and in-database scoring.
• The Teradata Aster Discovery Platform, which is a framework offered by Teradata with its Aster database, Discovery Portfolio with built-in analytics functions, a graph processing engine, MapReduce and a version of R.

Python, Machine Learning, and Language Wars. A Highly Subjective Point of View

In the following paragraphs, I really don’t mean to tell you why you or anyone else should use Python. To be honest, I really hate those types of questions: “Which * is the best?” (* insert “programming language, text editor, IDE, operating system, computer manufacturer” here). This is really a nonsense question and discussion. Sometimes it can be fun and entertaining though, but I recommend saving this question for our occasional after-work beer or coffee with friends and colleagues.

The Bias of Certain Elasticity Estimators

In a recent post I discussed some aspects of estimating elasticities from regression models, and the interpretation of these values. That discussion should be kept in mind in reading what follows. One thing that a lot of practitioners seem to be unaware of (or they choose to ignore it) is that in many of the common situations where we use regression analysis to estimate elasticities, these estimators are biased. And that’s true even if all of the conditions needed for the coefficient estimator (e.g., OLS) to be unbiased are fully satisfied. Let’s look at some common situations leading to the estimation of elasticities and marginal effects, and see if we can summarize what’s going on.