Sankey diagrams are majorly used to visualize the flow of data on energy flows, material flow and trade-offs. But SentimentBuilder has rediscovered them to use with unstructured text, based on the their online NLP tool!
It is a bit a contradiction. Kaggle provides competitions on data science, while Stan is clearly part of the (Bayesian) statistics. Yet after using random forests, boosting and bagging, I also think this problem has a suitable size for Stan, which I understand can handle larger problems than older Bayesian software such as JAGS. What I aim to do is enter a load of variables in the Stan model. Aliasing will be ignored, and I hope the hierarchical model will provide suitable shrinkage for terms which are not relevant.
Population growth, shortages in housing supply, internal migration, immigration, cheap money, and foreign investors are just a few of the claimed causes of House Price Inflation (HPI) in New Zealand in recent years. The notorious example of HPI in action is NZ’s largest city – Auckland.
With the rapid growth of data and the shift towards rapid development solutions much data is being stored in NoSQL stores such as Hadoop and MongoDB. The infrastructure built upon relational databases that have been used for decades cannot keep up with the volume and scope of data being captured. Further to this SQL is also a really good invention and method for extracting and analysing data that is very widely used. In short it will not be replaced by hierarchical query techniques such as XPATH anytime soon.
Modeling and prediction problems occur in different domain and data situations. One type of situation involves sequence of events. For instance, you may want to model behaviour of customers on your website, looking at pages they land or enter by, links they click, and so on. You may want to do this to understand common issues and needs and may redesign your website to address that. You may, on the other hand, may want to promote certain sections or products on website and want to understand right page architecture and layout. In other example, you may be interested in predicting next medical visit of patient based on previous visits or next purchase product of customer based on previous products. While traditional classification model based prediction methodologies may apply, there is additional class of algorithm available if you can classify actions as finite set of discrete events.
Most books on regression analysis assume homoscedasticity, the situation in which Var(Y | X = t), for a response variable Y and vector of predictor variables X, is the same for all t. Yet, needless to say, almost all data in real life is heteroscedastic. For Y = human weight and X = height, say, we know that the assumption of homoscedasticity can’t be true, even approximately.
The goal of this challenge was to detect 6 different events related to hand movement during a task of grasping and lifting an object, using only EEG signal. We were asked to provide probabilities for the 6 events and for every time sample. The evaluation metric for this challenge was the Area under ROC curve (AUC) averaged over the 6 event types.
This question on Cross-Validated got me interested. I gave a fairly inadequate answer and want to explore a few of the issues. Actually, I have a plan for an effective technique which is what I think the original post was asking for, but I need to check out a few things first.
When writing a long function which has to deal with multiple checks and complex processes, it is valuable to put comments in the function body. This allows readers (including you) to catch the concept of process workflow without going into details. I’m going to present a way how those comments can be nicely reused for the documentation purpose. The post will be constructed based on the package development process. The inline function body comments will be utilized to generate a documentation file which stores a task list.