Consistent and Clear Reporting of Results from Diverse Modeling Techniques: The A3 Method

The measurement and reporting of model error is of basic importance when constructing models. Here, a general method and an R package, A3, are presented to support the assessment and communication of the quality of a model t along with metrics of variable importance. The presented method is accurate, robust, and adaptable to a wide range of predictive modeling algorithms. The method is described along with case studies and a usage guide. It is shown how the method can be used to obtain more accurate models for prediction and how this may simultaneously lead to altered inferences and conclusions about the impact of potential drivers within a system.

SSMMATLAB: A Set of MATLAB Programs for the Statistical Analysis of State Space Models

This article discusses and describes SSMMATLAB, a set of programs written by the author in MATLAB for the statistical analysis of state space models. The state space model considered is very general. It may have univariate or multivariate observations, time-varying system matrices, exogenous inputs, regression e ects, incompletely speci ed initial conditions, such as those that arise with cointegrated VARMA models, and missing values. There are functions to put frequently used models, such as multiplicative VARMA models, VARMAX models in echelon form, cointegrated VARMA models, and univariate structural or ARIMA model-based unobserved components models, into state space form. There are also functions to implement the Hillmer-Tiao canonical decomposition and the smooth trend and cycle estimation proposed by G omez (2001). Once the model is in state space form, other functions can be used for likelihood evaluation, model estimation, forecasting and smoothing. A set of examples is presented in the SSMMATLAB manual to illustrate the use of these functions.

Interactive Data Visualization using Bokeh (in Python)

Bokeh is a Python library for interactive visualization that targets web browsers for representation. This is the core difference between Bokeh and other visualization libraries. Look at the snapshot below, which explains the process flow of how Bokeh helps to present data to a web browser.

Understanding LSTM Networks

Humans don’t start their thinking from scratch every second. As you read this essay, you understand each word based on your understanding of previous words. You don’t throw everything away and start thinking from scratch again. Your thoughts have persistence. Traditional neural networks can’t do this, and it seems like a major shortcoming. For example, imagine you want to classify what kind of event is happening at every point in a movie. It’s unclear how a traditional neural network could use its reasoning about previous events in the film to inform later ones. Recurrent neural networks address this issue. They are networks with loops in them, allowing information to persist.

A Beginner’s Guide to SQL

Maybe it is hard to believe, but SQL is used everywhere around us. Every application that is manipulating any kind of data needs to store that data somewhere. Whether it’s Big Data or just a table with few simple rows, a government or a small startups, or a big database that spans over multiple servers or a mobile phone that runs its own small database, SQL is ubiquitous. But what is a SQL? SQL stands for Structured Query Language, and usually is pronounced as “ess-que-el”. SQL is the language of databases, and is specifically built to communicate with databases. SQL is a simple language and is similar to the English language, as commands are structured almost like English sentences. Those sentences are structured like declared statements, thus SQL is also called a declarative language.

Plotting Time Series in R using Yahoo Finance data

I recently rediscovered the Timely Portfolio post on R Financial Time Series Plotting. If you are not familiar with this gem, it is well-worth the time to stop and have a look at it now. Not only does it contain some useful examples of time series plots mixing different combinations of time series packages (ts, zoo, xts) with multiple plotting systems (base R, lattice, etc.) but it provides an instructive, historical perspective that illustrates the non linear nature of progress in software development: new code is written to solve certain technical problems with the current software. Progress is made, and the new code makes it possible to do some things that couldn’t be done before, but there were tradeoffs. Design choices for the new system make it a little more difficult to do something that was easy before. The net result: all of the software continues to advance in a messy mix, confusing the newcomer and providing critics with the opportunity to complain that there is not just one way to solve a problem.

Spatio-Temporal Kriging in R

This is the first time I considered spatio-temporal interpolation. Even though many datasets are indexed in both space and time, in the majority of cases time is not really taken into account for the interpolation. As an example we can consider temperature observations measured hourly from various stations in a determined study area. There are several different things we can do with such a dataset. We could for instance create a series of maps with the average daily or monthly temperatures. Time is clearly considered in these studies, but not explicitly during the interpolation phase. If we want to compute daily averages we first perform the averaging and then kriging. However, the temporal interactions are not considered in the kriging model.

Feature Elimination in Support Vector Machines and Empirical Risk Minimization

We develop an approach for feature elimination in support vector machines (and empirical risk minimization), based on recursive elimination of features. We present theoretical properties of this method and show that this is uniformly consistent in finding the correct feature space under certain generalized assumptions. We present case studies to show that the assumptions are met in most practical situations and also present simulation studies to demonstrate performance of the proposed approach.