The notion of threading, for those who may not have this background, refers to several instances of a program, in this case, several instances of R, sharing global variables but otherwise running independently. As Bartnik points out, this can make I/O programming easier and clearer; see my Python tutorial, Chapter 4, for a network sockets example, in which the code must deal with situations in which data may come from one of many sockets, but without foreknowledge of which socket will be next. By having a separate process devoted to each socket, but storing the incoming data in a shared variable, the problem is neatly solved (and more conveniently than using nonblocking I/O).
Shiny is a package from RStudio that can be used to build interactive web pages with RStudio which is is an open source set of integrated tools designed to help you be more productive with R and you can download it from here. Use the examples in this tutorial to “take a first bite” and prepare for the exercises that follow and will help you build your first Shiny Application from “zero point”. This is the first part of the series and we will just create the interface, make some HTML formatting and add an image to our application. Specifically we will start creating a Shiny Application that will analyze the famous (Fisher’s or Anderson’s) iris data set which gives the measurements in centimeters of the variables sepal length and width and petal length and width, respectively, for 50 flowers from each of 3 species of iris. The species are Iris setosa, versicolor, and virginica. Lets go!
This chapter assumes that you’re trying to figure out how to extract a portion of that data frame so you can run a particular statistical test. If you don’t have any data yet, and you want to try this, let’s load in ChickWeight, one of R’s built in datasets. This contains the weights of little chickens at 12 different times throughout their lives. The chickens are on different diets, numbered 1, 2, 3, and 4. Using the str command, we find that there are 578 observations in this data frame, and two different categorical variables: Chick and Diet.
Big data is nowadays one of the most common buzzwords you might have heard of. There are many ways to define what big data is, and this is why probably it still remains a really difficult concept to grasp. Someone describes big data as dataset bigger than a certain threshold, e.g., over a terabyte (Driscoll, 2010), while others look at big data as dataset that crashes conventional analytical tools like Microsoft Excel. More renowned works though identified big data as data that display features of Variety, Velocity, and Volume (Laney, 2001; McAfee and Brynjolfsson, 2012; IBM, 2013; Marr, 2015). And all of them are somehow true, although I think incomplete.
Will Automated Predictive Analytics be a boon to professional data scientists or a dangerous diversion allowing well-meaning, motivated but amateur users try to implement predictive analytics. More on the conversation started last week about new One-Click Data-In Model-Out platforms.