Introduction to Probabilistic Data Structures
Probabilistic data structures are a group of data structures that are extremely useful for big data and streaming applications. Generally speaking, these data structures use hash functions to randomize and compactly represent a set of items. Collisions are ignored but errors can be well-controlled under certain threshold. Comparing with error-free approaches, these algorithms use much less memory and have constant query time. They usually support union and intersection operations and therefore can be easily parallelized.

Modeling the Latent Structure That Shapes Brand Learning
What is a brand? Metaphorically, the brand is the white sphere in the middle of this figure, that is, the ball surrounded by the radiating black cones. Of course, no ball has been drawn, just the conic thorns positioned so that we construct the sphere as an organizing structure (a form of reification in Gestalt psychology). Both perception and cognition organize input into Gestalts or Wholes generalizing previously learned forms and configurations. It is because we are familiar with pictures like the following that we impose an organization on the black objects and ‘see’ a ball with spikes. You did not need to be thinking about ‘spikey balls’ for the figure recruits its own interpretative frame. Similarly, brands and product categories impose structure on feature sets. The brand may be an associative net (what comes to mind when I say ‘McDonald’s’), but that network is built on a scaffolding that we can model using R.

Wakefield: Random Data Set (Part II)
This post is part II of a series detailing the GitHub package, wakefield, for generating random data sets. The First Post (part I) was a test run to gauge user interest. I received positive feedback and some ideas for improvements, which I’ll share below.

Short story on scaling an NLP problem without using a ton of hardware.
The basic idea of relation extraction is to be able to detect mentioned things in text (so called Mentions, or Entity-Occurrences), and later decide if in the text is expressed or not the target relation between each couple of those things. In our case, we needed to find where companies were mentioned, and later determine if in a given sentence it was said that Company-A was funding Company-B or not.

OpenStreetMap Visualization Case Study & Sample Code
OpenStreetMap (OSM) is a tool for creating and sharing map information. Anyone can contribute to OSM, that OSM maps are saved on the internet, and anyone can access them at any time. The growth of OSM has been phenomenal over the years and the number of registered users contributing to OSM is nearly 2 Million.

Visualizing fits, inference, implications of (G)LMMs with Jaime Ashander
A couple of weeks at the Davis R Users’ Group, Jaime Ashander gave a presentation on and visualizing and diagnosing (G)LMMs in R. Here’s the video …