Loops in R and Python: Who is faster?

This post is about R versus Python in terms of the time they require to loop and generate pseudo-random numbers. To accomplish the task, the following steps were performed in Python and R (1) loop 100k times (i i is the loop index) (2) generate a random integer number out of the array of integers from 1 to the current loop index i i (i i +1 for Python) (3) output elapsed time at the probe loop steps: i i (i i +1 for Python) in [10, 100, 1000, 5000, 10000, 25000, 50000, 75000, 100000]

Does It Make Sense to Do Big Data with Small Nodes?

It’s been about ten years since NoSQL showed up on the scene. With its scale-out architecture, NoSQL enables even small organizations to handle huge amounts of data on commodity hardware. Need to do more? No problem, just add more hardware. As companies continue to use more and more data, the ability to scale-out becomes more critical. It’s also important to note that commodity hardware has changed a lot since the rise of NoSQL. In 2008, Intel was about to release the Intel Core and Core Duo architecture, in which we first had two cores in the same die. Jump back to the present, where so many of us carry around a phone with an 8-core processor. In this age of big data and powerful commodity hardware there’s an ongoing debate about node size. Does it make sense to use a lot of small nodes to handle big data workloads? Or should we instead use only a handful of very big nodes? If we need to process 200TB of data, for example, is it better to do so with 200 nodes with 4 cores and 1 terabyte each, or to use 20 nodes with 40 cores and 10 terabytes each?

Control Structures in R: Using If-Else Statements and Loops

Control structures allow you to specify the execution of your code. They are extremely useful if you want to run a piece of code multiple times, or if you want to run a piece a code if a certain condition is met.

An introduction to joint modeling in R

It basically combines (joins) the probability distributions from a linear mixed-effects model with random effects (which takes care of the longitudinal data) and a survival Cox model (which calculates the hazard ratio for an event from the censored data).

Whys and Hows of Apply Family of Functions in R: Introduction to Looping system

Imagine you were to perform a simple task, let’s say calculating sum of columns for 3X3 matrix, what do you think is the best way? Calculating it directly using traditional methods such as calculator or even pen and paper doesn’t sound like a bad approach. A lot of us may prefer to just calculate it manually instead of writing an entire piece of code for such a small dataset. Now, if the dataset is 10X10 matrix, would you do the same? Not sure. Now, if the dataset is further bigger, let’s say 100X100 matrix or 1000X1000 matrix or 5000X5000 matrix, would you even think of doing it manually? I won’t.

Benchmarking Google’s new TPUv2

Nine months after the initial announcement, Google last week finally released TPUv2 to early beta users on the Google Cloud Platform. At RiseML, we got our hands on them and ran a couple of quick benchmarks. Below, we’d like to share our experience and preliminary results.