How to host a R Shiny App on AWS cloud in 7 simple steps

R shiny app is an interactive web interface. R shiny app has two components user interface object (UI.R) and server function (Server .R). The two components are passed as arguments to the shiny app function that creates a shiny app object.

Visualising SSH attacks with R

If you have any machine with an SSH server open to the world and you take a look at your logs, you may be alarmed to see so many login attempts from so many unknown IP addresses. DenyHosts is a pretty neat service for Unix-based systems which works in the background reviewing such logs and appending the offending addresses into the hosts.deny file, thus avoiding brute-force attacks.

R 3.4.3 released

R 3.4.3 has been released, as announced by the R Core team today. As of this writing, only the source distribution (for those that build R themselves) is available, but binaries for Windows, Mac and Linux should appear on your local CRAN mirror within the next day or so. This is primarily a bug-fix release. It fixes an issue with incorrect time zones on MacOS High Sierra, and some issues with handling Unicode characters. (Incidentally, representing international and special characters is something that R takes great care in handling properly. It’s not an easy task: a 2003 essay by Joel Spolsky describes the minefield that is character representation, and not much has changed since then.) You can check out the complete list of changes here. Whatever your platform, R 3.4.3 should be backwards-compatible will other R versions in the R 3.4.x series, and so your scripts and packages should continue to function as they did before. The codename for this release is ‘Kite-Eating Tree’, and as with all R codenames this is a references to a classic Peanuts episode. If you’re interested in the source of other R release names, Lucy D’Agostino McGowan provides the Peanuts references for R release names back to R 2.14.0.

A quick introduction to using color in density plots

Right now, many people are pursuing data science because they want to learn artificial intelligence and machine learning. And for good reason. Machine learning is white hot right now, and it will probably reshape almost every sector of the world economy. Having said that, if you want to be a great machine learning expert, and a great data scientist in general, you need to master data visualization too. This is because data visualization is a critical prerequisite for advanced topics (like machine learning), and also because visualization is very useful for getting things done in its own right. So let’s talk a little more about data visualization. As you begin learning data visualization in R, you should master the basics: the how to use ggplot2, how to think about data visualization, how to make basic plots (like the bar chart, line chart, histogram, and scatterplot).

Introducing RITCH: Parsing ITCH Files in R (Finance & Market Microstructure)

Recently I was faced with a file compressed in NASDAQ’s ITCH-protocol, as I wasn’t able to find an R-package that parses and loads the file to R for me, I spent (probably) way to much time to write one, so here it is. But you might wonder, what exactly is ITCH and why should I care? Well, ITCH is the outbound protocol NASDAQ uses to communicate market data to its clients, that is, all information including market status, orders, trades, circuit breakers, etc. with nanosecond timestamps for each day and each exchange. Kinda a must-have if you are looking into market microstructure, a good-to-have-looked-into-it if you are interested in general finance and/or if you are interested in data analytics and large structured datasets.

Deep Image Prior

Deep convolutional networks have become a popular tool for image generation and restoration. Generally, their excellent performance is imputed to their ability to learn realistic image priors from a large number of example images. In this paper, we show that, on the contrary, the structure of a generator network is sufficient to capture a great deal of low-level image statistics prior to any learning. In order to do so, we show that a randomly-initialized neural network can be used as a handcrafted prior with excellent results in standard inverse problems such as denoising, superresolution, and inpainting. Furthermore, the same prior can be used to invert deep neural representations to diagnose them, and to restore images based on flash-no flash input pairs. Apart from its diverse applications, our approach highlights the inductive bias captured by standard generator network architectures. It also bridges the gap between two very popular families of image restoration methods: learning-based methods using deep convolutional networks and learning-free methods based on handcrafted image priors such as self-similarity.

Analysing iOS App Store iTunes Reviews in R

Unlike Google PlayStore Developer Console for Android App, iOS App Store’s iTunes Connect does not help developers with the bulk download of App Store iTunes Reviews. So if you are an iOS App Developer or a Mobile App Product Manager, You are left with no option but to get paid subscription like App Annie to access and analyze iOS App Store iTunes Reviews of your App. itunesr an R package to access and analyse iOS App Store iTunes Reviews and We will see how to use itunesr to analyse iTunes Reviews of any iOS App for free.

A General Approach to Preprocessing Text Data

Recently we had a look at a framework for textual data science tasks in their totality. Now we focus on putting together a generalized approach to attacking text data preprocessing, regardless of the specific textual data science task you have in mind.

A Data Scientist’s Real Job: Storytelling

Every morning at, our computers greet us with a report containing over 350 million data points tracking our organization’s performance. Our challenge as data scientists is to translate this haystack of information into guidance for staff so they can make smart decisions — whether it’s choosing the right headline for today’s email blast (should we ask our members to “take action now” or “learn more”?) or determining the purpose of our summer volunteer campaign (food donation drive or recycling campaign?). In short, we’re tasked with transforming data into directives. Good analysis parses numerical outputs into an understanding of the organization. We “humanize” the data by turning raw numbers into a story about our performance. When many people hear “Big Data,” they think “Big Brother” (Type “big data is…” into Google and one of the top recommendations is, “…watching you.”). Central to this anxiety is a feeling that what it means to be human can’t be tracked or quantified by computers. This fear is well-founded. As the cost of collecting and storing data continues to decrease, the volume of raw data an organization has available can be overwhelming. Of all the data in existence, 90% was created in the last 2 years. Inundated organizations can lose sight of the difference between what’s statistically significant and what’s important for decision-making. Using Big Data successfully requires human translation and context whether it’s for your staff or the people your organization is trying to reach. Without a human frame, like photos or words that make emotion salient, data will only confuse, and certainly won’t lead to smart organizational behavior.