Time Series (referred as TS from now) is considered to be one of the less known skills in the analytics space (Even I had little clue about it a couple of days back). But as you know our inaugural Mini Hackathon is based on it, I set myself on a journey to learn the basic steps for solving a Time Series problem and here I am sharing the same with you. These will definitely help you get a decent model in our hackathon today.
The standard plot function in R allows extensive tuning of every element being plotted. There are, however, many possible ways and the standard help file are hard to grasp at the beginning. In this article we will see how to control every aspects of the axis (labels, tick marks …) in the standard plot function.
Hello everyone! In this article I will show you how to run the random forest algorithm in R. We will use the wine quality data set (white) from the UCI Machine Learning Repository.
When people think of a data anomaly, they often think of an error — a random blip outside of the normal scope of things that can be considered but discarded. A data anomaly, to many, is little more than a data defect. In the world of business data intelligence, however, this view is not only usually wrong, but in many cases, it can also be damaging. A data anomaly is often much more than a blip — it’s a signal.
Along with being an electrical engineer and now attending NYCDSA 12 week Data Science bootcamp to become a data scientist, I am also co-owner of a wedding and event coordination company with my wife known as LLG Events Inc. As you could imagine my weeks are filled with exploring fun and interesting data sets and my weekends are filled with running lavish events so at some point during the bootcamp I sat back and thought to myself… hmm wouldn’t it be nice to combine the two! Inspired to combine these two elements of my life, I examined all challenges that my wife and I face as business owners and realized that the most difficult part of a service oriented business is how to meet clients. We do not believe in costly advertising that deems vendors ‘Platinum Vendors’ just because they spend the most, we use our reputation to sell us but this also has its limitations… How do we meet new clients outside of our network? How do we expand our network?
The biggest impact on data science right now is not coming from a new algorithm or statistical method. It’s coming from Docker containers. Containers solve a bunch of tough problems simultaneously: they make it easy to use libraries with complicated setups; they make your output reproducible; they make it easier to share your work; and they can take the pain out of the Python data science stack.
With fewer than 500 North Atlantic right whales left in the world’s oceans, knowing the health and status of each whale is integral to the efforts of researchers working to protect the species from extinction. In the NOAA Right Whale Recognition challenge, 470 players on 364 teams competed to build a model that could identify any individual, living North Atlantic right whale from its aerial photographs. Felix Lau entered the competition with the goal of practicing new techniques in deep learning, and ended up taking second place. This blog shares his background, some of the limitations he ran into, and a high level overview of his approach. For a more technical description of his winning solution, don’t miss the post in his personal blog.
Lets say that we estimated a linear regression model on time series data with lagged predictors. The goal is to estimate sales as a function of inventory, search volume, and media spend from two months ago. After using the lm function to perform linear regression, we predict sales using values from two month ago.
In this project, I studied and re-implemented the technique for dependency parsing proposed by Yamada, Hiroyasu, and Yuji Matsumoto in ‘Statistical dependency analysis with support vector machines.’. In addition to recreating the results we also experimented with the biomedical data from the GENIA biomedical corpus(Yuka et al., 2005) and the Spanish universal dependency dataset(McDonald et al.,2013) to understand out of domain implications.
The course emphasizes how to design A/B tests using prior “guestimates” of effect sizes (often you have these from prior campaigns, or somebody claims an effect size and it is merely your job to confirm it). It is fairly technical, and the emphasis is Bayesian- where we are trying to get an actual estimate of the distribution unknown true expected payoff rate of the various campaigns (the so-called posteriors). We show how to design and evaluate a sales campaigns for a product at two different price points.
Today we’re practising functions! In the exercises below, you’re asked to write short R scripts that define functions aimed at specific tasks. The exercises start at an easy level, and gradually move towards slightly more complex functions.
Image classification is one important field in Computer Vision, not only because so many applications are associated with it, but also a lot of Computer Vision problems can be effectively reduced to image classification. The state of art tool in image classification is Convolutional Neural Network (CNN). In this article, I am going to write a simple Neural Network with 2 layers (fully connected). First, I will train it to classify a set of 4-class 2D data and visualize the decision bounday. Second, I am going to train my NN with the famous MNIST data (https://…/digit-recognizer) and see its performance. The first part is inspired by CS 231n course offered by Stanford: http://…/, which is taught in Python.
This is my first blog of the year…so I want it to be something really nice and huge You know how much I love the R Programming Language…but I also love other technologies as well…so taking a bunch of them and hooking them up together is what really brings me joy.
This is the sequel to the previous report “issues specific to cp932 locale, Japanese Shift-JIS, on Windows“. In this report, I will dig the issues deeper to find out what is exactly happening.
We had a fantastic turnout to last week’s webinar, Introduction to Microsoft R Open. If you missed it, you can watch the replay below. In the talk, I gives some background on the R language and its applications, describe the performance and reproducibility benefits of Microsoft R Open, and give a demonstration of the basics of the R language along with a more in-depth demo of producing a beautiful weather data chart with R.
A frequent question that we get here at Microsoft about MRO (Microsoft R Open) is: can be used with RStudio? The short answer is absolutely yes! In fact, more than just being compatible, MRO is the perfect complement for the RStudio environment. MRO is a downstream distribution of open source R that supports multiple operating systems and provides features that enhance the performance and reproducible use of the R language. RStudio, being much more than a simple IDE, provides several features such as the tight integration knitr, RMarkdown and Shiny that promote literate programming, the creation of reproducible code as well as sharing and collaboration. Together, MRO and RStudio they make a powerful combination. Before elaborating on this theme, I should just make it clear how to select MRO from the RStudio IDE. After you have installed MRO on your system, open RStudio, go to the ‘Tools’ tab at the top, and select ‘Global Options’. You should see a couple of pop-up windows like the screen capture below. If RStudio is not already pointing to MRO (like it is in the screen capture) browse to it, and click ‘OK’.