DevOps Pipeline for a Machine Learning Project

Machine learning is getting more and more popular in applications and software products, from accounting to hot dog recognition apps. When you add machine learning techniques to exciting projects, you need to be ready for a number of difficulties. The Statsbot team asked Boris Tvaroska to tell us how to prepare a DevOps pipeline for an ML based project.


GPU-accelerated, In-database Analytics for Operationalizing AI

This blog explores how the massive parallel processing power of the GPU is able to unify the entire AI pipeline on a single platform, and how this is both necessary and sufficient for overcoming the challenges to operationalizing AI.


I data scienced monitoring data and so can you

Rob Claire introduces the monitoring tools Pinterest uses and offers real-world examples of problem solving with data monitoring.


Bollinger Bands and their use in Stock Market Analysis (using Quandl & tidyverse in R)

Finding underlying patterns and taking decisions is very critical in Stock market. The same skill can be applied to many parallel domains. For example, I met some one who was doing the same thing with Cryptocurrency recently. Risk & Unemployment prediction in banks, customer churn in telecom and spend analysis are all examples of similar problems. That is why I decided to create this series of articles. By following this series, you will understand some of the techniques used in stock market. You can also apply them to the parallel domains I mentioned before. In the last article (Part I) , we started with descriptive analysis for comparison on stocks. In this post, we will emphasize on identifying patterns in order to know how a stock behaves. This behavior, as you will see later on, is very important for stock trading. In the latter part of the article, I will show how to predict stock prices using the conventional ARIMA (Auto-Regressive intensive Moving Average Method) methodology from Time Series Analysis and Regression Model.


Time Series Analysis in R Part 3: Getting Data from Quandl

Generated data like that used in Parts 1 and 2 is great for sake of example, but not very interesting to work with. So let’s get some real-world data that we can work with for the rest of this tutorial. There are countless sources of time series data that we can use including some that are already included in R and some of its packages. We’ll use some of this data in examples. But I’d like to expand our horizons a bit.


A Gentle Introduction on Market Basket Analysis – Association Rules

Market Basket Analysis is one of the key techniques used by large retailers to uncover associations between items. It works by looking for combinations of items that occur together frequently in transactions. To put it another way, it allows retailers to identify relationships between the items that people buy. Association Rules are widely used to analyze retail basket or transaction data, and are intended to identify strong rules discovered in transaction data using measures of interestingness, based on the concept of strong rules.


Data Analytics Basics (introduction)

You might have heard, that Data Scientist ranked as the best job of 2017 in the USA (based on Glassdoor’s research). Recently many IT professionals started considering to move their career path towards Data Science or Data Analytics. University students are looking for data related internships – even if their major is not related to it. And even project/product managers want to learn the data analytics basics to bring better data-informed decisions. Are you interested to learn more about the basics of data analytics too? Then this article is for you! I’ll just summarize here the most fundamental topics for first timers.


Cleaning data with Pandas

Pandas is a data analysis library in Python, widely used by data scientists. From data wrangling to data cleaning, it offers multiple functionalities to make life easy when working with data. It’s a must-know for anyone looking forward to getting started in the field. Pandas comes with a wide variety of features, here we’ll focus mainly on the data cleaning aspect.


Understanding Machine Learning Algorithms

Machine learning algorithms aren’t difficult to grasp if you understand the basic concepts. Here, a SAS data scientist describes the foundations for some of today’s popular algorithms.


Question answering with TensorFlow

Using advanced neural networks to tackle challenging natural language tasks.


Sentiment analysis with Apache MXNet

Using deep neural networks to make sense of unstructured text.


Create Powerpoint presentations from R with the OfficeR package

For Officer many of us data scientists, whatever the tools we use to conduct research or perform an analysis, our superiors are going to want the results as a Microsoft Office document. Most likely it’s a Word document or a PowerPoint presentation, and it probably has to follow corporate branding guidelines to boot. The OfficeR package, by David Gohel, addresses this problem by allowing you to take a Word or PowerPoint template and programmatically insert text, tables and charts generated by R into the template to create a complete document. (The OfficeR package also represents a leap forward from the similar ReporteRs package: it’s faster, and no longer has a dependency on a Java installation.)


Dashboard Design: 8 Types of Online Dashboards

What type of online dashboard will work best for your data? This post reviews eight types of online dashboards to assist you in choosing the right approach for your next dashboard. Note that there may well be more than eight types of dashboards, I am sure I will miss a few. If so, please tell me in the comments section of this post.


googleLanguageR – Analysing language through the Google Cloud Machine Learning APIs

One of the greatest assets human beings possess is the power of speech and language, from which almost all our other accomplishments flow. To be able to analyse communication offers us a chance to gain a greater understanding of one another.


A “Pre-Training” R Survey

Recently I’ve been working with a client to help their analysts improve their proficiency with R. A major challenge in engagements like this is figuring out the needs of the analysts, as well as their general attitude to the training.


Neural Networks: Innumerable Architectures, One Fundamental Idea

At the end of this post, you’ll be able to implement a neural network to identify handwritten digits using the MNIST dataset and have a rough time idea about how to build your own neural networks.