Forecasting: Principles and Practice

Welcome to our online textbook on forecasting. This textbook is intended to provide a comprehensive introduction to forecasting methods and to present enough information about each method for readers to be able to use them sensibly. We don´t attempt to give a thorough discussion of the theoretical details behind each method, although the references at the end of each chapter will fill in many of those details. The book is written for three audiences: (1) people finding themselves doing forecasting in business when they may not have had any formal training in the area; (2) undergraduate students studying business; (3) MBA students doing a forecasting elective. We use it ourselves for a third-year subject for students undertaking a Bachelor of Commerce or a Bachelor of Business degree at Monash University, Australia.

How to create an #Enterprise #AI Business case driven by Data

Creation Value by AI is subjective, but a number of considerations apply:
• Is the value reflected in business metrics like conversion, churn and cost savings?
• Do we get significantly improved performance over existing machine learning or rule-based algorithms?
• Does AI improve business processes and thereby create new value?
• Is there a cost benefit in terms of optimizing employee costs?
• Can the AI identify the hidden rules/hierarchy
• Can AI provide near-human or ideally better-than-human, levels of performance?
• Does training data exist? Is it labelled?
• Does the application require a high level of trust (ex: self-driving cars)?
• Does the application need a high level of control? (human intervention as needed)
• Domain complexity – for example the need for extensive feature engineering
• The usage of IoT with AI
• Developing proprietary algorithms
• Impact of regulation on business models including GDPR
• Regulatory transparency – ex explainable AI
• Risk of adoption especially in the non-consumer space. Many existing applications of AI are in the consumer space where the risks are relatively lower (ex chatbots). As AI expands into Enterprise and Healthcare domains, the risk of failure and liability increase substantially.

A Winning Game Plan For Building Your Data Science Team

One of the most exciting challenges I have at Hitachi as the Vice-Chairmen of Hitachi´s ‘Data Science ??’ is to help lead the development of Hitachi´s data science capabilities. We have a target number of people who we want trained and operational by 2020, so there is definitely a sense of urgency. And I like urgency because it´s required to sweep aside the inhibitors and resistors to change.

Building competitive data advantage

Several years ago, my company faced a significant challenge: A large swath of small new entrants relying heavily on data and artificial intelligence provided services faster, cheaper, and more flexibly than we could. They were not slowed down by legacy information systems, archaic business processes, and an outdated workforce. To add insult to injury, the new entrants would use our customer-facing transparency opportunistically to pick up the low hanging fruit, and gradually started to compete with us on our core practices. At the same time, other incumbent market participants had started to innovate. In this article I would like to share our lessons learned, and discuss how data assets can both be used and should be protected, as to build a defensible competitive data advantage. This is a perspective based on my experience on capital markets and other industries that places innovative technology at the active business foreground as a critical success factor, rather than in a passive support role.

An End-to-End Project on Time Series Analysis and Forecasting with Python

Time series analysis comprises methods for analyzing time series data in order to extract meaningful statistics and other characteristics of the data. Time series forecasting is the use of a model to predict future values based on previously observed values. Time series are widely used for non-stationary data, like economic, weather, stock price, and retail sales in this post. We will demonstrate different approaches for forecasting retail sales time series. Let´s get started!

R Code – Best practices

Nothing is more frustrating than a long piece of code with no standard way of naming elements, presenting code or organizing files. It´s not only unreadable but more importantly not reusable. Unfortunately, unlike other programming languages, R has no widely accepted coding best practices. Instead there has been various attempts to put together a few sets of rules. This post is trying to fill the gap by summarizing and/or extracting what I found relevant in those various attempts. It also includes some tips I came up with after years of using R on a daily basis.

Guide to a high-performance, powerful R installation

Guide to a high-performance, powerful R installation

New Course: Bayesian Modeling with RJAGS

The Bayesian approach to statistics and machine learning is logical, flexible, and intuitive. In this course, you will engineer and analyze a family of foundational, generalizable Bayesian models. These range in scope from fundamental one-parameter models to intermediate multivariate & generalized linear regression models. The popularity of such Bayesian models has grown along with the availability of computing resources required for their implementation. You will utilize one of these resources – the rjags package in R. Combining the power of R with the JAGS (Just Another Gibbs Sampler) engine, rjags provides a framework for Bayesian modeling, inference, and prediction.

How to select the Right Evaluation Metric for Machine Learning Models: Part 2 Regression Metrics

In this article, I will be discussing the usefulness of each regression metric depending on the objective and the problem we are trying to solve. Part 1presented the first four metrics as depicted below whereas the remaining are presented in this article.

Build & Deploy Data Science Projects at Lightspeed

One of the biggest challenges in data science development today is quickly sharing and deploying code. Unless you´re working within a business where data science & analytics is deeply embedded within the company culture and ecosystem (and Github), chances are you´ve experienced the fun of email chains containing lines of SQL, endless .txt snippets on Slack and enough similarly named R environments to cause a real headache. It can be challenging to share code in these environments with fellow data scientists, analysts and engineers. Data is still in some instances sitting too separate to traditional engineering departments, meaning that it´s a case of begging, borrowing and stealing code patterns and best practice.

Recurrent Neural Networks: The Powerhouse of Language Modeling

During the spring semester of my junior year in college, I had the opportunity to study abroad in Copenhagen, Denmark. I had never been to Europe before that, so I was incredibly excited to immerse into a new culture, meet new people, travel to new places, and, most important, encounter a new language. Now although English is not my native language (Vietnamese is), I have learned and spoken it since early childhood that it has become second-nature. Danish, on the other hand, is an incredibly complicated language with very different sentence structure and grammatical made-ups. Before my trip, I tried to learn a bit of Danish using the app Duolingo; however, I only got a hold of simple phrases such as Hello (Hej) and Good Morning (God Morgen).

jupyter and tensorboard in tmux

In recent posts, I described how you can set up your personal deep learning workstation and how you can switch it on and access it remotely. In this short article, I explain how I usually set up my remote working environment with tmux and check on my phone on the progress of my calculations. tmux is a terminal multiplexer, allowing a user to access multiple separate terminal sessions inside a single terminal window or remote terminal session.