Problems in Estimating GARCH Parameters in R (Part 2; rugarch)

Now here is a blog post that has been sitting on the shelf far longer than it should have. Over a year ago I wrote an article about problems I was having when estimating the parameters of a GARCH(1,1) model in R. I documented the behavior of parameter estimates (with a focus on \beta) and perceived pathological behavior when those estimates are computed using fGarch. I called for help from the R community, including sending out the blog post over the R Finance mailing list.

Scaling H2O analytics with AWS and p(f)urrr (Part 3)

This is the final installment of a three part series that looks at how we can leverage AWS, H2O and purrr in R to build analytical pipelines. In the previous posts I looked at starting up the environment through the EC2 dashboard on AWS’ website. The other aspect we looked at, in Part II, was how we can use purrr to train models using H2O’s awesome api.

Introduction to Unsupervised Learning

Unsupervised learning is a set of statistical tools for scenarios in which there is only a set of features and no targets. Therefore, we cannot make predictions, since there are no associated responses to each observation. Instead, we are interested in finding an interesting way to visualize data or in discovering subgroups of similar observations. Unsupervised learning tends to be more challenging, because there is no clear objective for the analysis, and it is often subjective. Additionally, it is hard to assess if the obtained results are good, since there is no accepted mechanism for performing cross-validation or validating results on an independent dataset, because we do not know the true answer. Two techniques will be the focus of this article: principal component analysis and clustering.

Uncertainty estimation for Neural Network – Dropout as Bayesian Approximation

This article is mainly about how I start with the Uber’s paper Deep and Confident Prediction for Time Series at Uber. Model interpretation with neural networks has not been an easy task, knowing the confidence of a neural network could be very important for business. Despite a lot of complicated proof in these series of papers, they all trying to answer a simple question.

The Unreasonable Effectiveness of Method Chaining in Pandas

Method chaining is a programmatic style of invoking multiple method calls sequentially with each call performing an action on the same object and returning it. It eliminates the cognitive burden of naming variables at each intermediate step. Fluent Interface, a method of creating object-oriented API relies on method cascading (aka method chaining). This is akin to piping in Unix systems. Method chaining substantially increases the readability of the code. If you don’t believe me, let’s ask Jack and Jill.

Analytics Building Blocks: Regression

This article summarizes and explains key modules of my regression block (One of the simple modularized notebooks I am developing to execute common analysis tasks). The notebook is intended to facilitate quicker experimentation for the users with a moderate understanding of regression models and Python programming. GitHub link to the notebook is at the bottom of the article!

How to Learn More in Less Time with Natural Language Processing (Part 2)

With the nifty extractive text summarizer we created in Part 1, we were able to take news articles and cut them down to half their size or more! Now it is time to take these articles and classify them by subject. In this part we will go through how to create a bag of words NLP classifier to do this!

Data Science and AI for good – An overview

Humanity is reaching an inflection point in terms of its technological development. At the same time, we are reevaluating our place on Earth and rethinking how to build a fairer society. Can artificial intelligence (AI, machine learning, statistical learning or however you want to call it) serve to tackle societal and environmental challenges? Yes, absolutely. In fact, the same algorithms used to recommend products on an e-commerce website, or choose the ads shown to you, can be applied to solve real human problems. All data scientists, from aspiring ones to researchers, have the opportunity (and even the responsibility) to take advantage of the current data revolution to improve our world. But let’s be clear, this is a complex endeavour, which requires cross-disciplinary and multi-sector collaborations involving governments, NGOs, private organizations and academia. Otherwise, it will be too easy to fall for the AI hype instead of understanding it as a tool to augment our abilities. Only solutions backed up by research and deep understanding of the respective problem domain will be robust (remember the scientific method) and effective in the long term. It requires cross-disciplinary and multi-sector collaborations, involving governments, NGOs, private organizations and academia, to use AI for good.

Complete Machine Learning solution (Part 1|3): Create Flask Application

While learning Machine Learning I used to practice on Python Notebook environment and always used to think about how can we make our solution usable by end-user. Since I already have worked on Client/Server applications so I knew that, one way can be to implement our solution and build the model in such a way that we can make it usable via API calls and these APIs can be used by various user facing platforms. This entire approach is know as API-First Approach.

Principal Component Analysis (PCA) 101, using R

Improving predictability and classification one dimension at a time! ‘Visualize’ 30 dimensions using a 2D-plot!

The future of data visualisation

Data visualization – as I’ve observed – has been changed due to developments in three broad areas. Streaming instead of static data, changing context and better data processing and design tools. This will profoundly change how data visualization will influence our future lives.

Neural Networks Intuitions: 1.Balanced Cross Entropy

This is my first medium blog post. The objective to create this series, ‘Neural Networks Intuitions’ is to gain a better understanding of things ranging from basics of neural networks to loss functions, neural network architectures and more(without diving deep into math). Ever since I started my career in deep learning, I have always felt that forming intuitions helped a lot in making progress during model engineering or solving problems/debugging when it comes to neural networks. The main aim of this series of blog posts(which I hope to continue later on) is to get insights into what’s happening even if the math part of it is tough to understand.

My Open-Source Machine Learning Journey Begins

In technology and much else, there’s an important connection between the research and developments of the past and the actual happenings of the present. Exploring that connection is essential to the process of learning. History is a great teacher, especially because, since things don’t happen in a vacuum, it’s usually wise to consider the present in relation to the past. There’s a second, equally important connection, something of an instructive question mark between what’s actual today and what’s possible tomorrow, that can help to accelerate the learning process. To understand and guide what you do today, you should also ask about what you might do tomorrow.

Text Matching with Deep Learning

In our daily life, we always want to know whether or not they are similar things. One of the typical example is Face ID. Apple launched a face recognition system for unlocking your iPhone X. You have to take several photos as a golden images. When you want to unlock your iPhone, iPhone compute current photo matches the pre-defined photos or not.


Today’s deep learning applications include complex, multi-stage pre-processing data pipelines that include compute-intensive steps mainly carried out on the CPU. For instance, steps such as load data from disk, decode, crop, random resize, color and spatial augmentations and format conversions are carried out on the CPUs, limiting the performance and scalability of training and inference tasks. In addition, the deep learning frameworks today have multiple data pre-processing implementations, resulting in challenges such as portability of training and inference workflows and code maintainability. NVIDIA Data Loading Library (DALI) is a collection of highly optimized building blocks and an execution engine to accelerate input data pre-processing for deep learning applications. DALI provides both performance and flexibility of accelerating different data pipelines, as a single library, that can be easily integrated into different deep learning training and inference applications.

Unprovability comes to machine learning

Scenarios have been discovered in which it is impossible to prove whether or not a machine-learning algorithm could solve a particular problem. This finding might have implications for both established and future learning algorithms. During the twentieth century, discoveries in mathematical logic revolutionized our understanding of the very foundations of mathematics. In 1931, the logician Kurt Gödel showed that, in any system of axioms that is expressive enough to model arithmetic, some true statements will be unprovable1. And in the following decades, it was demonstrated that the continuum hypothesis – which states that no set of distinct objects has a size larger than that of the integers but smaller than that of the real numbers – can be neither proved nor refuted using the standard axioms of mathematics. Writing in Nature Machine Intelligence, Ben-David et al.5 show that the field of machine learning, although seemingly distant from mathematical logic, shares this limitation. They identify a machine-learning problem whose fate depends on the continuum hypothesis, leaving its resolution forever beyond reach.

The Path to Predictive Analytics and Machine Learning

Why data pipelines are vital to predictive analytics, machine learning, and AI. How uniting transaction and analytics processing in a single database enables predictive analytics. What you gain by moving to predictive analytics, machine learning, and AI.

Evolving Data Infrastructure

How are companies using or exploring AI, big data, and the cloud for advanced analytics and automation? In an O’Reilly survey conducted in October 2018, more than 3,200 companies throughout the world – located primarily in North America, Europe, and Asia – revealed their choices of tools, technologies, and practices for pursuing sophisticated cloud-based data solutions. In this report, data science experts Ben Lorica and Paco Nathan break down the results of the survey to reveal how companies are using, building, or evaluating data infrastructure in the cloud. One group of respondents has had a cloud-based data infrastructure for years, while a second group is still exploring the possibilities. Early adopters currently building a cloud-based data infrastructure represent the largest of these three groups.