Essentials of Deep Learning : Introduction to Long Short Term Memory

Sequence prediction problems have been around for a long time. They are considered as one of the hardest problems to solve in the data science industry. These include a wide range of problems; from predicting sales to finding patterns in stock markets’ data, from understanding movie plots to recognizing your way of speech, from language translations to predicting your next word on your iPhone’s keyboard. With the recent breakthroughs that have been happening in data science, it is found that for almost all of these sequence prediction problems, Long short Term Memory networks, a.k.a LSTMs have been observed as the most effective solution. LSTMs have an edge over conventional feed-forward neural networks and RNN in many ways. This is because of their property of selectively remembering patterns for long durations of time. The purpose of this article is to explain LSTM and enable you to use it in real life problems. Let’s have a look!

Operationalizing Data Science

In the video presentation below, Joel Horwitz, Vice President, Partnerships, Digital Business Group for IBM Analytics, discusses what it means to “operationalize data science” – basically what it means to harden the ops behind running data science platforms. Over the past 3-4 years, IBM has partnered and invested in helping its clients marshal their valuable data and then to use Data Science to build insights and models that can create business value. The market is shifting to operationalizing these Data Science investments into production applications. The demands created by the vast volume and the blinding velocity of data can only be addressed via the reactive principles. IBM and Lightbend are working with clients who are ready to make strategic investments in Cognitive applications with the latest architectures for building and running distributed Reactive systems using Akka, Kafka, Spark, and more. Joel Horwitz graduated from the University of Washington in Seattle with a Masters in Nanotechnology with a focus in Molecular Electronics. He also hails from the University of Pittsburgh with an International MBA in Product Marketing and Financial Management. Joel designed, built, and launched new products at Intel and Datameer resulting in breakthrough innovations. He set and executed upon strategies at AVG Technologies that led to accretive acquisitions. He established a big data science team and the first Hadoop cluster in Europe. Most recently, he spearheaded new branding, positioning and business development strategies for several startups in area of Data Science and AI including Alpine Data Labs and He launched IBM | Spark and the Watson Data Platform and is now focused on building strategic partnerships and ecosystem for the IBM Watson and Cloud Platform businesses.

Explanation of One-shot Learning with Memory-Augmented Neural Networks

In an earlier post, I wrote about the need for massive amounts of data to train deep neural networks. In contrast, humans require comparatively little data to learn a new behavior or to rapidly shift away from an old behavior. For example, after running into a handful of street signs, the modern teenager quickly learns to be careful texting while walking. As Santoro et al. write, ‘This kind of flexible adaptation is a celebrated aspect of human learning (Jankowski et al., 2011), manifesting in settings ranging from motor control (Braun et al., 2009) to the acquisition of abstract concepts (Lake et al., 2015). Generating novel behavior based on inference from a few scraps of information – e.g., inferring the full range of applicability for a new word, heard in only one or two contexts – is something that has remained stubbornly beyond the reach of contemporary machine intelligence.’ The term one-shot learning has been introduced to capture this phenomenon of rapid behavior change following a small number of experiences, or even just one experience. In an earlier paper, a neural network was given an external memory and the ability to learn how to use its new memory in solving specific tasks. This paper classifies that previous model, the Neural Turing Machine (NTM), as a subclass of the more general class of Memory-Augmented Neural Networks (MANNs), and suggests an alternative memory system capable of outperforming humans in certain one-shot learning tasks.