**Essentials of Deep Learning – Sequence to Sequence modelling with Attention (using python)**

Deep Learning at scale is disrupting many industries by creating chatbots and bots never seen before. On the other hand, a person just starting out on Deep Learning would read about Basics of Neural Networks and its various architectures like CNN and RNN. But there seems like a big jump from the simple concepts to industrial applications of Deep Learning. Concepts such as Batch Normalization, Dropout and Attention are almost a requirement to know in building deep learning applications. In this article, we will cover two important concepts used in the current state of the art applications in Speech Recognition and Natural Language Processing – viz Sequence to Sequence modelling and Attention models. Just to give you a sneak peek of the potential application of these two techniques – Baidu’s AI system uses them to clone your voice It replicates a persons voice by understanding his voice in just three seconds of training.You can check out some audio samples provided by Baidu’s Research team which consist of original and synthesized voices.

**Top 5 Data Science & Machine Learning Repositories on GitHub in Feb 2018**

• FastPhotoStyle

• Twitter Scraper

• Handwriting Synthesis

• ENAS PyTorch

• Sign Language

• Twitter Scraper

• Handwriting Synthesis

• ENAS PyTorch

• Sign Language

**A Simple Introduction to Complex Stochastic Processes – Part 2**

In my first article on this topic (see here) I introduced some of the complex stochastic processes used by Wall Street data scientists, using a simple approach that can be understood by people with no statistics background other than a first course such as stats 101. I defined and illustrated the continuous Brownian motion (the mother of all these stochastic processes) using approximations by discrete random walks, simply re-scaling the X-axis and the Y-axis appropriately, and making time increments (the X-axis) smaller and smaller, so that the limiting process is a time-continuous one. This was done without using any complicated mathematics such as measure theory or filtrations. Here I am going one step further, introducing the integral and derivative of such processes, using rudimentary mathematics. All the articles that I’ve found on this subject are full of complicated equations and formulas. It is not the case here. Not only do I explain this material in simple English, but I also provide pictures to show how an Integrated Brownian motion looks like (I could not find such illustrations in the literature), how to compute its variance, and focus on applications, especially to number theory, Fintech and cryptography problems. Along the way, I discuss moving averages in a theoretical but basic framework (again with pictures), discussing what the optimal window should be for these (time-continuous or discrete) time series.

**Neural network classification of data using Smile**

Data classification is the central data-mining technique used for sorting data, understanding of data and for performing outcome predictions. In this small blog we will use a library Smilecthat includes many methods for supervising and non-supervising data classification methods. We will make a small Python-like code using Jython top build a complex Multilayer Perceptron Neural Network for data classification. It will have large number of inputs, several outputs, and can be easily extended for cases with many hidden layers. We will write a few lines of Jython code (most of our codding will deal with how to prepare an interface for reading data, rather than with Neural Network programming).

**Introduction to Numpy – Part I**

Numpy is a math library for python. It enables us to do computation (between an array, matrix, tensor etc..) efficiently and effectively. In this article, I’m just going to introduce you to the basics of what is mostly required for Machine Learning and Data science (and Deep Learning !).

Markov chains are a fairly common, and relatively simple, way to statistically model random processes. They have been used in many different domains, ranging from text generation to financial modeling. A popular example is r/SubredditSimulator, which uses Markov chains to automate the creation of content for an entire subreddit. Overall, Markov Chains are conceptually quite intuitive, and are very accessible in that they can be implemented without the use of any advanced statistical or mathematical concepts. They are a great way to start learning about probabilistic modeling and data science techniques.

**A Beginner’s Guide to Data Engineering – Part II**

In A Beginner’s Guide to Data Engineering – Part I, I explained that an organization’s analytics capability is built layers upon layers. From collecting raw data and building data warehouses to applying Machine Learning, we saw why data engineering plays a critical role in all of these areas. One of any data engineer’s most highly sought-after skills is the ability to design, build, and maintain data warehouses. I defined what data warehousing is and discussed its three common building blocks – Extract, Transform, and Load, where the name ETL comes from. For those who are new to ETL processes, I introduced a few popular open source frameworks built by companies like LinkedIn, Pinterest, Spotify, and highlight Airbnb’s own open-sourced tool Airflow. Finally, I argued that data scientist can learn data engineering much more effectively with the SQL-based ETL paradigm.

**How to train and deploy deep learning at scale**

We discussed using and deploying deep learning at scale. This is an empirical era for machine learning, and, as I noted in an earlier article, as successful as deep learning has been, our level of understanding of why it works so well is still lacking. In practice, machine learning engineers need to explore and experiment using different architectures and hyperparameters before they settle on a model that works for their specific use case. Training a single model usually involves big (labeled) data and big models; as such, exploring the space of possible model architectures and parameters can take days, weeks, or even months. Talwalkar has spent the last few years grappling with this problem as an academic researcher and as an entrepreneur. In this episode, he describes some of his related work on hyperparameter tuning, systems, and more.

Advertisements