What Is Dimension Reduction In Data Science?

We have access to a large amounts of data now. The large amount of data can lead us to situations where by we take every possible data that is available to us and feed it into a forecasting model to predict our target variable. This article aims to explain the common issues associated with introduction of large set of features and provides solutions which we can utilise to resolve those problems.

Web Development of NLP Model in Python & Deployed in Flask

Considering a system using machine learning to detect spam SMS text messages. Our ML systems workflow is like this: Train offline -> Make model available as a service -> Predict online.

Watch if R is running from Shiny

Today I discovered that the <html> tag of a Shiny App gets the shiny-busy class when computation is done in the R process. Which means that you can potentially watch with JavaScript if the R process is running.

Value Investing with Machine Learning

This article will show you how you can easily increase your certainty with transparent and interpretable Machine Learning. To do this, we will use Mind Foundry’s Automated Data Science platform, AuDaS, that augments Analysts and transforms them into Data Science Heroes.

Using Tensorflow Serving GRPC

Once you have your Tensorflow or Keras based model trained, one needs to think on how to use it in,deploy it in production. You may want to Dockerize it as a micro-service, implementing a custom GRPC (or REST- or not) interface. Then deploy this to server or Kubernetes cluster and have other client micro-services calling it. Google Tensorflow Serving library helps here, to save your model to disk, and then load and serve a GRPC or RESTful interface to interact with it.

Unsupervised Feature Learning

Deep Convolutional Networks on Image tasks take in Image Matrices of the form (height x width x channels) and process them into low-dimensional features through a series of parametric functions. Supervised and Unsupervised Learning tasks both aim to learn a semantically meaningful representation of features from raw data. Training Deep Supervised Learning models requires a massive amount of data in the form of labeled (x, y) pairs. Unsupervised Learning does not require the corresponding labels (y), the most common example of which being auto-encoders. Auto-encoders take x as input, pass it through a series of layers to compress the dimensionality and are then criticized on how well they can reconstruct x. Auto-encoders eventually learn a set of features that will describe the data x, however, these features are likely not to be very useful for Supervised Learning or Discriminative tasks.

Tutorial: Sequential Pattern Mining in R for Business Recommendations

In this tutorial, Allison Koenecke demonstrates how Microsoft could recommend to customers the next set of services they should acquire as they expand their use of the Azure Cloud, by using a temporal extension to conventional Market Basket Analysis.

Trending Deep Learning Github Repositories

Check these pair of resources for trending and top GitHub deep learning repositories for some new ideas on what to be looking out for.

Travelling in the BlockChain Ecosystem with Python

With over 2500+ active blockchain projects around the globe, each with it’s own unique statistical characteristic, we rarely see a top level analysis of the overall crypto market, because cleaning and collecting the time series is too time consuming. And on the retail side, we don’t have a clear set of functions to collect, clean, and explore the critical data needed to customize portfolios from the blockchain ecosystem. The following blog post will start as a strong foundation for more in-depth quantitative approaches related to things like volatility, clustering, forecasting, and log return based portfolios using data science and quant strategies. We now have access to aggregate and exchange specific trading data through CrytpoCompare with their excellent API?-?it’s free for retail investors.

Transfer Learning using ELMO Embedding

Last year, the major developments in ‘Natural Language Processing’ were about Transfer Learning. Basically, Transfer Learning is the process of training a model on a large-scale dataset and then using that pre-trained model to process learning for another target task. Transfer Learning became popular in the field of NLP thanks to the state-of-the-art performance of different algorithms like ULMFiT, Skip-Gram, Elmo, BERT etc.

Time Travel with RStudio Package Manager 1.0.4

We all love packages. We don’t love when broken package environments prevent us from reproducing our work. In version 1.0.4 of RStudio Package Manager, individuals and teams can navigate through repository checkpoints, making it easy to recreate environments and reproduce work. The new release also adds important security updates, improvements for Git sources, further access to retired packages, and beta support for Amazon S3 storage.

Three steps for a successful machine learning project

As people and companies venture into machine learning (ML), it is common for some to expect to dive right into building models and generating useful output. And while some parts of ML feel like this technical wizardry with magical predictions, there are other aspects that are less technical and arguably far more important. Taking sufficient time to define the right question, properly preprocess data, and consider the impact of using your model can greatly improve the success of your ML project. My hope is that as a company, manager, or engineer looking to leverage ML, that these tips will save you time up front and help you prioritize your future efforts.

The working of Naive Bayes algorithm

Naïve Bayes is a probability machine learning algorithm which is used in multiple classification tasks. In this article, I’m going to present a complete overview of the Naïve Bayes algorithm and how it is built and used in real-world.