Forward and Reverse Containment Logic – Introduction to Relational Data Logistics

The question of how to structure or arrange data in order to gain worthwhile insights is quite different from the issue of what data to include or how it should be analyzed. I often find myself preoccupied with structure and organization. In part, this is because of my belief that unique logistics can lead to different types of insights. Popular these days to the point of gaining industry dominance is holding data as large tables. If we peer into the code for the application handling the tables, in all likelihood bytes are being moved to and from many different locations in memory and files. In my line of work, data is often transferred between different tables. This sometimes occurs manually – if cut-and-paste can be described as manual. Why not get a machine to do it? I am certainly not opposed to the suggestion. Then it would be a machine moving data between tables. It is important to note that something, somebody, or both must handle the data logistics. It just so happens that when a computer does the job, the logistics are preset in code. When I handle the logistics manually, it can be shaped on the fly. When people think about this movement of data, I believe they tend to focus on the movement itself rather than the logic behind it. I suggest that data logistics – or perhaps just logistics more generally – is more about the reasoning behind the structure – the structure giving rise to movement.


The ‘lxml’ Package and xpath Expressions for Web Scraping

The Jupyter notebook at this link contains a tutorial that explains how to use the lxml package and xpath expressions to extract data from a webpage.
The tutorial consists of two sections:
• A basic example to demonstrate the process of downloading a webpage, extracting data with lxml and xpath and analysing it with pandas
• A comprehensive review of xpath expressions, covering topics such as ‘axes’, ‘operator’ etc


Running Fast.ai course notebooks on Kaggle Kernel

Kaggle Kernels offer ML optimized docker environment, Tesla K80 GPU, internet access and uninterrupted 6 hour sessions. Any Clouderizer project can now be run seamlessly on Kaggle Kernels. What this means is our community project for Fast.ai, can now be run on Kaggle Kernels, just as easily. Below are the steps.


Master Python through building real-world applications (Part 4)

Every once in a while, there comes a new programming language and along with that great community to support that. Python has been around for a while now so it is safe for me to say that Python is not a language, it is a religion. Do you want to print hello world? It’s there. Make database applications? There. Make GUI based applications? Yup. Visualization? Checked. Complex Machine Learning algorithms? Python’s got you covered. If you think of something that is programmable, you can do that with Python. Though there is one field where Python is underrated, and it is at the back-end of web development. But soon enough, it will change too. And we will take the first step to know more about it.


The 15 Most Popular Data Science and Machine Learning Articles on Analytics Vidhya in 2018

As we draw the curtains on a wonderful 2018, we wanted to share the best of the year with our wonderful community. This article is part of that series and looks at the articles which you, dear reader, enjoyed the most.


Leaf Plant Classification: Statistical Learning Model – Part 2

In this post, I am going to build a statistical learning model as based upon plant leaf datasets introduced in part one of this tutorial. We have available three datasets, each one providing sixteen samples each of one-hundred plant species.


Activation Functions in Neural Networks

The post discusses the various linear and non-linear activation functions used in deep learning and neural networks. We also take a look into how each function performs in different situations, the advantages and disadvantages of each then finally concluding with one last activation function that… 3


Small Steps: A Experimental Case for Compound Prediction

Uncovering the abstract non-linear relationship between variables is the utility which machine learning provides. However, does jumping from known inputs directly to the abstractly related desired values yield the most accurate results? What happens in cases where a series of more closely related variables can first be predicted and fed back into the model to predict the final output?


K-nearest Neighbors Algorithm with Examples in R (Simply Explained knn)

In this post I am going to exampling what k- nearest neighbor algorithm is and how does it help us. But before we move ahead, we aware that my target audience is the one who wants to get intuitive understanding of the concept and not very in-dept understanding, that is why I have avoided being too pedantic about this topic with very less focus on theoretical concept. Let’s get started.


Building a Chat Bot With Image Recognition and OCR

In part 1 of this series, we gave our bot the ability to detect sentiment from text and respond accordingly. But that’s about all it can do, and admittedly quite boring. Of course, in a real chat, we often send a multitude of media: from text, images, videos, gifs, to anything else. So in this, our next, step in our journey, let’s give our bot vision. The goal of this tutorial is to allow our bot to receive images, and reply to them?-?and eventually, give us a crude description of what it determines is the main object in the image. Let’s get started!


Use AWS Glue and/or Databricks’ Spark-xml to process XML data

Playing with unstructured data can be sometimes cumbersome and might include mammoth tasks to have control over the data if you have strict rules on the quality and structure of the data. In this article I will be share my experience of processing XML files with Glue transforms versus Databricks Spark-xml library.


Building a Sentiment Detection Bot with Google Cloud, a Chat Client, and Ruby.

In this series, I’ll explain how to create a chat bot that is capable of detecting sentiment, analyzing images, and finally having the basis of a evolving personality. This is part 1 of that series.


Setting Up Kaggle in Google Colab

You know where all those datasets are and you know where you want them to go, but how do you easily move your datasets from Kaggle into Google Colab without a lot of complicated madness? Let me show you!


Graph Embeddings – The Summary

This article presents what graph embeddings are, their use, and the comparison of the most commonly used graph embedding approaches.


Build Log Analytics Application using Apache Spark

Step by step process of developing a real world application using Apache Spark, along with main focus on explaining the architecture of Spark.


Various ways to evaluate a machine learning models performance

In this blog, we will discuss the various ways to check the performance of our machine learning or deep learning model and why to use one in place of the other. We will discuss terms like:
1. Confusion matrix
2. Accuracy
3. Precision
4. Recall
5. Specificity
6. F1 score
7. Precision-Recall or PR curve
8. ROC (Receiver Operating Characteristics) curve
9. PR vs ROC curve.


Default Risk using Deep Learning

Many people struggle to get loans due to insufficient or non-existent credit histories. And, unfortunately, this population is often taken advantage of by untrustworthy lenders. Companies like Home Credit strives to broaden financial inclusion for the unbanked population by providing a positive and safe borrowing experience. In order to make sure this underserved population has a positive loan experience, Home Credit makes use of a variety of alternative data (e.g., including telco and transactional information) to predict their clients’ repayment abilities. By using various Statistical, Machine Learning and Deep learning methods we unlock the full potential of their data. Doing so will ensure that clients capable of repayment are not rejected and that loans are given with a principal, maturity, and repayment calendar that will empower their clients to be successful.


Supercharging word vectors

A simple technique to boost fastText and other word vectors in your NLP projects