Timeliner is a personal data aggregation utility. It collects all your digital things from pretty much anywhere and stores them on your own computer, indexes them, and projects them onto a single, unified timeline. The intended purpose of this tool is to help preserve personal and family history. Things that are stored by Timeliner are easily accessible in a SQLite database or, for media items like photos and videos, are simply plopped onto your disk as regular files, organized in folders by date and data source. WIP Notice: This project works as documented for the most part, but is still very young. Please consider this experimental until stable releases. The documentation needs a lot of work too, I know, so feel free to contribute.
There has been increasing interest in research for applying machine learning to medical domains. For instance, computer vision has proved to be a valuable tool for providing analytical information for physicians and medical researchers. I wanted to explore if intelligent machines could directly take on the role of a physician?-?autonomously making medical decisions. In this project, I created a reinforcement learning based intelligent physician that would produce optimal chemotherapy treatment regimens given a patient’s cancer progression.
Under the manifold assumption, real-world high-dimensional data concentrates close to a non-linear low-dimensional manifold. In other words, data lies approximately on a manifold of much lower dimension than the input space, a manifold that can be retrieved/learned The manifold assumption is crucial in order to deal with the curse of dimensionality: many machine learning models problems seem hopeless if we expect the machine learning algorithm to learn functions with interesting variations across an highly dimensional space Fortunately, it has been empirically proven that ANNs capture the geometric regularities of commonplace data thanks to their hierarchical, layered structure. shows experiments to prove the ability to cope with data that lies on or near a low-dimensional manifold. However, how do ANN layers identify the mappings (representations) between the original data space to suitable lower dimensional manifolds?
Visually representing the content of a text document is one of the most important tasks in the field of text mining. As a data scientist or NLP specialist, not only we explore the content of documents from different aspects and at different levels of details, but also we summarize a single document, show the words and topics, detect events, and create storylines. However, there are some gaps between visualizing unstructured (text) data and structured date. For example, many text visualizations do not represent the text directly, they represent an output of a language model(word count, character length, word sequences, etc.). In this post, we will use Womens Clothing E-Commerce Reviews data set, and try to explore and visualize as much as we can, using Plotly’s Python graphing library and Bokeh visualization library. Not only we are going to explore text data, but also we will visualize numeric and categorical features. Let’s get started!
Motivation: Time-series being an important concept in statistics and machine learning is often less explored by data enthusiasts like us. To change the winds, we decided to work on one of the most burning time series problem of today’s day and era, ‘predicting web traffic’. This blog mirrors our brain storming involved in Web Traffic Time Series Forecasting, also a competition hosted by Kaggle. We believe that this forecasting can help website servers a great deal in effectively handling outages. The technique we implemented can be extended to diverse applications in financial markets, weather forecasts, audio and video processing. Not just that, understanding your website’s traffic trajectory can open up business opportunities too!
Big Data and Data Science is now in everyone’s mind. But not everyone clearly understands that not all data is the same, and has a clear vision of the types of applications and technologies available from Data Science. Data Science, Artificial Intelligence and Machine learning are often considered as quite equivalent. It is critical to understand that not all data is the same in order to understand that all data science techniques are not equivalent.
For quite some while, I feel content training my model on a single GTX 1070 graphics card which is rated around 8.18 TFlops single precision, then Google opens up their free Tesla K80 GPU on Colab which comes with 12GB RAM, and rated at slightly faster 8.73 TFlops. Until recently, the Cloud TPU option with 180 TFlops pops up in Colab’s runtime type selector. In this quick tutorial, you will learn how to take your existing Keras model, turn it into a TPU model and train on Colab x20 faster compared to training on my GTX1070 for free. We are going to build an easy to understand yet complex enough to train Keras model so we can warm up the Cloud TPU a little bit. Training an LSTM model on the IMDB sentiment classification task could be a great example because LSTM can be more computationally expensive to train than other layers like Dense and convolutional.
Recurrent Neural Nets (RNNs) are at the core of the most common AI applications in use today but we are rapidly recognizing broad time series problem types where they don’t fit well. Several alternatives are already in use and one that’s just been introduced, ODE net is a radical departure from our way of thinking about the solution.
This article will discuss a central component that is driving progress in Neural Network design, namely in Convolutional Networks for Computer Vision tasks. Classical CNNs such as LeNet-5, AlexNet, and VGG all have a connectivity pattern in which the output from the previous layer serves as the input to the next layer. This sequential processing pattern is easy to comprehend and it is a great way to begin learning about Neural Networks.
A beginner’s guide to dimensionality reduction in Machine Learning. Here, I’ll be giving a quick overview of what dimensionality reduction is, why we need it and how to do it.
Taking machine learning courses and reading articles about it doesn’t necessarily tell you which machine learning model to use. They just give you an intuition on how these models work which may leave you in the hassle of choosing the suitable model for your problem. At the beginning of my journey with ML, on solving a problem, I would try many ML models and use what works best, and I still do that now but I follow some best practices?-?on how to choose a machine learning model?-?that I learned from experience, intuition and colleagues, these best practices make things easier, here is what I had collected. I’ll tell you which machine learning model to use according to the nature of your problem, and I’ll try to explain some concepts.
In this blog post, I will provide a step-by-step tutorial on how to build a reporting dashboard using Dash, a Python framework for building analytical web applications. Rather than go over the basics of building a Dash app, I provide a detailed guide to building a multi-page dashboard with data tables and graphs. I built the reporting dashboard as a multi-page app in order to break up the dashboard into different pages so it less overwhelming and to present data in an organized fashion. On each dashboard page, there are two data tables, a date range selector, a data download link, as well as a set of graphs below the two dashboards. I ran into several technical challenges while building the dashboard and I describe in detail how I overcame these challenges.
Causal effect evaluation and causal network learning are two main research areas in causal inference. For causal effect evaluation, we review the two problems of confounders and surrogates. The Yule-Simpson paradox is the idea that the association between two variables may be changed dramatically due to ignoring confounders. We review criteria for confounders and methods of adjustment for observed and unobserved confounders. The surrogate paradox occurs when a treatment has a positive causal effect on a surrogate endpoint, which, in turn, has a positive causal effect on a true endpoint, but the treatment may have a negative causal effect on the true endpoint. Some of the existing criteria for surrogates are subject to the surrogate paradox, and we review criteria for consistent surrogates to avoid the surrogate paradox. Causal networks are used to depict the causal relationships among multiple variables. Rather than discovering a global causal network, researchers are often interested in discovering the causes and effects of a given variable. We review some algorithms for local structure learning of causal networks centering around a given variable.
Over the past decade, neural networks have achieved state-of-the-art results in a wide variety of prediction tasks, including image classification, machine translation, and speech recognition. These successes have been driven, at least in part, by hardware and software improvements that have significantly accelerated neural network training. Faster training has directly resulted in dramatic improvements to model quality, both by allowing more training data to be processed and by allowing researchers to try new ideas and configurations more rapidly. Today, hardware developments like Cloud TPU Pods are rapidly increasing the amount of computation available for neural network training, which raises the possibility of harnessing additional computation to make neural networks train even faster and facilitate even greater improvements to model quality. But how exactly should we harness this unprecedented amount of computation, and should we always expect more computation to facilitate faster training?
CatBoost is a fast implementation of GBDT with GPU support out-of-the-box. Google Colaboratory is a very useful tool with free GPU support. Data Science Salon NYC, June 13 – Register Now Data Science Salon NYC, June 13 –
Sentiment analysis labels a body of text as expressing either a positive or negative opinion, as in summarizing the content of an online product review. In this sense, sentiment analysis can be considered the challenge of building a classifier from text. Sentiment analysis can be done by counting the words from a dictionary of emotional terms, by fitting traditional classifiers such as logistic regression to word counts, or, most recently, by employing sophisticated neural networks. These methods progressively improve classification at the cost of increased computation and reduced transparency. A common sentiment analysis task, the classification of IMDb (Internet Movie Database) movie reviews, is used to illustrate the methods on a common task that appears frequently in the literature.
Learn how to apply L2 regularization and dropout to neural networks. You just built your neural network and notice that it performs incredibly well on the test set, but not nearly as good on the test set. This is a sign of overfitting. Your neural network has a very high variance and it cannot generalize well to data it has not been trained on.
Productionizing your ML model using the Flask web framework. Even though pushing your Machine Learning model to production is one of the most important steps of building a Machine Learning application there aren’t many tutorials out there showing how to do so. Especially not for small machine or deep learning libraries like Uber Ludwig. Therefore in this article, I will go over how to productionize a Ludwig model by building a Rest-API as well as a normal website using the Flask web micro-framework. The same process can be applied to almost any other machine learning framework with only a few little changes.
The following demonstrates how to use the low-level TensorFlow Core to create Convolutional Neural Network (ConvNet) models without high-level APIs such as Keras. The goal of this tutorial is to provide a better understanding of the background processes in a deep neural network and to demonstrate concepts on how use TensorFlow to create custom code. This tutorial will show how to load the MNIST handwritten digit dataset into a data iterator, use graphs and sessions, create a novel ConvNet architecture, train the model with different options, make predictions, and save the trained model. A complete code will then be provided along with the equivalent model in Keras to allow a direct comparison for further insight for those who use Keras, and to show how powerful high-level APIs are for creating neural networks.