Polyaxon is a platform for managing the whole life cycle of machine learning (ML) and deep learning(DL) applications. Today, we are pleased to announce the 0.4 release, the most stable release we have made until now. This release brings a lot of new features, integrations, improvements, and fixes. For over a year now, Polyaxon has been delivering software that enables many teams and organizations to be more productive, iterate faster on their research and ideas, and ship robust models to production.
Using the raw data for training a machine learning algorithm might not be the suitable choice in some situations. The algorithm, when trained by raw data, has to do feature mining by itself for detecting the different groups from each other. But this requires large amounts of data for doing feature mining automatically. For small datasets, it is preferred that the data scientist do the feature mining step on its own and just tell the machine learning algorithm which feature set to use. The used feature set has to be representative of the data samples and thus we have to take care of selecting the best features. The data scientist suggests using some types of features that seems helpful in representing the data samples based on the previous experience. Some features might prove their robustness in representing the samples and others not.
Deep Feedforward networks or also known multilayer perceptrons are the foundation of most deep learning models. Networks like CNNs and RNNs are just some special cases of Feedforward networks. These networks are mostly used for supervised machine learning tasks where we already know the target function ie the result we want our network to achieve and are extremely important for practicing machine learning and form the basis of many commercial applications, areas such as computer vision and NLP were greatly affected by the presence of these networks.
The building block of the deep neural networks is called the sigmoid neuron. Sigmoid neurons are similar to perceptrons, but they are slightly modified such that the output from the sigmoid neuron is much smoother than the step functional output from perceptron. In this post, we will talk about the motivation behind the creation of sigmoid neuron and working of the sigmoid neuron model.
There are dozens of different hypothesis tests, so choosing one can be a little overwhelming. The good news is that one of the more popular tests will usually do the trick–unless you have unusual data or are working within very specific guidelines (i.e. in medical research). The following picture shows several tests for a single population, and what kind of data (nominal, ordinal, interval/ratio) is best suited to those tests.
This article is targeted to data science managers and decision makers, as well as to junior professionals who want to become one at some point in their career. Deep thinking, unlike deep learning, is also more difficult to automate, so it provides better job security. Those automating deep learning are actually the new data science wizards, who can think out-of-the box. Much of what is described in this article is also data science wizardry, and not taught in standard textbooks nor in the classroom. By reading this tutorial, you will learn and be able to use these data science secrets, and possibly change your perspective on data science. Data science is like an iceberg: everyone knows and can see the tip of the iceberg (regression models, neural nets, cross-validation, clustering, Python, and so on, as presented in textbooks.) Here I focus on the unseen bottom, using a statistical level almost accessible to the layman, avoiding jargon and complicated math formulas, yet discussing a few advanced concepts.
Big data operates in a different ways than traditional relational database structures, index and keys are not usually present in Big data systems, where distributed systems concerns tend to have the upper hand. Nevertheless there are specific ways to operate big data, and understanding how to best operate with these type of dataset can prove the key to unlocking insights.
Model explainability techniques show you what your model is learning, and seeing inside your model is even more useful than most people expect. I’ve interviewed many data scientists in the last 10 years, and model explainability techniques are my favorite topic to distinguish the very best data scientists from the average. Some people think machine learning models are black boxes, useful for making predictions but otherwise unintelligible; but the best data scientists know techniques to extract real-world insights from any model.
How to optimize Spark for analytics without tanking productivity. Apache Spark has seen broad adoption for big data processing. It was originally developed to speed up map-reduce operations for data stored in Hadoop. Today, it’s still best suited for batch-oriented, high throughput operations on data. Although Spark continues to improve, it’s still at best an incomplete solution for analytics, especially when it comes to real-time interactive workloads on changing data?-?the kind needed by BI, data science and IoT applications. Software vendors like Databricks and AWS address this by making it easier to stitch together big data solutions and in-house IT groups often deploy additional data management tools on top of Spark. But, as Darius Foroux points out, more technology does not equal more productivity. The missing link is a way to optimize Spark for BI users, data engineers, and data scientists, without piling more non-Spark based tools on top that drain productivity. A Unified Analytics Data Fabric (UADF) solves this problem. It adds support for streaming and transactional data and optimizes Spark for lightning-fast BI, data science and IoT applications. And because it’s native to Spark, you leverage the people skills, operational processes, and tools that you already have. A UADF improves productivity by extending Spark into a lightning-fast platform for BI, data science, and IoT applications. As the SVP of analytics at TIBCO, we see our customers struggle with this challenge. We think SnappyData, a UADF created by the visionary team behind Gemfire, helps overcome the shortcomings of Spark for analytics. This article explains how it can help you get more from Spark while increasing your productivity at the same time.
Why we should worry about gender inequality in Natural Language Processing techniques.
Augmented reality (AR) helps you do more with what you see by overlaying digital content and information on top of the physical world. For example, AR features coming to Google Maps will let you find your way with directions overlaid on top of your real world. With Playground – a creative mode in the Pixel camera — you can use AR to see the world differently. And with the latest release of YouTube Stories and ARCore’s new Augmented Faces API you can add objects like animated masks, glasses, 3D hats and more to your own selfies! One of the key challenges in making these AR features possible is proper anchoring of the virtual content to the real world; a process that requires a unique set of perceptive technologies able to track the highly dynamic surface geometry across every smile, frown or smirk.
In a previous post, I discussed k-means clustering as a way of summarising text data. I also talked about some of the limitations of k-means and in what situations it may not be the most appropriate solution. Probably the biggest limitation is that each cluster has the same diagonal covariance matrix. This produces spherical clusters that are quite inflexible in terms of the types of distributions they can model. In this post, I wanted to address some of those limitations and talk about one method in particular that can avoid these issues, Gaussian Mixture Modelling (GMM). The format of this post will be very similar to the last one where I explain the theory behind GMM and how it works. I then want to dive into coding the algorithm in Python and we can see how the results differ from k-means and why using GMM may be a good alternative.
The analysis of time series data is an integral part of any data scientist’s job, more so in the quantitative trading world. Financial data is the most perplexing of time series data and often seems erratic. However, over these few articles, I will build a framework of analyzing such time series first using well established theories, and then delving into more exotic, modern day approaches such as machine learning. So let’s begin!
Swarm intelligence based optimal feature selection for enhanced predictive sentiment accuracy on twitter
This article will simplify the Kalman Filter for you. Hopefully you’ll learn and demystify all these cryptic things that you find in Wikipedia when you google Kalman filters. So let’s get started!
A guide to the less desirable aspects of deep learning environment configurations. Thanks to cheaper and bigger storage we have more data than what we had a couple of years back. We do owe our thanks to Big Data no matter how much hype it has created. However, the real MVP here is faster and better computing ,which made papers from the 1980s and 90s more relevant (LSTMs were actually invented in 1997)! We are finally able to leverage the true power of neural networks and deep learning thanks to better and faster CPUs and GPUs. Whether we like it or not, traditional statistical and machine learning models have severe limitations on problems with high-dimensionality, unstructured data, more complexity and large volumes of data.
Natural Language Processing (NLP) applications have become ubiquitous these days. I seem to stumble across websites and applications regularly that are leveraging NLP in one form or another. In short, this is a wonderful time to be involved in the NLP domain. This rapid increase in NLP adoption has happened largely thanks to the concept of transfer learning enabled through pretrained models. Transfer learning, in the context of NLP, is essentially the ability to train a model on one dataset and then adapt that model to perform different NLP functions on a different dataset. This breakthrough has made things incredibly easy and simple for everyone, especially folks who don’t have the time or resources to build NLP models from scratch. It’s perfect for beginners as well who want to learn or transition into NLP.