Reviewing Python Visualization Packages

There are so many ways to create a graph using Python, but which way is best? When we make visualizations, it is important to ask some questions as to the figure’s objective: Are you trying to get an initial feel for how your data looks? Maybe you are trying to impress someone at a presentation? Perhaps you want to show someone a figure internally and want a middle-of-the-road figure? In this post, I will be walking through a number of popular Python visualization packages, their pros and cons, and situations where they can each shine. I will scope this review to 2D plots, leaving room for 3D figures and dashboards for another time, though many of these packages support both quite well.


An introduction to Reinforcement Learning

Reinforcement learning is an important type of Machine Learning where an agent learn how to behave in a environment by performing actions and seeing the results. In recent years, we’ve seen a lot of improvements in this fascinating area of research. Examples include DeepMind and the Deep Q learning architecture in 2014, beating the champion of the game of Go with AlphaGo in 2016, OpenAI and the PPO in 2017, amongst others.


How can you work faster in R Studio? Do you really want to know?

In this article, I would like to share with you some of my favorite productivity features of R Studio along with their respective shortcuts. As well I will provide information about some other tools and techniques that are useful. I also prepared some visual incentives for you to immediately see some of them in action without the need to go into R Studio.


How to use Jupyter on a Google Cloud VM

Recipes for Notebooks on Google Cloud Platform.


How to do batch predictions of TensorFlow models directly in BigQuery

If you have trained a model in TensorFlow and exported it as a SavedModel, you can now use the ML.PREDICT SQL function in BigQuery to make predictions. This is very useful if you want to make batch predictions (e.g., to make predictions for all the data collected in the past hour), since any SQL query can be scheduled in BigQuery.


omega-ml: Deploying Data Pipelines & Machine Learning Models the Easy Way

Deploying data pipelines and machine learning models can take anywhere from weeks to months, involving engineers from many disciplines. Our open source data science platform omega|ml, written in Python and leveraging MongoDB, accomplishes the task in a matter of seconds – all it takes is a single line of code.


Free Datasets: Download Ready-Made or Customized

Webhose’s free datasets include data from a range of different sources, languages and categories. Leading organizations and universities around the world have used Webhose’s datasets for their predictive analytics, risk modeling, NLP, machine learning and sentiment analysis. Advanced filters allow you to conduct granular analysis to refine your queries according to names, keywords, specific authors, languages, publication dates, countries and more. Leverage these machine-readable datasets to mine data for important business analytic insights.


Time Series Momentum (aka Trend-Following): A Good Time for a Refresh

Similar to some better-known factors like size and value, time-series momentum is a factor which has historically demonstrated above-average excess returns. Time-series momentum, also called trend momentum or trend-following, is measured by a portfolio which is long assets which have had recent positive returns and short assets which have had recent negative returns.(1) Compare this to the traditional (cross-sectional) momentum factor, which considers recent asset performance only relative to other assets. The academic evidence suggests that inclusion of a strategy targeting time-series momentum in a portfolio improves the portfolio’s risk-adjusted returns. Strategies that attempt to capture the return premium offered by time-series momentum are often called, ‘managed futures,’ as they take long and short positions in assets via futures markets – ideally in a multitude of futures markets around the globe. This piece dives into time-series momentum and examines some of its specific qualities – qualities that make a managed futures strategy a good portfolio diversifier (example shown here). In general, an asset that has low (or negative) correlation with broad stocks and bonds provides good diversification benefits. Low or near-zero correlation between two assets means that there is no relationship in their performance: Asset A performing above average does not tell us anything about Asset B’s expected performance relative to its average. The addition of a low-correlation asset to a portfolio will, depending on the specific return and volatility properties of the asset, improve the portfolio’s risk-adjusted returns either by improving the portfolio’s return, reducing the portfolio’s volatility, or both.


Setting up an R Admin Group

When I set up an R server for clients they often want to be able to install packages so that all users on the machine have access to them. This requires them to be able to install the packages onto the root filesystem rather that under their individual home directories. It would be easy enough to give them su access, but this is a risky approach. There are so many other things on the system that they could break with this level of power. There is a useful alternative though: simply set up an R Admin Group and add them to it.


Introducing LinkedIn’s Avro2TF

A New Feature Transformation Framework for TensorFlow. Feature extraction and transformation is one of the main elements of any large-scale machine learning solution. Conceptually, feature extraction-transformation is the process of derive key pieces of information from a training dataset in a way that can be used by machine learning models. If you are building an isolated machine learning model, it is easy to grasp the relevance of feature extraction and transformation as, most likely, the code that accomplishes that is included in the same model. However, feature transformation quickly becomes a nightmare in environments running multiple machine learning models. LinkedIn has been dealing with feature extraction and transformation challenges for years and recently open sourced Avro2TF, a new framework for transforming large datasets into TensorFlow-ready features.


Pro-ML – Productive Machine Learning

The goal of Pro-ML is to double the effectiveness of machine learning engineers while simultaneously opening the tools for AI and modeling to engineers from across the LinkedIn stack. As we mapped out the effort, we kept a set of key ideas in place to constrain the solution space and focus our efforts.
• We will leverage and improve best-of-breed components from our existing code base to the maximum extent feasible. We are unlikely to rewrite our entire tech stack, but any particular component is fair game.
• The state of the art is constantly evolving with new algorithms and open source frameworks – we need to be flexible to support our existing major ML algorithms as well as new ones that will emerge.
• We will use an agile-inspired strategy so that each step we take is delivering value by making at least one product line better or providing generally useable improvements to existing components.
• The ability to run the models in real-time is as important as the ability to author or train them. The services hosting the models must be able to be independently upgraded without breaking their downstream or upstream services.
• New models, retrained models, and models using new technologies must be A/B testable in production.
• We must build GDPR privacy requirements into every stage of the solution.


Photon ML – Photon Machine Learning

Photon ML is a machine learning library based on Apache Spark. It was originally developed by the LinkedIn Machine Learning Algorithms Team. Currently, Photon ML supports training different types of Generalized Linear Models(GLMs) and Generalized Linear Mixed Models(GLMMs/GLMix model): logistic, linear, and Poisson.


The Bias-Variance trade-off : Explanation and Demo

The Bias-Variance trade-off is a basic yet important concept in the field of data science and machine learning. Often, we encounter statements like ‘simpler models have high bias and low variance whereas more complex or sophisticated models have low bias and high variance’ or ‘high bias leads to under-fitting and high variance leads to over-fitting’. But what do bias and variance actually mean and how are they related to the accuracy and performance of a model? In this article, I will explain the intuitive and mathematical meaning of bias and variance, show the mathematical relation between bias, variance and performance of a model and finally demo the effects of varying the model complexity on bias and variance through a small example.


TEA

Tea is a high-level domain-specific language and runtime system that automates statistical test selection and execution. Tea achieves these by applying techniques and ideas from human-computer interaction, programming languages, and software engineering to statistical analysis. Our hope is that Tea opens up possibilities for newtools for statistical analysis, helps researchers in diverse empirical fields, and resolves a century-old question: ‘Which test should I use to test my hypothesis?’


Transfer learning: the dos and don’ts

If you have recently started doing work in deep learning, especially image recognition, you might have seen the abundance of blog posts all over the internet, promising to teach you how to build a world-class image classifier in a dozen or fewer lines and just a few minutes on a modern GPU. What’s shocking is not the promise but the fact that most of these tutorials end up delivering on it. How is that possible? To those trained in ‘conventional’ machine learning techniques, the very idea that a model developed for one data set could simply be applied to a different one sounds absurd. The answer is, of course, transfer learning, one of the most fascinating features of deep neural networks. In this post, we’ll first look at what transfer learning is, when it will work, when it might work, and why it won’t work in some cases, finally concluding with some pointers at best practices for transfer learning.


Cloud Data FusionBeta

Cloud Data Fusion is a fully managed, cloud-native data integration service that helps users efficiently build and manage ETL/ELT data pipelines. With a graphical interface and a broad open-source library of preconfigured connectors and transformations, Data Fusion shifts an organization’s focus away from code and integration to insights and action.
Advertisements