Apache NiFi

Put simply NiFi was built to automate the flow of data between systems. While the term ‘dataflow’ is used in a variety of contexts, we use it here to mean the automated and managed flow of information between systems. This problem space has been around ever since enterprises had more than one system, where some of the systems created data and some of the systems consumed data. The problems and solution patterns that emerged have been discussed and articulated extensively. A comprehensive and readily consumed form is found in the Enterprise Integration Patterns .

Python Setup: The Definitive Guide

In this tutorial, you’ll learn how to set up your computer for Python development, and explain the basics for having the best application lifecycle.

Proteomics Data Analysis (2/3): Data Filtering and Missing Value Imputation

Welcome to Part Two of the three-part tutorial series on proteomics data analysis. The ultimate goal of this exercise is to identify proteins whose abundance is different bewteen the drug-resistant cells and the control. In other words, we are looking for a list of differentially regulated proteins that may shed light on how cells escape the cancer-killing action of a drug. In Part One, I have demonstrated the steps to acquiring a proteomics data set and performing data pre-processing. We will pick up from the cleaned data set and confront the missing value problem in proteomics.

Introducing a New Framework for Flexible and Reproducible Reinforcement Learning Research

Reinforcement learning (RL) research has seen a number of significant advances over the past few years. These advances have allowed agents to play games at a super-human level – notable examples include DeepMind´s DQN on Atari games along with AlphaGo and AlphaGo Zero, as well as Open AI Five. Specifically, the introduction of replay memories in DQN enabled leveraging previous agent experience, large-scale distributed training enabled distributing the learning process across multiple workers, and distributional methods allowed agents to model full distributions, rather than simply their expected values, to learn a more complete picture of their world. This type of progress is important, as the algorithms yielding these advances are additionally applicable for other domains, such as in robotics (see our recent work on robotic manipulation and teaching robots to visually self-adapt).

Linear Regression In Real Life

We learn a lot of interesting and useful concepts in school but sometimes it’s not very clear how we can use them in real life. One concept/tool that might be widely underestimated is Linear Regression.

How to Make Your Machine Learning Models Robust to Outliers

According to Wikipedia, an outlier is an observation point that is distant from other observations. This definition is vague because it doesn´t quantify the word ‘distant’. In this blog, we´ll try to understand the different interpretations of this ‘distant’ notion. We will also look into the outlier detection and treatment techniques while seeing their impact on different types of machine learning models. Outliers arise due to changes in system behavior, fraudulent behavior, human error, instrument error, or simply through natural deviations in populations. A sample may have been contaminated with elements from outside the population being examined.