Free Book: Process Improvement Using Data

Rethinking statistical learning theory: learning using statistical invariants

This paper introduces a new learning paradigm, called Learning Using Statistical Invariants (LUSI), which is different from the classical one. In a classical paradigm, the learning machine constructs a classification rule that minimizes the probability of expected error; it is data-driven model of learning. In the LUSI paradigm, in order to construct the desired classification function, a learning machine computes statistical invariants that are specific for the problem, and then minimizes the expected error in a way that preserves these invariants; it is thus both data- and invariant-driven learning. From a mathematical point of view, methods of the classical paradigm employ mechanisms of strong convergence of approximations to the desired function, whereas methods of the new paradigm employ both strong and weak convergence mechanisms. This can significantly increase the rate of convergence.

Bad Bots Are Stealing Data AND Ruining Customer Experience

Every online customer touchpoint – including websites, mobile apps, and APIs – is being attacked by bots. What are these bad bots doing? Interrupting good customer traffic, committing fraud, and stealing information – just ad fraud alone is set to exceed $3.3 billion in 2018! If all that wasn´t bad enough, bots are also trying to skew the data your company uses to make decisions. Your marketing and customer experience colleagues track user behavior to improve customer journeys or buy advertising. Unless you´re actively defending against bad bots, this these decisions could be way off base, and extremely costly.

Union Multiple Data.Frames with Different Column Names

On Friday, while working on a project that I needed to union multiple data.frames with different column names, I realized that the base::rbind() function doesn´t take data.frames with different columns names and therefore just quickly drafted a rbind2() function on the fly to get the job done based on the idea of MapReduce that I discussed before (https://…/playing-map-and-reduce-in-r-subsetting ).

Manage your Data Science project structure in early stage.

Jupyter Notebook (or Colab, databrisk’s notebook etc) provide a very efficient way for building a project in a short time. We can create any python class and function in the notebook without re-launching kernel. Helping to shorten the waiting time for that. It is good for small scale project and experiment. However, it may not good for long term growth.

How to rapidly test dozens of deep learning models in Python

Optimizing machine learning (ML) models is not an exact science. The best model architecture, optimization algorithm and hyperparameter settings depend on the data you’re working with. Thus, being able to quickly test several model configurations is imperative in maximizing productivity & driving progress in your ML project. In this article, we’ll create an easy-to-use interface which allows you to do this. We’re essentially going to build an assembly line for ML models.

Evolution of Spark Analytics

Apache Spark is an open source, scalable, massively parallel, in-memory execution environment for running analytics applications. Data scientist is Primarily responsible for building predictive analytic models and building insights. He will analyze data that’s been cataloged and prepared by the data engineer using machine learning tools like Watson Machine Learning. He will build applications using Jupyter Notebooks, RStudio After the data scientist shares his Analytical outputs, Application developer can build APPs like a cognitive chatbot. As the chatbot engages with customers, it will continuously improve its knowledge and help uncover new insights.
Lets get into the shoes of a data scientist and see what are the things I want to do as a Data Scientist:
• I want to run my analytic jobs, I want to run social media analytics or text analytics
• I want to run queries on demand
• I want to run R, Python scripts on Spark
• I want to submit Spark jobs
• I want to view History Server Logs of my application so that i can compare my jobs performance and improve it further
• I want to see Daemon logs for my debugging
• I want to write Notebooks

Portfolio Optimization with Deep Reinforcement Learning

Portfolio Optimization or the process of giving optimal weights to assets in a financial portfolio is a fundamental problem in Financial Engineering. There are many approaches one can follow?-?for passive investments the most common is liquidity based weighting or market capitalization weighting. If one has no view on investment performance one follows equal weighting. Following the Capital Asset Pricing Model, the most elegant solution is the Markovitz Optimal portfolio?-?where risk-averse investors try to maximize return based on their level of risk .