Road Map for Choosing Between Statistical Modeling and Machine Learning

Data analysis methods may be described by their areas of applications, but for this article I’m using definitions that are strictly methods-oriented. A statistical model (SM) is a data model that incorporates probabilities for the data generating mechanism and has identified unknown parameters that are usually interpretable and of special interest, e.g., effects of predictor variables and distributional parameters about the outcome variable. The most commonly used SMs are regression models, which potentially allow for a separation of the effects of competing predictor variables. SMs include ordinary regression, Bayesian regression, semiparametric models, generalized additive models, longitudinal models, time-to-event models, penalized regression, and others. Penalized regression includes ridge regression, lasso, and elastic net. Contrary to what some machine learning (ML) researchers believe, SMs easily allow for complexity (nonlinearity and second-order interactions) and an unlimited number of candidate features (if penalized maximum likelihood estimation or Bayesian models with sharp skeptical priors are used). It is especially easy, using regression splines, to allow every continuous predictor to have a smooth nonlinear effect.

How to choose the best eDiscovery software in 2018

Three out of four corporate legal operations professionals say their entire legal departments would benefit from reducing their eDiscovery costs, according to the OpenText Corporate Legal Ops Survey. Around half of the respondents said they have either consolidated their eDiscovery solutions or are in the process of doing so. In addition, many enterprises are looking to in-source eDiscovery software to conquer litigation and due diligence challenges. But not all litigation profiles are the same and each legal department must evaluate what platforms will best serve their needs. So, what should you look for in the best eDiscovery software.

eDiscovery Features
• Robust search capabilities
• Leverage metadata
• Machine learning tools
• Integration with existing systems
• Minimize data collection’s impact on staff
• Rolling data loads
• Defensible data
• Audit trails and reporting
• Comprehensive eDiscovery capabilities

Is Model Bias a Threat to Equal and Fair Treatment? Maybe, Maybe Not.

There is a great hue and cry about the danger of bias in our predictive models when applied to high significance events like who gets a loan, insurance, a good school assignment, or bail. It’s not as simple as it seems and here we try to take a more nuanced look. The result is not as threatening as many headlines make it seem.

TensorBoard Tutorial

This tutorial will guide you on how to use TensorBoard, which is an amazing utility that allows you to visualize data and how it behaves. You will see for what sort of purposes you can use it when training a neural network.

Classification from scratch, SVM 7/8

Seventh post of our series on classification from scratch. The latest one was on the neural nets, and today, we will discuss SVM, support vector machines.

Human Interpretable Machine Learning (Part 1) — The Need and Importance of Model Interpretation

A brief introduction into machine learning model interpretation

ioModel Machine Learning Research Platform – Open Source

This article introduces ioModel, an open source research platform that ingests data and automatically generates descriptive statistics on that data.

In the past, scientific researchers who strove to innovate have had to either learn the discipline of writing code or rely on computer or data scientists for complex model development and the integration of the models that were developed. The ioModel Research Platform challenges this traditional approach by putting the power of machine learning directly into the hands of subject matter experts, unlocking the potential for more rapid innovation at a significantly reduced cost with higher reliability.

Three techniques to improve machine learning model performance with imbalanced datasets

This project was part of one my recent job interview skill test for a “Machine learning engineer” position. I had to complete the project in 48 hours which includes writing a 10-page report in latex. The dataset has classes and highly imbalanced. The primary objective of this project was to handle data imbalance issue. In the following subsections, I describe three techniques I used to overcome the data imbalance problem.

Self-Regularization in Deep Neural Networks: a preview

Empirical results, using the machinery of Random Matrix Theory (RMT), are presented that are aimed at clarifying and resolving some of the puzzling and seemingly-contradictory aspects of deep neural networks (DNNs). We apply RMT to several well known pre-trained models: LeNet5, AlexNet, and Inception V3, as well as 2 small, toy models. We show that the DNN training process itself implicitly implements a form of self-regularization associated with the entropy collapse / information bottleneck. We find that the self-regularization in small models like LeNet5, resembles the familar Tikhonov regularization whereas large, modern deep networks display a new kind of heavy tailed self-regularization.

Strengthen your digital defenses with AI and supervised machine learning

Data Data (source: Pixabay) Download “Securing Web Applications’ to learn more about how to defend your web applications against common threats. In the context of security, there are many discussions on how artificial intelligence (AI) and machine learning (ML) can be leveraged for malicious purposes. For example, in the recently released report The Malicious Use of Artificial Intelligence: Forecasting, Prevention, and Mitigation, you will find a list of scenarios where AI and ML have the potential to be used to attack business assets physically, politically, and digitally. Possible cybersecurity threat scenarios include sophisticated and automated malicious hacking, human-like denial-of-services attacks, prioritized and targeted ML attacks, and the exploitative use of AI in applications related to information security.

GDPR: After 25th May, What Medium And Long Term Actions?

The General Data Protection Regulation (GDPR) requires businesses to protect the personal data and privacy of EU citizens and any non-compliance could cost them dearly. For those who are still unfamiliar with this, take a look at the infographic below. It shall give you a basicunderstanding of GDPR and its aspects.

5 Free Programming and Machine Learning Books for Data Scientists

There is a lifelong learning curve for data scientists. You will learn more quickly by reading the right books and focusing on developing the right skills. The good news is that Packt has published a number of books that can help you out.

Introduction to Game Theory (Part 1)

Game theory generally refers to the study of mathematical models that describe the behavior of logical decision-makers. It is widely used in many fields such as economics, political science, politics, and computer science, and can be used to model many real-world scenarios. Generally, a game refers to a situation involving a set of players who each have a set of possible choices, in which the outcome for any individual player depends partially on the choices made by other players.

An overview on evolutionary algorithms for many-objective optimization problems

Multiobjective evolutionary algorithms (MOEAs) effectively solve several complex optimization problems with two or three objectives. However, when they are applied to many-objective optimization, that is, when more than three criteria are simultaneously considered, the performance of most MOEAs is severely affected. Several alternatives have been reported to reproduce the same performance level that MOEAs have achieved in problems with up to three objectives when considering problems with higher dimensions. This work briefly reviews the main search difficulties, visualization, evaluation of algorithms, and new procedures in many-objective optimization using evolutionary methods. Approaches for the development of evolutionary many-objective algorithms are classified into: (a) based on preference relations, (b) aggregation-based, (c) decomposition-based, (d) indicator-based, and (e) based on dimensionality reduction. The analysis of the reviewed works indicates the promising future of such methods, especially decomposition-based approaches; however, much still need to be done to develop more robust, faster, and predictable evolutionary many-objective algorithms.