Introduction to Artificial Neural Network Model

In this Machine Learning tutorial, we will take you through the introduction of Artificial Neural network Model. First of all, we will discuss the multilayer Perceptron network next with the Radial Basis Function Network, they both are supervised learning model. At last, we will cover the Kohonen Model which follows Unsupervised learning and the difference between Multilayer Perceptron network and Radial Basis Function Network.

A Brief Overview of Outlier Detection Techniques

Outliers are extreme values that deviate from other observations on data , they may indicate a variability in a measurement, experimental errors or a novelty. In other words, an outlier is an observation that diverges from an overall pattern on a sample.

RStudio:addins part 4 – Unit testing coverage investigation and improvement, made easy

A developer always pays his technical debts! And we have a debt to pay to the gods of coding best practices, as we did not present many unit tests for our functions yet. Today we will show how to efficiently investigate and improve unit test coverage for our R code, with focus on functions governing our RStudio addins, which have their own specifics.

MiKTeX Behind a Windows Firewall

I´ve always had problems with MiKTeX on my work computer. I can install it just fine, or get IT to install it, but then the package manager doesn´t work because of our firewall. You can set up a local repository to get around this problem, and I will show you how. I´m just doing a basic setup here, just enough to compile the RStudio Rmd template to PDF. ‘Why do I need MiKTeX ‘, you might ask. Well, because if you want to create a PDF from an RMarkdown file in RStudio it is required. Otherwise you get this polite error message.

Whether to use a data frame in R?

In this post, I try to show you in which situations using a data frame is appropriate, and in which it´s not.

CVXR: A Direct Standardization Example

In our first blog post, we introduced CVXR, an R package for disciplined convex optimization, and showed how to model and solve a non-negative least squares problem using its interface. This time, we will tackle a non-parametric estimation example, which features new atoms as well as more complex constraints.

Divide and recombine (D&R) data science projects for deep analysis of big data and high computational complexity

The focus of data science is data analysis. This article begins with a categorization of the data science technical areas that play a direct role in data analysis. Next, big data are addressed, which create computational challenges due to the data size, as does the computational complexity of many analytic methods. Divide and recombine (D&R) is a statistical approach whose goal is to meet the challenges. In D&R, the data are divided into subsets, an analytic method is applied independently to each subset, and the outputs are recombined. This enables a large component of embarrassingly-parallel computation, the fastest parallel computation. DeltaRho open-source software implements D&R. At the front end, the analyst programs in R. The back end is the Hadoop distributed file system and parallel compute engine. The goals of D&R are the following: access to thousands of methods of machine learning, statistics, and data visualization; deep analysis of the data, which means analysis of the detailed data at their finest granularity; easy programming of analyses; and high computational performance. To succeed, D&R requires research in all of the technical areas of data science. Network cybersecurity and climate science are two subject-matter areas with big, complex data benefiting from D&R. We illustrate this by discussing two datasets, one from each area. The first is the measurements of 13 variables for each of 10,615,054,608 queries to the Spamhaus IP address blacklisting service. The second has 50,632 3-hourly satellite rainfall estimates at 576,000 locations.