It is often said that in machine learning (and more specifically deep learning) – it’s not the person with the best algorithm that wins, but the one with the most data. We can always try and collect or generate more labelled data but it’s an expensive and time consuming task. This is where the promise and potential of unsupervised deep learning algorithms comes into the picture. They are designed to derive insights from the data without any supervision. For example, customers can be segmented into different groups based on their buying behaviour. This information can then be used to serve up better product recommendations.
Did you know that you can write R and Python code within your T-SQL statements Machine Learning Services in SQLServer eliminates the need for data movement. Instead of transferring large and sensitive data over the network or losing accuracy with sample csv files, you can have your R/Python code execute within your database. Easily deploy your R/Python code with SQL stored procedures making them accessible in your ETL processes or to any application. Train and store machine learning models in your database bringing intelligence to where your data lives.
Here I am focusing on outlier and anomaly detection. Important to note that outliers and anomalies can be synonymous, but there are few differences, although I am not going into those nuances.
Eighth post of our series on classification from scratch. The latest one was on the SVM, and today, I want to get back on very old stuff, with here also a linear separation of the space, using Fisher´s linear discriminent analysis.
Artificial neural networks, especially deep neural networks and (deep) convolutions neural networks, have become increasingly popular in recent years, dominating most machine learning competitions since the early 2010´s (for reviews about DNN and (D)CNNs see LeCun, Bengio, & Hinton, 2015). In ecology, there are a large number of potential applications for these methods, for example image recognition, analysis of acoustic signals, or any other type of classification tasks for which large datasets are available.
In the exercises below, we will work with Time Series analysis and see how R can make your life easier when working with Time Series. This will be a series of Exercises and I urge you to take it in series. Please install the package and load the library before starting.
In an earlier post I focused on an in depth visit with CHAID (Chi-square automatic interaction detection). There are lots of tools that can help you predict an outcome, or classify, but CHAID is especially good at helping you explain to any audience how the model arrives at it´s prediction or classification. It´s also incredibly robust from a statistical perspective, making almost no assumptions about your data for distribution or normality. This post I´ll focus on marrying CHAID with the awesome caret package to make our predicting easier and hopefully more accurate. Although not strictly necessary you´re probably best served by reading the original post first.
Nineth post of our series on classification from scratch. Today, we´ll see the heuristics of the algorithm inside classification trees. And yes, I promised eight posts in that series, but clearly, that was not sufficient… sorry for the poor prediction.
Understanding a scenario where your machine learning model can fail
The scatterplot matrix, known acronymically as SPLOM, is a relatively uncommon graphical tool that uses multiple scatterplots to determine the correlation (if any) between a series of variables. These scatterplots are then organized into a matrix, making it easy to look at all the potential correlations in one place. SPLOMs, invented by John Hartigan in 1975, allow data aficionados to quickly realize any interesting correlations between parameters in the data set. In this post, we´ll go over how to make SPLOMs in Plotly with Python. For extra insights, check out our SPLOM tutorial in Python and R.
Recently, I gave a talk at the O´Reilly AI conference in Beijing about some of the interesting lessons we´ve learned in the world of NLP. While there, I was lucky enough to attend a tutorial on Deep Reinforcement Learning (Deep RL) from scratch by Unity Technologies. I thought that the session, led by Arthur Juliani, was extremely informative and wanted to share some big takeaways below.