A Comprehensive Tutorial to Learn Data Science with Julia from Scratch

The above line tells a lot about why I chose to write this article. I came across Julia a while ago even though it was in its early stages, it was still creating ripples in the numerical computing space. Julia is a work straight out of MIT, a high-level language that has a syntax as friendly as Python and performance as competitive as C. This is not all, It provides a sophisticated compiler, distributed parallel execution, numerical accuracy, and an extensive mathematical function library. But this article isn’t about praising Julia, it is about how can you utilize it in your workflow as a data scientist without going through hours of confusion which usually comes when we come across a new language. Read more about Why Julia? here.

Microsoft R Open 3.4.2 now available

Microsoft R Open (MRO), Microsoft’s enhanced distribution of open source R, has been upgraded to version 3.4.2 and is now available for download for Windows, Mac, and Linux. This update upgrades the R language engine to the latest R 3.4.2 and updates the bundled packages. MRO is 100% compatible with all R packages. MRO 3.4.2 points to a fixed CRAN snapshot taken on October 15 2017, and you can see some highlights of new packages released since the prior version of MRO on the Spotlights page. As always you can use the built-in checkpoint package to access packages from an earlier date (for compatibility) or a later date (to access new and updated packages). MRO 3.4.2 is based on R 3.4.2, a minor update to the R engine (you can see the detailed list of updates to R here). This update is backwards-compatible with R 3.4.1 (and MRO 3.4.1), so you shouldn’t encounter an new issues by upgrading. We hope you find Microsoft R Open useful, and if you have any comments or questions please visit the Microsoft R Open forum. You can follow the development of Microsoft R Open at the MRO Github repository. To download Microsoft R Open, simply follow the link below.

Demo Week: Time Series Machine Learning with h2o and timetk

We’re at the final day of Business Science Demo Week. Today we are demo-ing the h2o package for machine learning on time series data. What’s demo week? Every day this week we are demoing an R package: tidyquant (Monday), timetk (Tuesday), sweep (Wednesday), tibbletime (Thursday) and h2o (Friday)! That’s five packages in five days! We’ll give you intel on what you need to know about these packages to go from zero to hero. Today you’ll see how we can use timetk + h2o to get really accurate time series forecasts. Here we go!

Automated Machine Learning Drives Intelligent Business

Even as enterprises explore how artificial intelligence can help their organizations and people discuss the relationship between humans and software, automated machine learning tools continue to simplify deployment across all layers.

Monitoring and Machine Learning: How Close are We?

The wonders of automation have brought incredible efficiencies to standard IT monitoring practices, especially when it comes to the detection-prevention-analysis-response (DPAR) cycle. Automating detection and remediation steps via alerts has alleviated massive amounts of stress for IT teams and businesses alike, providing the data needed to understand how and why issues happen. But it begs the question: are we doing enough with that data, and how can we do more? Is machine learning a viable solution for modern monitoring practices? Why DPAR? First, some background: DPAR cycles stem from the three ultimate goals of the information security process that guide businesses on how to protect confidentiality, ensure integrity, and maintain availability. While these guidelines are fluid and should not be thought of as a cure-all, they also inform the process of building sophisticated monitoring, especially as it relates to leveraging the emerging area of machine learning.

7 Steps to Mastering Deep Learning with Keras

Are you interested in learning how to use Keras? Do you already have an understanding of how neural networks work? Check out this lean, fat-free 7 step plan for going from Keras newbie to master of its basics as quickly as is possible.
Step 1: Neural Networks Basics
Step 2: Keras Basics
Step 3: An Overview of Keras
Step 4: Baby Steps with Keras
Step 5: Implementing a Convolutional Neural Network
Step 6: Implementing a Recurrent Neural Network
Step 7: What Next?

Neural Networks, Step 1: Where to Begin with Neural Nets & Deep Learning

This is a short post for beginners learning neural networks, covering several essential neural networks concepts.

Practical Machine Learning with R and Python – Part 4

This is the 4th installment of my ‘Practical Machine Learning with R and Python’ series. In this part I discuss classification with Support Vector Machines (SVMs), using both a Linear and a Radial basis kernel, and Decision Trees. Further, a closer look is taken at some of the metrics associated with binary classification, namely accuracy vs precision and recall. I also touch upon Validation curves, Precision-Recall, ROC curves and AUC with equivalent code in R and Python

heatmaply: an R package for creating interactive cluster heatmaps for online publishing

heatmaply is an R package for easily creating interactive cluster heatmaps that can be shared online as a stand-alone HTML file. Interactivity includes a tooltip display of values when hovering over cells, as well as the ability to zoom in to specific sections of the figure from the data matrix, the side dendrograms, or annotated labels. Thanks to the synergistic relationship between heatmaply and other R packages, the user is empowered by a refined control over the statistical and visual aspects of the heatmap layout.

Introducing Kaggle’s State of Data Science & Machine Learning Report, 2017

In 2017 we conducted our first ever extra-large, industry-wide survey to captured the state of data science and machine learning. As the data science field booms, so has our community. In 2017 we hit a new milestone of reaching over 1M registered data scientists from almost every country in the world. Representing many different backgrounds, skill levels, and professions, we were excited to ask our community a wide range of questions about themselves, their skills, and their path to data science. We asked them everything from “what’s your yearly salary?” to “what’s your favorite data science podcasts?” to “what barriers are faced at work?”, letting us piece together key insights about the people and the trends behind the machine learning models.

Top 6 errors novice machine learning engineers make

1. Taking the default loss function for granted
2. Using one algorithm/method for all problems
3. Ignoring outliers
4. Not properly dealing with cyclical features
5. L1/L2 Regularization without standardization
6. Interpreting coefficients from linear or logistic regression as feature importance