Random Forest in Python
Random Forest is a machine learning algorithm used for classification, regression, and feature selection. It’s an ensemble technique, meaning it combines the output of one weaker technique in order to get a stronger result.
Top YouTube Videos on Machine Learning, Neural Network & Deep Learning
One of the best way to get better at machine learning and deep learning is to watch a lecture from an expert and work your way along with it. If you do so, you get the best of both the worlds – you learn from the experts across the globe and also get hands on knowledge. In this article, I have provided a list of YouTube videos, which you can use to improve your knowledge in these areas.
Best Practices When Starting And Working On A Data Science Project
Here is a summary of the best practices for working on enterprise analytics projects.
Soft Skills Best Practice
• Understand the business problem
• Understand the stakeholders
• Understand your STARS situation
• Explain what you are doing and why
• Explain the caveats in interpreting what you are doing
• Always focus on the business problem
• Continuously validate the above
o Budget and Plan
• Clearly set out your approach, milestones and deliverables
• Measure progress and adjust when going off track or moving in a new direction
Technical Skills Best Practice Using the 7 Guerrilla Analytics Principles
• Keep Everything
• Keep It Simple
• Maintain Data Provenance
• Version Control Data and Code
• Think like a developer
• Test data with the 5 Cs of Guerrilla Analytics Data Quality
• Test code.
• Test models.
Hadoop Ecosystem and its components
Let us understand the components in Hadoop Ecosytem to build right solutions for a given business problem.
Data modeling with multi-model databases
In recent years, the idea of ‘polyglot persistence’ has emerged and become popular — for example, see Martin Fowler’s excellent blog post. Fowler’s basic idea can be interpreted that it is beneficial to use a variety of appropriate data models for different parts of the persistence layer of larger software architectures. According to this, one would, for example, use a relational database to persist structured, tabular data; a document store for unstructured, object-like data; a key/value store for a hash table; and a graph database for highly linked referential data. Traditionally, this means that one has to use multiple databases in the same project, which leads to some operational friction (more complicated deployment, more frequent upgrades) as well as data consistency and duplication issues.
Like pretty much everyone, I’m obsessed with word embeddings word2vec or GloVe. Although most of machine learning in general is based on turning things into vectors, it got me thinking that we should probably be learning more fundamental representations for objects, rather than hand tuning features. Here is my attempt at turning random things into vectors, starting with graphs.
Standardizing the World of Machine Learning Web Service APIs
We introduce Protocols and Structures for Inference (PSI) API specification which enables delivering flexible Machine Learning by specifying how datasets, learning algorithms and predictors can be presented as web resources.
Deep Learning and the Triumph of Empiricism
Theoretical guarantees are clearly desirable. And yet many of today’s best-performing supervised learning algorithms offer none. What explains the gap between theoretical soundness and empirical success?
Time series outlier detection (a simple R function)
Imagine you have a lot of time series – they may be short ones – related to a lot of different measures and very little time to find outliers. You need something not too sophisticated to solve quickly the mess. This is – very shortly speaking – the typical situation in which you can adopt washer.AV() function in R language. In this linked document (washer) you have the function and an example of actual application in R language: a data.frame (dati) with temperature and rain (phen) measures (value) in 4 periods of time (time) and in 20 geographical zones (zone). (20*4*2=160 arbitrary observations).