Here’s a situation you’ve got into: You are working on a classification problem and you have generated your set of hypothesis, created features and discussed the importance of variables. Within an hour, stakeholders want to see the first cut of the model. What will you do? You have hunderds of thousands of data points and quite a few variables in your training data set. In such situation, if I were at your place, I would have used ‘Naive Bayes‘, which can be extremely fast relative to other classification algorithms. It works on Bayes theorem of probability to predict the class of unknown data set. In this article, I’ll explain the basics of this algorithm, so that next time when you come across large data sets, you can bring this algorithm to action. In addition, if you are a newbie in Python or R, you should be overwhelmed by the presence of available codes in this article.
Interacting with the machine via natural language is one of the requirements for general artificial intelligence. This field of AI is called dialogue systems, spoken dialogue systems, or chatbots. The machine needs to provide you with an informative answer, maintain the context of the dialogue, and be indistinguishable from the human (ideally). In practice, the last requirement is not reachable yet, but luckily, humans are ready to talk with robots if they are helpful, sometimes funny, and interesting interlocutors.
When we open-sourced TensorFlow in 2015, it included TensorBoard, a suite of visualizations for inspecting and understanding your TensorFlow models and runs. Tensorboard included a small, predetermined set of visualizations that are generic and applicable to nearly all deep learning applications such as observing how loss changes over time or exploring clusters in high-dimensional spaces. However, in the absence of reusable APIs, adding new visualizations to TensorBoard was prohibitively difficult for anyone outside of the TensorFlow team, leaving out a long tail of potentially creative, beautiful and useful visualizations that could be built by the research community. To allow the creation of new and useful visualizations, we announce the release of a consistent set of APIs that allows developers to add custom visualization plugins to TensorBoard. We hope that developers use this API to extend TensorBoard and ensure that it covers a wider variety of use cases.
Right now there are literally thousands of datasets on Kaggle, and more being added every day. It’s a fabulous resource, but with so many datasets it can sometimes be a little tricky to find a dataset on the exact topic you’re interested in. Luckily, I’ve learned some tips and tricks over the last couple months that might help you out!
Given enough GPUs, distributed machine learning systems (such as the one Facebook has published earlier this week) excel in recognizing and labeling images. These systems can quickly and accurately determine whether a dog is in the image, but struggle to answer relational questions. For example, a computer vision software cannot determine whether the dog in the picture is bigger than the ball it is playing with or the couch it is sitting on.
Meet Tyler McMullen and attend his session ‘Building a skyscraper with Legos: The anatomy of a distributed system’ at The O’Reilly Velocity Conference, taking place Oct. 1-4 in New York. Use code ORM20 to save 20% on your conference pass (Gold, Silver, and Bronze passes). This video was originally recorded in 2016 at the O’Reilly Velocity Conference in New York.
A practical introduction with R and ggplot2
I decided to write this post, as occasionally youngsters approach me and ask me where they should start their adventure in Data Science & Machine Learning. There are other times, when the ‘not-so-youngsters’ want to know what their next step should be after having done some courses. This post includes my travels through the domains of Data Science, Machine Learning, Deep Learning and (soon to be done AI).