Five Hundred Deep Learning Papers, Graphviz and Python

I am currently writing my computer engineering master thesis about Deep Learning. When I came to the state-of-art part I thought I could build up something interesting by making a graph of the most relevant papers for the last twenty-five years, the only problem is that I don’t have much experience with GraphViz so after I built a sort of JSON archive by scraping (sorry for that) Google Scholar, I made a lot of tries to figure out how GraphViz works and how I could visualize the results in a meaningful way. In this post I will show how in brief how I obtained the papers data and how I came to the final result above.


Building a Streaming Analytics Data Stack

At Jut, we built a streaming analytics data stack— one place to send, store, analyze, and visualize any combination of logs, events, and metrics.
This post lays out the blueprint for the pieces we used and how we put them together. We’ll cover:
• Ingest: how to bring in many different types of data streams.
• Index and querying: efficient storage and unified queries.
• Wiring it up: how data flows through the system.
• Optimization: making queries fast.
We hope that this will be useful and can server as a high-level orientation for those who are getting started on the exciting (but often daunting, at first) task of a building a similar system.


How to write the first for loop in R

In this tutorial we will have a look at how you can write a basic for loop in R. It is aimed at beginners, and if you’re not yet familiar with the basic syntax of the R language we recommend you to first have a look at this introductory R tutorial.


Prior on df (normality) parameter in t distribution

Many models in DBDA use a t distribution for ‘robust’ estimation of central tendency parameters. The idea is that the heavy tails of a t distribution can accommodate data points that would be outliers relative to a normal distribution, and therefore the estimates of the mean and standard deviation are not distorted by the outliers. Thus the estimates of the mean and standard deviation in the t distribution are ‘robust against outliers’ relative to a normal distribution.


18 Useful Mobile Apps for Data Scientist / Data Analysts

1. Elevate (Downloads – 1 million, Rating – 4.5, Size – 38.80 MB)
2. Lumosity (Downloads – 10 million, Rating – 4.1, Size – 49.71 MB)
3. Neuro Nation (Downloads – 5million, Rating – 4.5, Size – 18.32 MB)
4. Math Workout (Downloads – 5 million , Rating – 4.2, Size – 2.71 MB)
5. Math Tricks (Downloads – 5 million, Rating – 4.4, Size – 6.90 MB)
6. QPython (Downloads – 500k, Rating – 4.4, Size – 13.17 MB)
7. Learn Python (Downloads – 10k, Rating – 4.7, Size – 4.77 MB)
8. R Programming (Downloads – 10k, Rating – 3.8, Size – 561 KB)
9. Excel Tutorial (Downloads – 100k, Rating – 4.2, Size – 9.05 MB)
10. Termux (Downloads – 10k, Rating – 4.8, Size – 134KB )
11. Basic Statistics (Downloads – 50k, Rating – 4.1, Size – 2.72 MB)
12. Probability Distributions (Downloads – 50k, Rating – 4.5, Size – 3.21 MB)
13. Statistics and Sample Size (Downloads – 10k, Rating – 4.4, Size – 3.43 MB)
14. Khan Academy (Downloads – 100k, Rating – 4.5, Size – 21.48 MB)
15. Udacity (Downloads – 1 million, Rating – 4.2, Size – 5.40 MB)
16. Coursera (Downloads – 5 million, Rating – 4.3, Size – 16.80 MB)
17. edX (Downloads – 500k, Rating – 4.2, Size – 6.07 MB)
18. Udemy (Download – 5million, Rating – 4.3, Size – 27.47 MB)


Tutorial – Getting Started with GraphLab For Machine Learning in Python

GraphLab came as an unexpected breakthrough on my learning plan. After all, ‘ Good Things Happen When You Expect Them Least To Happen’. It all started with the end of Black Friday Data Hack. Out of 1200 participants, we got our winners and their interesting solutions. I read and analyzed them. I realized that I had missed on an incredible machine learning tool. A quick exploration told me that this tool has immense potential to reduce our machine learning modeling pains. So, I decided to explore it further. I now have dedicated a few days to understand its science and logical methods of usage. To my surprise, it wasn’t difficult to understand.


Inter-relationships in a matrix

Last week, I wanted to displaying inter-relationships between data in a matrix. My friend Fleur, from AXA, mentioned an interesting possible application, in car accidents. In car against car accidents, it might be interesting to see which parts of the cars were involved. On https://…/, we can find such a dataset, with a lot of information of car accident involving bodily injuries (in France, a police report is necessary, and all of them are reported in a big dataset… actually several dataset, with information of people involved, cars, locations, etc).


What was data science before it was called data science?

“Data Science” is obviously a trendy term making it way through the hype cycle. Either nobody is good enough to be a data scientist (unicorns) or everybody is too good to be a data scientist (or the truth is somewhere in the middle).


Real-Time Data Applications

There are a variety of useful applications for real-time data, including quick identification of general patterns and trends in data, performing sentiment analysis, crafting responses in real-time, and—perhaps one of the most important uses—when having analysis immediately will change the outcome of the situation. This Learning Path provides an in-depth tour of technologies used in processing and analyzing real-time data.


RStudio Essentials Webinar Series

The RStudio IDE is bursting with capabilities and features. Do you know how to use them all? Tomorrow, we begin an “RStudio Essentials” webinar series. This will be the perfect way to learn how to use the IDE to its fullest. The series is broken into six sections always on a Wednesday at 11 a.m. EDT:
• Programming Part 1 (Writing code in RStudio) – December 2nd
• Programming Part 2 (Debugging code in RStudio) – December 9th
• Programming Part 3 (Package Writing and in RStudio) – December 16th
• Managing Change Part 1 (Projects in RStudio) – January 6th
• Managing Change Part 2 (Github and RStudio) – January 20th
• Managing Change Part 3 (Package version with Packrat) – February 3rd
Each webinar will be 30 minutes long, which will make them easy to attend. If you miss a live webinar or want to review them, recorded versions will be available to registrants.


100 Data Science Interview Questions and Answers

In collaboration with data scientists, industry experts and top counsellors, we have put together a list of general data science interview questions and answers to help you with your preparation in applying for data science jobs. This first part of a series of data science interview questions and answers article, focuses only on the general topics like questions around data, probability, statistics and other data science concepts. This also includes a list of open ended questions that interviewers ask to get a feel of how often and how quickly you can think on your feet. These kind of questions also measures if you were successful in applying data science techniques to real life problems.


A/B Testing, from scratch

If you work in a diligent web development business you probably know what an A/B test is. However, its fascinating statistical theory is usually left behind. Understanding the basics can help you avoid common pitfalls, better design your experiments, and ultimately do a better job in improving the effectiveness of your website. Please hold tight, and enjoy a pleasant statistical journey with the help of R and some maths. You will not be disappointed.


A Map of the Social Media Lands

Imagine you’re organizing a big tech conference, and you want to understand what people thought of your conference, so you can run it even better next year. A good resource to look at, would be what people posted on Social Media about your conference, because well, it’s a tech event after all. But there’s a problem here: if your event is as popular as the Web Summit, you’re going to have to go through hundreds of thousands of tweets, which is simply not practical. One common solution would be to use Social Media monitoring and analysis tools, which try to aggregate all of these posts in one place, and crunch them into more understandable metrics: total number of posts, most frequently used hashtags, average sentiment, etc., and this is indeed what we did in our 2014 report of the Web Summit according to Twitter. But what if we wanted to go a level deeper, to understand a little bit more in depth, what people thought and said about our conference? What if we wanted to build a map, that shows all the topics that were discussed, the sentiment around them, and how these two things evolved with time? That’s what we’re going to explore in this series of blogs. Over a 5 day period, we collected about 77,000 tweets about the Web Summit 2015 using the Twitter Streaming API and in this blog series, we’re going to explore them and see what we can extract.
Advertisements