Top 20 Python Machine Learning Open Source Projects
We examine top Python Machine learning open source projects on Github, both in terms of contributors and commits, and identify most popular and most active ones.
1. scikit-learn
2. Pylearn2
3. NuPIC
4. Nilearn
5. PyBrain
6. Pattern
7. Fuel
8. Bob
9. skdata
10. MILK
11. IEPY
12. Quepy
13. Hebel
14. mlxtend
15. nolearn
16. Ramp
17. Feature Forge
18. REP
19. Python Machine Learning Samples
20. Python-ELM

Cheat sheet: Data Visualisation in Python
It is said ‘A visually presented data speaks for itself’. Data, served in the right visual form, brings out hidden trends and insights to enable faster decision making. The importance of right visualization is only set to increase with increasing data. Python, popular for its ease of writing codes, offers some amazing set of libraries support to create visualization. Not only 2D, it has features to create jaw-dropping 3D visualisations & animations. Here is the cheat sheet for popular visualisation methods used for representing data. You can keep this handy for your use:

Report: EuroVis 2015
EuroVis 2015 last week in Cagliari, Sardinia

Analysing the Twitter Mentions Network
We’ve seen that the tools available in R have made aquiring and analysing the Twitter network an easily accessible task. In this post we’ve dipped our toe into the vast array of methods available in network analysis and we’ve found that digging a little deeper than simply counting connections can lead to a deeper insight of the network’s true function. With so much data available it is important to ask the right questions. If your goal is to improve your social media presence, or whether you are trying to better search for influential users, a little bit of the right network tools can go a long way. – See more at: http://…/#sthash.YLj3YQcX.dpuf

Out-of-core Learning and Model Persistence using scikit-learn
When we are applying machine learning algorithms to real-world applications, our computer hardware often still constitutes the major bottleneck of the learning process. Of course, we all have access to supercomputers, Amazon EC2, Apache Spark, etc. However, out-of-core learning via Stochastic Gradient Descent can still be attractive if we’d want to update our model on-the-fly (‘online-learning’), and in this notebook, I want to provide some examples of how we can implement an ‘out-of-core’ approach using scikit-learn. I compiled the following code examples for personal reference, and I don’t intend it to be a comprehensive reference for the underlying theory, but nonetheless, I decided to share it since it may be useful to one or the other!

Top 30 Social Network Analysis and Visualization Tools
We review major tools and packages for Social Network Analysis and visualization, which have wide applications including biology, finance, sociology, network theory, and many other domains.

A comparison of high-performance computing techniques in R
When it comes to speeding up ’embarassingly parallel’ computations (like for loops with many iterations), the R language offers a number of options:
• An R looping operator, like mapply (which runs in a single thread)
• A parallelized version of a looping operator, like mcmapply (which can use multiple cores)
• Explicit parallelization, via the parallel package or the ParallelR suite (which can use multiple cores, or distribute the problem across nodes in a cluster)
• Translating the loop to C++ using Rcpp (which runs as compiled and optimized machine code)

Cluster analysis on earthquake data from USGS
In this experiment we will look at a very simple exercise of cluster analysis of seismic events downloaded from the USGS website. To complete this exercise you would need the following packages: sp, raster, plotrix, rgeos, rgdal and scatterplot3d.

4 Business Benefits of Data Visualization
Here are the top five benefits that data visualization provides to organizations & decision makers.
1. Absorb more information easily
2. Discover relationships & patterns between business & operational activities
3. Identify emerging trends faster
4. Directly interact with data