Beginners Tutorial for Regular Expressions in Python
Regular expressions are normally the default way of data cleaning and wrangling in most of these tools. Be it extraction of specific parts of text from web pages, making sense of twitter data or preparing your data for text mining – Regular expressions are your best bet for all these tasks. Given their applicability, it makes sense to know them and use them appropriately.

Implementing Fisher’s LDA in Python
If you don’t have knowledge of Fisher’s LDA go through my previous post Fisher’s Linear Discriminant.

Materials for Learning Machine Learning
Lately, myself and a friend have become rather interested in learning more about machine learning. I’ve been trying to start a collection of learning materials that I either found useful or mean to go through at some point, and thought I’d write a post about it. I’m hoping this can help people who are just starting out and those who know more than me could comment and point me towards some additional material I can link to and look into myself.

Journal of Machine Learning Research – Volume 16
• Statistical Decision Making for Optimal Budget Allocation in Crowd LabelingXi Chen, Qihang Lin, Dengyong Zhou; 16(Jan):1−46, 2015.
• Links Between Multiplicity Automata, Observable Operator Models and Predictive State Representations — a Unified Learning FrameworkMichael Thon, Herbert Jaeger; 16(Jan):103−147, 2015.
• Simultaneous Pursuit of Sparseness and Rank Structures for Matrix DecompositionQi Yan, Jieping Ye, Xiaotong Shen; 16(Jan):47−75, 2015.
• Statistical Topological Data Analysis using Persistence LandscapesPeter Bubenik; 16(Jan):77−102, 2015.
• SAMOA: Scalable Advanced Massive Online AnalysisGianmarco De Francisci Morales, Albert Bifet; 16(Jan):149−153, 2015.
• Online Learning via Sequential ComplexitiesAlexander Rakhlin, Karthik Sridharan, Ambuj Tewari; 16(Feb):155−186, 2015.
• Learning Transformations for Clustering and ClassificationQiang Qiu, Guillermo Sapiro; 16(Feb):187−225, 2015.
• Multi-layered Gesture Recognition with KinectFeng Jiang, Shengping Zhang, Shen Wu, Yang Gao, Debin Zhao; 16(Feb):227−254, 2015.
• Multimodal Gesture Recognition via Multiple Hypotheses RescoringVassilis Pitsikalis, Athanasios Katsamanis, Stavros Theodorakis, Petros Maragos; 16(Feb):255−284, 2015.
• An Asynchronous Parallel Stochastic Coordinate Descent AlgorithmJi Liu, Stephen J. Wright, Christopher Ré, Victor Bittorf, Srikrishna Sridhar; 16(Feb):285−322, 2015.
• AD3: Alternating Directions Dual Decomposition for MAP Inference in Graphical ModelsAndré F. T. Martins, Mário A. T. Figueiredo, Pedro M. Q. Aguiar, Noah A. Smith, Eric P. Xing; 16(Mar):495−545, 2015.
• Composite Self-Concordant MinimizationQuoc Tran-Dinh, Anastasios Kyrillidis, Volkan Cevher; 16(Mar):371−416, 2015.
• Geometric Intuition and Algorithms for Ev–SVMAlvaro Barbero, Akiko Takeda, Jorge López; 16(Mar):323−369, 2015.
• Iterative and Active Graph Clustering Using Trace Norm Minimization Without Cluster Size ConstraintsNir Ailon, Yudong Chen, Huan Xu; 16(Mar):455−490, 2015.
• The flare Package for High Dimensional Linear Regression and Precision Matrix Estimation in RXingguo Li, Tuo Zhao, Xiaoming Yuan, Han Liu; 16(Mar):553−557, 2015.
• Introducing CURRENNT: The Munich Open-Source CUDA RecurREnt Neural Network ToolkitFelix Weninger; 16(Mar):547−551, 2015.
• A Classification Module for Genetic Programming Algorithms in JCLECAlberto Cano, José María Luna, Amelia Zafra, Sebastián Ventura; 16(Mar):491−494, 2015.
• Generalized Hierarchical Kernel LearningPratik Jawanpuria, Jagarlapudi Saketha Nath, Ganesh Ramakrishnan; 16(Mar):617−652, 2015.
• Regularized M-estimators with Nonconvexity: Statistical and Algorithmic Theory for Local OptimaPo-Ling Loh, Martin J. Wainwright; 16(Mar):559−616, 2015.
• Network Granger Causality with Inherent Grouping StructureSumanta Basu, Ali Shojaie, George Michailidis; 16(Mar):417−453, 2015.
• Discrete Restricted Boltzmann MachinesGuido Montúfar, Jason Morton; 16(Apr):653−672, 2015.
• Response-Based Approachability with Applications to Generalized No-Regret ProblemsAndrey Bernstein, Nahum Shimkin; 16(Apr):747−773, 2015.
• A Compression Technique for Analyzing Disagreement-Based Active LearningYair Wiener, Steve Hanneke, Ran El-Yaniv; 16(Apr):713−745, 2015.
• Evolving GPU Machine CodeCleomar Pereira da Silva, Douglas Mota Dias, Cristiana Bentes, Marco Aurélio Cavalcanti Pacheco, Leandro Fontoura Cupertino; 16(Apr):673−712, 2015.
• Risk Bounds for the Majority Vote: From a PAC-Bayesian Analysis to a Learning AlgorithmPascal Germain, Alexandre Lacasse, Francois Laviolette, Mario Marchand, Jean-Francis Roy; 16(Apr):787−860, 2015.
• A Statistical Perspective on Algorithmic LeveragingPing Ma, Michael W. Mahoney, Bin Yu; 16(Apr):861−911, 2015.
• Distributed Matrix Completion and Robust FactorizationLester Mackey, Ameet Talwalkar, Michael I. Jordan; 16(Apr):913−960, 2015.
• Strong Consistency of the Prototype Based Clustering in Probabilistic SpaceVladimir Nikulin; 16(Apr):775−785, 2015.

Animated US Hexbin Map of the Avian Flu Outbreak
The recent announcement of the start of egg rationing in the U.S. made me curious enough about the avian flu outbreak to try to dig into the numbers a bit. I finally stumbled upon a USDA site that had an embedded HTML table of flock outbreak statistics by state, county and date (also flock type and whether it was a commercial enterprise or ‘backyard’ farm). Just looking at the sum of flock sizes on that page shows that nearly 50 million birds have been impacted since December, 2014.

IPython Markdown Opportunities in IPython Notebooks and Rstudio
One of the reasons I started working on the Wrangling F1 Data With R book was to see what the Rmd (RMarkdown) workflow was like. Rmd allows you to combine markdown and R code in the same document, as well as executing the code blocks and then displaying the results of that code execution inline in the output document.

Who interacts on Twitter during a conference
Organised annually since 1970 by the French Society of Statistics (SFdS), the Journées de Statistique (JdS) are the most important scientific event of the French statistical community. More than 400 researchers, teachers and practitioners meet at each edition. In 2015, JDS took place in Lille, in France. SFdS regularly tweets (with the account @Statfr) and for the first year a live-tweet was organized during JdS. The Hashtag was #JDSLille. The aim of this post is a (brief) statistical analysis of the live-tweet.

Bringing Deep Learning to the Grocery Store
Have you ever found yourself in a grocery store curious to know more about the food you’re about to buy? For instance, what is the carbon footprint of your favorite brand of yogurt? Or if that yogurt goes well with strawberries more than it does with bananas? Wouldn’t it be neat if technology could provide auxiliary information about our food in real-time so we could make more informed decisions about what we buy?