JMLR Volume 16

• Statistical Decision Making for Optimal Budget Allocation in Crowd Labeling
• Simultaneous Pursuit of Sparseness and Rank Structures for Matrix Decomposition
• Statistical Topological Data Analysis using Persistence Landscapes
• Links Between Multiplicity Automata, Observable Operator Models and Predictive State Representations — a Unified Learning Framework
• SAMOA: Scalable Advanced Massive Online Analysis
• Online Learning via Sequential Complexities
• Learning Transformations for Clustering and Classification
• Multi-layered Gesture Recognition with Kinect
• Multimodal Gesture Recognition via Multiple Hypotheses Rescoring
• An Asynchronous Parallel Stochastic Coordinate Descent Algorithm
• Geometric Intuition and Algorithms for Ev–SVM
• Composite Self-Concordant Minimization
• Network Granger Causality with Inherent Grouping Structure
• Iterative and Active Graph Clustering Using Trace Norm Minimization Without Cluster Size Constraints
• A Classification Module for Genetic Programming Algorithms in JCLEC
• AD3: Alternating Directions Dual Decomposition for MAP Inference in Graphical Models
• Introducing CURRENNT: The Munich Open-Source CUDA RecurREnt Neural Network Toolkit
• The flare Package for High Dimensional Linear Regression and Precision Matrix Estimation in R
• Regularized M-estimators with Nonconvexity: Statistical and Algorithmic Theory for Local Optima
• Generalized Hierarchical Kernel Learning
• Discrete Restricted Boltzmann Machines
• Evolving GPU Machine Code
• A Compression Technique for Analyzing Disagreement-Based Active Learning
• Response-Based Approachability with Applications to Generalized No-Regret Problems
• Strong Consistency of the Prototype Based Clustering in Probabilistic Space
• Risk Bounds for the Majority Vote: From a PAC-Bayesian Analysis to a Learning Algorithm
• A Statistical Perspective on Algorithmic Leveraging
• Distributed Matrix Completion and Robust Factorization
• Combined l1 and Greedy l0 Penalized Least Squares for Linear Model Selection
• Learning with the Maximum Correntropy Criterion Induced Losses for Regression
• Joint Estimation of Multiple Precision Matrices with Common Structures
• Lasso Screening Rules via Dual Polytope Projection
• Fast Cross-Validation via Sequential Testing
• Learning the Structure and Parameters of Large-Population Graphical Games from Behavioral Data
• Local Identification of Overcomplete Dictionaries
• Encog: Library of Interchangeable Machine Learning Models for Java and C#
• Perturbed Message Passing for Constraint Satisfaction Problems
• Learning Sparse Low-Threshold Linear Classifiers
• Learning Equilibria of Games via Payoff Queries
• Rationality, Optimism and Guarantees in General Reinforcement Learning
• The Algebraic Combinatorial Approach for Low-Rank Matrix Completion
• A Comprehensive Survey on Safe Reinforcement Learning
• Second-Order Non-Stationary Online Learning for Regression
• A Finite Sample Analysis of the Naive Bayes Classifier
• Flexible High-Dimensional Classification Machines and Their Asymptotic Properties
• RLPy: A Value-Function-Based Reinforcement Learning Framework for Education and Research
• Calibrated Multivariate Regression with Application to Neural Semantic Basis Discovery
• Bayesian Nonparametric Crowdsourcing
• Approximate Modified Policy Iteration and its Application to the Game of Tetris
• Preface to this Special Issue
• V-Matrix Method of Solving Statistical Inference Problems
• Batch Learning from Logged Bandit Feedback through Counterfactual Risk Minimization
• Optimal Estimation of Low Rank Density Matrices
• Fast Rates in Statistical and Online Learning
• On the Asymptotic Normality of an Estimate of a Regression Functional
• Sharp Oracle Bounds for Monotone and Convex Regression Through Aggregation
• Exceptional Rotations of Random Graphs: A VC Theory
• Semi-Supervised Interpolation in an Anticausal Learning Scenario
• Towards an Axiomatic Approach to Hierarchical Clustering of Measures
• Predicting a Switching Sequence of Graph Labelings
• Learning Using Privileged Information: Similarity Control and Knowledge Transfer
• Alexey Chervonenkis’s Bibliography: Introductory Comments
• Alexey Chervonenkis’s Bibliography

Building a Logistic Regression model from scratch

Do you understand how does logistic regression work? If your answer is yes, I have a challenge for you to solve. Here is an extremely simple logistic problem.

Summarising with Box and Whisker plots

In the Northern Hemisphere, it is the start of the school year, and thousands of eager students are beginning their study of statistics. I know this because this is the time of year when lots of people watch my video, Types of Data. On 23rd August the hits on the video bounced up out of their holiday slumber, just as they do every year. They gradually dwindle away until the end of January when they have a second jump in popularity, I suspect at the start of the second semester.

Deep Learning Tutorial

This tutorial will teach you the main ideas of Unsupervised Feature Learning and Deep Learning. By working through it, you will also get to implement several feature learning/deep learning algorithms, get to see them work for yourself, and learn how to apply/adapt these ideas to new problems. This tutorial assumes a basic knowledge of machine learning (specifically, familiarity with the ideas of supervised learning, logistic regression, gradient descent). If you are not familiar with these ideas, we suggest you go to this Machine Learning course and complete sections II, III, IV (up to Logistic Regression) first.

Unboxing the Random Forest Classifier: The Threshold Distributions

In the Trust and Safety team at Airbnb, we use the random forest classifier in many of our risk mitigation models. Despite our successes with it, the ensemble of trees along with the random selection of features at each node makes it difficult to succinctly describe how features are being split. In this post, we propose a method to aggregate and summarize those split values by generating weighted threshold distributions.

Correlation, Causation, and Confusion

Causation has long been something of a mystery, bedeviling philosophers and scientists down through the ages. What exactly is it? How can it be measured – that is, can we assess the strength of the relationship between a cause and its effect? What does an observed association between factors – a correlation – tell us about a possible causal relationship? How do multiple factors or causes jointly influence outcomes? And does causation even exist “in the world,” as it were, or is it merely a habit of our minds, a connection we draw between two events we have observed in succession many times, as Hume famously argued? The rich philosophical literature on causation is a testament to the struggle of thinkers throughout history to develop satisfactory answers to these questions. Likewise, scientists have long wrestled with problems of causation in the face of numerous practical and theoretical impediments.


• Generate a Basket of Stocks Using any Keyword
• Explore Hidden Connections between them

Probability, Paradox, and the Reasonable Person Principle

In this notebook, we cover the basics of probability theory, and show how to implement the theory in Python. (You should have a little background in probability and Python.) Then we show how to solve some particularly perplexing paradoxical probability problems.

A Hierarchical Neural Autoencoder for Paragraphs and Documents

Natural language generation of coherent long texts like paragraphs or longer documents is a challenging problem for recurrent networks models. In this paper, we explore an important step toward this generation task: training an LSTM (Long-short term memory) auto-encoder to preserve and reconstruct multi-sentence paragraphs.

Replicating NatGeo’s “Proper” Earthquake Map in R

… with ggplot2.

SparkR quick start that works

If you’re following along the SparkR Quick Start, you’ll notice that the instructions are not consistent with a more recent build of Spark. Here are instructions that work for SparkR version 1.4.1 on Linux. YMMV on Spark 1.5.

Imputing missing data with R; MICE package

Missing data can be a not so trivial problem when analysing a dataset and accounting for it is usually not so straightforward either. If the amount of missing data is very small relatively to the size of the dataset, then leaving out the few samples with missing features may be the best strategy in order not to bias the analysis, however leaving out available datapoints deprives the data of some amount of information and depending on the situation you face, you may want to look for other fixes before wiping out potentially useful datapoints from your dataset.