Interactive Document for Working with Association Rule Mining Analysis (ASSOCShiny)
An interactive document on the topic of association rule mining analysis using ‘rmarkdown’ and ‘shiny’ packages. Runtime examples are provided in the p …
Poisson Lognormal Models (PLNmodels)
The Poisson-lognormal model and variants can be used for a variety of multivariate problems when count data are at play, including principal component …
Generate tables and plots to get summaries of data (glancedata)
Generate data frames for all the variables with some summaries and also some plots for numerical variables. Several functions from the ‘tidyverse’ and …
Cartography for Statistical Analysis (oceanis)
Creating maps for statistical analysis such as proportional circles, chroropleth, typology and flows. Some functions use ‘shiny’ or ‘leaflet’ technolog …
Log Rotation and Conditional Backups (rotor)
Conditionally rotate or back-up files based on their size or the date of the last backup; inspired by the ‘Linux’ utility ‘logrotate’.
Easily Visualize Data from ‘ERDDAP’ Servers via the ‘rerddap’ Package (plotdap)
Easily visualize and animate ‘tabledap’ and ‘griddap’ objects obtained via the ‘rerddap’ package in a simple one-line command, using either base graphi …
A neuroscience method to understanding the brain is to find and study the preferred stimuli that highly activate an individual cell or groups of cells. Recent advances in machine learning enable a family of methods to synthesize preferred stimuli that cause a neuron in an artificial or biological brain to fire strongly. Those methods are known as Activation Maximization (AM) or Feature Visualization via Optimization. In this chapter, we (1) review existing AM techniques in the literature; (2) discuss a probabilistic interpretation for AM; and (3) review the applications of AM in debugging and explaining networks. Understanding Neural Networks via Feature Visualization: A survey
“In Management, Stupidity Is An Advantage.” Justin Locke ( January 30, 2015 )
Rule #1: Don’t be afraid to launch a product without machine learning.
Rule #2: First, design and implement metrics.
Rule #3: Choose machine learning over a complex heuristic.
Rule #4: Keep the first model simple and get the infrastructure right.
Rule #5: Test the infrastructure independently from the machine learning.
Rule #6: Be careful about dropped data when copying pipelines.
Rule #7: Turn heuristics into features, or handle them externally.
Rule #8: Know the freshness requirements of your system.
Rule #9: Detect problems before exporting models.
Rule #10: Watch for silent failures.
Rule #11: Give feature columns owners and documentation.
Rule #12: Don’t overthink which objective you choose to directly optimize.
Rule #13: Choose a simple, observable and attributable metric for your first objective.
Rule #14: Starting with an interpretable model makes debugging easier.
Rule #15: Separate Spam Filtering and Quality Ranking in a Policy Layer.
Rule #16: Plan to launch and iterate.
Rule #17: Start with directly observed and reported features as opposed to learned features.
Rule #18: Explore with features of content that generalize across contexts.
Rule #19: Use very specific features when you can.
Rule #20: Combine and modify existing features to create new features in human-understandable ways.
Rule #21: The number of feature weights you can learn in a linear model is roughly proportional to the amount of data you have.
Rule #22: Clean up features you are no longer using.
Rule #23: You are not a typical end user.
Rule #24: Measure the delta between models.
Rule #25: When choosing models, utilitarian performance trumps predictive power.
Rule #26: Look for patterns in the measured errors, and create new features.
Rule #27: Try to quantify observed undesirable behavior.
Rule #28: Be aware that identical short-term behavior does not imply identical long-term behavior.
Rule #29: The best way to make sure that you train like you serve is to save the set of features used at serving time, and then pipe those features to a log to use them at training time.
Rule #30: Importance-weight sampled data, don’t arbitrarily drop it!
Rule #31: Beware that if you join data from a table at training and serving time, the data in the table may change.
Rule #32: Re-use code between your training pipeline and your serving pipeline whenever possible.
Rule #33: If you produce a model based on the data until January 5th, test the model on the data from January 6th and after.
Rule #34: In binary classification for filtering (such as spam detection or determining interesting emails), make small short-term sacrifices in performance for very clean data.
Rule #35: Beware of the inherent skew in ranking problems.
Rule #36: Avoid feedback loops with positional features.
Rule #37: Measure Training/Serving Skew.
Rule #38: Don’t waste time on new features if unaligned objectives have become the issue.
Rule #39: Launch decisions are a proxy for long-term product goals.
Rule #40: Keep ensembles simple.
Rule #41: When performance plateaus, look for qualitatively new sources of information to add rather than refining existing signals.
Rule #42: Don’t expect diversity, personalization, or relevance to be as correlated with popularity as you think they are.
Rule #43: Your friends tend to be the same across different products. Your interests tend not to be.
Kernel Convolution (kervolution)
Convolutional neural networks (CNNs) have enabled the state-of-the-art performance in many computer vision tasks. However, little effort has been devoted to establishing convolution in non-linear space. Existing works mainly leverage on the activation layers, which can only provide point-wise non-linearity. To solve this problem, a new operation, kervolution (kernel convolution), is introduced to approximate complex behaviors of human perception systems leveraging on the kernel trick. It generalizes convolution, enhances the model capacity, and captures higher order interactions of features, via patch-wise kernel functions, but without introducing additional parameters. Extensive experiments show that kervolutional neural networks (KNN) achieve higher accuracy and faster convergence than baseline CNN. …
Hyperbolic Attention Network
We introduce hyperbolic attention networks to endow neural networks with enough capacity to match the complexity of data with hierarchical and power-law structure. A few recent approaches have successfully demonstrated the benefits of imposing hyperbolic geometry on the parameters of shallow networks. We extend this line of work by imposing hyperbolic geometry on the activations of neural networks. This allows us to exploit hyperbolic geometry to reason about embeddings produced by deep networks. We achieve this by re-expressing the ubiquitous mechanism of soft attention in terms of operations defined for hyperboloid and Klein models. Our method shows improvements in terms of generalization on neural machine translation, learning on graphs and visual question answering tasks while keeping the neural representations compact. …
Recurrent Predictive State Policy Network (RPSP)
We introduce Recurrent Predictive State Policy (RPSP) networks, a recurrent architecture that brings insights from predictive state representations to reinforcement learning in partially observable environments. Predictive state policy networks consist of a recursive filter, which keeps track of a belief about the state of the environment, and a reactive policy that directly maps beliefs to actions, to maximize the cumulative reward. The recursive filter leverages predictive state representations (PSRs) (Rosencrantz and Gordon, 2004; Sun et al., 2016) by modeling predictive state– a prediction of the distribution of future observations conditioned on history and future actions. This representation gives rise to a rich class of statistically consistent algorithms (Hefny et al., 2018) to initialize the recursive filter. Predictive state serves as an equivalent representation of a belief state. Therefore, the policy component of the RPSP-network can be purely reactive, simplifying training while still allowing optimal behaviour. Moreover, we use the PSR interpretation during training as well, by incorporating prediction error in the loss function. The entire network (recursive filter and reactive policy) is still differentiable and can be trained using gradient based methods. We optimize our policy using a combination of policy gradient based on rewards (Williams, 1992) and gradient descent based on prediction error. We show the efficacy of RPSP-networks under partial observability on a set of robotic control tasks from OpenAI Gym. We empirically show that RPSP-networks perform well compared with memory-preserving networks such as GRUs, as well as finite memory models, being the overall best performing method. …
ggplot2′ Based Tool to Facilitate Diagnostic Plots for NLME Models (ggPMX)
At Novartis, we aimed at standardizing the set of diagnostic plots used for modeling activities in order to reduce the overall effort required for gene …
A Tidy Wrapper Around ‘gtrendsR’ (trendyy)
Access Google Trends information. This package provides a tidy wrapper to the ‘gtrendsR’ package. Use four spaces when indenting paragraphs within the …
Plotting, Smoothing and Growth Trait Extraction for Longitudinal Data (growthPheno)
Assists in producing longitudinal or profile plots of measured traits. These allow checks to be made for anomalous data and growth patterns in the data …
Bayesian Inference for Multinomial Models with Inequality Constraints (multinomineq)
Implements Gibbs sampling and Bayes factors for multinomial models with linear inequality constraints on the vector of probability parameters. As speci …
Regression Models and Utilities for Repeated Measures and Panel Data (panelr)
Provides an object type and associated tools for storing and wrangling panel data. Implements several methods for creating regression models that take …
Simulating Pollen Curves from Virtual Taxa with Different Life and Niche Traits (virtualPollen)
Tools to generate virtual environmental drivers with a given temporal autocorrelation, and to simulate pollen curves at annual resolution over millenni …
In the era of big data, analysts usually explore various statistical models or machine learning methods for observed data in order to facilitate scientific discoveries or gain predictive power. Whatever data and fitting procedures are employed, a crucial step is to select the most appropriate model or method from a set of candidates. Model selection is a key ingredient in data analysis for reliable and reproducible statistical inference or prediction, and thus central to scientific studies in fields such as ecology, economics, engineering, finance, political science, biology, and epidemiology. There has been a long history of model selection techniques that arise from researches in statistics, information theory, and signal processing. A considerable number of methods have been proposed, following different philosophies and exhibiting varying performances. The purpose of this article is to bring a comprehensive overview of them, in terms of their motivation, large sample performance, and applicability. We provide integrated and practically relevant discussions on theoretical properties of state-of- the-art model selection approaches. We also share our thoughts on some controversial views on the practice of model selection. Model Selection Techniques — An Overview