APES google
Assisted by neural networks, reinforcement learning agents have been able to solve increasingly complex tasks over the last years. The simulation environment in which the agents interact is an essential component in any reinforcement learning problem. The environment simulates the dynamics of the agents’ world and hence provides feedback to their actions in terms of state observations and external rewards. To ease the design and simulation of such environments this work introduces $\texttt{APES}$, a highly customizable and open source package in Python to create 2D grid-world environments for reinforcement learning problems. $\texttt{APES}$ equips agents with algorithms to simulate any field of vision, it allows the creation and positioning of items and rewards according to user-defined rules, and supports the interaction of multiple agents. …

Counterfactual Regret Minimization (CRM) google
CFR is a self-play algorithm: it learns to play a game by repeatedly playing against itself. The program starts off with a strategy that is uniformly random, where it will play every action at every decision point with an equal probability. It then simulates playing games against itself. After every game, it revisits its decisions, and finds ways to improve its strategy. It repeats this process for billions of games, improving its strategy each time. As it plays, it gets closer and closer towards an optimal strategy for the game: a strategy that can do no worse than tie against any opponent. The way it improves over time is by summing the total amount of regret it has for each action at each decision point, where regret means: how much better would I have done over all the games so far if I had just always played this one action at this decision, instead of choosing whatever mixture over actions that my strategy said I should use? Positive regret means that we would have done better if we had taken that action more often. Negative regret means that we would have done better by not taking that action at all. After each game that the program plays against itself, it computes and adds in the new regret values for all of its decisions it just made. It then recomputes its strategy so that it takes actions with probabilities proportional to their positive regret. If an action would have been good in the past, then it will choose it more often in the future. …

Highly Missing Data Lasso (HMLasso) google
Sparse regression such as Lasso has achieved great success in dealing with high dimensional data for several decades. However, there are few methods applicable to missing data, which often occurs in high dimensional data. Recently, CoCoLasso was proposed to deal with high dimensional missing data, but it still suffers from highly missing data. In this paper, we propose a novel Lasso-type regression technique for Highly Missing data, called `HMLasso’. We use the mean imputed covariance matrix, which is notorious in general due to its estimation bias for missing data. However, we effectively incorporate it into Lasso, by using a useful connection with the pairwise covariance matrix. The resulting optimization problem can be seen as a weighted modification of CoCoLasso with the missing ratios, and is quite effective for highly missing data. To the best of our knowledge, this is the first method that can efficiently deal with both high dimensional and highly missing data. We show that the proposed method is beneficial with regards to non-asymptotic properties of the covariance matrix. Numerical experiments show that the proposed method is highly advantageous in terms of estimation error and generalization error. …