Equilibrium-Independent Passivity-Short System (EIPS) google
Maximal equilibrium-independent passivity (MEIP) is a recently introduced system property which has acquired special attention in the study of networked dynamical systems. MEIP requires a system to be passive with respect to any forced equilibrium configuration and the associated steady-state input-output map must be maximally monotone. In practice, however, most of the systems are not well behaved and possess shortage of passivity or non-passiveness in their operation. In this paper, we consider a class of passivity-short systems, namely equilibrium-independent passivity-short (EIPS) systems, and presents an input-output transformation based generalized passivation approach to ensure their MEIP properties. We characterize the steady-state input-output relations of the EIPS systems and establish their connection with that of the transformed MEIP systems. We further study the diffusively-coupled networked interactions of such EIPS systems and explore their connection to a pair of dual network optimization problems, under the proposed matrix transformation. A simulation example is given to illustrate the theoretical results. …

Balanced Linear Contextual Bandits google
Contextual bandit algorithms are sensitive to the estimation method of the outcome model as well as the exploration method used, particularly in the presence of rich heterogeneity or complex outcome models, which can lead to difficult estimation problems along the path of learning. We develop algorithms for contextual bandits with linear payoffs that integrate balancing methods from the causal inference literature in their estimation to make it less prone to problems of estimation bias. We provide the first regret bound analyses for linear contextual bandits with balancing and show that our algorithms match the state of the art theoretical guarantees. We demonstrate the strong practical advantage of balanced contextual bandits on a large number of supervised learning datasets and on a synthetic example that simulates model misspecification and prejudice in the initial training data. …

Curiosity-Driven Prioritization google
In Reinforcement Learning (RL), an agent explores the environment and collects trajectories into the memory buffer for later learning. However, the collected trajectories can easily be imbalanced with respect to the achieved goal states. The problem of learning from imbalanced data is a well-known problem in supervised learning, but has not yet been thoroughly researched in RL. To address this problem, we propose a novel Curiosity-Driven Prioritization (CDP) framework to encourage the agent to over-sample those trajectories that have rare achieved goal states. The CDP framework mimics the human learning process and focuses more on relatively uncommon events. We evaluate our methods using the robotic environment provided by OpenAI Gym. The environment contains six robot manipulation tasks. In our experiments, we combined CDP with Deep Deterministic Policy Gradient (DDPG) with or without Hindsight Experience Replay (HER). The experimental results show that CDP improves both performance and sample-efficiency of reinforcement learning agents, compared to state-of-the-art methods. …

Inducibility google
The quantity that captures the asymptotic value of the maximum number of appearances of a given topological tree (a rooted tree with no vertices of outdegree $1$) $S$ with $k$ leaves in an arbitrary tree with sufficiently large number of leaves is called the inducibility of $S$. …