Semantic View Selection
An understanding of the nature of objects could help robots to solve both high-level abstract tasks and improve performance at lower-level concrete tasks. Although deep learning has facilitated progress in image understanding, a robot’s performance in problems like object recognition often depends on the angle from which the object is observed. Traditionally, robot sorting tasks rely on a fixed top-down view of an object. By changing its viewing angle, a robot can select a more semantically informative view leading to better performance for object recognition. In this paper, we introduce the problem of semantic view selection, which seeks to find good camera poses to gain semantic knowledge about an observed object. We propose a conceptual formulation of the problem, together with a solvable relaxation based on clustering. We then present a new image dataset consisting of around 10k images representing various views of 144 objects under different poses. Finally we use this dataset to propose a first solution to the problem by training a neural network to predict a ‘semantic score’ from a top view image and camera pose. The views predicted to have higher scores are then shown to provide better clustering results than fixed top-down views. …

Missingness-Aware Temporal Convolutional Hitting-time Network (MATCH-Net)
Accurate prediction of disease trajectories is critical for early identification and timely treatment of patients at risk. Conventional methods in survival analysis are often constrained by strong parametric assumptions and limited in their ability to learn from high-dimensional data, while existing neural network models are not readily-adapted to the longitudinal setting. This paper develops a novel convolutional approach that addresses these drawbacks. We present MATCH-Net: a Missingness-Aware Temporal Convolutional Hitting-time Network, designed to capture temporal dependencies and heterogeneous interactions in covariate trajectories and patterns of missingness. To the best of our knowledge, this is the first investigation of temporal convolutions in the context of dynamic prediction for personalized risk prognosis. Using real-world data from the Alzheimer’s Disease Neuroimaging Initiative, we demonstrate state-of-the-art performance without making any assumptions regarding underlying longitudinal or time-to-event processes attesting to the model’s potential utility in clinical decision support. …

Windowed Fourier Filtering
Interferometric phase (InPhase) imaging is an important part of many present-day coherent imaging technologies. Often in such imaging techniques, the acquired images, known as interferograms, suffer from two major degradations: 1) phase wrapping caused by the fact that the sensing mechanism can only measure sinusoidal $2\pi$-periodic functions of the actual phase, and 2) noise introduced by the acquisition process or the system. This work focusses on InPhase denoising which is a fundamental restoration step to many posterior applications of InPhase, namely to phase unwrapping. The presence of sharp fringes that arises from phase wrapping makes InPhase denoising a hard-inverse problem. Motivated by the fact that the InPhase images are often locally sparse in Fourier domain, we propose a multi-resolution windowed Fourier filtering (WFF) analysis that fuses WFF estimates with different resolutions, thus overcoming the WFF fixed resolution limitation. The proposed fusion relies on an unbiased estimate of the mean square error derived using the Stein’s lemma adapted to complex-valued signals. This estimate, known as SURE, is minimized using an optimization framework to obtain the fusion weights. Strong experimental evidence, using synthetic and real (InSAR & MRI) data, that the developed algorithm, termed as SURE-fuse WFF, outperforms the best hand-tuned fixed resolution WFF as well as other state-of-the-art InPhase denoising algorithms, is provided. …

Reinforced Encoder-Decoder (RED)
Action anticipation aims to detect an action before it happens. Many real world applications in robotics and surveillance are related to this predictive capability. Current methods address this problem by first anticipating visual representations of future frames and then categorizing the anticipated representations to actions. However, anticipation is based on a single past frame’s representation, which ignores the history trend. Besides, it can only anticipate a fixed future time. We propose a Reinforced Encoder-Decoder (RED) network for action anticipation. RED takes multiple history representations as input and learns to anticipate a sequence of future representations. One salient aspect of RED is that a reinforcement module is adopted to provide sequence-level supervision; the reward function is designed to encourage the system to make correct predictions as early as possible. We test RED on TVSeries, THUMOS-14 and TV-Human-Interaction datasets for action anticipation and achieve state-of-the-art performance on all datasets. …