Temporal Dependency Network (TDN) google
While conventional methods for sequential learning focus on interaction between consecutive inputs, we suggest a new method which captures composite semantic flows with variable-length dependencies. In addition, the semantic structures within given sequential data can be interpreted by visualizing temporal dependencies learned from the method. The proposed method, called Temporal Dependency Network (TDN), represents a video as a temporal graph whose node represents a frame of the video and whose edge represents the temporal dependency between two frames of a variable distance. The temporal dependency structure of semantic is discovered by learning parameterized kernels of graph convolutional methods. We evaluate the proposed method on the large-scale video dataset, Youtube-8M. By visualizing the temporal dependency structures as experimental results, we show that the suggested method can find the temporal dependency structures of video semantic. …

Table2Answer google
Semantic parsing is the task of mapping natural language to logic form. In question answering, semantic parsing can be used to map the question to logic form and execute the logic form to get the answer. One key problem for semantic parsing is the hard label work. We study this problem in another way: we do not use the logic form any more. Instead we only use the schema and answer info. We think that the logic form step can be injected into the deep model. The reason why we think removing the logic form step is possible is that human can do the task without explicit logic form. We use BERT-based model and do the experiment in the WikiSQL dataset, which is a large natural language to SQL dataset. Our experimental evaluations that show that our model can achieves the baseline results in WikiSQL dataset. …

Multiple Instance Spatial Transformer Network (MIST) google
We propose a deep network that can be trained to tackle image reconstruction and classification problems that involve detection of multiple object instances, without any supervision regarding their whereabouts. The network learns to extract the most significant top-K patches, and feeds these patches to a task-specific network — e.g., auto-encoder or classifier — to solve a domain specific problem. The challenge in training such a network is the non-differentiable top-K selection process. To address this issue, we lift the training optimization problem by treating the result of top-K selection as a slack variable, resulting in a simple, yet effective, multi-stage training. Our method is able to learn to detect recurrent structures in the training dataset by learning to reconstruct images. It can also learn to localize structures when only knowledge on the occurrence of the object is provided, and in doing so it outperforms the state-of-the-art. …

Advertisements