With the recent substantial growth of media such as YouTube, a considerable number of instructional videos covering a wide variety of tasks are available online. Therefore, online instructional videos have become a rich resource for humans to learn everyday skills. In order to improve the effectiveness of the learning with instructional video, observation and evaluation of the activity are required. However, it is difficult to observe and evaluate every activity steps by expert. In this study, a novel deep learning framework which targets human activity evaluation for learning from instructional video has been proposed. In order to deal with the inherent variability of activities, we propose to model activity as a structured process. First, action units are encoded from dense trajectories with LSTM network. The variable-length action unit features are then evaluated by a Siamese LSTM network. By the comparative experiments on public dataset, the effectiveness of the proposed method has been demonstrated. EvaluationNet: Can Human Skill be Evaluated by Deep Networks?