Evolutionary Stochastic Gradient Descent (ESGD) google
We propose a population-based Evolutionary Stochastic Gradient Descent (ESGD) framework for optimizing deep neural networks. ESGD combines SGD and gradient-free evolutionary algorithms as complementary algorithms in one framework in which the optimization alternates between the SGD step and evolution step to improve the average fitness of the population. With a back-off strategy in the SGD step and an elitist strategy in the evolution step, it guarantees that the best fitness in the population will never degrade. In addition, individuals in the population optimized with various SGD-based optimizers using distinct hyper-parameters in the SGD step are considered as competing species in a coevolution setting such that the complementarity of the optimizers is also taken into account. The effectiveness of ESGD is demonstrated across multiple applications including speech recognition, image recognition and language modeling, using networks with a variety of deep architectures. …

Interaction-Aware Factorization Machine (IFM) google
Factorization Machine (FM) is a widely used supervised learning approach by effectively modeling of feature interactions. Despite the successful application of FM and its many deep learning variants, treating every feature interaction fairly may degrade the performance. For example, the interactions of a useless feature may introduce noises; the importance of a feature may also differ when interacting with different features. In this work, we propose a novel model named \emph{Interaction-aware Factorization Machine} (IFM) by introducing Interaction-Aware Mechanism (IAM), which comprises the \emph{feature aspect} and the \emph{field aspect}, to learn flexible interactions on two levels. The feature aspect learns feature interaction importance via an attention network while the field aspect learns the feature interaction effect as a parametric similarity of the feature interaction vector and the corresponding field interaction prototype. IFM introduces more structured control and learns feature interaction importance in a stratified manner, which allows for more leverage in tweaking the interactions on both feature-wise and field-wise levels. Besides, we give a more generalized architecture and propose Interaction-aware Neural Network (INN) and DeepIFM to capture higher-order interactions. To further improve both the performance and efficiency of IFM, a sampling scheme is developed to select interactions based on the field aspect importance. The experimental results from two well-known datasets show the superiority of the proposed models over the state-of-the-art methods. …

Self-Taught Learning google
Self-taught learning is a new paradigm in machine learning introduced by Stanford researchers in 2007. ‘Self-taught Learning: Transfer Learning from Unlabeled Data’. Proceedings of the 24th International Conference on Machine Learning, Corvallis, OR.</ref>. The full paper can be found here. It builds on ideas from existing supervised, semi-supervised and transfer learning algorithms. The differences between these methods depend on the usage of labeled and unlabeled data (Figure 1):
• Supervised Learning – All data is labeled and of the same type (shares the same class labels).
• Semi-supervised learning – Only some of the data is labeled but it is all of the same class. One drawback is that acquiring unlabeled data of the same class is often difficult and/or expensive.
• Transfer learning – All data is labeled but some is of another type (i.e. has class labels that do not apply to data set that we wish to classify).
Self-taught learning combines the latter two ideas. It uses labeled data belonging to the desired classes and unlabeled data from other, somehow similar, classes. It is important to emphasize that the unlabeled data need not belong to the class labels we wish to assign, as long as it is related. This fact distinguishes it from semi-supervised learning. Since it uses unlabeled data from new classes, it can be thought of as semi-supervised transfer learning.
Self-taught learning is a technique that uses a large number of unlabeled data as source samples to improve the task performance on target samples. Compared with other transfer learning techniques, self-taught learning can be applied to a broader set of scenarios due to the loose restrictions on source data. However, knowledge transferred from source samples that are not sufficiently related to the target domain may negatively influence the target learner, which is referred to as negative transfer.
Autoencoder Based Sample Selection for Self-Taught Learning

Collaborative Human-AI (CHAI) google
Automated dermoscopic image analysis has witnessed rapid growth in diagnostic performance. Yet adoption faces resistance, in part, because no evidence is provided to support decisions. In this work, an approach for evidence-based classification is presented. A feature embedding is learned with CNNs, triplet-loss, and global average pooling, and used to classify via kNN search. Evidence is provided as both the discovered neighbors, as well as localized image regions most relevant to measuring distance between query and neighbors. To ensure that results are relevant in terms of both label accuracy and human visual similarity for any skill level, a novel hierarchical triplet logic is implemented to jointly learn an embedding according to disease labels and non-expert similarity. Results are improved over baselines trained on disease labels alone, as well as standard multiclass loss. Quantitative relevance of results, according to non-expert similarity, as well as localized image regions, are also significantly improved. …