Neural Net Topology Profiler (NTP) google
Performance of end-to-end neural networks on a given hardware platform is a function of its compute and memory signature, which in-turn, is governed by a wide range of parameters such as topology size, primitives used, framework used, batching strategy, latency requirements, precision etc. Current benchmarking tools suffer from limitations such as a) being either too granular like DeepBench (or) b) mandate a working implementation that is either framework specific or hardware-architecture specific (or) c) provide only high level benchmark metrics. In this paper, we present NTP (Neural Net Topology Profiler), a sophisticated benchmarking framework, to effectively identify memory and compute signature of an end-to-end topology on multiple hardware architectures, without the need to actually implement the topology in a framework. NTP is tightly integrated with hardware specific benchmark tools to enable exhaustive data collection and analysis. Using NTP, a deep learning researcher can quickly establish baselines needed to understand performance of an end-to-end neural network topology and make high level architectural decisions based on optimization techniques like layer sizing, quantization, pruning etc. Further, integration of NTP with frameworks like Tensorflow, Pytorch, Intel OpenVINO etc. allows for performance comparison along several vectors like a) Comparison of different frameworks on a given hardware b) Comparison of different hardware using a given framework c) Comparison across different heterogeneous hardware configurations for given framework etc. These capabilities empower a researcher to effortlessly make architectural decisions needed for achieving optimized performance on any hardware platform. The paper documents the architectural approach of NTP and demonstrates the capabilities of the tool by benchmarking Mozilla DeepSpeech, a popular Speech Recognition topology. …

Deep Asymmetric Multitask Feature Learning (Deep-AMTFL) google
We propose Deep Asymmetric Multitask Feature Learning (Deep-AMTFL) which can learn deep representations shared across multiple tasks while effectively preventing negative transfer that may happen in the feature sharing process. Specifically, we introduce an asymmetric autoencoder term that allows predictors for the confident tasks to have high contribution to the feature learning while suppressing the influences of less confident task predictors. This allows learning less noisy representations, and allows weak predictors to exploit knowledge from the strong predictors via the shared latent features. Such asymmetric knowledge transfer through shared features is also more scalable and efficient than inter-task asymmetric transfer. We validate our Deep-AMTFL model on multiple benchmark datasets for multitask learning and image classification, on which it significantly outperforms existing symmetric and asymmetric multitask learning models, by effectively preventing negative transfer in deep feature learning. …

Upper Edge Cover google
Optimization problems consist of either maximizing or minimizing an objective function. Instead of looking for a maximum solution (resp. minimum solution), one can find a minimum maximal solution (resp. maximum minimal solution). Such ‘flipping’ of the objective function was done for many classical optimization problems. For example, Minimum Vertex Cover becomes Maximum Minimal Vertex Cover, Maximum Independent Set becomes Minimum Maximal Independent Set and so on. In this paper, we propose to study the weighted version of Maximum Minimal Edge Cover called Upper Edge Cover, a problem having application in the genomic sequence alignment. It is well-known that Minimum Edge Cover is polynomial-time solvable and the ‘flipped’ version is NP-hard, but constant approximable. We show that the weighted Upper Edge Cover is much more difficult than Upper Edge Cover because it is not $O(\frac{1}{n^{1/2-\varepsilon}})$ approximable, nor $O(\frac{1}{\Delta^{1-\varepsilon}})$ in edge-weighted graphs of size $n$ and maximum degree $\Delta$ respectively. Indeed, we give some hardness of approximation results for some special restricted graph classes such as bipartite graphs, split graphs and $k$-trees. We counter-balance these negative results by giving some positive approximation results in specific graph classes. …

Variable Size based Fixed Passes Combined-counting (VFPC) google
Many techniques have been proposed to implement the Apriori algorithm on MapReduce framework but only a few have focused on performance improvement. FPC (Fixed Passes Combined-counting) and DPC (Dynamic Passes Combined-counting) algorithms combine multiple passes of Apriori in a single MapReduce phase to reduce the execution time. In this paper, we propose improved MapReduce based Apriori algorithms VFPC (Variable Size based Fixed Passes Combined-counting) and ETDPC (Elapsed Time based Dynamic Passes Combined-counting) over FPC and DPC. Further, we optimize the multi-pass phases of these algorithms by skipping pruning step in some passes, and propose Optimized-VFPC and Optimized-ETDPC algorithms. Quantitative analysis reveals that counting cost of additional un-pruned candidates produced due to skipped-pruning is less significant than reduction in computation cost due to the same. Experimental results show that VFPC and ETDPC are more robust and flexible than FPC and DPC whereas their optimized versions are more efficient in terms of execution time. …