N2Net  We present N2Net, a system that implements binary neural networks using commodity switching chips deployed in network switches and routers. Our system shows that these devices can run simple neural network models, whose input is encoded in the network packets’ header, at packet processing speeds (billions of packets per second). Furthermore, our experience highlights that switching chips could support even more complex models, provided that some minor and cheap modifications to the chip’s design are applied. We believe N2Net provides an interesting building block for future endtoend networked systems. 
NAIL  Interactive Fiction (IF) games are complex textual decision making problems. This paper introduces NAIL, an autonomous agent for general parserbased IF games. NAIL won the 2018 Text Adventure AI Competition, where it was evaluated on twenty unseen games. This paper describes the architecture, development, and insights underpinning NAIL’s performance. 
Naive Bayes Classifier  A naive Bayes classifier is a simple probabilistic classifier based on applying Bayes’ theorem with strong (naive) independence assumptions. A more descriptive term for the underlying probability model would be “independent feature model”. An overview of statistical classifiers is given in the article on pattern recognition. 
Naive Discriminative Learning  Naive discriminative learning implements learning and classification models based on the RescorlaWagner equations and their equilibrium equations. ndl 
Naive Probability  We describe a rational, but low resolution model of probability. 
Named Entity Extraction  ➘ “Named Entity Recognition” 
Named Entity Recognition (NER) 
Namedentity recognition (NER) (also known as entity identification, entity chunking and entity extraction) is a subtask of information extraction that seeks to locate and classify elements in text into predefined categories such as the names of persons, organizations, locations, expressions of times, quantities, monetary values, percentages, etc. Most research on NER systems has been structured as taking an unannotated block of text, such as this one: Jim bought 300 shares of Acme Corp. in 2006. And producing an annotated block of text that highlights the names of entities: Person bought 300 shares of Organization in Time. In this example, a person name consisting of one token, a twotoken company name and a temporal expression have been detected and classified. Stateoftheart NER systems for English produce nearhuman performance. For example, the best system entering MUC7 scored 93.39% of Fmeasure while human annotators scored 97.60% and 96.95%. http://…/aijwikiner.pdf 
Named Entity Recognition and Classification (NERC) 
The term ‘Named Entity’, now widely used in Natural Language Processing, was coined for the Sixth Message Understanding Conference (MUC6) (R. Grishman & Sundheim 1996). At that time, MUC was focusing on Information Extraction (IE) tasks where structured information of company activities and defense related activities is extracted from unstructured text, such as newspaper articles. In defining the task, people noticed that it is essential to recognize information units like names, including person, organization and location names, and numeric expressions including time, date, money and percent expressions. Identifying references to these entities in text was recognized as one of the important subtasks of IE and was called ‘Named Entity Recognition and Classification (NERC)’. 
Named Entity Recognizer (NER) 
Stanford NER is a Java implementation of a Named Entity Recognizer. Named Entity Recognition (NER) labels sequences of words in a text which are the names of things, such as person and company names, or gene and protein names. It comes with wellengineered feature extractors for Named Entity Recognition, and many options for defining feature extractors. Included with the download are good named entity recognizers for English, particularly for the 3 classes (PERSON, ORGANIZATION, LOCATION), and we also make available on this page various other models for different languages and circumstances, including models trained on just the CoNLL 2003 English training data. The distributional similarity features in some models improve performance but the models require considerably more memory. Stanford NER is also known as CRFClassifier. The software provides a general implementation of (arbitrary order) linear chain Conditional Random Field (CRF) sequence models. That is, by training your own models, you can actually use this code to build sequence models for any task. (CRF models were pioneered by Lafferty, McCallum, and Pereira (2001); see Sutton and McCallum (2006) or Sutton and McCallum (2010) for more comprehensible introductions.) 
Named Entity Sequence Classification (NESC) 
Named Entity Recognition (NER) aims at locating and classifying named entities in text. In some use cases of NER, including cases where detected named entities are used in creating content recommendations, it is crucial to have a reliable confidence level for the detected named entities. In this work we study the problem of finding confidence levels for detected named entities. We refer to this problem as Named Entity Sequence Classification (NESC). We frame NESC as a binary classification problem and we use NER as well as recurrent neural networks to find the probability of candidate named entity is a real named entity. We apply this approach to Tweet texts and we show how we could find named entities with high confidence levels from Tweets. 
NamedEntity Linking (NEL) 
In the legal domain it is important to differentiate between words in general, and afterwards to link the occurrences of the same entities. The topic to solve these challenges is called NamedEntity Linking (NEL). Current supervised neural networks designed for NEL use publicly available datasets for training and testing. However, this paper focuses especially on the aspect of applying transfer learning approach using networks trained for NEL to legal documents. Experiments show consistent improvement in the legal datasets that were created from the European Union law in the scope of this research. Using transfer learning approach, we reached F1score of 98.90\% and 98.01\% on the legal small and large test dataset. 
NAMSG  We introduce NAMSG, an adaptive firstorder algorithm for training neural networks. The method is efficient in computation and memory, and straightforward to implement. It computes the gradients at configurable remote observation points, in order to expedite the convergence by adjusting the step size for directions with different curvatures, in the stochastic setting. It also scales the updating vector elementwise by a nonincreasing preconditioner, to take the advantages of AMSGRAD. We analyze the convergence properties for both convex and nonconvex problems, by modeling the training process as a dynamic system, and provide a guideline to select the observation distance without grid search. We also propose a datadependent regret bound, which guarantees the convergence in the convex setting. Experiments demonstrate that NAMSG works well in practice and compares favorably to popular adaptive methods, such as ADAM, NADAM, and AMSGRAD. 
NASBench101  Recent advances in neural architecture search (NAS) demand tremendous computational resources. This makes it difficult to reproduce experiments and imposes a barriertoentry to researchers without access to largescale computation. We aim to ameliorate these problems by introducing NASBench101, the first public architecture dataset for NAS research. To build NASBench101, we carefully constructed a compact, yet expressive, search space, exploiting graph isomorphisms to identify 423k unique convolutional architectures. We trained and evaluated all of these architectures multiple times on CIFAR10 and compiled the results into a large dataset. All together, NASBench101 contains the metrics of over 5 million models, the largest dataset of its kind thus far. This allows researchers to evaluate the quality of a diverse range of models in milliseconds by querying the precomputed dataset. We demonstrate its utility by analyzing the dataset as a whole and by benchmarking a range of architecture optimization algorithms. 
NASFPN  Current stateoftheart convolutional architectures for object detection are manually designed. Here we aim to learn a better architecture of feature pyramid network for object detection. We adopt Neural Architecture Search and discover a new feature pyramid architecture in a novel scalable search space covering all crossscale connections. The discovered architecture, named NASFPN, consists of a combination of topdown and bottomup connections to fuse features across scales. NASFPN, combined with various backbone models in the RetinaNet framework, achieves better accuracy and latency tradeoff compared to stateoftheart object detection models. NASFPN improves mobile detection accuracy by 2 AP compared to stateoftheart SSDLite with MobileNetV2 model in [32] and achieves 48.3 AP which surpasses Mask RCNN [10] detection accuracy with less computation time. 
Nash Averaging  Progress in machine learning is measured by careful evaluation on problems of outstanding common interest. However, the proliferation of benchmark suites and environments, adversarial attacks, and other complications has diluted the basic evaluation model by overwhelming researchers with choices. Deliberate or accidental cherry picking is increasingly likely, and designing wellbalanced evaluation suites requires increasing effort. In this paper we take a step back and propose Nash averaging. The approach builds on a detailed analysis of the algebraic structure of evaluation in two basic scenarios: agentvsagent and agentvstask. The key strength of Nash averaging is that it automatically adapts to redundancies in evaluation data, so that results are not biased by the incorporation of easy tasks or weak agents. Nash averaging thus encourages maximally inclusive evaluation — since there is no harm (computational cost aside) from including all available tasks and agents. 
NATTACK  Powerful adversarial attack methods are vital for understanding how to construct robust deep neural networks (DNNs) and for thoroughly testing defense techniques. In this paper, we propose a blackbox adversarial attack algorithm that can defeat both vanilla DNNs and those generated by various defense techniques developed recently. Instead of searching for an ‘optimal’ adversarial example for a benign input to a targeted DNN, our algorithm finds a probability density distribution over a small region centered around the input, such that a sample drawn from this distribution is likely an adversarial example, without the need of accessing the DNN’s internal layers or weights. Our approach is universal as it can successfully attack different neural networks by a single algorithm. It is also strong; according to the testing against 2 vanilla DNNs and 13 defended ones, it outperforms stateoftheart blackbox or whitebox attack methods for most test cases. Additionally, our results reveal that adversarial training remains one of the best defense techniques, and the adversarial examples are not as transferable across defended DNNs as them across vanilla DNNs. 
Natural Language Aggregate Query (NLAQ) 
Natural language questionanswering over RDF data has received widespread attention. Although there have been several studies that have dealt with a small number of aggregate queries, they have many restrictions (i.e., interactive information, controlled question or query template). Thus far, there has been no natural language querying mechanism that can process general aggregate queries over RDF data. Therefore, we propose a framework called NLAQ (Natural Language Aggregate Query). First, we propose a novel algorithm to automatically understand a users query intention, which mainly contains semantic relations and aggregations. Second, to build a better bridge between the query intention and RDF data, we propose an extended paraphrase dictionary ED to obtain more candidate mappings for semantic relations, and we introduce a predicatetype adjacent set PT to filter out inappropriate candidate mapping combinations in semantic relations and basic graph patterns. Third, we design a suitable translation plan for each aggregate category and effectively distinguish whether an aggregate item is numeric or not, which will greatly affect the aggregate result. Finally, we conduct extensive experiments over real datasets (QALD benchmark and DBpedia), and the experimental results demonstrate that our solution is effective. 
Natural Language Generation  Natural Language Generation (NLG) is the natural language processing task of generating natural language from a machine representation system such as a knowledge base or a logical form. Psycholinguists prefer the term language production when such formal representations are interpreted as models for mental representations. It could be said an NLG system is like a translator that converts a computer based representation into a natural language representation. However, the methods to produce the final language are different from those of a compiler due to the inherent expressivity of natural languages. NLG may be viewed as the opposite of natural language understanding: whereas in natural language understanding the system needs to disambiguate the input sentence to produce the machine representation language, in NLG the system needs to make decisions about how to put a concept into words. Simple examples are systems that generate form letters. These do not typically involve grammar rules, but may generate a letter to a consumer, e.g. stating that a credit card spending limit was reached. More complex NLG systems dynamically create texts to meet a communicative goal. As in other areas of natural language processing, this can be done using either explicit models of language (e.g., grammars) and the domain, or using statistical models derived by analysing humanwritten texts. 
Natural Language Inference (NLI) 
Inference has been a central topic in artificial intelligence from the start, but while automatic methods for formal deduction have advanced tremendously, comparatively little progress has been made on the problem of natural language inference (NLI), that is, determining whether a natural language hypothesis h can justifiably be inferred from a natural language premise p. The challenges of NLI are quite different from those encountered in formal deduction: the emphasis is on informal reasoning, lexical semantic knowledge, and variability of linguistic expression. 
Natural Language Interaction (NLI) 
Natural Language Interaction (NLI) enables people to interact with any connected device using normal, everyday language. It understands the meaning of conversational input, and reacts accordingly, creating value and enhancing the user experience. 
Natural Language Interfaces for Databases (NLIDBs) 
The ability to extract insights from new data sets is critical for decision making. Visual interactive tools play an important role in data exploration since they provide nontechnical users with an effective way to visually compose queries and comprehend the results. Natural language has recently gained traction as an alternative query interface to databases with the potential to enable nonexpert users to formulate complex questions and information needs efficiently and effectively. However, understanding natural language questions and translating them accurately to SQL is a challenging task, and thus Natural Language Interfaces for Databases (NLIDBs) have not yet made their way into practical tools and commercial products. In this paper, we present DBPal, a novel data exploration tool with a natural language interface. DBPal leverages recent advances in deep models to make query understanding more robust in the following ways: First, DBPal uses a deep model to translate natural language statements to SQL, making the translation process more robust to paraphrasing and other linguistic variations. Second, to support the users in phrasing questions without knowing the database schema and the query features, DBPal provides a learned autocompletion model that suggests partial query extensions to users during query formulation and thus helps to write complex queries. 
Natural Language Processing (NLP) 
Natural language processing (NLP) is a field of computer science, artificial intelligence, and linguistics concerned with the interactions between computers and human (natural) languages. As such, NLP is related to the area of human – computer interaction. Many challenges in NLP involve natural language understanding, that is, enabling computers to derive meaning from human or natural language input, and others involve natural language generation. NLP,openNLP 
Natural Language Query  A natural language query consists only of normal terms in the user’s language, without any special syntax or format. 
Natural Language Toolkit (NLTK) 
The Natural Language Toolkit, or more commonly NLTK, is a suite of libraries and programs for symbolic and statistical natural language processing (NLP) for the Python programming language. NLTK includes graphical demonstrations and sample data. It is accompanied by a book that explains the underlying concepts behind the language processing tasks supported by the toolkit, plus a cookbook. NLTK is intended to support research and teaching in NLP or closely related areas, including empirical linguistics, cognitive science, artificial intelligence, information retrieval, and machine learning. NLTK has been used successfully as a teaching tool, as an individual study tool, and as a platform for prototyping and building research systems. http://www.nltk.org 
Natural Language Understanding (NLU) 
Natural language understanding (NLU) is a subtopic of natural language processing in artificial intelligence that deals with machine reading comprehension. NLU is considered an AIhard problem. The process of disassembling and parsing input is more complex than the reverse process of assembling output in natural language generation because of the occurrence of unknown and unexpected features in the input and the need to determine the appropriate syntactic and semantic schemes to apply to it, factors which are predetermined when outputting language.[dubious – discuss] There is considerable commercial interest in the field because of its application to newsgathering, text categorization, voiceactivation, archiving, and largescale contentanalysis. 
Natural Parameter Networks (NPN) 
Neural networks (NN) have achieved stateoftheart performance in various applications. Unfortunately in applications where training data is insufficient, they are often prone to overfitting. One effective way to alleviate this problem is to exploit the Bayesian approach by using Bayesian neural networks (BNN). Another shortcoming of NN is the lack of flexibility to customize different distributions for the weights and neurons according to the data, as is often done in probabilistic graphical models. To address these problems, we propose a class of probabilistic neural networks, dubbed naturalparameter networks (NPN), as a novel and lightweight Bayesian treatment of NN. NPN allows the usage of arbitrary exponentialfamily distributions to model the weights and neurons. Different from traditional NN and BNN, NPN takes distributions as input and goes through layers of transformation before producing distributions to match the target output distributions. As a Bayesian treatment, efficient backpropagation (BP) is performed to learn the natural parameters for the distributions over both the weights and neurons. The output distributions of each layer, as byproducts, may be used as secondorder representations for the associated tasks such as link prediction. Experiments on realworld datasets show that NPN can achieve stateoftheart performance. 
Nature Language Inference (NLI) 
Nature language inference (NLI) task is a predictive task of determining the inference relationship of a pair of natural language sentences. With the increasing popularity of NLI, many stateoftheart predictive models have been proposed with impressive performances. However, several works have noticed the statistical irregularities in the collected NLI data set that may result in an overestimated performance of these models and proposed remedies. 
Nauta  Nauta provides a multiuser, distributed computing environment for running DL model training experiments on Intel® Xeon® Scalable processorbased systems. Results can be viewed and monitored using a command line interface, web UI and/or TensorBoard*. Developers can use existing data sets, proprietary data, or downloaded data from online sources, and create public or private folders to make collaboration among teams easier. For scalability and ease of management, Nauta uses components from the industryleading Kubernetes* orchestration system, leveraging Kubeflow*, and Docker* for containerized machine learning at scale. DL model templates are available (and customizable) on the platform, removing complexities associated with creating and running single and multinode deep learning training experiments. For model testing, Nauta also supports both batch and streaming inference, all in a single platform. 
Navigation Network (NavNet) 
We propose to take a novel approach to robot system design where each building block of a larger system is represented as a differentiable program, i.e. a deep neural network. This representation allows for integrating algorithmic planning and deep learning in a principled manner, and thus combine the benefits of modelfree and modelbased methods. We apply the proposed approach to a challenging partially observable robot navigation task. The robot must navigate to a goal in a previously unseen 3D environment without knowing its initial location, and instead relying on a 2D floor map and visual observations from an onboard camera. We introduce the Navigation Networks (NavNets) that encode state estimation, planning and acting in a single, endtoend trainable recurrent neural network. In preliminary simulation experiments we successfully trained navigation networks to solve the challenging partially observable navigation task. 
NavigatorTeacherScrutinizer Network (NTSNet) 
Finegrained classification is challenging due to the difficulty of finding discriminative features. Finding those subtle traits that fully characterize the object is not straightforward. To handle this circumstance, we propose a novel selfsupervision mechanism to effectively localize informative regions without the need of boundingbox/part annotations. Our model, termed NTSNet for NavigatorTeacherScrutinizer Network, consists of a Navigator agent, a Teacher agent and a Scrutinizer agent. In consideration of intrinsic consistency between informativeness of the regions and their probability being groundtruth class, we design a novel training paradigm, which enables Navigator to detect most informative regions under the guidance from Teacher. After that, the Scrutinizer scrutinizes the proposed regions from Navigator and makes predictions. Our model can be viewed as a multiagent cooperation, wherein agents benefit from each other, and make progress together. NTSNet can be trained endtoend, while provides accurate finegrained classification predictions as well as highly informative regions during inference. We achieve stateoftheart performance in extensive benchmark datasets. 
NBaIoT  The proliferation of IoT devices which can be more easily compromised than desktop computers has led to an increase in the occurrence of IoT based botnet attacks. In order to mitigate this new threat there is a need to develop new methods for detecting attacks launched from compromised IoT devices and differentiate between hour and millisecond long IoTbased attacks. In this paper we propose and empirically evaluate a novel network based anomaly detection method which extracts behavior snapshots of the network and uses deep autoencoders to detect anomalous network traffic emanating from compromised IoT devices. To evaluate our method, we infected nine commercial IoT devices in our lab with two of the most widely known IoT based botnets, Mirai and BASHLITE. Our evaluation results demonstrated our proposed method’s ability to accurately and instantly detect the attacks as they were being launched from the compromised IoT devices which were part of a botnet. 
NBody Network  We describe Nbody networks, a neural network architecture for learning the behavior and properties of complex many body physical systems. Our specific application is to learn atomic potential energy surfaces for use in molecular dynamics simulations. Our architecture is novel in that (a) it is based on a hierarchical decomposition of the many body system into subsytems, (b) the activations of the network correspond to the internal state of each subsystem, (c) the ‘neurons’ in the network are constructed explicitly so as to guarantee that each of the activations is covariant to rotations, (d) the neurons operate entirely in Fourier space, and the nonlinearities are realized by tensor products followed by ClebschGordan decompositions. As part of the description of our network, we give a characterization of what way the weights of the network may interact with the activations so as to ensure that the covariance property is maintained. 
NCRF++  This paper describes NCRF++, a toolkit for neural sequence labeling. NCRF++ is designed for quick implementation of different neural sequence labeling models with a CRF inference layer. It provides users with an inference for building the custom model structure through configuration file with flexible neural feature design and utilization. Built on PyTorch, the core operations are calculated in batch, making the toolkit efficient with the acceleration of GPU. It also includes the implementations of most stateoftheart neural sequence labeling models such as LSTMCRF, facilitating reproducing and refinement on those methods. 
ND4J  ND4J is a scientific computing library for the JVM. It is meant to be used in production environments rather than as a research tool, which means routines are designed to run fast with minimum RAM requirements. 
NearBucket Locality Sensitive Hashing (NearBucketLSH) 
We present NearBucketLSH, an effective algorithm for similarity search in largescale distributed online social networks organized as peertopeer overlays. As communication is a dominant consideration in distributed systems, we focus on minimizing the network cost while guaranteeing good search quality. Our algorithm is based on Locality Sensitive Hashing (LSH), which limits the search to collections of objects, called buckets, that have a high probability to be similar to the query. More specifically, NearBucketLSH employs an LSH extension that searches in near buckets, and improves search quality but also significantly increases the network cost. We decrease the network cost by considering the internals of both LSH and the P2P overlay, and harnessing their properties to our needs. We show that our NearBucketLSH increases search quality for a given network cost compared to previous art. In many cases, the search quality increases by more than 50%. 
Nearest Descent (ND) 

Nearest Neighbor Descent (NND) 

NearFar Matching  Nearfar matching is a study design technique for preprocessing observational data to mimic a pairrandomized trial. Individuals are matched to be near on measured confounders and far on levels of an instrumental variable. nearfar 
Necessary Condition Analysis (NCA) 
Theoretical ‘necessary but not sufficient’ statements are common in the organizational sciences. Traditional data analyses approaches (e.g., correlation or multiple regression) are not appropriate for testing or inducing such statements. This paper proposes Necessary Condition Analysis (NCA) as a general and straightforward methodology for identifying necessary conditions in datasets. The paper presents the logic and methodology of necessary but not sufficient contributions of organizational determinants (e.g., events, characteristics, resources, efforts) to a desired outcome (e.g., good performance). A necessary determinant must be present for achieving an outcome, but its presence is not sufficient to obtain that outcome. Without the necessary condition, there is guaranteed failure, which cannot be compensated by other determinants of the outcome. This logic and its related methodology are fundamentally different from the traditional sufficiencybased logic and methodology. Practical recommendations and free software are offered to support researchers to apply NCA. NCA 
NeedleHaystack (NH) 
Images from social media can reflect diverse viewpoints, heated arguments, and expressions of creativity — adding new complexity to search tasks. Researchers working on ContentBased Image Retrieval (CBIR) have traditionally tuned their search algorithms to match filtered results with user search intent. However, we are now bombarded with composite images of unknown origin, authenticity, and even meaning. With such uncertainty, users may not have an initial idea of what the results of a search query should look like. For instance, hidden people, spliced objects, and subtly altered scenes can be difficult for a user to detect initially in a meme image, but may contribute significantly to its composition. We propose a new framework for image retrieval that models objectlevel regions using image keypoints retrieved from an image index, which are then used to accurately weight small contributing objects within the results, without the need for costly object detection steps. We call this method NeedleHaystack (NH) scoring, and it is optimized for fast matrix operations on CPUs. We show that this method not only performs comparably to stateoftheart methods in classic CBIR problems, but also outperforms them in finegrained object and instancelevel retrieval on the Oxford 5K, Paris 6K, GoogleLandmarks, and NIST MFC2018 datasets, as well as memestyle imagery from Reddit. 
Negative Binomial Regression (NBR) 
Negative binomial regression is for modeling count variables, usually for overdispersed count outcome variables. NegBinBetaBinreg 
Negative Type Diversity  Diversities are a generalization of metric spaces in which a nonnegative value is assigned to all finite subsets of a set, rather than just to pairs of points. Here we provide an analogue of the theory of negative type metrics for diversities. We introduce negative type diversities, and show that, as in the metric space case, they are a generalization of $L_1$embeddable diversities. We provide a number of characterizations of negative type diversities, including a geometric characterisation. Much of the recent interest in negative type metrics stems from the connections between metric embeddings and approximation algorithms. We extend some of this work into the diversity setting, showing that lower bounds for embeddings of negative type metrics into $L_1$ can be extended to diversities by using recently established extremal results on hypergraphs. 
NeighborEncoder  Since its introduction, unsupervised representation learning has attracted a lot of attention from the research community, as it is demonstrated to be highly effective and easytoapply in tasks such as dimension reduction, clustering, visualization, information retrieval, and semisupervised learning. In this work, we propose a novel unsupervised representation learning framework called neighborencoder, in which domain knowledge can be easily incorporated into the learning process without modifying the general encoderdecoder architecture of the classic autoencoder.In contrast to autoencoder, which reconstructs the input data itself, neighborencoder reconstructs the input data’s neighbors. As the proposed representation learning problem is essentially a neighbor reconstruction problem, domain knowledge can be easily incorporated in the form of an appropriate definition of similarity between objects. Based on that observation, our framework can leverage any offtheshelf similarity search algorithms or side information to find the neighbor of an input object. Applications of other algorithms (e.g., association rule mining) in our framework are also possible, given that the appropriate definition of neighbor can vary in different contexts. We have demonstrated the effectiveness of our framework in many diverse domains, including images, text, and time series, and for various data mining tasks including classification, clustering, and visualization. Experimental results show that neighborencoder not only outperforms autoencoder in most of the scenarios we consider, but also achieves the stateoftheart performance on text document clustering. 
NeighborRegularized and ContextAware NonNegative Tensor Factorization Model (NRcNTF) 
Recent years have witnessed the worldwide emergence of megametropolises with incredibly huge populations. Understanding residents mobility patterns, or urban dynamics, thus becomes crucial for building modern smart cities. In this paper, we propose a NeighborRegularized and contextaware Nonnegative Tensor Factorization model (NRcNTF) to discover interpretable urban dynamics from urban heterogeneous data. Different from many existing studies concerned with prediction tasks via tensor completion, NRcNTF focuses on gaining urban managerial insights from spatial, temporal, and spatiotemporal patterns. This is enabled by highquality Tucker factorizations regularized by both POIbased urban contexts and geographically neighboring relations. NRcNTF is also capable of unveiling longterm evolutions of urban dynamics via a pipeline initialization approach. We apply NRcNTF to a reallife data set containing rich taxi GPS trajectories and POI records of Beijing. The results indicate: 1) NRcNTF accurately captures four kinds of city rhythms and seventeen spatial communities; 2) the rapid development of Beijing, epitomized by the CBD area, indeed intensifies the jobhousing imbalance; 3) the southern areas with recent government investments have shown more healthy development tendency. Finally, NRcNTF is compared with some baselines on traffic prediction, which further justifies the importance of urban contexts awareness and neighboring regulations. 
NelderMead Method  The NelderMead method or downhill simplex method or amoeba method is a commonly applied numerical method used to find the minimum or maximum of an objective function in a manydimensional space. It is applied to nonlinear optimization problems for which derivatives may not be known. However, the NelderMead technique is a heuristic search method that can converge to nonstationary points on problems that can be solved by alternative methods. The NelderMead technique was proposed by John Nelder & Roger Mead (1965). 
NELL  Can computers learn to read? We think so. ‘Read the Web’ is a research project that attempts to create a computer system that learns over time to read the web. Since January 2010, our computer system called NELL (NeverEnding Language Learner) has been running continuously, attempting to perform two tasks each day: • First, it attempts to ‘read,’ or extract facts from text found in hundreds of millions of web pages (e.g., playsInstrument(George_Harrison, guitar)). • Second, it attempts to improve its reading competence, so that tomorrow it can extract more facts from the web, more accurately. So far, NELL has accumulated over 50 million candidate beliefs by reading the web, and it is considering these at different levels of confidence. NELL has high confidence in 2,810,379 of these beliefs – these are displayed on this website. It is not perfect, but NELL is learning. You can track NELL’s progress below or @cmunell on Twitter, browse and download its knowledge base, read more about our technical approach, or join the discussion group. 
Neo4j  Neo4j is an opensource graph database, implemented in Java. The developers describe Neo4j as ’embedded, diskbased, fully transactional Java persistence engine that stores data structured in graphs rather than in tables’. Neo4j is the most popular graph database. Neo4j version 1.0 was released in February, 2010. The community edition of the database is licensed under the free GNU General Public License (GPL) v3. The additional modules, such as online backup and high availability, are licensed under the free Affero General Public License (AGPL) v3. The database, with the additional modules, is also available under a commercial license, in a dual license model. Neo4j version 2.0 was released in December, 2013. Neo4j was developed by Neo Technology, Inc., based in the San Francisco Bay Area, US and Malmö, Sweden. RNeo4j 
NestDNN  Mobile vision systems such as smartphones, drones, and augmentedreality headsets are revolutionizing our lives. These systems usually run multiple applications concurrently and their available resources at runtime are dynamic due to events such as starting new applications, closing existing applications, and application priority changes. In this paper, we present NestDNN, a framework that takes the dynamics of runtime resources into account to enable resourceaware multitenant ondevice deep learning for mobile vision systems. NestDNN enables each deep learning model to offer flexible resourceaccuracy tradeoffs. At runtime, it dynamically selects the optimal resourceaccuracy tradeoff for each deep learning model to fit the model’s resource demand to the system’s available runtime resources. In doing so, NestDNN efficiently utilizes the limited resources in mobile vision systems to jointly maximize the performance of all the concurrently running applications. Our experiments show that compared to the resourceagnostic status quo approach, NestDNN achieves as much as 4.2% increase in inference accuracy, 2.0x increase in video frame processing rate and 1.7x reduction on energy consumption. 
Nested Association Mapping (NAM) 
Nested association mapping (NAM) is a technique designed by the labs of Edward Buckler, James Holland, and Michael McMullen for identifying and dissecting the genetic architecture of complex traits in corn (Zea mays). It is important to note that nested association mapping (unlike Association mapping) is a specific technique that cannot be performed outside of a specifically designed population such as the Maize NAM population. NAM 
Nested Averaged Stochastic Approximation (NASA) 
We study constrained nested stochastic optimization problems in which the objective function is a composition of two smooth functions whose exact values and derivatives are not available. We propose a single timescale stochastic approximation algorithm, which we call the Nested Averaged Stochastic Approximation (NASA), to find an approximate stationary point of the problem. The algorithm has two auxiliary averaged sequences (filters) which estimate the gradient of the composite objective function and the inner function value. By using a special Lyapunov function, we show that NASA achieves the sample complexity of ${\cal O}(1/\epsilon^{2})$ for finding an $\epsilon$approximate stationary point, thus outperforming all extant methods for nested stochastic approximation. Our method and its analysis are the same for both unconstrained and constrained problems, without any need of batch samples for constrained nonconvex stochastic optimization. We also present a simplified variant of the NASA method for solving constrained single level stochastic optimization problems, and we prove the same complexity result for both unconstrained and constrained problems. 
Nested Chinese Restaurant Process (NCRP) 
The nested Chinese restaurant process (nCRP) is a stochastic process that assigns probability distributions to ensembles of inÞnitely deep, inÞnitely branching trees. 
Nested Dirichlet Process Mixture of Products of Multinomial Distributions (NDPMPM) 
We present a Bayesian model for estimating the joint distribution of multivariate categorical data when units are nested within groups. Such data arise frequently in social science settings, for example, people living in households. The model assumes that (i) each group is a member of a grouplevel latent class, and (ii) each unit is a member of a unitlevel latent class nested within its grouplevel latent class. This structure allows the model to capture dependence among units in the same group. It also fa cilitates simultaneous modeling of variables at both group and unit levels. We develop a version of the model that assigns zero probability to groups and units with physically impossible combinations of variables. We apply the model to estimate multivariate relationships in a subset of the Ameri can Community Survey. Using the estimated model, we generate synthetic household data that could be disseminated as redacted public use les. Supplementary materials for this article are available online. NestedCategBayesImpute 
Nested Error Regression Model  This paper suggests the nested error regression model, with use of uncertain random effects, which means that the random effects in each area are expressed as a mixture of a normal distribution and a positive mass at 0. For estimation of model parameters and prediction of random effects, we consider Bayesian yet objective inference by setting improper prior distributions on the model parameters. We show under the mild sufficient condition that the posterior distribution is proper and the posterior variances are finite to confirm validity of posterior inference. To generate samples from the posterior distribution, we provide the Gibbs sampling method. The full conditional distributions of the posterior distribution are all familiar forms such that the proposed methodology is easy to implement. This paper also addresses the problem of prediction of finite population means and we provide a sampling based method to tackle this issue. We compare the proposed model with the conventional nested error regression model through simulation and empirical studies. 
Nested LSTM (NLSTM) 
We propose Nested LSTMs (NLSTM), a novel RNN architecture with multiple levels of memory. Nested LSTMs add depth to LSTMs via nesting as opposed to stacking. The value of a memory cell in an NLSTM is computed by an LSTM cell, which has its own inner memory cell. Specifically, instead of computing the value of the (outer) memory cell as $c^{outer}_t = f_t \odot c_{t1} + i_t \odot g_t$, NLSTM memory cells use the concatenation $(f_t \odot c_{t1}, i_t \odot g_t)$ as input to an inner LSTM (or NLSTM) memory cell, and set $c^{outer}_t$ = $h^{inner}_t$. Nested LSTMs outperform both stacked and singlelayer LSTMs with similar numbers of parameters in our experiments on various characterlevel language modeling tasks, and the inner memories of an LSTM learn longer term dependencies compared with the higherlevel units of a stacked LSTM. 
Nested MultiInstance Classification  ➘ “Nested MultiInstance Deep Network” 
Nested MultiInstance Deep Network  There are classification tasks that take as inputs groups of images rather than single images. In order to address such situations, we introduce a nested multiinstance deep network. The approach is generic in that it is applicable to general data instances, not just images. The network has several convolutional neural networks grouped together at different stages. This primarily differs from other previous works in that we organize instances into relevant groups that are treated differently. We also introduce a method to replace instances that are missing which successfully creates neutral input instances and consistently outperforms standard fillin methods in real world use cases. In addition, we propose a method for manual dropout when a whole group of instances is missing that allows us to use richer training data and obtain higher accuracy at the end of training. With specific pretraining, we find that the model works to great effect on our real world and public datasets in comparison to baseline methods, justifying the different treatment among groups of instances. 
Nested Polyhedral Model  Hardware architectures and machine learning (ML) libraries evolve rapidly. Traditional compilers often fail to generate highperformance code across the spectrum of new hardware offerings. To mitigate, engineers develop handtuned kernels for each ML library update and hardware upgrade. Unfortunately, this approach requires excessive engineering effort to scale or maintain with any degree of stateoftheart performance. Here we present a Nested Polyhedral Model for representing highly parallelizable computations with limited dependencies between iterations. This model provides an underlying framework for an intermediate representation (IR) called Stripe, amenable to standard compiler techniques while naturally modeling key aspects of modern ML computing. Stripe represents parallelism, efficient memory layout, and multiple compute units at a level of abstraction amenable to automatic optimization. We describe how Stripe enables a compiler for ML in the style of LLVM that allows independent development of algorithms, optimizations, and hardware accelerators. We also discuss the design exploration advantages of Stripe over kernel libraries and schedulebased or schedulespacebased code generation. 
Nested Sampling Algorithm  The nested sampling algorithm is a computational approach to the problem of comparing models in Bayesian statistics, developed in 2004 by physicist John Skilling. 
Nesterov’s Accelerated Gradient (NAG) 
Nesterov’s Accelerated Gradient Descent performs a simple step of gradient descent to go from x_s to y_{s+1}, and then it ‘slides’ a little bit further than y_{s+1} in the direction given by the previous point y_s. The intuition behind the algorithm is quite difficult to grasp, and unfortunately the analysis will not be very enlightening either. Nonetheless Nesterov’s Accelerated Gradient is an optimal method (in terms of oracle complexity) for smooth convex optimization, 
Net Reclassification Improvement (NRI) 
Net Reclassification Improvement (NRI) described in the paper: Jialiang Li (2013) <doi:10.1093/biostatistics/kxs047>. mcca 
Net#  Neural networks are one of the most popular machine learning algorithms today. One of the challenges when using neural networks is how to define a network topology given the variety of possible layer types, connections among them, and activation functions. Net# solves this problem by providing a succinct way to define almost any neural network architecture in a descriptive, easytoread format. This post provides a short tutorial for building a neural network using the Net# language to classify images of handwritten numeric digits in Microsoft Azure Machine Learning. 
Net2Vec  In an effort to understand the meaning of the intermediate representations captured by deep networks, recent papers have tried to associate specific semantic concepts to individual neural network filter responses, where interesting correlations are often found, largely by focusing on extremal filter responses. In this paper, we show that this approach can favor easytointerpret cases that are not necessarily representative of the average behavior of a representation. A more realistic but hardertostudy hypothesis is that semantic representations are distributed, and thus filters must be studied in conjunction. In order to investigate this idea while enabling systematic visualization and quantification of multiple filter responses, we introduce the Net2Vec framework, in which semantic concepts are mapped to vectorial embeddings based on corresponding filter responses. By studying such embeddings, we are able to show that 1., in most cases, multiple filters are required to code for a concept, that 2., often filters are not concept specific and help encode multiple concepts, and that 3., compared to single filter activations, filter embeddings are able to better characterize the meaning of a representation and its relationship to other concepts. 
Net2Vis  To properly convey neural network architectures in publications, appropriate visualization techniques are of great importance. While most current deep learning papers contain such visualizations, these are usually handcrafted, which results in a lack of a common visual grammar, as well as a significant time investment. Since these visualizations are often crafted just before publication, they are also prone to contain errors, might deviate from the actual architecture, and are sometimes ambiguous to interpret. Current automatic network visualization toolkits focus on debugging the network itself, and are therefore not ideal for generating publicationready visualization, as they cater a different level of detail. Therefore, we present an approach to automate this process by translating network architectures specified in Python, into publicationready network visualizations that can directly be embedded into any publication. To improve the readability of these visualizations, and in order to make them comparable, the generated visualizations obey to a visual grammar, which we have derived based on the analysis of existing network visualizations. Besides carefully crafted visual encodings, our grammar also incorporates abstraction through layer accumulation, as it is often done to reduce the complexity of the network architecture to be communicated. Thus, our approach not only reduces the time needed to generate publicationready network visualizations, but also enables a unified and unambiguous visualization design. 
netinf  Given a set of events that spread between a set of nodes the algorithm infers the most likely stable diffusion network that is underlying the diffusion process. NetworkInference 
NetMix  Complex networks have become powerful mechanisms for studying a variety of realworld systems. Consequently, many humandesigned network models are proposed that reproduce nontrivial properties of complex networks, such as longtail degree distribution or high clustering coefficient. Therefore, we may utilize network models in order to generate graphs similar to desired networks. However, a desired network structure may deviate from emerging structure of any generative model, because no selected single model may support all the needed properties of the target graph and instead, each network model reflects a subset of the required features. In contrast to the classical approach of network modeling, an appropriate modern network model should adapt the desired features of the target network. In this paper, we propose an automatic approach for constructing network models that are adapted to the desired network features. We employ Genetic Algorithms in order to evolve network models based on the characteristics of the target networks. The experimental evaluations show that our proposed framework, called NetMix, results network models that outperform baseline models according to the compliance with the desired features of the target networks. 
NetSim  Networks are everywhere and their many types, including social networks, the Internet, food webs etc., have been studied for the last few decades. However, in realworld networks, it’s hard to find examples that can be easily comparable, i.e. have the same density or even number of nodes and edges. We propose a flexible and extensible NetSim framework to understand how properties in different types of networks change with varying number of edges and vertices. Our approach enables to simulate three classical network models (random, smallworld and scalefree) with easily adjustable model parameters and network size. To be able to compare different networks, for a single experimental setup we kept the number of edges and vertices fixed across the models. To understand how they change depending on the number of nodes and edges we ran over 30,000 simulations and analysed different network characteristics that cannot be derived analytically. Two of the main findings from the analysis are that the average shortest path does not change with the density of the scalefree network but changes for smallworld and random networks; the apparent difference in mean betweenness centrality of the scalefree network compared with random and smallworld networks. 
NetTrim  We develop a fast, tractable technique called NetTrim for simplifying a trained neural network. The method is a convex postprocessing module, which prunes (sparsifies) a trained network layer by layer, while preserving the internal responses. We present a comprehensive analysis of NetTrim from both the algorithmic and sample complexity standpoints, centered on a fast, scalable convex optimization program. Our analysis includes consistency results between the initial and retrained models before and after NetTrim application and guarantees on the number of training samples needed to discover a network that can be expressed using a certain number of nonzero terms. Specifically, if there is a set of weights that uses at most $s$ terms that can recreate the layer outputs from the layer inputs, we can find these weights from $\mathcal{O}(s\log N/s)$ samples, where $N$ is the input size. These theoretical results are similar to those for sparse regression using the Lasso, and our analysis uses some of the same recentlydeveloped tools (namely recent results on the concentration of measure and convex analysis). Finally, we propose an algorithmic framework based on the alternating direction method of multipliers (ADMM), which allows a fast and simple implementation of NetTrim for network pruning and compression. 
Network Analysis  Network analysis is a quantitative methodology for studying properties related to connectivity and distances in graphs, with diverse applications like citation indexing and information retrieval on the Web. ➘ “Network Theory” ➘ “Social Network Analysis” A Short Course on Network Analysis Network Analysis for Wikipedia 
Network Based Diffusion Analysis (NBDA) 
Social learning has been documented in a wide diversity of animals. In freeliving animals, however, it has been difficult to discern whether animals learn socially by observing other group members or asocially by acquiring a new behaviour independently. We addressed this challenge by developing networkbased diffusion analysis (NBDA), which analyses the spread of traits through animal groups and takes into account that social network structure directs social learning opportunities. NBDA fits agentbased models of social and asocial learning to the observed data using maximumlikelihood estimation. The underlying learning mechanism can then be identified using model selection based on the Akaike information criterion. spatialnbda 
Network Embedding  Network embedding assigns nodes in a network to lowdimensional representations and effectively preserves the network structure. A Tutorial on Network Embeddings 
Network Estimation Across Unequal Sample Sizes (NExUS) 
Networkbased analyses of highthroughput genomics data provide a holistic, systemslevel understanding of various biological mechanisms for a common population. However, when estimating multiple networks across heterogeneous subpopulations, varying sample sizes pose a challenge in the estimation and inference, as network differences may be driven by differences in power. We are particularly interested in addressing this challenge in the context of proteomic networks for related cancers, as the number of subjects available for rare cancer (sub)types is often limited. We develop NExUS (Network Estimation across Unequal Sample sizes), a Bayesian method that enables joint learning of multiple networks while avoiding artefactual relationship between sample size and network sparsity. We demonstrate through simulations that NExUS outperforms existing network estimation methods in this context, and apply it to learn network similarity and shared pathway activity for groups of cancers with related origins represented in The Cancer Genome Atlas (TCGA) proteomic data. 
Network Flow Motif  Many realworld phenomena are best represented as interaction networks with dynamic structures (e.g., transaction networks, social networks, traffic networks). Interaction networks capture flow of data which is transferred between their vertices along a timeline. Analyzing such networks is crucial toward comprehending processes in them. A typical analysis task is the finding of motifs, which are small subgraph patterns that repeat themselves in the network. In this paper, we introduce network flow motifs, a novel type of motifs that model significant flow transfer among a set of vertices within a constrained time window. We design an algorithm for identifying flow motif instances in a large graph. Our algorithm can be easily adapted to find the topk instances of maximal flow. In addition, we design a dynamic programming module that finds the instance with the maximum flow. We evaluate the performance of the algorithm on three real datasets and identify flow motifs which are significant for these graphs. Our results show that our algorithm is scalable and that the real networks indeed include interesting motifs, which appear much more frequently than in randomly generated networks having similar characteristics. 
Network for Adversary Generation (NAG) 
Adversarial perturbations can pose a serious threat for deploying machine learning systems. Recent works have shown existence of imageagnostic perturbations that can fool classifiers over most natural images. Existing methods present optimization approaches that solve for a fooling objective with an imperceptibility constraint to craft the perturbations. However, for a given classifier, they generate one perturbation at a time, which is a single instance from the manifold of adversarial perturbations. Also, in order to build robust models, it is essential to explore the manifold of adversarial perturbations. In this paper, we propose for the first time, a generative approach to model the distribution of adversarial perturbations. The architecture of the proposed model is inspired from that of GANs and is trained using fooling and diversity objectives. Our trained generator network attempts to capture the distribution of adversarial perturbations for a given classifier and readily generates a wide variety of such perturbations. Our experimental evaluation demonstrates that perturbations crafted by our model (i) achieve stateoftheart fooling rates, (ii) exhibit wide variety and (iii) deliver excellent cross model generalizability. Our work can be deemed as an important step in the process of inferring about the complex manifolds of adversarial perturbations. 
Network In Network (NIN) 
We propose a novel deep network structure called ‘Network In Network’ (NIN) to enhance model discriminability for local patches within the receptive field. The conventional convolutional layer uses linear filters followed by a nonlinear activation function to scan the input. Instead, we build micro neural networks with more complex structures to abstract the data within the receptive field. We instantiate the micro neural network with a multilayer perceptron, which is a potent function approximator. The feature maps are obtained by sliding the micro networks over the input in a similar manner as CNN; they are then fed into the next layer. Deep NIN can be implemented by stacking mutiple of the above described structure. With enhanced local modeling via the micro network, we are able to utilize global average pooling over feature maps in the classification layer, which is easier to interpret and less prone to overfitting than traditional fully connected layers. We demonstrated the stateoftheart classification performances with NIN on CIFAR10 and CIFAR100, and reasonable performances on SVHN and MNIST datasets. GitXiv 
Network Laplacian Spectral Descriptor (NetLSD) 
Comparison among graphs is ubiquitous in graph analytics. However, it is a hard task in terms of the expressiveness of the employed similarity measure and the efficiency of its computation. Ideally, graph comparison should be invariant to the order of nodes and the sizes of compared graphs, adaptive to the scale of graph patterns, and scalable. Unfortunately, these properties have not been addressed together. Graph comparisons still rely on direct approaches, graph kernels, or representationbased methods, which are all inefficient and impractical for large graph collections. In this paper, we propose NetLSD (Network Laplacian Spectral Descriptor), a permutation and sizeinvariant, scaleadaptive, and scalably computable graph representation method that allows for straightforward comparisons. NetLSD hears the shape of a graph by extracting a compact signature that inherits the formal properties of the Laplacian spectrum, specifically its heat or wave kernel. To our knowledge, NetLSD is the first expressive graph representation that allows for efficient comparisons of large graphs, our evaluation on a variety of realworld graphs demonstrates that it outperforms previous works in both expressiveness and efficiency. 
Network Lasso (nLasso) 
The network Lasso (nLasso) has been proposed recently as an efficient learning algorithm for massive networked data sets (big data over networks). It extends the wellknown least least absolute shrinkage and selection operator (Lasso) from learning sparse (generalized) linear models to network models. Efficient implementations of the nLasso have been obtained using convex optimization methods. These implementations naturally lend to highly scalable message passing methods. In this paper, we analyze the performance of nLasso when applied to localized linear regression problems involving networked data. Our main result is a sufficient conditions on the network structure and available label information such that nLasso accurately learns a localized linear regression model from few labeled data points. 
Network Mapping  Network mapping is the study of the physical connectivity of networks. Internet mapping is the study of the physical connectivity of the Internet. Network mapping discovers the devices on the network and their connectivity. It is not to be confused with network discovery or network enumerating which discovers devices on the network and their characteristics such as (operating system, open ports, listening network services, etc.). The field of automated network mapping has taken on greater importance as networks become more dynamic and complex in nature. 
Network Maximal Correlation (NMC) 
We introduce Network Maximal Correlation (NMC) as a multivariate measure of nonlinear association among random variables. NMC is defined via an optimization that infers (nontrivial) transformations of variables by maximizing aggregate inner products between transformed variables. We characterize a solution of the NMC optimization using geometric properties of Hilbert spaces for finite discrete and jointly Gaussian random variables. For finite discrete variables, we propose an algorithm based on alternating conditional expectation to determine NMC. We also show that empirically computed NMC converges to NMC exponentially fast in sample size. For jointly Gaussian variables, we show that under some conditions the NMC optimization is an instance of the MaxCut problem. We then illustrate an application of NMC and multiple MC in inference of graphical model for bijective, possibly nonmonotone, functions of jointly Gaussian variables generalizing the copula setup developed by Liu et al. Finally, we illustrate NMC’s utility in a real data application of learning nonlinear dependencies among genes in a cancer dataset. 
Network MetaAnalysis  I present methods for assessing the relative effectiveness of two treatments when they have not been compared directly in a randomized trial but have each been compared to other treatments. These network metaanalysis techniques allow estimation of both heterogeneity in the effect of any given treatment and inconsistency (‘incoherence’) in the evidence from different pairs of treatments. 
Network Scale Up Method (NSUM) 
The network scaleup method was developed by a team of researchers under grants from the U. S. National Science Foundation to H. Russell Bernard and Christopher McCarty at the University of Florida. The method can be applied now to estimating the size of hardtocount (or impossibletocount) populations but the method is a work in progress. Each new application provides data for improving the validity and accuracy of the estimates. As with the development of the model, these improvements require the efforts of survey researchers, mathematicians, and ethnographers. The network scaleup method was developed in conjunction with our team’s research on the rules governing who people know and how they know them. The particular list of people who people come to know in a lifetime may appear random, but the rules governing who we come to know are surely not random. One basic component of social structure is the number of people whom people know. NSUM 
Network Science  Network science is an interdisciplinary academic field which studies complex networks such as telecommunication networks, computer networks, biological networks, cognitive and semantic networks, and social networks. The field draws on theories and methods including graph theory from mathematics, statistical mechanics from physics, data mining and information visualization from computer science, inferential modeling from statistics, and social structure from sociology. The United States National Research Council defines network science as ‘the study of network representations of physical, biological, and social phenomena leading to predictive models of these phenomena.’ 
Network Sketching  Convolutional neural networks (CNNs) with deep architectures have substantially advanced the stateoftheart in computer vision tasks. However, deep networks are typically resourceintensive and thus difficult to be deployed on mobile devices. Recently, CNNs with binary weights have shown compelling efficiency to the community, whereas the accuracy of such models is usually unsatisfactory in practice. In this paper, we introduce network sketching as a novel technique of pursuing binaryweight CNNs, targeting at more faithful inference and better tradeoff for practical applications. Our basic idea is to exploit binary structure directly in pretrained filter banks and produce binaryweight models via tensor expansion. The whole process can be treated as a coarsetofine model approximation, akin to the pencil drawing steps of outlining and shading. To further speedup the generated models, namely the sketches, we also propose an associative implementation of binary tensor convolutions. Experimental results demonstrate that a proper sketch of AlexNet (or ResNet) outperforms the existing binaryweight models by large margins on the ImageNet large scale classification task, while the committed memory for network parameters only exceeds a little. 
Network State Prediction (NSP) 
The goal in network state prediction (NSP) is to classify the global state (label) associated with features embedded in a graph. This graph structure encoding feature relationships is the key distinctive aspect of NSP compared to classical supervised learning. NSP arises in various applications: gene expression samples embedded in a proteinprotein interaction (PPI) network, temporal snapshots of infrastructure or sensor networks, and fMRI coherence network samples from multiple subjects to name a few. Instances from these domains are typically “wide” (more features than samples), and thus, feature subselection is required for robust and generalizable prediction. How to best employ the network structure in order to learn succinct connected subgraphs encompassing the most discriminative features becomes a central challenge in NSP. Prior work employs connected subgraph sampling or graph smoothing within optimization frameworks, resulting in either large variance of quality or weak control over the connectivity of selected subgraphs. In this work we propose an optimization framework for discriminative subgraph learning (DSL) which simultaneously enforces (i) sparsity, (ii) connectivity and (iii) high discriminative power of the resulting subgraphs of features. Our optimization algorithm is a singlestep solution for the NSP and the associated feature selection problem. It is rooted in the rich literature on maximalmargin optimization, spectral graph methods and sparse subspace selfrepresentation. DSL simultaneously ensures solution interpretability and superior predictive power (up to 16% improvement in challenging instances compared to baselines), with execution times up to an hour for large instances. 
Network Theory  In computer and network science, network theory is the study of graphs as a representation of either symmetric relations or, more generally, of asymmetric relations between discrete objects. Network theory is a part of graph theory. It has applications in many disciplines including statistical physics, particle physics, computer science, electrical engineering, biology, economics, operations research, and sociology. Applications of network theory include logistical networks, the World Wide Web, Internet, gene regulatory networks, metabolic networks, social networks, epistemological networks, etc; see List of network theory topics for more examples. Euler’s solution of the Seven Bridges of Königsberg problem is considered to be the first true proof in the theory of networks. 
Network Tikhono  Recovering a function or highdimensional parameter vector from indirect measurements is a central task in various scientific areas. Several methods for solving such inverse problems are well developed and well understood. Recently, novel algorithms using deep learning and neural networks for inverse problems appeared. While still in their infancy, these techniques show astonishing performance for applications like lowdose CT or various sparse data problems. However, theoretical results for deep learning in inverse problems are missing so far. In this paper, we establish such a convergence analysis for the proposed NETT (Network Tikhonov) approach to inverse problems. NETT considers regularized solutions having small value of a regularizer defined by a trained neural network. Opposed to existing deep learning approaches, our regularization scheme enforces data consistency also for the actual unknown to be recovered. This is beneficial in case the unknown to be recovered is not sufficiently similar to available training data. We present a complete convergence analysis for NETT, where we derive wellposedness results and quantitative error estimates, and propose a possible strategy for training the regularizer. Numerical results are presented for a tomographic sparse data problem using the $\ell^q$norm of autoencoder as trained regularizer, which demonstrate good performance of NETT even for unknowns of different type from the training data. 
Network Transplanting  This paper focuses on a novel problem, i.e., transplanting a categoryandtaskspecific neural network to a generic, distributed network without strong supervision. Like playing LEGO blocks, incrementally constructing a generic network by asynchronously merging specific neural networks is a crucial bottleneck for deep learning. Suppose that the pretrained specific network contains a module $f$ to extract features of the target category, and the generic network has a module $g$ for a target task, which is trained using other categories except for the target category. Instead of using numerous training samples to teach the generic network a new category, we aim to learn a small adapter module to connect $f$ and $g$ to accomplish the task on a target category in a weaklysupervised manner. The core challenge is to efficiently learn feature projections between the two connected modules. We propose a new distillation algorithm, which exhibited superior performance. Our method without training samples even significantly outperformed the baseline with 100 training samples. 
Network Vector  We propose a neural embedding algorithm called Network Vector, which learns distributed representations of nodes and the entire networks simultaneously. By embedding networks in a lowdimensional space, the algorithm allows us to compare networks in terms of structural similarity and to solve outstanding predictive problems. Unlike alternative approaches that focus on node level features, we learn a continuous global vector that captures each node’s global context by maximizing the predictive likelihood of random walk paths in the network. Our algorithm is scalable to real world graphs with many nodes. We evaluate our algorithm on datasets from diverse domains, and compare it with stateoftheart techniques in node classification, role discovery and concept analogy tasks. The empirical results show the effectiveness and the efficiency of our algorithm. 
NetworkClustered MultiModal Bug Localization (NetML) 
Developers often spend much effort and resources to debug a program. To help the developers debug, numerous information retrieval (IR)based and spectrumbased bug localization techniques have been devised. IRbased techniques process textual information in bug reports, while spectrumbased techniques process program spectra (i.e., a record of which program elements are executed for each test case). While both techniques ultimately generate a ranked list of program elements that likely contain a bug, they only consider one source of information–either bug reports or program spectra–which is not optimal. In light of this deficiency, this paper presents a new approach dubbed Networkclustered Multimodal Bug Localization (NetML), which utilizes multimodal information from both bug reports and program spectra to localize bugs. NetML facilitates an effective bug localization by carrying out a joint optimization of bug localization error and clustering of both bug reports and program elements (i.e., methods). The clustering is achieved through the incorporation of network Lasso regularization, which incentivizes the model parameters of similar bug reports and similar program elements to be close together. To estimate the model parameters of both bug reports and methods, NetML employs an adaptive learning procedure based on Newton method that updates the parameters on a perfeature basis. Extensive experiments on 355 real bugs from seven software systems have been conducted to benchmark NetML against various stateoftheart localization methods. The results show that NetML surpasses the bestperforming baseline by 31.82%, 22.35%, 19.72%, and 19.24%, in terms of the number of bugs successfully localized when a developer inspects the top 1, 5, and 10 methods and Mean Average Precision (MAP), respectively. 
Neumann Network  Many challenging image processing tasks can be described by an illposed linear inverse problem: deblurring, deconvolution, inpainting, compressed sensing, and superresolution all lie in this framework. Traditional inverse problem solvers minimize a cost function consisting of a datafit term, which measures how well an image matches the observations, and a regularizer, which reflects prior knowledge and promotes images with desirable properties like smoothness. Recent advances in machine learning and image processing have illustrated that it is often possible to learn a regularizer from training data that can outperform more traditional regularizers. We present an endtoend, datadriven method of solving inverse problems inspired by the Neumann series, which we call a Neumann network. Rather than unroll an iterative optimization algorithm, we truncate a Neumann series which directly solves the linear inverse problem with a datadriven nonlinear regularizer. The Neumann network architecture outperforms traditional inverse problem solution methods, modelfree deep learning approaches, and stateoftheart unrolled iterative methods on standard datasets. Finally, when the images belong to a union of subspaces and under appropriate assumptions on the forward model, we prove there exists a Neumann network configuration that wellapproximates the optimal oracle estimator for the inverse problem and demonstrate empirically that the trained Neumann network has the form predicted by theory. 
Neumann Optimizer  Progress in deep learning is slowed by the days or weeks it takes to train large models. The natural solution of using more hardware is limited by diminishing returns, and leads to inefficient use of additional resources. In this paper, we present a large batch, stochastic optimization algorithm that is both faster than widely used algorithms for fixed amounts of computation, and also scales up substantially better as more computational resources become available. Our algorithm implicitly computes the inverse Hessian of each minibatch to produce descent directions; we do so without either an explicit approximation to the Hessian or Hessianvector products. We demonstrate the effectiveness of our algorithm by successfully training large ImageNet models (InceptionV3, Resnet50, Resnet101 and InceptionResnetV2) with minibatch sizes of up to 32000 with no loss in validation error relative to current baselines, and no increase in the total number of steps. At smaller minibatch sizes, our optimizer improves the validation error in these models by 0.80.9%. Alternatively, we can trade off this accuracy to reduce the number of training steps needed by roughly 1030%. Our work is practical and easily usable by others — only one hyperparameter (learning rate) needs tuning, and furthermore, the algorithm is as computationally cheap as the commonly used Adam optimizer. 
NeuNetS  Application of neural networks to a vast variety of practical applications is transforming the way AI is applied in practice. Pretrained neural network models available through APIs or capability to custom train prebuilt neural network architectures with customer data has made the consumption of AI by developers much simpler and resulted in broad adoption of these complex AI models. While prebuilt network models exist for certain scenarios, to try and meet the constraints that are unique to each application, AI teams need to think about developing custom neural network architectures that can meet the tradeoff between accuracy and memory footprint to achieve the tight constraints of their unique usecases. However, only a small proportion of data science teams have the skills and experience needed to create a neural network from scratch, and the demand far exceeds the supply. In this paper, we present NeuNetS : An automated Neural Network Synthesis engine for custom neural network design that is available as part of IBM’s AI OpenScale’s product. NeuNetS is available for both Text and Image domains and can build neural networks for specific tasks in a fraction of the time it takes today with human effort, and with accuracy similar to that of humandesigned AI models. 
Neural Architecture Optimization (NAO) 
Automatic neural architecture design has shown its potential in discovering powerful neural network architectures. Existing methods, no matter based on reinforcement learning or evolutionary algorithms (EA), conduct architecture search in a discrete space, which is highly inefficient. In this paper, we propose a simple and efficient method to automatic neural architecture design based on continuous optimization. We call this new approach neural architecture optimization (NAO). There are three key components in our proposed approach: (1) An encoder embeds/maps neural network architectures into a continuous space. (2) A predictor takes the continuous representation of a network as input and predicts its accuracy. (3) A decoder maps a continuous representation of a network back to its architecture. The performance predictor and the encoder enable us to perform gradient based optimization in the continuous space to find the embedding of a new architecture with potentially better accuracy. Such a better embedding is then decoded to a network by the decoder. Experiments show that the architecture discovered by our method is very competitive for image classification task on CIFAR10 and language modeling task on PTB, outperforming or on par with the best results of previous architecture search methods with a significantly reduction of computational resources. Specifically we obtain $2.07\%$ test set error rate for CIFAR10 image classification task and $55.9$ test set perplexity of PTB language modeling task. The best discovered architectures on both tasks are successfully transferred to other tasks such as CIFAR100 and WikiText2. 
Neural Architecture Search (NAS) 
Neural Architecture Search (NAS) is a laborious process. Prior work on automated NAS targets mainly on improving accuracy, but lacks consideration of computational resource use. We propose the ResourceEfficient Neural Architect (RENA), an efficient resourceconstrained NAS using reinforcement learning with network embedding. RENA uses a policy network to process the network embeddings to generate new configurations. We demonstrate RENA on image recognition and keyword spotting (KWS) problems. RENA can find novel architectures that achieve high performance even with tight resource constraints. For CIFAR10, it achieves 2.95% test error when compute intensity is greater than 100 FLOPs/byte, and 3.87% test error when model size is less than 3M parameters. For Google Speech Commands Dataset, RENA achieves the stateoftheart accuracy without resource constraints, and it outperforms the optimized architectures with tight resource constraints. 
Neural Attentive Interpretable Recommendation System (NAIRS) 
In this paper, we develop a neural attentive interpretable recommendation system, named NAIRS. A selfattention network, as a key component of the system, is designed to assign attention weights to interacted items of a user. This attention mechanism can distinguish the importance of the various interacted items in contributing to a user profile. Based on the user profiles obtained by the selfattention network, NAIRS offers personalized highquality recommendation. Moreover, it develops visual cues to interpret recommendations. This demo application with the implementation of NAIRS enables users to interact with a recommendation system, and it persistently collects training data to improve the system. The demonstration and experimental results show the effectiveness of NAIRS. 
Neural Attentive Item Similarity Model (NAIS) 
Itemtoitem collaborative filtering (aka. itembased CF) has been long used for building recommender systems in industrial settings, owing to its interpretability and efficiency in realtime personalization. It builds a user’s profile as her historically interacted items, recommending new items that are similar to the user’s profile. As such, the key to an itembased CF method is in the estimation of item similarities. Early approaches use statistical measures such as cosine similarity and Pearson coefficient to estimate item similarities, which are less accurate since they lack tailored optimization for the recommendation task. In recent years, several works attempt to learn item similarities from data, by expressing the similarity as an underlying model and estimating model parameters by optimizing a recommendationaware objective function. While extensive efforts have been made to use shallow linear models for learning item similarities, there has been relatively less work exploring nonlinear neural network models for itembased CF. In this work, we propose a neural network model named Neural Attentive Item Similarity model (NAIS) for itembased CF. The key to our design of NAIS is an attention network, which is capable of distinguishing which historical items in a user profile are more important for a prediction. Compared to the stateoftheart itembased CF method Factored Item Similarity Model (FISM), our NAIS has stronger representation power with only a few additional parameters brought by the attention network. Extensive experiments on two public benchmarks demonstrate the effectiveness of NAIS. This work is the first attempt that designs neural network models for itembased CF, opening up new research possibilities for future developments of neural recommender systems. 
Neural Attentive KnowledgeBased Model (NAK) 
Grade prediction for future courses not yet taken by students is important as it can help them and their advisers during the process of course selection as well as for designing personalized degree plans and modifying them based on their performance. One of the successful approaches for accurately predicting a student’s grades in future courses is Cumulative Knowledgebased Regression Models (CKRM). CKRM learns shallow linear models that predict a student’s grades as the similarity between his/her knowledge state and the target course. A student’s knowledge state is built by linearly accumulating the learned provided knowledge components of the courses he/she has taken in the past, weighted by his/her grades in them. However, not all the prior courses contribute equally to the target course. In this paper, we propose a novel Neural Attentive Knowledgebased model (NAK) that learns the importance of each historical course in predicting the grade of a target course. Compared to CKRM and other competing approaches, our experiments on a large realworld dataset consisting of $\sim$1.5 grades show the effectiveness of the proposed NAK model in accurately predicting the students’ grades. Moreover, the attention weights learned by the model can be helpful in better designing their degree plans. 
Neural Autoregressive Flow  Normalizing flows and autoregressive models have been successfully combined to produce stateoftheart results in density estimation, via Masked Autoregressive Flows (MAF), and to accelerate stateoftheart WaveNetbased speech synthesis to 20x faster than realtime, via Inverse Autoregressive Flows (IAF). We unify and generalize these approaches, replacing the (conditionally) affine univariate transformations of MAF/IAF with a more general class of invertible univariate transformations expressed as monotonic neural networks. We demonstrate that the proposed neural autoregressive flows (NAF) are universal approximators for continuous probability distributions, and their greater expressivity allows them to better capture multimodal target distributions. Experimentally, NAF yields stateoftheart performance on a suite of density estimation tasks and outperforms IAF in variational autoencoders trained on binarized MNIST. Block Neural Autoregressive Flow 
Neural Basis Expansion Analysis for Interpretable Time Series Forecasting (NBEATS) 
We focus on solving the univariate times series point forecasting problem using deep learning. We propose a deep neural architecture based on backward and forward residual links and a very deep stack of fullyconnected layers. The architecture has a number of desirable properties, being interpretable, applicable without modification to a wide array of target domains, and fast to train. We test the proposed architecture on the wellknown M4 competition dataset containing 100k time series from diverse domains. We demonstrate stateoftheart performance for two configurations of NBEATS, improving forecast accuracy by 11% over a statistical benchmark and by 3% over last year’s winner of the M4 competition, a domainadjusted handcrafted hybrid between neural network and statistical time series models. The first configuration of our model does not employ any timeseriesspecific components and its performance on the M4 dataset strongly suggests that, contrarily to received wisdom, deep learning primitives such as residual blocks are by themselves sufficient to solve a wide range of forecasting problems. Finally, we demonstrate how the proposed architecture can be augmented to provide outputs that are interpretable without loss in accuracy. 
Neural Block Sampling  Efficient Monte Carlo inference often requires manual construction of modelspecific proposals. We propose an approach to automated proposal construction by training neural networks to provide fast approximations to block Gibbs conditionals. The learned proposals generalize to occurrences of common structural motifs both within a given model and across models, allowing for the construction of a library of learned inference primitives that can accelerate inference on unseen models with no modelspecific training required. We explore several applications including openuniverse Gaussian mixture models, in which our learned proposals outperform a handtuned sampler, and a realworld named entity recognition task, in which our sampler’s ability to escape local modes yields higher final F1 scores than singlesite Gibbs. 
Neural Collaborative Filtering (NCF) 
In recent years, deep neural networks have yielded immense success on speech recognition, computer vision and natural language processing. However, the exploration of deep neural networks on recommender systems has received relatively less scrutiny. In this work, we strive to develop techniques based on neural networks to tackle the key problem in recommendation — collaborative filtering — on the basis of implicit feedback. Although some recent work has employed deep learning for recommendation, they primarily used it to model auxiliary information, such as textual descriptions of items and acoustic features of musics. When it comes to model the key factor in collaborative filtering — the interaction between user and item features, they still resorted to matrix factorization and applied an inner product on the latent features of users and items. By replacing the inner product with a neural architecture that can learn an arbitrary function from data, we present a general framework named NCF, short for Neural networkbased Collaborative Filtering. NCF is generic and can express and generalize matrix factorization under its framework. To supercharge NCF modelling with nonlinearities, we propose to leverage a multilayer perceptron to learn the useritem interaction function. Extensive experiments on two realworld datasets show significant improvements of our proposed NCF framework over the stateoftheart methods. Empirical evidence shows that using deeper layers of neural networks offers better recommendation performance. 
Neural Collaborative Subspace Clustering  We introduce the Neural Collaborative Subspace Clustering, a neural model that discovers clusters of data points drawn from a union of lowdimensional subspaces. In contrast to previous attempts, our model runs without the aid of spectral clustering. This makes our algorithm one of the kinds that can gracefully scale to large datasets. At its heart, our neural model benefits from a classifier which determines whether a pair of points lies on the same subspace or not. Essential to our model is the construction of two affinity matrices, one from the classifier and the other from a notion of subspace selfexpressiveness, to supervise training in a collaborative scheme. We thoroughly assess and contrast the performance of our model against various stateoftheart clustering algorithms including deep subspacebased ones. 
Neural Collective Entity Linking (NCEL) 
Entity Linking aims to link entity mentions in texts to knowledge bases, and neural models have achieved recent success in this task. However, most existing methods rely on local contexts to resolve entities independently, which may usually fail due to the data sparsity of local information. To address this issue, we propose a novel neural model for collective entity linking, named as NCEL. NCEL applies Graph Convolutional Network to integrate both local contextual features and global coherence information for entity linking. To improve the computation efficiency, we approximately perform graph convolution on a subgraph of adjacent entity mentions instead of those in the entire text. We further introduce an attention scheme to improve the robustness of NCEL to data noise and train the model on Wikipedia hyperlinks to avoid overfitting and domain bias. In experiments, we evaluate NCEL on five publicly available datasets to verify the linking performance as well as generalization ability. We also conduct an extensive analysis of time complexity, the impact of key modules, and qualitative results, which demonstrate the effectiveness and efficiency of our proposed method. 
Neural Component Analysis (NCA) 
Principal component analysis (PCA) is largely adopted for chemical process monitoring and numerous PCAbased systems have been developed to solve various fault detection and diagnosis problems. Since PCAbased methods assume that the monitored process is linear, nonlinear PCA models, such as autoencoder models and kernel principal component analysis (KPCA), has been proposed and applied to nonlinear process monitoring. However, KPCAbased methods need to perform eigendecomposition (ED) on the kernel Gram matrix whose dimensions depend on the number of training data. Moreover, prefixed kernel parameters cannot be most effective for different faults which may need different parameters to maximize their respective detection performances. Autoencoder models lack the consideration of orthogonal constraints which is crucial for PCAbased algorithms. To address these problems, this paper proposes a novel nonlinear method, called neural component analysis (NCA), which intends to train a feedforward neural work with orthogonal constraints such as those used in PCA. NCA can adaptively learn its parameters through backpropagation and the dimensionality of the nonlinear features has no relationship with the number of training samples. Extensive experimental results on the Tennessee Eastman (TE) benchmark process show the superiority of NCA in terms of missed detection rate (MDR) and false alarm rate (FAR). The source code of NCA can be found in https://…/NeuralComponentAnalysis.git. 
Neural Decision Trees  In this paper we propose a synergistic melting of neural networks and decision trees into a deep hashing neural network (HNN) having a modeling capability exponential with respect to its number of neurons. We first derive a soft decision tree named neural decision tree allowing the optimization of arbitrary decision function at each split node. We then rewrite this soft space partitioning as a new kind of neural network layer, namely the hashing layer (HL), which can be seen as a generalization of the known softmax layer. This HL can easily replace the standard last layer of ANN in any known network topology and thus can be used after a convolutional or recurrent neural network for example. We present the modeling capacity of this deep hashing function on small datasets where one can reach at least equally good results as standard neural networks by diminishing the number of output neurons. Finally, we show that for the case where the number of output neurons is large, the neural network can mitigate the absence of linear decision boundaries by learning for each difficult class a collection of not necessarily connected subregions of the space leading to more flexible decision surfaces. Finally, the HNN can be seen as a deep locality sensitive hashing function which can be trained in a supervised or unsupervised setting as we will demonstrate for classification and regression problems. 
Neural Decomposition (ND) 
We present a neural network technique for the analysis and extrapolation of timeseries data called Neural Decomposition (ND). Units with a sinusoidal activation function are used to perform a Fourierlike decomposition of training samples into a sum of sinusoids, augmented by units with nonperiodic activation functions to capture linear trends and other nonperiodic components. We show how careful weight initialization can be combined with regularization to form a simple model that generalizes well. Our method generalizes effectively on the MackeyGlass series, a dataset of unemployment rates as reported by the U.S. Department of Labor Statistics, a timeseries of monthly international airline passengers, the monthly ozone concentration in downtown Los Angeles, and an unevenly sampled timeseries of oxygen isotope measurements from a cave in north India. We find that ND outperforms popular timeseries forecasting techniques including LSTM, echo state networks, ARIMA, SARIMA, SVR with a radial basis function, and Gashler and Ashmore’s model. 
Neural Differential Equations  DiffEqFlux.jl is a library for fusing neural networks and differential equations. In this work we describe differential equations from the viewpoint of data science and discuss the complementary nature between machine learning models and differential equations. We demonstrate the ability to incorporate DifferentialEquations.jldefined differential equation problems into a Fluxdefined neural network, and vice versa. The advantages of being able to use the entire DifferentialEquations.jl suite for this purpose is demonstrated by counter examples where simple integration strategies fail, but the sophisticated integration strategies provided by the DifferentialEquations.jl library succeed. This is followed by a demonstration of delay differential equations and stochastic differential equations inside of neural networks. We show highlevel functionality for defining neural ordinary differential equations (neural networks embedded into the differential equation) and describe the extra models in the Flux model zoo which includes neural stochastic differential equations. We conclude by discussing the various adjoint methods used for backpropogation of the differential equation solvers. DiffEqFlux.jl is an important contribution to the area, as it allows the full weight of the differential equation solvers developed from decades of research in the scientific computing field to be readily applied to the challenges posed by machine learning and data science. 
Neural Educational Recommendation Engine (NERE) 
Quizlet is the most popular online learning tool in the United States, and is used by over 2/3 of high school students, and 1/2 of college students. With more than 95% of Quizlet users reporting improved grades as a result, the platform has become the defacto tool used in millions of classrooms. In this paper, we explore the task of recommending suitable content for a student to study, given their prior interests, as well as what their peers are studying. We propose a novel approach, i.e. Neural Educational Recommendation Engine (NERE), to recommend educational content by leveraging student behaviors rather than ratings. We have found that this approach better captures social factors that are more aligned with learning. NERE is based on a recurrent neural network that includes collaborative and contentbased approaches for recommendation, and takes into account any particular student’s speed, mastery, and experience to recommend the appropriate task. We train NERE by jointly learning the user embeddings and content embeddings, and attempt to predict the content embedding for the final timestamp. We also develop a confidence estimator for our neural network, which is a crucial requirement for productionizing this model. We apply NERE to Quizlet’s proprietary dataset, and present our results. We achieved an R^2 score of 0.81 in the content embedding space, and a recall score of 54% on our 100 nearest neighbors. This vastly exceeds the recall@100 score of 12% that a standard matrixfactorization approach provides. We conclude with a discussion on how NERE will be deployed, and position our work as one of the first educational recommender systems for the K12 space. 
Neural Empirical Bayes  We formulate a novel framework that unifies kernel density estimation and empirical Bayes, where we address a broad set of problems in unsupervised learning with a geometric interpretation rooted in the concentration of measure phenomenon. We start by energy estimation based on a denoising objective which recovers the original/clean data X from its measured/noisy version Y with empirical Bayes least squares estimator. The setup is rooted in kernel density estimation, but the logpdf in Y is parametrized with a neural network, and crucially, the learning objective is derived for any level of noise/kernel bandwidth. Learning is efficient with double backpropagation and stochastic gradient descent. An elegant physical picture emerges of an interacting system of highdimensional spheres around each data point, together with a globallydefined probability flow field. The picture is powerful: it leads to a novel sampling algorithm, a new notion of associative memory, and it is instrumental in designing experiments. We start with extreme denoising experiments. Walkjump sampling is defined by Langevin MCMC walks in Y, along with asynchronous empirical Bayes jumps to X. Robbins associative memory is defined by a deterministic flow to attractors of the learned probability flow field. Finally, we observed the emergence of remarkably rich creative modes in the regime of highly overlapping spheres. 
Neural Error Correcting and Source Trimming (NECST) 
For reliable transmission across a noisy communication channel, classical results from information theory show that it is asymptotically optimal to separate out the source and channel coding processes. However, this decomposition can fall short in the finite bitlength regime, as it requires nontrivial tuning of handcrafted codes and assumes infinite computational power for decoding. In this work, we propose Neural Error Correcting and Source Trimming (\modelname) codes to jointly learn the encoding and decoding processes in an endtoend fashion. By adding noise into the latent codes to simulate the channel during training, we learn to both compress and errorcorrect given a fixed bitlength and computational budget. We obtain codes that are not only competitive against several capacityapproaching channel codes, but also learn useful robust representations of the data for downstream tasks such as classification. Finally, we learn an extremely fast neural decoder, yielding almost an order of magnitude in speedup compared to standard decoding methods based on iterative belief propagation. 
Neural ExplorationExploitation Tree (NEXT) 
Samplingbased algorithms such as RRT and its variants are powerful tools for path planning problems in highdimensional continuous state and action spaces. While these algorithms perform systematic exploration of the state space, they do not fully exploit past planning experiences from similar environments. In this paper, we design a meta path planning algorithm, called \emph{Neural ExplorationExploitation Trees} (NEXT), which can exploit past experience to drastically reduce the sample requirement for solving new path planning problems. More specifically, NEXT contains a novel neural architecture which can learn from experiences the dependency between task structures and promising path search directions. Then this learned prior is integrated with a UCBtype algorithm to achieve an online balance between \emph{exploration} and \emph{exploitation} when solving a new problem. Empirically, we show that NEXT can complete the planning tasks with very small searching trees and significantly outperforms previous stateofthearts on several benchmark problems. 
Neural Graph Collaborative Filtering (NGCF) 
Learning vector representations (aka. embeddings) of users and items lies at the core of modern recommender systems. Ranging from early matrix factorization to recently emerged deep learning based methods, existing efforts typically obtain a user’s (or an item’s) embedding by mapping from preexisting features that describe the user (or the item), such as ID and attributes. We argue that an inherent drawback of such methods is that, the collaborative signal, which is latent in useritem interactions, is not encoded in the embedding process. As such, the resultant embeddings may not be sufficient to capture the collaborative filtering effect. In this work, we propose to integrate the useritem interactions — more specifically the bipartite graph structure — into the embedding process. We develop a new recommendation framework Neural Graph Collaborative Filtering (NGCF), which exploits the useritem graph structure by propagating embeddings on it. This leads to the expressive modeling of highorder connectivity in useritem graph, effectively injecting the collaborative signal into the embedding process in an explicit manner. We conduct extensive experiments on three public benchmarks, demonstrating significant improvements over several stateoftheart models like HOPRec and Collaborative Memory Network. Further analysis verifies the importance of embedding propagation for learning better user and item representations, justifying the rationality and effectiveness of NGCF. Codes are available at https://…/neural_graph_collaborative_filtering. 
Neural Hawkes Process  Many events occur in the world. Some event types are stochastically excited or inhibited—in the sense of having their probabilities elevated or decreased—by patterns in the sequence of previous events. Discovering such patterns can help us predict which type of event will happen next and when. Learning such structure should benefit various applications, including medical prognosis, consumer behavior, and social media activity prediction. We propose to model streams of discrete events in continuous time, by constructing a neurally selfmodulating multivariate point process. This generative model allows past events to influence the future in complex ways, by conditioning future event intensities on the hidden state of a recurrent neural network that has consumed the stream of past events. We evaluate our model on multiple datasets and show that it significantly outperforms other strong baselines. 
Neural Inference Network (NIN) 
Neural networks have been learning complex multihop reasoning in various domains. One such formal setting for reasoning, logic, provides a challenging case for neural networks. In this article, we propose a Neural Inference Network (NIN) for learning logical inference over classes of logic programs. Trained in an endtoend fashion NIN learns representations of normal logic programs, by processing them at a character level, and the reasoning algorithm for checking whether a logic program entails a given query. We define 12 classes of logic programs that exemplify increased level of complexity of the inference process (multihop and default reasoning) and show that our NIN passes 10 out of the 12 tasks. We also analyse the learnt representations of logic programs that NIN uses to perform the logical inference. 
Neural Information Processing  Neural information processing is an interdisciplinary subject, and the merging interaction between neuroscience and mathematics, physics, as well as information science plays a key role in the development of this field. 
Neural Jump Stochastic Differential Equation  Many time series can be effectively modeled with a combination of continuous flows along with random jumps sparked by discrete events. However, we usually do not have the equation of motion describing the flows, or how they are affected by jumps. To this end, we introduce Neural Jump Stochastic Differential Equations that provide a datadriven approach to learn continuous and discrete dynamic behavior, i.e., hybrid systems that both flow and jump. Our approach extends the framework of Neural Ordinary Differential Equations with a stochastic process term that models discrete events. We then model temporal point processes with a piecewisecontinuous latent trajectory, where stochastic events cause an abrupt change in the latent variables. We demonstrate the predictive capabilities of our model on a range of synthetic and realworld marked point process datasets, including classical point processes such as Hawkes processes, medical records, awards on Stack Overflow, and earthquake monitoring. 
Neural Lattice Decoder  Lattice decoders constructed with neural networks are presented. Firstly, we show how the fundamental parallelotope is used as a compact set for the approximation by a neural lattice decoder. Secondly, we introduce the notion of Voronoireduced lattice basis. As a consequence, a first optimal neural lattice decoder is built from Boolean equations and the facets of the Voronoi region. This decoder needs no learning. Finally, we present two neural decoders with learning. It is shown that L1 regularization and a priori information about the lattice structure lead to a simplification of the model. 
Neural Lattice Language Model  In this work, we propose a new language modeling paradigm that has the ability to perform both prediction and moderation of information flow at multiple granularities: neural lattice language models. These models construct a lattice of possible paths through a sentence and marginalize across this lattice to calculate sequence probabilities or optimize parameters. This approach allows us to seamlessly incorporate linguistic intuitions – including polysemy and existence of multiword lexical items – into our language model. Experiments on multiple language modeling tasks show that English neural lattice language models that utilize polysemous embeddings are able to improve perplexity by 9.95% relative to a wordlevel baseline, and that a Chinese model that handles multicharacter tokens is able to improve perplexity by 20.94% relative to a characterlevel baseline. 
Neural Logic Machine (NLM) 
We propose the Neural Logic Machine (NLM), a neuralsymbolic architecture for both inductive learning and logic reasoning. NLMs exploit the power of both neural networks—as function approximators, and logic programming—as a symbolic processor for objects with properties, relations, logic connectives, and quantifiers. After being trained on smallscale tasks (such as sorting short arrays), NLMs can recover lifted rules, and generalize to largescale tasks (such as sorting longer arrays). In our experiments, NLMs achieve perfect generalization in a number of tasks, from relational reasoning tasks on the family tree and general graphs, to decision making tasks including sorting arrays, finding shortest paths, and playing the blocks world. Most of these tasks are hard to accomplish for neural networks or inductive logic programming alone. 
Neural Logic Reinforcement Learning (NLRL) 
Deep reinforcement learning (DRL) has achieved significant breakthroughs in various tasks. However, most DRL algorithms suffer a problem of generalizing the learned policy which makes the learning performance largely affected even by minor modifications of the training environment. Except that, the use of deep neural networks makes the learned policies hard to be interpretable. To address these two challenges, we propose a novel algorithm named Neural Logic Reinforcement Learning (NLRL) to represent the policies in reinforcement learning by firstorder logic. NLRL is based on policy gradient methods and differentiable inductive logic programming that have demonstrated significant advantages in terms of interpretability and generalisability in supervised tasks. Extensive experiments conducted on cliffwalking and blocks manipulation tasks demonstrate that NLRL can induce interpretable policies achieving nearoptimal performance while demonstrating good generalisability to environments of different initial states and problem sizes. 
Neural Machine Translation (NMT) 
Neural machine translation (NMT) is the approach to machine translation in which a large neural network is trained to maximize translation performance. It is a radical departure from the phrasebased statistical translation approaches, in which a translation system consists of subcomponents that are separately optimized. The artificial neural network (ANN) is a model inspired by the functional aspects and structure of the brain’s biological neural networks. With use of ANN, it is possible to execute a number of tasks, such as classification, clustering, and prediction, using machine learning techniques like supervised or reinforced learning to learn or adjust net connections. A bidirectional recurrent neural network (RNN), known as an encoder, is used by the neural network to encode a source sentence for a second RNN, known as a decoder, that is used to predict words in the target language. NMT models are inspired by deep representation learning. They require only a fraction of the memory needed by traditional statistical machine translation (SMT) models. Furthermore, unlike conventional translation systems, each and every component of the neural translation model is trained jointly to maximize the translation performance. When a new neural network is created, it is trained for certain domains or applications. Once an automatic learning mechanism is established, the network practices. With time it starts operating according to its own judgment, turning into an ‘expert’. 
Neural MultiTask Recommendation (NMTR) 
Most existing recommender systems leverage the data of one type of user behaviors only, such as the purchase behavior in Ecommerce that is directly related to the business KPI (Key Performance Indicator) of conversion rate. Besides the key behavioral data, we argue that other forms of user behaviors also provide valuable signal on a user’s preference, such as views, clicks, adding a product to shop carts and so on. They should be taken into account properly to provide quality recommendation for users. In this work, we contribute a novel solution named NMTR (short for Neural MultiTask Recommendation) for learning recommender systems from multiple types of user behaviors. We develop a neural network model to capture the complicated and multitype interactions between users and items. In particular, our model accounts for the cascading relationship among behaviors (e.g., a user must click on a product before purchasing it). To fully exploit the signal in the data of multiple types of behaviors, we perform a joint optimization based on the multitask learning framework, where the optimization on a behavior is treated as a task. Extensive experiments on two realworld datasets demonstrate that NMTR significantly outperforms stateoftheart recommender systems that are designed to learn from both singlebehavior data and multibehavior data. Further analysis shows that modeling multiple behaviors is particularly useful for providing recommendation for sparse users that have very few interactions. 
Neural Nearest Neighbors Block (N3 Block) 
Nonlocal methods exploiting the selfsimilarity of natural signals have been well studied, for example in image analysis and restoration. Existing approaches, however, rely on knearest neighbors (KNN) matching in a fixed feature space. The main hurdle in optimizing this feature space w.r.t. application performance is the nondifferentiability of the KNN selection rule. To overcome this, we propose a continuous deterministic relaxation of KNN selection that maintains differentiability w.r.t. pairwise distances, but retains the original KNN as the limit of a temperature parameter approaching zero. To exploit our relaxation, we propose the neural nearest neighbors block (N3 block), a novel nonlocal processing layer that leverages the principle of selfsimilarity and can be used as building block in modern neural network architectures. We show its effectiveness for the set reasoning task of correspondence classification as well as for image restoration, including image denoising and single image superresolution, where we outperform strong convolutional neural network (CNN) baselines and recent nonlocal models that rely on KNN selection in handchosen features spaces. 
Neural Nearest Neighbors Network  Nonlocal methods exploiting the selfsimilarity of natural signals have been well studied, for example in image analysis and restoration. Existing approaches, however, rely on knearest neighbors (KNN) matching in a fixed feature space. The main hurdle in optimizing this feature space w.r.t. application performance is the nondifferentiability of the KNN selection rule. To overcome this, we propose a continuous deterministic relaxation of KNN selection that maintains differentiability w.r.t. pairwise distances, but retains the original KNN as the limit of a temperature parameter approaching zero. To exploit our relaxation, we propose the neural nearest neighbors block (N3 block), a novel nonlocal processing layer that leverages the principle of selfsimilarity and can be used as building block in modern neural network architectures. We show its effectiveness for the set reasoning task of correspondence classification as well as for image restoration, including image denoising and single image superresolution, where we outperform strong convolutional neural network (CNN) baselines and recent nonlocal models that rely on KNN selection in handchosen features spaces. 
Neural Net Topology Profiler (NTP) 
Performance of endtoend neural networks on a given hardware platform is a function of its compute and memory signature, which inturn, is governed by a wide range of parameters such as topology size, primitives used, framework used, batching strategy, latency requirements, precision etc. Current benchmarking tools suffer from limitations such as a) being either too granular like DeepBench (or) b) mandate a working implementation that is either framework specific or hardwarearchitecture specific (or) c) provide only high level benchmark metrics. In this paper, we present NTP (Neural Net Topology Profiler), a sophisticated benchmarking framework, to effectively identify memory and compute signature of an endtoend topology on multiple hardware architectures, without the need to actually implement the topology in a framework. NTP is tightly integrated with hardware specific benchmark tools to enable exhaustive data collection and analysis. Using NTP, a deep learning researcher can quickly establish baselines needed to understand performance of an endtoend neural network topology and make high level architectural decisions based on optimization techniques like layer sizing, quantization, pruning etc. Further, integration of NTP with frameworks like Tensorflow, Pytorch, Intel OpenVINO etc. allows for performance comparison along several vectors like a) Comparison of different frameworks on a given hardware b) Comparison of different hardware using a given framework c) Comparison across different heterogeneous hardware configurations for given framework etc. These capabilities empower a researcher to effortlessly make architectural decisions needed for achieving optimized performance on any hardware platform. The paper documents the architectural approach of NTP and demonstrates the capabilities of the tool by benchmarking Mozilla DeepSpeech, a popular Speech Recognition topology. 
Neural Network based Collaborative Ranking (NCR) 
Recommender systems are aimed at generating a personalized ranked list of items that an end user might be interested in. With the unprecedented success of deep learning in computer vision and speech recognition, recently it has been a hot topic to bridge the gap between recommender systems and deep neural network. And deep learning methods have been shown to achieve stateoftheart on many recommendation tasks. For example, a recent model, NeuMF, first projects users and items into some shared lowdimensional latent feature space, and then employs neural nets to model the interaction between the user and item latent features to obtain stateoftheart performance on the recommendation tasks. NeuMF assumes that the noninteracted items are inherent negative and uses negative sampling to relax this assumption. In this paper, we examine an alternative approach which does not assume that the noninteracted items are necessarily negative, just that they are less preferred than interacted items. Specifically, we develop a new classification strategy based on the widely used pairwise ranking assumption. We combine our classification strategy with the recently proposed neural collaborative filtering framework, and propose a general collaborative ranking framework called Neural Network based Collaborative Ranking (NCR). We resort to a neural network architecture to model a user’s pairwise preference between items, with the belief that neural network will effectively capture the latent structure of latent factors. The experimental results on two realworld datasets show the superior performance of our models in comparison with several stateoftheart approaches. 
Neural Network Encapsulation  A capsule is a collection of neurons which represents different variants of a pattern in the network. The routing scheme ensures only certain capsules which resemble lower counterparts in the higher layer should be activated. However, the computational complexity becomes a bottleneck for scaling up to larger networks, as lower capsules need to correspond to each and every higher capsule. To resolve this limitation, we approximate the routing process with two branches: a master branch which collects primary information from its direct contact in the lower layer and an aide branch that replenishes master based on pattern variants encoded in other lower capsules. Compared with previous iterative and unsupervised routing scheme, these two branches are communicated in a fast, supervised and onetime pass fashion. The complexity and runtime of the model are therefore decreased by a large margin. Motivated by the routing to make higher capsule have agreement with lower capsule, we extend the mechanism as a compensation for the rapid loss of information in nearby layers. We devise a feedback agreement unit to send back higher capsules as feedback. It could be regarded as an additional regularization to the network. The feedback agreement is achieved by comparing the optimal transport divergence between two distributions (lower and higher capsules). Such an addon witnesses a unanimous gain in both capsule and vanilla networks. Our proposed EncapNet performs favorably better against previous stateofthearts on CIFAR10/100, SVHN and a subset of ImageNet. 
Neural Network Exchange Format (NNEF) 
NNEF reduces machine learning deployment fragmentation by enabling a rich mix of neural network training tools and inference engines to be used by applications across a diverse range of devices and platforms. The goal of NNEF is to enable data scientists and engineers to easily transfer trained networks from their chosen training framework into a wide variety of inference engines. A stable, flexible and extensible standard that equipment manufacturers can rely on is critical for the widespread deployment of neural networks onto edge devices, and so NNEF encapsulates a complete description of the structure, operations and parameters of a trained neural network, independent of the training tools used to produce it and the inference engine used to execute it. 
Neural Network Quantization  Defensive Quantization: When Efficiency Meets Robustness 
Neural Network Quine  Selfreplication is a key aspect of biological life that has been largely overlooked in Artificial Intelligence systems. Here we describe how to build and train selfreplicating neural networks. The network replicates itself by learning to output its own weights. The network is designed using a loss function that can be optimized with either gradientbased or nongradientbased methods. We also describe a method we call regeneration to train the network without explicit optimization, by injecting the network with predictions of its own parameters. The best solution for a selfreplicating network was found by alternating between regeneration and optimization steps. Finally, we describe a design for a selfreplicating neural network that can solve an auxiliary task such as MNIST image classification. We observe that there is a tradeoff between the network’s ability to classify images and its ability to replicate, but training is biased towards increasing its specialization at image classification at the expense of replication. This is analogous to the tradeoff between reproduction and other tasks observed in nature. We suggest that a selfreplication mechanism for artificial intelligence is useful because it introduces the possibility of continual improvement through natural selection. 
Neural Network Synthesis Tool (NeST) 
Neural networks (NNs) have begun to have a pervasive impact on various applications of machine learning. However, the problem of finding an optimal NN architecture for large applications has remained open for several decades. Conventional approaches search for the optimal NN architecture through extensive trialanderror. Such a procedure is quite inefficient. In addition, the generated NN architectures incur substantial redundancy. To address these problems, we propose an NN synthesis tool (NeST) that automatically generates very compact architectures for a given dataset. NeST starts with a seed NN architecture. It iteratively tunes the architecture with gradientbased growth and magnitudebased pruning of neurons and connections. Our experimental results show that NeST yields accurate yet very compact NNs with a wide range of seed architecture selection. For example, for the LeNet300100 (LeNet5) NN architecture derived from the MNIST dataset, we reduce network parameters by 34.1x (74.3x) and floatingpoint operations (FLOPs) by 35.8x (43.7x). For the AlexNet NN architecture derived from the ImageNet dataset, we reduce network parameters by 15.7x and FLOPs by 4.6x. All these results are the current stateoftheart for these architectures. 
Neural Networks / Artificial Neural Networks (ANN) 
In computer science and related fields, artificial neural networks (ANNs) are computational models inspired by an animal’s central nervous systems (in particular the brain) which is capable of machine learning as well as pattern recognition. Artificial neural networks are generally presented as systems of interconnected “neurons” which can compute values from inputs. neural,neuralnet 
Neural Optimizer (Neo) 
Query optimization is one of the most challenging problems in database systems. Despite the progress made over the past decades, query optimizers remain extremely complex components that require a great deal of handtuning for specific workloads and datasets. Motivated by this shortcoming and inspired by recent advances in applying machine learning to data management challenges, we introduce Neo (Neural Optimizer), a novel learningbased query optimizer that relies on deep neural networks to generate query executions plans. Neo bootstraps its query optimization model from existing optimizers and continues to learn from incoming queries, building upon its successes and learning from its failures. Furthermore, Neo naturally adapts to underlying data patterns and is robust to estimation errors. Experimental results demonstrate that Neo, even when bootstrapped from a simple optimizer like PostgreSQL, can learn a model that offers similar performance to stateoftheart commercial optimizers, and in some cases even surpass them. 
Neural Ordinary Differential Equation  We introduce a new family of deep neural network models. Instead of specifying a discrete sequence of hidden layers, we parameterize the derivative of the hidden state using a neural network. The output of the network is computed using a blackbox differential equation solver. These continuousdepth models have constant memory cost, adapt their evaluation strategy to each input, and can explicitly trade numerical precision for speed. We demonstrate these properties in continuousdepth residual networks and continuoustime latent variable models. We also construct continuous normalizing flows, a generative model that can train by maximum likelihood, without partitioning or ordering the data dimensions. For training, we show how to scalably backpropagate through any ODE solver, without access to its internal operations. This allows endtoend training of ODEs within larger models. 
Neural Persistence  While many approaches to make neural networks more fathomable have been proposed, they are restricted to interrogating the network with input data. Measures for characterizing and monitoring structural properties, however, have not been developed. In this work, we propose neural persistence, a complexity measure for neural network architectures based on topological data analysis on weighted stratified graphs. To demonstrate the usefulness of our approach, we show that neural persistence reflects best practices developed in the deep learning community such as dropout and batch normalization. Moreover, we derive a neural persistencebased stopping criterion that shortens the training process while achieving comparable accuracies as early stopping based on validation loss. 
Neural Predictive Coding (NPC) 
Learning speakerspecific features is vital in many applications like speaker recognition, diarization and speech recognition. This paper provides a novel approach, we term Neural Predictive Coding (NPC), to learn speakerspecific characteristics in a completely unsupervised manner from large amounts of unlabeled training data that even contain multispeaker audio streams. The NPC framework exploits the proposed shortterm activespeaker stationarity hypothesis which assumes two temporallyclose short speech segments belong to the same speaker, and thus a common representation that can encode the commonalities of both the segments, should capture the vocal characteristics of that speaker. We train a convolutional deep siamese network to produce ‘speaker embeddings’ by optimizing a loss function that increases betweenspeaker variability and decreases withinspeaker variability. The trained NPC model can produce these embeddings by projecting any test audio stream into a high dimensional manifold where speech frames of the same speaker come closer than they do in the raw feature space. Results in the framelevel speaker classification experiment along with the visualization of the embeddings manifest the distinctive ability of the NPC model to learn shortterm speakerspecific features as compared to raw MFCC features and ivectors. The utterancelevel speaker classification experiments show that concatenating simple statistics of the shortterm NPC embeddings over the whole utterance with the utterancelevel ivectors can give useful complimentary information to the ivectors and boost the classification accuracy. The results also show the efficacy of this technique to learn those characteristics from large amounts of unlabeled training set which has no prior information about the environment of the test set. 
Neural Process (NP) 
Metalearning methods leverage past experience to learn datadriven inductive biases from related problems, increasing learning efficiency on new tasks. This ability renders them particularly suitable for sequential decision making with limited experience. Within this problem family, we argue for the use of such approaches in the study of modelbased approaches to Bayesian Optimisation, contextual bandits and Reinforcement Learning. We approach the problem by learning distributions over functions using Neural Processes (NPs), a recently introduced probabilistic metalearning method. This allows the treatment of model uncertainty to tackle the exploration/exploitation dilemma. We show that NPs are suitable for sequential decision making on a diverse set of domains, including adversarial task search, recommender systems and modelbased reinforcement learning. MetaLearning surrogate models for sequential decision making 
Neural Programmer  Deep neural networks have achieved impressive supervised classification performance in many tasks including image recognition, speech recognition, and sequence to sequence learning. However, this success has not been translated to applications like question answering that may involve complex arithmetic and logic reasoning. A major limitation of these models is in their inability to learn even simple arithmetic and logic operations. For example, it has been shown that neural networks fail to learn to add two binary numbers reliably. In this work, we propose Neural Programmer, an endtoend differentiable neural network augmented with a small set of basic arithmetic and logic operations. Neural Programmer can call these augmented operations over several steps, thereby inducing compositional programs that are more complex than the builtin operations. The model learns from a weak supervision signal which is the result of execution of the correct program, hence it does not require expensive annotation of the correct program itself. The decisions of what operations to call, and what data segments to apply to are inferred by Neural Programmer. Such decisions, during training, are done in a differentiable fashion so that the entire network can be trained jointly by gradient descent. We find that training the model is difficult, but it can be greatly improved by adding random noise to the gradient. On a fairly complex synthetic tablecomprehension dataset, traditional recurrent networks and attentional models perform poorly while Neural Programmer typically obtains nearly perfect accuracy. 
Neural Reasoner  We propose Neural Reasoner, a framework for neural networkbased reasoning over natural language sentences. Given a question, Neural Reasoner can infer over multiple supporting facts and find an answer to the question in specific forms. Neural Reasoner has 1) a specific interactionpooling mechanism, allowing it to examine multiple facts, and 2) a deep architecture, allowing it to model the complicated logical relations in reasoning tasks. Assuming no particular structure exists in the question and facts, Neural Reasoner is able to accommodate different types of reasoning and different forms of language expressions. Despite the model complexity, Neural Reasoner can still be trained effectively in an endtoend manner. Our empirical studies show that Neural Reasoner can outperform existing neural reasoning systems with remarkable margins on two difficult artificial tasks (Positional Reasoning and Path Finding) proposed in. For example, it improves the accuracy on Path Finding(10K) from 33.4% to over 98%. 
Neural Regression Tree  RegressionviaClassification (RvC) is the process of converting a regression problem to a classification one. Current approaches for RvC use adhoc discretization strategies and are suboptimal. We propose a neural regression tree model for RvC. In this model, we employ a joint optimization framework where we learn optimal discretization thresholds while simultaneously optimizing the features for each node in the tree. We empirically show the validity of our model by testing it on two challenging regression tasks where we establish the state of the art. 
Neural Relational Inference (NRI) 
Interacting systems are prevalent in nature, from dynamical systems in physics to complex societal dynamics. The interplay of components can give rise to complex behavior, which can often be explained using a simple model of the system’s constituent parts. In this work, we introduce the neural relational inference (NRI) model: an unsupervised model that learns to infer interactions while simultaneously learning the dynamics purely from observational data. Our model takes the form of a variational autoencoder, in which the latent code represents the underlying interaction graph and the reconstruction is based on graph neural networks. In experiments on simulated physical systems, we show that our NRI model can accurately recover groundtruth interactions in an unsupervised manner. We further demonstrate that we can find an interpretable structure and predict complex dynamics in real motion capture and sports tracking data. 
Neural Rendering Model (NRM) 
Unsupervised and semisupervised learning are important problems that are especially challenging with complex data like natural images. Progress on these problems would accelerate if we had access to appropriate generative models under which to pose the associated inference tasks. Inspired by the success of Convolutional Neural Networks (CNNs) for supervised prediction in images, we design the Neural Rendering Model (NRM), a new probabilistic generative model whose inference calculations correspond to those in a given CNN architecture. The NRM uses the given CNN to design the prior distribution in the probabilistic model. Furthermore, the NRM generates images from coarse to finer scales. It introduces a small set of latent variables at each level, and enforces dependencies among all the latent variables via a conjugate prior distribution. This conjugate prior yields a new regularizer based on paths rendered in the generative model for training CNNsthe Rendering Path Normalization (RPN). We demonstrate that this regularizer improves generalization, both in theory and in practice. In addition, likelihood estimation in the NRM yields training losses for CNNs, and inspired by this, we design a new loss termed as the MaxMin cross entropy which outperforms the traditional crossentropy loss for object classification. The MaxMin cross entropy suggests a new deep network architecture, namely the MaxMin network, which can learn from less labeled data while maintaining good prediction performance. Our experiments demonstrate that the NRM with the RPN and the MaxMin architecture exceeds or matches thestateofart on benchmarks including SVHN, CIFAR10, and CIFAR100 for semisupervised and supervised learning tasks. 
Neural Response Generation  
Neural Rule Engine (NRE) 
Neuralsymbolic learning aims to take the advantages of both neural networks and symbolic knowledge to build better intelligent systems. As neural networks have dominated the stateoftheart results in a wide range of NLP tasks, it attracts considerable attention to improve the performance of neural models by integrating symbolic knowledge. Different from existing works, this paper investigates the combination of these two powerful paradigms from the knowledgedriven side. We propose Neural Rule Engine (NRE), which can learn knowledge explicitly from logic rules and then generalize them implicitly with neural networks. NRE is implemented with neural module networks in which each module represents an action of the logic rule. The experiments show that NRE could greatly improve the generalization abilities of logic rules with a significant increase on recall. Meanwhile, the precision is still maintained at a high level. 
Neural Semantic Embedding for Entity Normalization (NSEEN) 
Much of human knowledge is encoded in the text, such as scientific publications, books, and the web. Given the rapid growth of these resources, we need automated methods to extract such knowledge into formal, machineprocessable structures, such as knowledge graphs. An important task in this process is entity normalization (also called entity grounding, or resolution), which consists of mapping entity mentions in text to canonical entities in wellknown reference sets. However, entity resolution is a challenging problem, since there often are many textual forms for a canonical entity. The problem is particularly acute in the scientific domain, such as biology. For example, a protein may have many different names and syntactic variations on these names. To address this problem, we have developed a general, scalable solution based on a deep Siamese neural network model to embed the semantic information about the entities, as well as their syntactic variations. We use these embeddings for fast mapping of new entities to large reference sets, and empirically show the effectiveness of our framework in challenging bioentity normalization datasets. 
Neural Semantic Encoders (NSE) 
We present a memory augmented neural network for natural language understanding: Neural Semantic Encoders (NSE). NSE has a variable sized encoding memory that evolves over time and maintains the understanding of input sequences through read, compose and write operations. NSE can access multiple and shared memories depending on the complexity of a task. We demonstrated the effectiveness and the flexibility of NSE on five different natural language tasks, natural language inference, question answering, sentence classification, document sentiment analysis and machine translation where NSE achieved stateoftheart performance when evaluated on publically available benchmarks. For example, our sharedmemory model showed an encouraging result on neural machine translation, improving an attentionbased baseline by approximately 1.0 BLEU. 
Neural SLAM  We present an approach for agents to learn representations of a global map from sensor data, to aid their exploration in new environments. To achieve this, we embed procedures mimicking that of traditional Simultaneous Localization and Mapping (SLAM) into the soft attention based addressing of external memory architectures, in which the external memory acts as an internal representation of the environment. This structure encourages the evolution of SLAMlike behaviors inside a completely differentiable deep neural network. We show that this approach can help reinforcement learning agents to successfully explore new environments where longterm memory is essential. We validate our approach in both challenging gridworld environments and preliminary Gazebo experiments. 
Neural Sobolev Descent  We introduce Regularized Kernel and Neural Sobolev Descent for transporting a source distribution to a target distribution along smooth paths of minimum kinetic energy (defined by the Sobolev discrepancy), related to dynamic optimal transport. In the kernel version, we give a simple algorithm to perform the descent along gradients of the Sobolev critic, and show that it converges asymptotically to the target distribution in the MMD sense. In the neural version, we parametrize the Sobolev critic with a neural network with input gradient norm constrained in expectation. We show in theory and experiments that regularization has an important role in favoring smooth transitions between distributions, avoiding large discrete jumps. Our analysis could provide a new perspective on the impact of critic updates (early stopping) on the paths to equilibrium in the GAN setting. 
Neural SPARQL Machine  In the last years, the Linked Data Cloud has achieved a size of more than 100 billion facts pertaining to a multitude of domains. However, accessing this information has been significantly challenging for lay users. Approaches to problems such as Question Answering on Linked Data and Link Discovery have notably played a role in increasing information access. These approaches are often based on handcrafted and/or statistical models derived from data observation. Recently, Deep Learning architectures based on Neural Networks called seq2seq have shown to achieve stateoftheart results at translating sequences into sequences. In this direction, we propose Neural SPARQL Machines, endtoend deep architectures to translate any natural language expression into sentences encoding SPARQL queries. Our preliminary results, restricted on selected DBpedia classes, show that Neural SPARQL Machines are a promising approach for Question Answering on Linked Data, as they can deal with known problems such as vocabulary mismatch and perform graph pattern composition. 
Neural Style Transfer  The recent work of Gatys et al. demonstrated the power of Convolutional Neural Networks (CNN) in creating artistic fantastic imagery by separating and recombing the image content and style. This process of using CNN to migrate the semantic content of one image to different styles is referred to as Neural Style Transfer. 
Neural Tangent Kernel (NTK) 
At initialization, artificial neural networks (ANNs) are equivalent to Gaussian processes in the infinitewidth limit, thus connecting them to kernel methods. We prove that the evolution of an ANN during training can also be described by a kernel: during gradient descent on the parameters of an ANN, the network function fTheta(which maps input vectors to output vectors) follows the kernel gradient of the functional cost (which is convex, in contrast to the parameter cost) w.r.t. a new kernel: the Neural Tangent Kernel (NTK). This kernel is central to describe the generalization features of ANNs. While the NTK is random at initialization and varies during training, in the infinitewidth limit it converges to an explicit limiting kernel and it stays constant during training. This makes it possible to study the training of ANNs in function space instead of parameter space. Convergence of the training can then be related to the positivedefiniteness of the limiting NTK. We prove the positivedefiniteness of the limiting NTK when the data is supported on the sphere and the nonlinearity is nonpolynomial. We then focus on the setting of leastsquares regression and show that in the infinitewidth limit, the network function fTheta follows a linear differential equation during training. The convergence is fastest along the largest kernel principal components of the input data with respect to the NTK, hence suggesting a theoretical motivation for early stopping. Finally we study the NTK numerically, observe its behavior for wide networks, and compare it to the infinitewidth limit. On Exact Computation with an Infinitely Wide Neural Net 
Neural Task Programming (NTP) 
In this work, we propose a novel robot learning framework called Neural Task Programming (NTP), which bridges the idea of fewshot learning from demonstration and neural program induction. NTP takes as input a task specification (e.g., video demonstration of a task) and recursively decomposes it into finer subtask specifications. These specifications are fed to a hierarchical neural program, where bottomlevel programs are callable subroutines that interact with the environment. We validate our method in three robot manipulation tasks. NTP achieves strong generalization across sequential tasks that exhibit hierarchal and compositional structures. The experimental results show that NTP learns to generalize well towards unseen tasks with increasing lengths, variable topologies, and changing objectives. 
Neural Tensor Factorization (NTF) 
Neural collaborative filtering (NCF) and recurrent recommender systems (RRN) have been successful in modeling useritem relational data. However, they are also limited in their assumption of static or sequential modeling of relational data as they do not account for evolving users’ preference over time as well as changes in the underlying factors that drive the change in useritem relationship over time. We address these limitations by proposing a Neural Tensor Factorization (NTF) model for predictive tasks on dynamic relational data. The NTF model generalizes conventional tensor factorization from two perspectives: First, it leverages the long shortterm memory architecture to characterize the multidimensional temporal interactions on relational data. Second, it incorporates the multilayer perceptron structure for learning the nonlinearities between different latent factors. Our extensive experiments demonstrate the significant improvement in rating prediction and link prediction on dynamic relational data by our NTF model over both neural network based factorization models and other traditional methods. 
Neural Tensor Network (NTN) 
The Neural Tensor Network (NTN) replaces a standard linear neural network layer with a bilinear tensor layer that directly relates two entity vectors across multiple dimensions. The model computes a score of how likely it is that two entities are in a certain relationship. 
Neural Theorem Prover (NTP) 
We introduce neural networks for endtoend differentiable proving of queries to knowledge bases by operating on dense vector representations of symbols. These neural networks are constructed recursively by taking inspiration from the backward chaining algorithm as used in Prolog. Specifically, we replace symbolic unification with a differentiable computation on vector representations of symbols using a radial basis function kernel, thereby combining symbolic reasoning with learning subsymbolic vector representations. By using gradient descent, the resulting neural network can be trained to infer facts from a given incomplete knowledge base. It learns to (i) place representations of similar symbols in close proximity in a vector space, (ii) make use of such similarities to prove queries, (iii) induce logical rules, and (iv) use provided and induced logical rules for multihop reasoning. We demonstrate that this architecture outperforms ComplEx, a stateoftheart neural link prediction model, on three out of four benchmark knowledge bases while at the same time inducing interpretable functionfree firstorder logic rules. Towards Neural Theorem Proving at Scale Neural Theorem Prover 
Neural Turing Machine (NTM) 
One of the major objectives of Artificial Intelligence is to design learning algorithms that are executed on a general purposes computational machines such as human brain. Neural Turing Machine (NTM) is a step towards realizing such a computational machine. The attempt is made here to run a systematic review on Neural Turing Machine. First, the mindmap and taxonomy of machine learning, neural networks, and Turing machine are introduced. Next, NTM is inspected in terms of concepts, structure, variety of versions, implemented tasks, comparisons, etc. Finally, the paper discusses on issues and ends up with several future works. 
Neural Turing Machines (NTM) 
We extend the capabilities of neural networks by coupling them to external memory resources, which they can interact with by attentional processes. The combined system is analogous to a Turing Machine or Von Neumann architecture but is differentiable endtoend, allowing it to be efficiently trained with gradient descent. Preliminary results demonstrate that Neural Turing Machines can infer simple algorithms such as copying, sorting, and associative recall from input and output examples. Neural Turing Machines are fully differentiable computers that use backpropagation to learn their own programming. 
Neural Variational Hybrid Collaborative Filtering (VDMF) 
Collaborative Filtering (CF) is one of the most used methods for Recommender System. Because of the Bayesian nature and nonlinearity, deep generative models, e.g. Variational Autoencoder (VAE), have been applied into CF task, and have achieved great performance. However, most VAEbased methods suffer from matrix sparsity and consider the prior of users’ latent factors to be the same, which leads to poor latent representations of users and items. Additionally, most existing methods model latent factors of users only and but not items, which makes them not be able to recommend items to a new user. To tackle these problems, we propose a Neural Variational Hybrid Collaborative Filtering, \VDMF{}. Specifically, we consider both the generative processes of users and items, and the prior of latent factors of users and items to be \emph{side ~informationspecific}, which enables our model to alleviate matrix sparsity and learn better latent representations of users and items. For inference purpose, we derived a Stochastic Gradient Variational Bayes (SGVB) algorithm to analytically approximate the intractable distributions of latent factors of users and items. Experiments conducted on two large datasets have showed our methods significantly outperform the stateoftheart CF methods, including the VAEbased methods. 
Neural Vector Space Model (NVSM) 
We propose the Neural Vector Space Model (NVSM), a method that learns representations of documents in an unsupervised manner for news article retrieval. In the NVSM paradigm, we learn lowdimensional representations of words and documents from scratch using gradient descent and rank documents according to their similarity with query representations that are composed from word representations. We show that NVSM performs better at document ranking than existing latent semantic vector space methods. The addition of NVSM to a mixture of lexical language models and a stateoftheart baseline vector space model yields a statistically significant increase in retrieval effectiveness. Consequently, NVSM adds a complementary relevance signal. Next to semantic matching, we find that NVSM performs well in cases where lexical matching is needed. NVSM learns a notion of term specificity directly from the document collection without feature engineering. We also show that NVSM learns regularities related to Luhn significance. Finally, we give advice on how to deploy NVSM in situations where model selection (e.g., crossvalidation) is infeasible. We find that an unsupervised ensemble of multiple models trained with different hyperparameter values performs better than a single crossvalidated model. Therefore, NVSM can safely be used for ranking documents without supervised relevance judgments. 
NeuralDater  Document date is essential for many important tasks, such as document retrieval, summarization, event detection, etc. While existing approaches for these tasks assume accurate knowledge of the document date, this is not always available, especially for arbitrary documents from the Web. Document Dating is a challenging problem which requires inference over the temporal structure of the document. Prior document dating systems have largely relied on handcrafted features while ignoring such document internal structures. In this paper, we propose NeuralDater, a Graph Convolutional Network (GCN) based document dating approach which jointly exploits syntactic and temporal graph structures of document in a principled way. To the best of our knowledge, this is the first application of deep learning for the problem of document dating. Through extensive experiments on realworld datasets, we find that NeuralDater significantly outperforms stateoftheart baseline by 19% absolute (45% relative) accuracy points. 
NeuralGuided RANSAC (NGRANSAC) 
We present NeuralGuided RANSAC (NGRANSAC), an extension to the classic RANSAC algorithm from robust optimization. NGRANSAC uses prior information to improve model hypothesis search, increasing the chance of finding outlierfree minimal sets. Previous works use heuristic sideinformation like handcrafted descriptor distance to guide hypothesis search. In contrast, we learn hypothesis search in a principled fashion that lets us optimize an arbitrary task loss during training, leading to large improvements on classic computer vision tasks. We present two further extensions to NGRANSAC. Firstly, using the inlier count itself as training signal allows us to train neural guidance in a selfsupervised fashion. Secondly, we combine neural guidance with differentiable RANSAC to build neural networks which focus on certain parts of the input data and make the output predictions as good as possible. We evaluate NGRANSAC on a wide array of computer vision tasks, namely estimation of epipolar geometry, horizon line estimation and camera relocalization. We achieve superior or competitive results compared to stateoftheart robust estimators, including very recent, learned ones. 
NeurAll  Convolutional Neural Networks (CNNs) are successfully used for the important automotive visual perception tasks including object recognition, motion and depth estimation, visual SLAM, etc. However, these tasks are independently explored and modeled. In this paper, we propose a joint multitask network design called NeurAll for learning all tasks simultaneously. Our main motivation is the computational efficiency achieved by sharing the expensive initial convolutional layers between all tasks. Indeed, the main bottleneck in automated driving systems is the limited processing power available on deployment hardware. There could be other benefits in improving accuracy for some tasks and it eases development effort. It also offers scalability to add more tasks leveraging existing features and achieving better generalization. We survey various CNN based solutions for visual perception tasks in automated driving. Then we propose a unified CNN model for the important tasks and discuss several advanced optimization and architecture design techniques to improve the baseline model. The paper is partly review and partly positional with demonstration of several preliminary results promising for future research. Firstly, we show that an efficient twotask model performing semantic segmentation and object detection achieves similar accuracies compared to separate models on various datasets with minimized runtime. We then illustrate that using depth regression as auxiliary task improves semantic segmentation and using multistream semantic segmentation outperforms onestream semantic segmentation. The twotask network achieves 30 fps on an automotive grade low power SOC for 1280×384 image resolution 
Neurally Directed Program Search (NDPS) 
We study the problem of generating interpretable and verifiable policies through reinforcement learning. Unlike the popular Deep Reinforcement Learning (DRL) paradigm, in which the policy is represented by a neural network, the aim in Programmatically Interpretable Reinforcement Learning is to find a policy that can be represented in a highlevel programming language. Such programmatic policies have the benefits of being more easily interpreted than neural networks, and being amenable to verification by symbolic methods. We propose a new method, called Neurally Directed Program Search (NDPS), for solving the challenging nonsmooth optimization problem of finding a programmatic policy with maxima reward. NDPS works by first learning a neural policy network using DRL, and then performing a local search over programmatic policies that seeks to minimize a distance from this neural ‘oracle’. We evaluate NDPS on the task of learning to drive a simulated car in the TORCS carracing environment. We demonstrate that NDPS is able to discover humanreadable policies that pass some significant performance bars. We also find that a welldesigned policy language can serve as a regularizer, and result in the discovery of policies that lead to smoother trajectories and are more easily transferred to environments not encountered during training. 
Neurally Plausible Alternating OptimizationBased Online Dictionary Learning (NOODL) 
We consider the dictionary learning problem, where the aim is to model the given data as a linear combination of a few columns of a matrix known as a dictionary, where the sparse weights forming the linear combination are known as coefficients. Since the dictionary and coefficients, parameterizing the linear model are unknown, the corresponding optimization is inherently nonconvex. This was a major challenge until recently, when provable algorithms for dictionary learning were proposed. Yet, these provide guarantees only on the recovery of the dictionary, without explicit recovery guarantees on the coefficients. Moreover, any estimation error in the dictionary adversely impacts the ability to successfully localize and estimate the coefficients. This potentially limits the utility of existing provable dictionary learning methods in applications where coefficient recovery is of interest. To this end, we develop NOODL: a simple Neurally plausible alternating Optimizationbased Online Dictionary Learning algorithm, which recovers both the dictionary and coefficients exactly at a geometric rate, when initialized appropriately. Our algorithm, NOODL, is also scalable and amenable for large scale distributed implementations in neural architectures, by which we mean that it only involves simple linear and nonlinear operations. Finally, we corroborate these theoretical results via experimental evaluation of the proposed algorithm with the current stateoftheart techniques. 
Neuralogram  We propose the Neuralogram — a deep neural network based representation for understanding audio signals which, as the name suggests, transforms an audio signal to a dense, compact representation based upon embeddings learned via a neural architecture. Through a series of probing signals, we show how our representation can encapsulate pitch, timbre and rhythmbased information, and other attributes. This representation suggests a method for revealing meaningful relationships in arbitrarily long audio signals that are not readily represented by existing algorithms. This has the potential for numerous applications in audio understanding, music recommendation, metadata extraction to name a few. 
neuralRank  Widespread applications of deep learning have led to a plethora of pretrained neural network models for common tasks. Such models are often adapted from other models via transfer learning. The models may have varying training sets, training algorithms, network architectures, and hyperparameters. For a given application, what isthe most suitable model in a model repository? This is a critical question for practical deployments but it has not received much attention. This paper introduces the novel problem of searching and ranking models based on suitability relative to a target dataset and proposes a ranking algorithm called \textit{neuralRank}. The key idea behind this algorithm is to base model suitability on the discriminating power of a model, using a novel metric to measure it. With experimental results on the MNIST, Fashion, and CIFAR10 datasets, we demonstrate that (1) neuralRank is independent of the domain, the training set, or the network architecture and (2) that the models ranked highly by neuralRank ranking tend to have higher model accuracy in practice. 
NeuralSort  Sorting input objects is an important step in many machine learning pipelines. However, the sorting operator is nondifferentiable with respect to its inputs, which prohibits endtoend gradientbased optimization. In this work, we propose NeuralSort, a generalpurpose continuous relaxation of the output of the sorting operator from permutation matrices to the set of unimodal rowstochastic matrices, where every row sums to one and has a distinct arg max. This relaxation permits straightthrough optimization of any computational graph involve a sorting operation. Further, we use this relaxation to enable gradientbased stochastic optimization over the combinatorially large space of permutations by deriving a reparameterized gradient estimator for the PlackettLuce family of distributions over permutations. We demonstrate the usefulness of our framework on three tasks that require learning semantic orderings of highdimensional objects, including a fully differentiable, parameterized extension of the knearest neighbors algorithm. 
Neuroevolution  Neuroevolution, or neuroevolution, is a form of artificial intelligence that uses evolutionary algorithms to generate artificial neural networks (ANN), parameters, topology and rules. It is most commonly applied in artificial life, general game playing and evolutionary robotics. The main benefit is that neuroevolution can be applied more widely than supervised learning algorithms, which require a syllabus of correct inputoutput pairs. In contrast, neuroevolution requires only a measure of a network’s performance at a task. For example, the outcome of a game (i.e. whether one player won or lost) can be easily measured without providing labeled examples of desired strategies. Neuroevolution can be contrasted with conventional deep learning techniques that use gradient descent on a neural network with a fixed topology. Adaptive Genomic Evolution of Neural Network Topologies (AGENT) for StatetoAction Mapping in Autonomous Agents 
NeuroFuzzy  In the field of artificial intelligence, neurofuzzy refers to combinations of artificial neural networks and fuzzy logic. Neurofuzzy was proposed by J. S. R. Jang. Neurofuzzy hybridization results in a hybrid intelligent system that synergizes these two techniques by combining the humanlike reasoning style of fuzzy systems with the learning and connectionist structure of neural networks. Neurofuzzy hybridization is widely termed as Fuzzy Neural Network (FNN) or NeuroFuzzy System (NFS) in the literature. Neurofuzzy system (the more popular term is used henceforth) incorporates the humanlike reasoning style of fuzzy systems through the use of fuzzy sets and a linguistic model consisting of a set of IFTHEN fuzzy rules. The main strength of neurofuzzy systems is that they are universal approximators with the ability to solicit interpretable IFTHEN rules. The strength of neurofuzzy systems involves two contradictory requirements in fuzzy modeling: interpretability versus accuracy. In practice, one of the two properties prevails. The neurofuzzy in fuzzy modeling research field is divided into two areas: linguistic fuzzy modeling that is focused on interpretability, mainly the Mamdani model; and precise fuzzy modeling that is focused on accuracy, mainly the TakagiSugenoKang (TSK) model. Although generally assumed to be the realization of a fuzzy system through connectionist networks, this term is also used to describe some other configurations including: · Deriving fuzzy rules from trained RBF networks. · Fuzzy logic based tuning of neural network training parameters. · Fuzzy logic criteria for increasing a network size. · Realising fuzzy membership function through clustering algorithms in unsupervised learning in SOMs and neural networks. · Representing fuzzification, fuzzy inference and defuzzification through multilayers feedforward connectionist networks. It must be pointed out that interpretability of the Mamdanitype neurofuzzy systems can be lost. To improve the interpretability of neurofuzzy systems, certain measures must be taken, wherein important aspects of interpretability of neurofuzzy systems are also discussed. A recent research line addresses the data stream mining case, where neurofuzzy systems are sequentially updated with new incoming samples on demand and onthefly. Thereby, system updates do not only include a recursive adaptation of model parameters, but also a dynamic evolution and pruning of model components (neurons, rules), in order to handle concept drift and dynamically changing system behavior adequately and to keep the systems/models ‘uptodate’ anytime. Comprehensive surveys of various evolving neurofuzzy systems approaches can be found in and. frbs 
NeuroFuzzy System  Modern neurofuzzy systems are usually represented as special multilayer feedforward neural networks (see for example models like ANFIS , FuNe , Fuzzy RuleNet , GARIC , or NEFCLASS and NEFCON ). However, fuzzifications of other neural network architectures are also considered, for example selforganizing feature maps. In those neurofuzzy networks, connection weights and propagation and activation functions differ from common neural networks. Although there are a lot of different approaches , we usually use the term neuro–fuzzy system for approaches which display the following properties: · A neurofuzzy system is based on a fuzzy system which is trained by a learning algorithm derived from neural network theory. The (heuristical) learning procedure operates on local information, and causes only local modifications in the underlying fuzzy system. · A neurofuzzy system can be viewed as a 3layer feedforward neural network. The first layer represents input variables, the middle (hidden) layer represents fuzzy rules and the third layer represents output variables. Fuzzy sets are encoded as (fuzzy) connection weights. It is not necessary to represent a fuzzy system like this to apply a learning algorithm to it. However, it can be convenient, because it represents the data flow of input processing and learning within the model. Remark: Sometimes a 5layer architecture is used, where the fuzzy sets are represented in the units of the second and fourth layer. · A neurofuzzy system can be always (i.e.\ before, during and after learning) interpreted as a system of fuzzy rules. It is also possible to create the system out of training data from scratch, as it is possible to initialize it by prior knowledge in form of fuzzy rules. Remark: Not all neurofuzzy models specifiy learning procedures for fuzzy rule creation. · The learning procedure of a neurofuzzy system takes the semantical properties of the underlying fuzzy system into account. This results in constraints on the possible modifications applicable to the system parameters. Remark: Not all neurofuzzy approaches have this property. · A neurofuzzy system approximates an $n$dimensional (unknown) function that is partially defined by the training data. The fuzzy rules encoded within the system represent vague samples, and can be viewed as prototypes of the training data. A neurofuzzy system should not be seen as a kind of (fuzzy) expert system, and it has nothing to do with fuzzy logic in the narrow sense. frbs 
NeuroIndex  The article describes a new data structure called neuroindex. It is an alternative to wellknown file indexes. The neuroindex is fundamentally different because it stores weight coefficients in neural network. It is not a reference type like ‘keywordposition in a file’. 
Neuroinformatics  Neuroinformatics is a research field concerned with the organization of neuroscience data by the application of computational models and analytical tools. These areas of research are important for the integration and analysis of increasingly largevolume, highdimensional, and finegrain experimental data. Neuroinformaticians provide computational tools, mathematical models, and create interoperable databases for clinicians and research scientists. Neuroscience is a heterogeneous field, consisting of many and various subdisciplines (e.g., Cognitive Psychology, Behavioral Neuroscience, and Behavioral Genetics). In order for our understanding of the brain to continue to deepen, it is necessary that these subdisciplines are able to share data and findings in a meaningful way; Neuroinformaticians facilitate this. Neuroinformatics stands at the intersection of neuroscience and information science. Other fields, like genomics, have demonstrated the effectiveness of freelydistributed databases and the application of theoretical and computational models for solving complex problems. In Neuroinformatics, such facilities allow researchers to more easily quantitatively confirm their working theories by computational modeling. Additionally, neuroinformatics fosters collaborative researchan important fact that facilitates the field’s interest in studying the multilevel complexity of the brain. There are three main directions where neuroinformatics has to be applied: 1. the development of tools and databases for management and sharing of neuroscience data at all levels of analysis, 2. the development of tools for analyzing and modeling neuroscience data, 3. the development of computational models of the nervous system and neural processes. 
Neuromorphic Engineering  Neuromorphic engineering, also known as neuromorphic computing, is a concept developed by Carver Mead, in the late 1980s, describing the use of verylargescale integration (VLSI) systems containing electronic analog circuits to mimic neurobiological architectures present in the nervous system. In recent times, the term neuromorphic has been used to describe analog, digital, mixedmode analog/digital VLSI, and software systems that implement models of neural systems (for perception, motor control, or multisensory integration). The implementation of neuromorphic computing on the hardware level can be realized by oxidebased memristors,, spintronic memories, threshold switches, and transistors. A key aspect of neuromorphic engineering is understanding how the morphology of individual neurons, circuits, applications, and overall architectures creates desirable computations, affects how information is represented, influences robustness to damage, incorporates learning and development, adapts to local change (plasticity), and facilitates evolutionary change. Neuromorphic engineering is an interdisciplinary subject that takes inspiration from biology, physics, mathematics, computer science, and electronic engineering to design artificial neural systems, such as vision systems, headeye systems, auditory processors, and autonomous robots, whose physical architecture and design principles are based on those of biological nervous systems. 
Neuromorphic Hardware  Hyperparameters and learning algorithms for neuromorphic hardware are usually chosen by hand. In contrast, the hyperparameters and learning algorithms of networks of neurons in the brain, which they aim to emulate, have been optimized through extensive evolutionary and developmental processes for specific ranges of computing and learning tasks. Occasionally this process has been emulated through genetic algorithms, but these require themselves handdesign of their details and tend to provide a limited range of improvements. We employ instead other powerful gradientfree optimization tools, such as crossentropy methods and evolutionary strategies, in order to port the function of biological optimization processes to neuromorphic hardware. As an example, we show that this method produces neuromorphic agents that learn very efficiently from rewards. In particular, metaplasticity, i.e., the optimization of the learning rule which they use, substantially enhances rewardbased learning capability of the hardware. In addition, we demonstrate for the first time LearningtoLearn benefits from such hardware, in particular, the capability to extract abstract knowledge from prior learning experiences that speeds up the learning of new but related tasks. LearningtoLearn is especially suited for accelerated neuromorphic hardware, since it makes it feasible to carry out the required very large number of network computations. 
NEURON  Natural language interfaces for relational databases have been explored for several decades. Majority of the work have focused on translating natural language sentences to SQL queries or narrating SQL queries in natural language. Scant attention has been paid for natural language understanding of query execution plans (QEP) of SQL queries. In this demonstration, we present a novel generic system called NEURON that facilitates natural language interaction with QEPs. NEURON accepts a SQL query (which may include joins, aggregation, nesting, among other things) as input, executes it, and generates a natural languagebased description (both in text and voice form) of the execution strategy deployed by the underlying RDBMS. Furthermore, it facilitates understanding of various features related to the QEP through a natural languagebased question answering framework. NEURON can be potentially useful to database application developers in comprehending query execution strategies and to database instructors and students for pedagogical support. 
NeuronBlocks  NeuronBlocks is a NLP deep learning modeling toolkit that helps engineers/researchers to build endtoend pipelines for neural network model training for NLP tasks. The main goal of this toolkit is to minimize developing cost for NLP deep neural network model building, including both training and inference stages. For more details, please check our paper: NeuronBlocks — Building Your NLP DNN Models Like Playing Lego at https://…/1904.09535. NeuronBlocks consists of two major components: Block Zoo and Model Zoo. • In Block Zoo, we provide commonly used neural network components as building blocks for model architecture design. • In Model Zoo, we provide a suite of NLP models for common NLP tasks, in the form of JSON configuration files. 
Neurons Merging Layer (NMLayer) 
Deep supervised hashing has become an active topic in web search and information retrieval. It generates hashing bits by the output neurons of a deep hashing network. During binary discretization, there often exists much redundancy among hashing bits that degenerates retrieval performance in terms of both storage and accuracy. This paper formulates the redundancy problem in deep supervised hashing as a graph learning problem and proposes a novel layer, named Neurons Merging Layer (NMLayer). The NMLayer constructs a graph to model the adjacency relationship among different neurons. Specifically, it learns the relationship by the defined active and frozen phases. According to the learned relationship, the NMLayer merges the redundant neurons together to balance the importance of each output neuron. Based on the NMLayer, we further propose a progressive optimization strategy for training a deep hashing network. That is, multiple NMLayers are progressively trained to learn a more compact hashing code from a long redundant code. Extensive experiments on four datasets demonstrate that our proposed method outperforms stateoftheart hashing methods. 
NeuroOptimization  Mathematical optimization is widely used in various research fields. With a carefullydesigned objective function, mathematical optimization can be quite helpful in solving many problems. However, objective functions are usually handcrafted and designing a good one can be quite challenging. In this paper, we propose a novel framework to learn the objective function based on a neural network. The basic idea is to consider the neural network as an objective function, and the input as an optimization variable. For the learning of objective function from the training data, two processes are conducted: In the inner process, the optimization variable (the input of the network) are optimized to minimize the objective function (the network output), while fixing the network weights. In the outer process, on the other hand, the weights are optimized based on how close the final solution of the inner process is to the desired solution. After learning the objective function, the solution for the test set is obtained in the same manner of the inner process. The potential and applicability of our approach are demonstrated by the experiments on toy examples and a computer vision task, optical flow. 
NeuroSymbolic Concept Learner (NSCL) 
We propose the NeuroSymbolic Concept Learner (NSCL), a model that learns visual concepts, words, and semantic parsing of sentences without explicit supervision on any of them; instead, our model learns by simply looking at images and reading paired questions and answers. Our model builds an objectbased scene representation and translates sentences into executable, symbolic programs. To bridge the learning of two modules, we use a neurosymbolic reasoning module that executes these programs on the latent scene representation. Analogical to human concept learning, the perception module learns visual concepts based on the language description of the object being referred to. Meanwhile, the learned visual concepts facilitate learning new words and parsing new sentences. We use curriculum learning to guide the searching over the large compositional space of images and language. Extensive experiments demonstrate the accuracy and efficiency of our model on learning visual concepts, word representations, and semantic parsing of sentences. Further, our method allows easy generalization to new object attributes, compositions, language concepts, scenes and questions, and even new program domains. It also empowers applications including visual question answering and bidirectional imagetext retrieval. 
NeuroTreeNet (NTN) 
It is widely recognized that the deeper networks or networks with more feature maps have better performance. Existing studies mainly focus on extending the network depth and increasing the feature maps of networks. At the same time, horizontal expansion network (e.g. Inception Model) as an alternative way to improve network performance has not been fully investigated. Accordingly, we proposed NeuroTreeNet (NTN), as a new horizontal extension network through the combination of random forest and Inception Model. Based on the tree structure, in which each branch represents a network and the root node features are shared to child nodes, network parameters are effectively reduced. By combining all features of leaf nodes, even less feature maps achieved better performance. In addition, the relationship between tree structure and the performance of NTN was investigated in depth. Comparing to other networks (e.g. VDSR\_5) with equal magnitude parameters, our model showed preferable performance in super resolution reconstruction task. 
NeuroX  We present a toolkit to facilitate the interpretation and understanding of neural network models. The toolkit provides several methods to identify salient neurons with respect to the model itself or an external task. A user can visualize selected neurons, ablate them to measure their effect on the model accuracy, and manipulate them to control the behavior of the model at the test time. Such an analysis has a potential to serve as a springboard in various research directions, such as understanding the model, better architectural choices, model distillation and controlling data biases. 
Neutrosophic Logic  A logic in which each proposition is estimated to have the percentage of truth in a subset T, the percentage of indeterminacy in a subset I, and the percentage of falsity in a subset F, is called Neutrosophic Logic. We use a subset of truth (or indeterminacy, or falsity), instead of a number only, because in many cases we are not able to exactly determine the percentages of truth and of falsity but to approximate them: for example a proposition is between 3040% true and between 6070% false, even worst: between 3040% or 4550% true (according to various analyzers), and 60% or between 6670% false. The subsets are not necessary intervals, but any sets (discrete, continuous, open or closed or halfopen/ halfclosed interval, intersections or unions of the previous sets, etc.) in accordance with the given proposition. A subset may have one element only in special cases of this logic. 
NEUZZ  Fuzzing has become the de facto standard technique for finding software vulnerabilities. However, even the stateoftheart fuzzers are not very efficient at finding hardtotrigger software bugs. Coverageguided evolutionary fuzzers, while fast and scalable, often get stuck at fruitless sequences of random mutations. By contrast, more systematic techniques like symbolic and concolic execution incur significant performance overhead and struggle to scale to larger programs. We design, implement, and evaluate NEUZZ, an efficient fuzzer that guides the fuzzing input generation process using deep neural networks. NEUZZ efficiently learns a differentiable neural approximation of the target program logic. The differentiability of the surrogate neural program, unlike the original target program, allows us to use efficient optimization techniques like gradient descent to identify promising mutations that are more likely to trigger hardtoreach code in the target program. We evaluate NEUZZ on 10 popular realworld programs and demonstrate that NEUZZ consistently outperforms AFL, a stateoftheart evolutionary fuzzer, both at finding new bugs and achieving higher edge coverage. In total, NEUZZ found 36 previously unknown bugs that AFL failed to find and achieved, on average, 70 more edge coverage than AFL. Our results also demonstrate that NEUZZ can achieve average 9 more edge coverage while taking 16 less training time than other learningenabled fuzzers. 
Newick Format  In mathematics, Newick tree format (or Newick notation or New Hampshire tree format) is a way of representing graphtheoretical trees with edge lengths using parentheses and commas. It was adopted by James Archie, William H. E. Day, Joseph Felsenstein, Wayne Maddison, Christopher Meacham, F. James Rohlf, and David Swofford, at two meetings in 1986, the second of which was at Newick’s restaurant in Dover, New Hampshire, US. The adopted format is a generalization of the format developed by Meacham in 1984 for the first treedrawing programs in Felsenstein’s PHYLIP package. ggtree 
NewSQL  NewSQL is a class of modern relational database management systems that seek to provide the same scalable performance of NoSQL systems for online transaction processing (OLTP) readwrite workloads while still maintaining the ACID guarantees of a traditional database system. 
Newton  This article introduces Newton, a specification language for notating the analytic form, units of measure, and sensor signal properties for physicalobjectspecific invariants and general physical laws. We designed Newton to provide a means for hardware designers (e.g., sensor integrated circuit manufacturers, computing hardware architects, or mechanical engineers) to specify properties of the physical environments in which embedded computing systems will be deployed (e.g., a sensing platform deployed on a bridge versus worn by a human). Compilers and other program analysis tools for embedded systems can use a library interface to the Newton compiler to obtain information about the sensors, sensor signals, and intersignal relationships imposed by the structure and materials properties of a given physical system. The information encoded within Newton specifications could enable new compiletime transformations that exploit information about the physical world. 
Newton Scheme (NS) 
We introduce a neural network (NN) strictly governed by Newton’s Law, with the nature required basis functions derived from the fundamental classic mechanics. Then, by classifying the training model as a quick procedure of ‘force pattern’ recognition, we developed the Newton physicsbased NS scheme. Once the force pattern is confirmed, the neuro network simply does the checking of the ‘pattern stability’ instead of the continuous fitting by computational resource consuming big datadriven processing. In the given physics’s law system, once the field is confirmed, the mathematics bases for the force field description actually are not diverged but denumerable, which can save the function representations from the exhaustible available mathematics bases. In this work, we endorsed Newton’s Law into the deep learning technology and proposed Newton Scheme (NS). Under NS, the user first identifies the path pattern, like the constant acceleration movement.The object recognition technology first loads mass information, then, the NS finds the matched physical pattern and describe and predict the trajectory of the movements with nearly zero error. We compare the major contribution of this NS with the TCN, GRU and other physics inspired ‘FINDPDE’ methods to demonstrate fundamental and extended applications of how the NS works for the freefalling, pendulum and curve soccer balls.The NS methodology provides more opportunity for the future deep learning advances. 
Newtontype Alternating Minimization Algorithm (NAMA) 
We propose NAMA (Newtontype Alternating Minimization Algorithm) for solving structured nonsmooth convex optimization problems where the sum of two functions is to be minimized, one being strongly convex and the other composed with a linear mapping. The proposed algorithm is a linesearch method over a continuous, realvalued, exact penalty function for the corresponding dual problem, which is computed by evaluating the augmented Lagrangian at the primal points obtained by alternating minimizations. As a consequence, NAMA relies on exactly the same computations as the classical alternating minimization algorithm (AMA), also known as the dual proximal gradient method. Under standard assumptions the proposed algorithm possesses strong convergence properties, while under mild additional assumptions the asymptotic convergence is superlinear, provided that the search directions are chosen according to quasiNewton formulas. Due to its simplicity, the proposed method is well suited for embedded applications and largescale problems. Experiments show that using limitedmemory directions in NAMA greatly improves the convergence speed over AMA and its accelerated variant. 
Next Hit Predictor (NHP) 
Our goal is to predict the location of the next crime in a crime series, based on the identified previous offenses in the series. We build a predictive model called Next Hit Predictor (NHP) that finds the most likely location of the next serial crime via a carefully designed risk model. The risk model follows the paradigm of a selfexciting point process which consists of a background crime risk and triggered risks stimulated by previous offenses in the series. Thus, NHP creates a risk map for a crime series at hand. To train the risk model, we formulate a convex learning objective that considers pairwise rankings of locations and use stochastic gradient descent to learn the optimal parameters. Next Hit Predictor incorporates both spatialtemporal features and geographical characteristics of prior crime locations in the series. Next Hit Predictor has demonstrated promising results on decades’ worth of serial crime data collected by the Crime Analysis Unit of the Cambridge Police Department in Massachusetts, USA. 
NEXTSUM  Existing approaches to automatic summarization assume that a length limit for the summary is given, and view content selection as an optimization problem to maximize informativeness and minimize redundancy within this budget. This framework ignores the fact that humanwritten summaries have rich internal structure which can be exploited to train a summarization system. We present NEXTSUM, a novel approach to summarization based on a model that predicts the next sentence to include in the summary using not only the source article, but also the summary produced so far. We show that such a model successfully captures summaryspecific discourse moves, and leads to better content selection performance, in addition to automatically predicting how long the target summary should be. We perform experiments on the New York Times Annotated Corpus of summaries, where NEXTSUM outperforms lead and contentmodel summarization baselines by significant margins. We also show that the lengths of summaries produced by our system correlates with the lengths of the humanwritten gold standards. 
NeymanPearson Classification  
NeymanPearson Criterion (NPC) 
We propose a new model selection criterion, the NeymanPearson criterion (NPC), for asymmetric binary classification problems such as cancer diagnosis, where the two types of classification errors have vastly different priorities. The NPC is a general predictionbased criterion that works for most classification methods including logistic regression, support vector machines, and random forests. We study the theoretical model selection properties of the NPC for nonparametric plugin methods. Simulation studies show that the NPC outperforms the classical predictionbased criterion that minimizes the overall classification error under various asymmetric classification scenarios. A real data case study of breast cancer suggests that the NPC is a practical criterion that leads to the discovery of novel gene markers with both high sensitivity and specificity for breast cancer diagnosis. The NPC is available in an R package NPcriterion. 
NGra  Recent deep learning models have moved beyond lowdimensional regular grids such as image, video, and speech, to highdimensional graphstructured data, such as social networks, brain connections, and knowledge graphs. This evolution has led to large graphbased irregular and sparse models that go beyond what existing deep learning frameworks are designed for. Further, these models are not easily amenable to efficient, at scale, acceleration on parallel hardwares (e.g. GPUs). We introduce NGra, the first parallel processing framework for graphbased deep neural networks (GNNs). NGra presents a new SAGANN model for expressing deep neural networks as vertex programs with each layer in welldefined (Scatter, ApplyEdge, Gather, ApplyVertex) graph operation stages. This model not only allows GNNs to be expressed intuitively, but also facilitates the mapping to an efficient dataflow representation. NGra addresses the scalability challenge transparently through automatic graph partitioning and chunkbased stream processing out of GPU core or over multiple GPUs, which carefully considers data locality, data movement, and overlapping of parallel processing and data movement. NGra further achieves efficiency through highly optimized Scatter/Gather operators on GPUs despite its sparsity. Our evaluation shows that NGra scales to large real graphs that none of the existing frameworks can handle directly, while achieving up to about 4 times speedup even at small scales over the multiplebaseline design on TensorFlow. 
NGram  In the fields of computational linguistics and probability, an ngram is a contiguous sequence of n items from a given sequence of text or speech. The items can be phonemes, syllables, letters, words or base pairs according to the application. The ngrams typically are collected from a text or speech corpus. An ngram of size 1 is referred to as a “unigram”; size 2 is a “bigram” (or, less commonly, a “digram”); size 3 is a “trigram”. Larger sizes are sometimes referred to by the value of n, e.g., “fourgram”, “fivegram”, and so on. 
NGram Machine (NGM) 
Deep neural networks (DNNs) had great success on NLP tasks such as language modeling, machine translation and certain question answering (QA) tasks. However, the success is limited at more knowledge intensive tasks such as QA from a big corpus. Existing endtoend deep QA models (Miller et al., 2016; Weston et al., 2014) need to read the entire text after observing the question, and therefore their complexity in responding a question is linear in the text size. This is prohibitive for practical tasks such as QA from Wikipedia, a novel, or the Web. We propose to solve this scalability issue by using symbolic meaning representations, which can be indexed and retrieved efficiently with complexity that is independent of the text size. More specifically, we use sequencetosequence models to encode knowledge symbolically and generate programs to answer questions from the encoded knowledge. We apply our approach, called the NGram Machine (NGM), to the bAbI tasks (Weston et al., 2015) and a special version of them (‘lifelong bAbI’) which has stories of up to 10 million sentences. Our experiments show that NGM can successfully solve both of these tasks accurately and efficiently. Unlike fully differentiable memory models, NGM’s time complexity and answering quality are not affected by the story length. The whole system of NGM is trained endtoend with REINFORCE (Williams, 1992). To avoid high variance in gradient estimation, which is typical in discrete latent variable models, we use beam search instead of sampling. To tackle the exponentially large search space, we use a stabilized autoencoding objective and a structure tweak procedure to iteratively reduce and refine the search space. 
Niching  Simply put, niching is a class of methods that try to converge to more than one solution during a single run. Niching is the idea of segmenting the population of the GA into disjoint sets, intended so that you have at least one member in each region of the fitness function that is ‘interesting’; generally by this we mean that you cover more than one local optima. Algorithm of the Week: Niching in Genetic Algorithms 
NIMFA  NIMFA is an opensource Python library that provides a unified interface to nonnegative matrix factorization algorithms. It includes implementations of stateoftheart factorization methods, initialization approaches, and quality scoring. It supports both dense and sparse matrix representation. NIMFA’s componentbased implementation and hierarchical design should help the users to employ already implemented techniques or design and code new strategies for matrix factorization tasks. 
NL4Py  NL4Py is a NetLogo controller software for Python, for the rapid, parallel execution of NetLogo models. NL4Py provides both headless (no graphical user interface) and GUI NetLogo workspace control through Python. Spurred on by the increasing availability of opensource computation and machine learning libraries on the Python package index, there is an increasing demand for such rapid, parallel execution of agentbased models through Python. NetLogo, being the language of choice for a majority of agentbased modeling driven research projects, requires an integration to Python for researchers looking to perform statistical analyses of agentbased model output using these libraries. Unfortunately, until the recent introduction of PyNetLogo, and now NL4Py, such a controller was unavailable. This article provides a detailed introduction into the usage of NL4Py and explains its clientserver software architecture, highlighting architectural differences to PyNetLogo. A stepbystep demonstration of global sensitivity analysis and parameter calibration of the Wolf Sheep Predation model is then performed through NL4Py. Finally, NL4Py’s performance is benchmarked against PyNetLogo and its combination with IPyParallel, and shown to provide significant savings in execution time over both configurations. 
NLDpMRI  Fast data acquisition in Magnetic Resonance Imaging (MRI) is vastly in demand and scan time directly depends on the number of acquired kspace samples. The most common issues in any deep learningbased MRI reconstruction approaches are generalizability and transferability. For different MRI scanner configurations using these approaches, the network must be trained from scratch every time with new training dataset, acquired under new configurations, to be able to provide good reconstruction performance. Here, we propose a new parallel imaging method based on deep neural networks called NLDpMRI to reduce any structured aliasing ambiguities related to the different kspace undersampling patterns for accelerated data acquisition. Two loss functions including nonregularized and regularized are proposed for parallel MRI reconstruction using deep network optimization and we reconstruct MR images by optimizing the proposed loss functions over the network parameters. Unlike any deep learningbased MRI reconstruction approaches, our method doesn’t include any training step that the network learns from a large number of training samples and it only needs the single undersampled multicoil kspace data for reconstruction. Also, the proposed method can handle kspace data with different undersampling patterns, and different number of coils. Unlike most deep learningbased MRI reconstruction methods, our method operates on realworld acquisitions with the complex data format, not on simulated data, realvalued data, or data with added simulatedphase. Experimental results show that the proposed method outperforms the current stateoftheart GRAPPA reconstruction method. 
NMNet  Feature correspondence selection is pivotal to many featurematching based tasks in computer vision. Searching for spatially knearest neighbors is a common strategy for extracting local information in many previous works. However, there is no guarantee that the spatially knearest neighbors of correspondences are consistent because the spatial distribution of false correspondences is often irregular. To address this issue, we present a compatibilityspecific mining method to search for consistent neighbors. Moreover, in order to extract and aggregate more reliable features from neighbors, we propose a hierarchical network named NMNet with a series of convolution layers taking the generated graph as input, which is insensitive to the order of correspondences. Our experimental results have shown the proposed method achieves the stateoftheart performance on four datasets with various inlier ratios and varying numbers of feature consistencies. 
NNCubes  Visual exploration of large multidimensional datasets has seen tremendous progress in recent years, allowing users to express rich data queries that produce informative visual summaries, all in real time. However, a limitation with current techniques is their lack of guidance. Exploration in existing methods is typically driven by data aggregation queries, but these are unable to suggest interesting aggregations and are limited in helping the user understand the types of queries that lead to certain aggregations. To tackle this problem, it is necessary to understand how the space of queries relates to their aggregation results. We present NNCubes: neural networks that are surrogate models for data cube techniques. NNCubes learns a function that takes as input a given query, for instance a geographic region and temporal interval, and outputs an aggregation of the query. The learned function serves as a realtime, lowmemory approximator for aggregation queries. Moreover, using neural networks as querying engines opens up new ways to guide user interactions that would be challenging, to do with existing techniques. First, we show how to use the network for discovering queries that lead to userspecified aggregation results, thus providing a form of direct manipulation. Second, our networks are designed in such a way that we learn meaningful 2D projections of the individual inputs, namely that they are predictive of the aggregation operation. We use these learned projections to allow the user to explore the space of aggregation queries, to help discover trends and patterns in the data. We demonstrate both of these forms of guidance using NNCubes on a variety of datasets. 
nndependabilitykit  nndependabilitykit is an opensource toolbox to support safety engineering of neural networks. The key functionality of nndependabilitykit includes (a) novel dependability metrics for indicating sufficient elimination of uncertainties in the product life cycle, (b) formal reasoning engine for ensuring that the generalization does not lead to undesired behaviors, and (c) runtime monitoring for reasoning whether a decision of a neural network in operation time is supported by prior similarities in the training data. 
NNStreamer  We propose nnstreamer, a software system that handles neural networks as filters of stream pipelines, applying the stream processing paradigm to neural network applications. A new trend with the widespread of deep neural network applications is ondevice AI; i.e., processing neural networks directly on mobile devices or edge/IoT devices instead of cloud servers. Emerging privacy issues, data transmission costs, and operational costs signifies the need for ondevice AI especially when a huge number of devices with realtime data processing are deployed. Nnstreamer efficiently handles neural networks with complex data stream pipelines on devices, improving the overall performance significantly with minimal efforts. Besides, nnstreamer simplifies the neural network pipeline implementations and allows reusing offshelf multimedia stream filters directly; thus it reduces the developmental costs significantly. Nnstreamer is already being deployed with a product releasing soon and is open source software applicable to a wide range of hardware architectures and software platforms. 
No Free Lunch Theorem (NFL) 
In mathematical folklore, the ‘no free lunch’ theorem (sometimes pluralized) of David Wolpert and William Macready appears in the 1997 ‘No Free Lunch Theorems for Optimization’. Wolpert had previously derived no free lunch theorems for machine learning (statistical inference). In 2005, Wolpert and Macready themselves indicated that the first theorem in their paper ‘state that any two optimization algorithms are equivalent when their performance is averaged across all possible problems’. The 1997 theorems of Wolpert and Macready are mathematically technicaland some find them unintuitive. The folkloric ‘no free lunch’ (NFL) theorem is an easily stated and easily understood consequence of theorems Wolpert and Macready actually prove. It is weaker than the proven theorems, and thus does not encapsulate them. Various investigators have extended the work of Wolpert and Macready substantively. http://…/No_free_lunch_in_search_and_optimization 
Node Attribution Method (NAMA) 
In order to solve the problem that convolutional neural networks (CNN) are difficult to process nonimage type relational data, Kipf et al. proposed a graph convolutional neural network (GCN). The core idea is to perform twofold information fusion for each node in a given graph during each iteration: the fusion of graph structure information and the fusion of node feature dimensions. Although GCN has been widely used in the fields of scene semantic relationship analysis, natural language processing, and fewshot learning because of its ability to combine generalization, owing to its twoinformation fusion involves mathematical irreversible calculations, it is hard for GCN to explain that the predicting reason for each node classification (i.e. attribution analysis). However, the existing attribution analysis methods cannot be directly applied to the GCN because compared with the independence among CNN input data, there is correlation between GCN input data. This leads to the existing attribution method only to obtain the partial contribution of the final decision of the GCN from target node feature, the complete contribution and the contribution from neighbor nodes features cannot be obtained. To this end, we propose a gradient attribution analysis method for GCN, NAM (Node Attribution Method), can get the contribution of the target node and its neighbor nodes to the GCN output. We also propose the NIV (Node Importance Visualization) method to visualize the target node of the GCN and its neighbor nodes based on the value of the contribution value. We use the perturbation analysis method to verify the effect of NAM based on the citation network dataset. The experimental results show that NAM can well learn the contribution of each node to the node classification prediction. 
Node Link Diagram  Graphs are frequently drawn as nodelink diagrams in which the vertices are represented as disks, boxes, or textual labels and the edges are represented as line segments, polylines, or curves in the Euclidean plane. Nodelink diagrams can be traced back to the 13th century work of Ramon Llull, who drew diagrams of this type for complete graphs in order to analyze all pairwise combinations among sets of metaphysical concepts. 
node2bits  Identity stitching, the task of identifying and matching various online references (e.g., sessions over different devices and timespans) to the same user in realworld web services, is crucial for personalization and recommendations. However, traditional user stitching approaches, such as grouping or blocking, require quadratic pairwise comparisons between a massive number of user activities, thus posing both computational and storage challenges. Recent works, which are often applicationspecific, heuristically seek to reduce the amount of comparisons, but they suffer from low precision and recall. To solve the problem in an applicationindependent way, we take a heterogeneous networkbased approach in which users (nodes) interact with content (e.g., sessions, websites), and may have attributes (e.g., location). We propose node2bits, an efficient framework that represents multidimensional features of node contexts with binary hashcodes. node2bits leverages featurebased temporal walks to encapsulate short and longterm interactions between nodes in heterogeneous web networks, and adopts SimHash to obtain compact, binary representations and avoid the quadratic complexity for similarity search. Extensive experiments on largescale real networks show that node2bits outperforms traditional techniques and existing works that generate realvalued embeddings by up to 5.16% in F1 score on user stitching, while taking only up to 1.56% as much storage. 
Noise Engineered Modematching GAN (NEMGAN) 
Conditional generation refers to the process of sampling from an unknown distribution conditioned on semantics of the data. This can be achieved by augmenting the generative model with the desired semantic labels, albeit it is not straightforward in an unsupervised setting where the semantic label of every data sample is unknown. In this paper, we address this issue by proposing a method that can generate samples conditioned on the properties of a latent distribution engineered in accordance with a certain data prior. In particular, a latent space inversion network is trained in tandem with a generative adversarial network such that the modal properties of the latent space distribution are induced in the data generating distribution. We demonstrate that our model despite being fully unsupervised, is effective in learning meaningful representations through its mode matching property. We validate our method on multiple unsupervised tasks such as conditional generation, attribute discovery and inference using three real world image datasets namely MNIST, CIFAR10 and CelebA and show that the results are comparable to the stateoftheart methods. 
Noise Sensitivity Score (NSS) 
Deep Neural Networks (DNN) have excessively advanced the field of computer vision by achieving state of the art performance in various vision tasks. These results are not limited to the field of vision but can also be seen in speech recognition and machine translation tasks. Recently, DNNs are found to poorly fail when tested with samples that are crafted by making imperceptible changes to the original input images. This causes a gap between the validation and adversarial performance of a DNN. An effective and generalizable robustness metric for evaluating the performance of DNN on these adversarial inputs is still missing from the literature. In this paper, we propose Noise Sensitivity Score (NSS), a metric that quantifies the performance of a DNN on a specific input under different forms of fixdirectional attacks. An insightful mathematical explanation is provided for deeply understanding the proposed metric. By leveraging the NSS, we also proposed a skewness based dataset robustness metric for evaluating a DNN’s adversarial performance on a given dataset. Extensive experiments using widely used state of the art architectures along with popular classification datasets, such as MNIST, CIFAR10, CIFAR100, and ImageNet, are used to validate the effectiveness and generalization of our proposed metrics. Instead of simply measuring a DNN’s adversarial robustness in the input domain, as previous works, the proposed NSS is built on top of insightful mathematical understanding of the adversarial attack and gives a more explicit explanation of the robustness. 
NOise Tolerant Ensemble RCNN (NOTERCNN) 
The labeling cost of large number of bounding boxes is one of the main challenges for training modern object detectors. To reduce the dependence on expensive bounding box annotations, we propose a new semisupervised object detection formulation, in which a few seed box level annotations and a large scale of image level annotations are used to train the detector. We adopt a trainingmining framework, which is widely used in weakly supervised object detection tasks. However, the mining process inherently introduces various kinds of labelling noises: false negatives, false positives and inaccurate boundaries, which can be harmful for training the standard object detectors (e.g. Faster RCNN). We propose a novel NOise Tolerant Ensemble RCNN (NOTERCNN) object detector to handle such noisy labels. Comparing to standard Faster RCNN, it contains three highlights: an ensemble of two classification heads and a distillation head to avoid overfitting on noisy labels and improve the mining precision, masking the negative sample loss in box predictor to avoid the harm of false negative labels, and training box regression head only on seed annotations to eliminate the harm from inaccurate boundaries of mined bounding boxes. We evaluate the methods on ILSVRC 2013 and MSCOCO 2017 dataset; we observe that the detection accuracy consistently improves as we iterate between mining and training steps, and stateoftheart performance is achieved. 
Noise2Void (N2V) 
The field of image denoising is currently dominated by discriminative deep learning methods that are trained on pairs of noisy input and clean target images. Recently it has been shown that such methods can also be trained without clean targets. Instead, independent pairs of noisy images can be used, in an approach known as Noise2Noise (N2N). Here, we introduce Noise2Void (N2V), a training scheme that takes this idea one step further. It does not require noisy image pairs, nor clean target images. Consequently, N2V allows us to train directly on the body of data to be denoised and can therefore be applied when other methods cannot. Especially interesting is the application to biomedical image data, where the acquisition of training targets, clean or noisy, is frequently not possible. We compare the performance of N2V to approaches that have either clean target images and/or noisy image pairs available. Intuitively, N2V cannot be expected to outperform methods that have more information available during training. Still, we observe that the denoising performance of Noise2Void drops in moderation and compares favorably to trainingfree denoising methods. 
NoiseContrastive Estimation (NCE) 
Many parametric statistical models are not properly normalised and only specified up to an intractable partition function, which renders parameter estimation difficult. Examples of unnormalised models are Gibbs distributions, Markov random fields, and neural network models in unsupervised deep learning. In previous work, the estimation principle called noisecontrastive estimation (NCE) was introduced where unnormalised models are estimated by learning to distinguish between data and auxiliary noise. An open question is how to best choose the auxiliary noise distribution. We here propose a new method that addresses this issue. The proposed method shares with NCE the idea of formulating density estimation as a supervised learning problem but in contrast to NCE, the proposed method leverages the observed data when generating noise samples. The noise can thus be generated in a semiautomated manner. We first present the underlying theory of the new method, show that score matching emerges as a limiting case, validate the method on continuous and discrete valued synthetic data, and show that we can expect an improved performance compared to NCE when the data lie in a lowerdimensional manifold. Then we demonstrate its applicability in unsupervised deep learning by estimating a fourlayer neural image model. 
Noisin  Recurrent neural networks (RNNs) are powerful models of sequential data. They have been successfully used in domains such as text and speech. However, RNNs are susceptible to overfitting; regularization is important. In this paper we develop Noisin, a new method for regularizing RNNs. Noisin injects random noise into the hidden states of the RNN and then maximizes the corresponding marginal likelihood of the data. We show how Noisin applies to any RNN and we study many different types of noise. Noisin is unbiased–it preserves the underlying RNN on average. We characterize how Noisin regularizes its RNN both theoretically and empirically. On language modeling benchmarks, Noisin improves over dropout by as much as 12.2% on the Penn Treebank and 9.4% on the Wikitext2 dataset. We also compared the stateoftheart language model of Yang et al. 2017, both with and without Noisin. On the Penn Treebank, the method with Noisin more quickly reaches stateoftheart performance. 
Noisy Expectation Maximization (NEM) 
We present a noiseinjected version of the ExpectationMaximization (EM) algorithm: the Noisy Expectation Maximization (NEM) algorithm. The NEM algorithm uses noise to speed up the convergence of the EM algorithm. The NEM theorem shows that injected noise speeds up the average convergence of the EM algorithm to a local maximum of the likelihood surface if a positivity condition holds. The generalized form of the noisy expectationmaximization (NEM) algorithm allow for arbitrary modes of noise injection including adding and multiplying noise to the data. We demonstrate these noise benefits on EM algorithms for the Gaussian mixture model (GMM) with both additive and multiplicative NEM noise injection. A separate theorem (not presented here) shows that the noise benefit for independent identically distributed additive noise decreases with sample size in mixture models. This theorem implies that the noise benefit is most pronounced if the data is sparse. Injecting blind noise only slowed convergence. 
Noisy MultiLabel SemiSupervised Dimensionality Reduction (NMLSDR) 
Noisy labeled data represent a rich source of information that often are easily accessible and cheap to obtain, but label noise might also have many negative consequences if not accounted for. How to fully utilize noisy labels has been studied extensively within the framework of standard supervised machine learning over a period of several decades. However, very little research has been conducted on solving the challenge posed by noisy labels in nonstandard settings. This includes situations where only a fraction of the samples are labeled (semisupervised) and each highdimensional sample is associated with multiple labels. In this work, we present a novel semisupervised and multilabel dimensionality reduction method that effectively utilizes information from both noisy multilabels and unlabeled data. With the proposed Noisy multilabel semisupervised dimensionality reduction (NMLSDR) method, the noisy multilabels are denoised and unlabeled data are labeled simultaneously via a specially designed label propagation algorithm. NMLSDR then learns a projection matrix for reducing the dimensionality by maximizing the dependence between the enlarged and denoised multilabel space and the features in the projected space. Extensive experiments on synthetic data, benchmark datasets, as well as a realworld case study, demonstrate the effectiveness of the proposed algorithm and show that it outperforms stateoftheart multilabel feature extraction algorithms. 
NoisyNet  We introduce NoisyNet, a deep reinforcement learning agent with parametric noise added to its weights, and show that the induced stochasticity of the agent’s policy can be used to aid efficient exploration. The parameters of the noise are learned with gradient descent along with the remaining network weights. NoisyNet is straightforward to implement and adds little computational overhead. We find that replacing the conventional exploration heuristics for A3C, DQN and dueling agents (entropy reward and $\epsilon$greedy respectively) with NoisyNet yields substantially higher scores for a wide range of Atari games, in some cases advancing the agent from sub to superhuman performance. 
Nomogram  A nomogram, also called a nomograph, alignment chart or abaque, is a graphical calculating device, a twodimensional diagram designed to allow the approximate graphical computation of a function. The field of nomography was invented in 1884 by the French engineer Philbert Maurice d’Ocagne (18621938) and used extensively for many years to provide engineers with fast graphical calculations of complicated formulas to a practical precision. Nomograms use a parallel coordinate system invented by d’Ocagne rather than standard Cartesian coordinates. A nomogram consists of a set of n scales, one for each variable in an equation. Knowing the values of n1 variables, the value of the unknown variable can be found, or by fixing the values of some variables, the relationship between the unfixed ones can be studied. The result is obtained by laying a straightedge across the known values on the scales and reading the unknown value from where it crosses the scale for that variable. The virtual or drawn line created by the straightedge is called an index line or isopleth. 
Non Metric Space (Approximate) Library (NMSLIB) 
A NonMetric Space Library (‘NMSLIB’ <https://…/nmslib> ) wrapper, which according to the authors ‘is an efficient crossplatform similarity search library and a toolkit for evaluation of similarity search methods. The goal of the ‘NMSLIB’ <https://…/nmslib> Library is to create an effective and comprehensive toolkit for searching in generic nonmetric spaces. Being comprehensive is important, because no single method is likely to be sufficient in all cases. Also note that exact solutions are hardly efficient in high dimensions and/or nonmetric spaces. Hence, the main focus is on approximate methods’. The wrapper also includes Approximate Kernel kNearestNeighbor functions based on the ‘NMSLIB’ <https://…/nmslib> ‘Python’ Library. nmslibR 
NonAutOregressive Multiresolution Imputation (NAOMI) 
Missing value imputation is a fundamental problem in modeling spatiotemporal sequences, from motion tracking to the dynamics of physical systems. In this paper, we take a nonautoregressive approach and propose a novel deep generative model: NonAutOregressive Multiresolution Imputation (NAOMI) for imputing longrange spatiotemporal sequences given arbitrary missing patterns. In particular, NAOMI exploits the multiresolution structure of spatiotemporal data to interpolate recursively from coarse to finegrained resolutions. We further enhance our model with adversarial training using an imitation learning objective. When trained on billiards and basketball trajectories, NAOMI demonstrates significant improvement in imputation accuracy (reducing average prediction error by 60% compared to autoregressive counterparts) and generalization capability for long range trajectories in systems of both deterministic and stochastic dynamics. 
Nonconvex Conditional Gradient Sliding (NCGS) 
We investigate a projection free method, namely conditional gradient sliding on batched, stochastic and finitesum nonconvex problem. CGS is a smart combination of Nesterov’s accelerated gradient method and FrankWolfe (FW) method, and outperforms FW in the convex setting by saving gradient computations. However, the study of CGS in the nonconvex setting is limited. In this paper, we propose the nonconvex conditional gradient sliding (NCGS) which surpasses the nonconvex FrankWolfe method in batched, stochastic and finitesum setting. 
NonCorrelating Multiplicative Noise (NCMN) 
Multiplicative noise, including dropout, is widely used to regularize deep neural networks (DNNs), and is shown to be effective in a wide range of architectures and tasks. From an information perspective, we consider injecting multiplicative noise into a DNN as training the network to solve the task with noisy information pathways, which leads to the observation that multiplicative noise tends to increase the correlation between features, so as to increase the signaltonoise ratio of information pathways. However, high feature correlation is undesirable, as it increases redundancy in representations. In this work, we propose noncorrelating multiplicative noise (NCMN), which exploits batch normalization to remove the correlation effect in a simple yet effective way. We show that NCMN significantly improves the performance of standard multiplicative noise on image classification tasks, providing a better alternative to dropout for batchnormalized networks. Additionally, we present a unified view of NCMN and shakeshake regularization, which explains the performance gain of the latter. 
NonDeterministic Inference Framework  A random set is a generalisation of a random variable, i.e. a setvalued random variable. The random set theory allows a unification of other uncertainty descriptions such as interval variable, mass belief function in DempsterShafer theory of evidence, possibility theory, and set of probability distributions. The aim of this work is to develop a nondeterministic inference framework, including theory, approximation and sampling method, that deals with the inverse problems in which uncertainty is represented using random sets. The proposed inference method yields the posterior random set based on the intersection of the prior and the measurement induced random sets. That inference method is an extension of Dempster’s rule of combination, and a generalisation of Bayesian inference as well. A direct evaluation of the posterior random set might be impractical. We approximate the posterior random set by a random discrete set whose domain is the set of samples generated using a proposed probability distribution. We use the capacity transform density function of the posterior random set for this proposed distribution. This function has a special property: it is the posterior density function yielded by Bayesian inference of the capacity transform density function of the prior random set. The samples of such proposed probability distribution can be directly obtained using the methods developed in the Bayesian inference framework. With this approximation method, the evaluation of the posterior random set becomes tractable. 
NonDeterministic Turing Machine  In theoretical computer science, a nondeterministic Turing machine is a theoretical model of computation. They are used in thought experiments to examine the abilities and limitations of computers. One of the most important open problems in theoretical computer science is the P vs. NP problem, which concerns the question of how difficult it is to simulate nondeterministic computation with a deterministic computer. 
NonGaussian Component Analysis (NGCA) 
NonGaussian component analysis (NGCA) is a problem in multidimensional data analysis. Since its formulation in 2006, NGCA has attracted considerable attention in statistics and machine learning. In this problem, we have a random variable $X$ in $n$dimensional Euclidean space. There is an unknown subspace $U$ of the $n$dimensional Euclidean space such that the orthogonal projection of $X$ onto $U$ is standard multidimensional Gaussian and the orthogonal projection of $X$ onto $V$, the orthogonal complement of $U$, is nonGaussian, in the sense that all its onedimensional marginals are different from the Gaussian in a certain metric defined in terms of moments. The NGCA problem is to approximate the nonGaussian subspace $V$ given samples of $X$. Vectors in $V$ corresponds to ‘interesting’ directions, whereas vectors in $U$ correspond to the directions where data is very noisy. The most interesting applications of the NGCA model is for the case when the magnitude of the noise is comparable to that of the true signal, a setting in which traditional noise reduction techniques such as PCA don’t apply directly. NGCA is also related to dimensionality reduction and to other data analysis problems such as ICA. NGCAlike problems have been studied in statistics for a long time using techniques such as projection pursuit. 
NonHomogeneous Markov Switching Autoregressive Models (MSAR) 
In this paper, nonhomogeneous MarkovSwitching Autoregressive (MSAR) models are proposed to describe wind time series. In these models, several au toregressive models are used to describe the time evolution of the wind speed and the switching between these different models is controlled by a hidden Markov chain which represents the weather types. We first block the data by month in order to remove seasonal components and propose a MSAR model with nonhomogeneous autoregressive models to describe daily components. Then we discuss extensions where the hidden Markov chain is also nonstationary to handle seasonal and interannual fluctuations. NHMSAR 
NonInterfering ComparisonExchange (NICE) 
In studying the statistical frequency of exchange in comparisonexchange (CE) networks we discover a new elementary form of comparisonexchange which we name the ‘2op’. The operation supports concurrent and noninterfering operations of two traditional CEs upon one shared element. More than merely improving overall statistical performance, the introduction of NICE (noninterfering CE) networks lowers longheld bounds in the number of stages required for sorting tasks. Codebased CEs also benefit from improved average/worst case run time costs. 
NonIntrusive Load Monitoring (NILM) 
Nonintrusive load monitoring (NILM), or nonintrusive appliance load monitoring (NIALM), is a process for analyzing changes in the voltage and current going into a house and deducing what appliances are used in the house as well as their individual energy consumption. Electric meters with NILM technology are used by utility companies to survey the specific uses of electric power in different homes. NILM is considered a lowcost alternative to attaching individual monitors on each appliance. It does, however, present privacy concerns. 
NonIntrusive Probabilistic Power Flow  In this paper, a novel nonintrusive probabilistic power flow (PPF) analysis method based on the lowrank approximation (LRA) is proposed, which can accurately and efficiently estimate the probabilistic characteristics (e.g., mean, variance, probability density function) of the PPF solutions. This method aims at building up a statisticallyequivalent surrogate for the PPF solutions through a small number of power flow evaluations. By exploiting the retained tensorproduct form of the univariate polynomial basis, a sequential correctionupdating scheme is applied, making the total number of unknowns to be linear rather than exponential to the number of random inputs. Consequently, the LRA method is particularly promising for dealing with highdimensional problems with a large number of random inputs. Numerical studies on the IEEE 39bus, 118bus, and 1354bus systems show that the proposed method can achieve accurate probabilistic characteristics of the PPF solutions with much less computational effort compared to the Monte Carlo simulations. Even compared to the polynomial chaos expansion method, the LRA method can achieve comparable accuracy, while the LRA method is more capable of handling higherdimensional problems. Moreover, numerical results reveal that the randomness brought about by the renewable energy resources and loads may inevitably affect the feasibility of dispatch/planning schemes. 
Nonlinear Collaborative Scheme  Conventional research attributes the improvements of generalization ability of deep neural networks either to powerful optimizers or the new network design. Different from them, in this paper, we aim to link the generalization ability of a deep network to optimizing a new objective function. To this end, we propose a \textit{nonlinear collaborative scheme} for deep network training, with the key technique as combining different loss functions in a nonlinear manner. We find that after adaptively tuning the weights of different loss functions, the proposed objective function can efficiently guide the optimization process. What is more, we demonstrate that, from the mathematical perspective, the nonlinear collaborative scheme can lead to (i) smaller KL divergence with respect to optimal solutions; (ii) datadriven stochastic gradient descent; (iii) tighter PACBayes bound. We also prove that its advantage can be strengthened by nonlinearity increasing. To some extent, we bridge the gap between learning (i.e., minimizing the new objective function) and generalization (i.e., minimizing a PACBayes bound) in the new scheme. We also interpret our findings through the experiments on Residual Networks and DenseNet, showing that our new scheme performs superior to singleloss and multiloss schemes no matter with randomization or not. 
Nonlinear Dimensionality Reduction (NLDR) 
Highdimensional data, meaning data that requires more than two or three dimensions to represent, can be difficult to interpret. One approach to simplification is to assume that the data of interest lie on an embedded nonlinear manifold within the higherdimensional space. If the manifold is of low enough dimension, the data can be visualised in the lowdimensional space. Topleft: a 3D dataset of 1000 points in a spiraling band (a.k.a. the Swiss roll) with a rectangular hole in the middle. Topright: the original 2D manifold used to generate the 3D dataset. Bottom left and right: 2D recoveries of the manifold respectively using the LLE and Hessian LLE algorithms as implemented by the Modular Data Processing toolkit. Below is a summary of some of the important algorithms from the history of manifold learning and nonlinear dimensionality reduction (NLDR). Many of these nonlinear dimensionality reduction methods are related to the linear methods listed below. Nonlinear methods can be broadly classified into two groups: those that provide a mapping (either from the highdimensional space to the lowdimensional embedding or vice versa), and those that just give a visualisation. In the context of machine learning, mapping methods may be viewed as a preliminary feature extraction step, after which pattern recognition algorithms are applied. Typically those that just give a visualisation are based on proximity data – that is, distance measurements. ➚ “Manifold Learning” 
Nonlinear expectation  In probability theory, a nonlinear expectation is a nonlinear generalization of the expectation. Nonlinear expectations are useful in utility theory as they more closely match human behavior than traditional expectations. 
Nonlinear Iterative Partial Least Squares (NIPALS) 
In statistics, nonlinear iterative partial least squares (NIPALS) is an algorithm for computing the first few components in a principal component or partial least squares analysis. For veryhighdimensional datasets, such as those generated in the ‘omics sciences (e.g., genomics, metabolomics) it is usually only necessary to compute the first few principal components. The nonlinear iterative partial least squares (NIPALS) algorithm calculates t1 and p1′ from X. The outer product, t1p1’ can then be subtracted from X leaving the residual matrix E1. This can be then used to calculate subsequent principal components. This results in a dramatic reduction in computational time since calculation of the covariance matrix is avoided. 
Nonlinear Simplex Regression Model  In this paper, we propose a simplex regression model in which both the mean and the dispersion parameters are related to covariates by nonlinear predictors. We provide closedform expressions for the score function, for Fisher’s information matrix and its inverse. Some diagnostic measures are introduced. We propose a residual, obtained using Fisher’s scoring iterative scheme for the estimation of the parameters that index the regression nonlinear predictor to the mean response and numerically evaluate its behaviour. We also derive the appropriate matrices for assessing local influence on the parameter estimates under different perturbation schemes. We also proposed a scheme for the choice of starting values for the Fisher’s iterative scheme for nonlinear simplex models. The diagnostic techniques were applied on actual data. The local influence analyses reveal that the simplex models can be a modeling alternative more robust to influential cases than the beta regression models, both to linear and nonlinear models. 
Nonlinear Variable Selection based on Derivatives (NVSD) 
We investigate structured sparsity methods for variable selection in regression problems where the target depends nonlinearly on the inputs. We focus on general nonlinear functions not limiting a priori the function space to additive models. We propose two new regularizers based on partial derivatives as nonlinear equivalents of group lasso and elastic net. We formulate the problem within the framework of learning in reproducing kernel Hilbert spaces and show how the variational problem can be reformulated into a more practical finite dimensional equivalent. We develop a new algorithm derived from the ADMM principles that relies solely on closed forms of the proximal operators. We explore the empirical properties of our new algorithm for Nonlinear Variable Selection based on Derivatives (NVSD) on a set of experiments and confirm favourable properties of our structuredsparsity models and the algorithm in terms of both prediction and variable selection accuracy. 
Nonlinearity Coefficient  For a long time, designing neural architectures that exhibit high performance was considered a dark art that required expert handtuning. One of the few wellknown guidelines for architecture design is the avoidance of exploding gradients, though even this guideline has remained relatively vague and circumstantial. We introduce the nonlinearity coefficient (NLC), a measurement of the complexity of the function computed by a neural network that is based on the magnitude of the gradient. Via an extensive empirical study, we show that the NLC is a powerful predictor of test error and that attaining a rightsized NLC is essential for optimal performance. The NLC exhibits a range of intriguing and important properties. It is closely tied to the amount of information gained from computing a single network gradient. It is tied to the error incurred when replacing the nonlinearity operations in the network with linear operations. It is not susceptible to the confounders of multiplicative scaling, additive bias and layer width. It is stable from layer to layer. Hence, we argue that the NLC is the first robust predictor of overfitting in deep networks. 
NonLocal Context Encoder (NLCE) 
Recent progress in biomedical image segmentation based on deep convolutional neural networks (CNNs) has drawn much attention. However, its vulnerability towards adversarial samples cannot be overlooked. This paper is the first one that discovers that all the CNNbased stateoftheart biomedical image segmentation models are sensitive to adversarial perturbations. This limits the deployment of these methods in safetycritical biomedical fields. In this paper, we discover that global spatial dependencies and global contextual information in a biomedical image can be exploited to defend against adversarial attacks. To this end, nonlocal context encoder (NLCE) is proposed to model short and long range spatial dependencies and encode global contexts for strengthening feature activations by channelwise attention. The NLCE modules enhance the robustness and accuracy of the nonlocal context encoding network (NLCEN), which learns robust enhanced pyramid feature representations with NLCE modules, and then integrates the information across different levels. Experiments on both lung and skin lesion segmentation datasets have demonstrated that NLCEN outperforms any other stateoftheart biomedical image segmentation methods against adversarial attacks. In addition, NLCE modules can be applied to improve the robustness of other CNNbased biomedical image segmentation methods. 
Nonlocal Neural Network  Both convolutional and recurrent operations are building blocks that process one local neighborhood at a time. In this paper, we present nonlocal operations as a generic family of building blocks for capturing longrange dependencies. Inspired by the classical nonlocal means method in computer vision, our nonlocal operation computes the response at a position as a weighted sum of the features at all positions. This building block can be plugged into many computer vision architectures. On the task of video classification, even without any bells and whistles, our nonlocal models can compete or outperform current competition winners on both Kinetics and Charades datasets. In static image recognition, our nonlocal models improve object detection/segmentation and pose estimation on the COCO suite of tasks. Code will be made available. 
NonMarkovian Monte Carlo (NMMC) 
Markov Chain Monte Carlo (MCMC) has been the de facto technique for sampling and inference of large graphs such as online social networks. At the heart of MCMC lies the ability to construct an ergodic Markov chain that attains any given stationary distribution $\boldsymbol{\pi}$, often in the form of random walks or crawling agents on the graph. Most of the works around MCMC, however, presume that the graph is undirected or has reciprocal edges, and become inapplicable when the graph is directed and nonreciprocal. Here we develop a similar framework for directed graphs, which we call NonMarkovian Monte Carlo (NMMC), by establishing a mapping to convert $\boldsymbol{\pi}$ into the quasistationary distribution of a carefully constructed transient Markov chain on an extended state space. As applications, we demonstrate how to achieve any given distribution $\boldsymbol{\pi}$ on a directed graph and estimate the eigenvector centrality using a set of nonMarkovian, historydependent random walks on the same graph in a distributed manner. We also provide numerical results on various realworld directed graphs to confirm our theoretical findings, and present several practical enhancements to make our NMMC method ready for practical use in most directed graphs. To the best of our knowledge, the proposed NMMC framework for directed graphs is the first of its kind, unlocking all the limitations set by the standard MCMC methods for undirected graphs. 
NonMaximum Suppression  
Nonmetric MultiDimensional Scaling (NMDS) 
Nonmetric multidimensional scaling (MDS, also NMDS and NMS) is an ordination technique that differs in several ways from nearly all other ordination methods. In most ordination methods, many axes are calculated, but only a few are viewed, owing to graphical limitations. In MDS, a small number of axes are explicitly chosen prior to the analysis and the data are fitted to those dimensions; there are no hidden axes of variation. Second, most other ordination methods are analytical and therefore result in a single unique solution to a set of data. In contrast, MDS is a numerical technique that iteratively seeks a solution and stops computation when an acceptable solution has been found, or it stops after some prespecified number of attempts. As a result, an MDS ordination is not a unique solution and a subsequent MDS analysis on the same set of data and following the same methodology will likely result in a somewhat different ordination. Third, MDS is not an eigenvalueeigenvector technique like principal components analysis or correspondence analysis that ordinates the data such that axis 1 explains the greatest amount of variance, axis 2 explains the next greatest amount of variance, and so on. As a result, an MDS ordination can be rotated, inverted, or centered to any desired configuration. 
Nonnegative Matrix Factorization (NMF) 
Nonnegative matrix factorization (NMF), also nonnegative matrix approximation is a group of algorithms in multivariate analysis and linear algebra where a matrix V is factorized into (usually) two matrices W and H, with the property that all three matrices have no negative elements. This nonnegativity makes the resulting matrices easier to inspect. Since the problem is not exactly solvable in general, it is commonly approximated numerically. NMF finds applications in such fields as computer vision, document clustering, chemometrics and recommender systems. NMF 
Nonnegative Matrix Factorization ExpectationMaximization (NMFEM) 
Mixture models are among the most popular tools for model based clustering. However, when the dimension and the number of clusters is large, the estimation as well as the interpretation of the clusters become challenging. We propose a reduceddimension mixture model, where the K components parameters are combinations of words from a small dictionary – say H words with H«K . Including a Nonnegative Matrix Factorization (NMF) in the EM algorithm allows to simultaneously estimate the dictionary and the parameters of the mixture. We propose the acronym NMFEM for this algorithm. This original approach is motivated by passengers clustering from ticketing data: we apply NMFEM to ticketing data from two Transdev public transport networks. In this case, the words are easily interpreted as typical slots in a timetable. nmfem 
Nonparallel Support Vector Ordinal Regression (NPSVOR) 
Ordinal regression (OR) is a special multiclass classification problem where an order relation exists among the labels. Recent years, people share their opinions and sentimental judgments conveniently with social networks and ECommerce so that plentiful largescale OR problems arise. However, few studies have focused on this kind of problems. Nonparallel Support Vector Ordinal Regression (NPSVOR) is a SVMbased OR model, which learns a hyperplane for each rank by solving a series of independent suboptimization problems and then ensembles those learned hyperplanes to predict. The previous studies are focused on its nonlinear case and got a competitive testing performance, but its training is time consuming, particularly for largescale data. In this paper, we consider NPSVOR’s linear case and design an efficient training method based on the dual coordinate descent method (DCD). To utilize the order information among labels in prediction, a new prediction function is also proposed. Extensive contrast experiments on the text OR datasets indicate that the carefully implemented DCD is very suitable for training large data. 
Nonparametric Behavior Clustering Inverse Reinforcement Learning  Inverse Reinforcement Learning (IRL) is the task of learning a single reward function given a Markov Decision Process (MDP) without defining the reward function, and a set of demonstrations generated by humans/experts. However, in practice, it may be unreasonable to assume that human behaviors can be explained by one reward function since they may be inherently inconsistent. Also, demonstrations may be collected from various users and aggregated to infer and predict user’s behaviors. In this paper, we introduce the Nonparametric Behavior Clustering IRL algorithm to simultaneously cluster demonstrations and learn multiple reward functions from demonstrations that may be generated from more than one behaviors. Our method is iterative: It alternates between clustering demonstrations into different behavior clusters and inverse learning the reward functions until convergence. It is built upon the ExpectationMaximization formulation and nonparametric clustering in the IRL setting. Further, to improve the computation efficiency, we remove the need of completely solving multiple IRL problems for multiple clusters during the iteration steps and introduce a resampling technique to avoid generating too many unlikely clusters. We demonstrate the convergence and efficiency of the proposed method through learning multiple driver behaviors from demonstrations generated from a gridworld environment and continuous trajectories collected from autonomous robot cars using the Gazebo robot simulator. 
Nonparametric Canonical Correlation Analysis (NCCA) 
Canonical correlation analysis (CCA) is a fundamental technique in multiview data analysis and representation learning. Several nonlinear extensions of the classical linear CCA method have been proposed, including kernel and deep neural network methods. These approaches restrict attention to certain families of nonlinear projections, which the user must specify (by choosing a kernel or a neural network architecture), and are computationally demanding. Interestingly, the theory of nonlinear CCA without any functional restrictions, has been studied in the population setting by Lancaster already in the 50’s. However, these results, have not inspired practical algorithms. In this paper, we revisit Lancaster’s theory, and use it to devise a practical algorithm for nonparametric CCA (NCCA). Specifically, we show that the most correlated nonlinear projections of two random vectors can be expressed in terms of the singular value decomposition of a certain operator associated with their joint density. Thus, by estimating the population density from data, NCCA reduces to solving an eigenvalue system, superficially like kernel CCA but, importantly, without having to compute the inverse of any kernel matrix. We also derive a partially linear CCA (PLCCA) variant in which one of the views undergoes a linear projection while the other is nonparametric. PLCCA turns out to have a similar form to the classical linear CCA, but with a nonparametric regression term replacing the linear regression in CCA. Using a kernel density estimate based on a small number of nearest neighbors, our NCCA and PLCCA algorithms are memoryefficient, often run much faster, and achieve better performance than kernel CCA and comparable performance to deep CCA. 
NonParametric Generalized Linear Model (NPGLM) 
In this paper, we try to solve the problem of temporal link prediction in information networks. This implies predicting the time it takes for a link to appear in the future, given its features that have been extracted at the current network snapshot. To this end, we introduce a probabilistic nonparametric approach, called ‘NonParametric Generalized Linear Model’ (NPGLM), which infers the hidden underlying probability distribution of the link advent time given its features. We then present a learning algorithm for NPGLM and an inference method to answer timerelated queries. Extensive experiments conducted on both synthetic data and realworld Sina Weibo social network demonstrate the effectiveness of NPGLM in solving temporal link prediction problem visavis competitive baselines. 
Nonparametric Neural Networks  Automatically determining the optimal size of a neural network for a given task without prior information currently requires an expensive global search and training many networks from scratch. In this paper, we address the problem of automatically finding a good network size during a single training cycle. We introduce *nonparametric neural networks*, a nonprobabilistic framework for conducting optimization over all possible network sizes and prove its soundness when network growth is limited via an L_p penalty. We train networks under this framework by continuously adding new units while eliminating redundant units via an L_2 penalty. We employ a novel optimization algorithm, which we term *adaptive radialangular gradient descent* or *AdaRad*, and obtain promising results. 
NonParametric Transformation Network (NPTN) 
ConvNets have been very effective in many applications where it is required to learn invariances to withinclass nuisance transformations. However, through their architecture, ConvNets only enforce invariance to translation. In this paper, we introduce a new class of convolutional architectures called NonParametric Transformation Networks (NPTNs) which can learn general invariances and symmetries directly from data. NPTNs are a direct and natural generalization of ConvNets and can be optimized directly using gradient descent. They make no assumption regarding structure of the invariances present in the data and in that aspect are very flexible and powerful. We also model ConvNets and NPTNs under a unified framework called Transformation Networks which establishes the natural connection between the two. We demonstrate the efficacy of NPTNs on natural data such as MNIST and CIFAR 10 where it outperforms ConvNet baselines with the same number of parameters. We show it is effective in learning invariances unknown apriori directly from data from scratch. Finally, we apply NPTNs to Capsule Networks and show that they enable them to perform even better. 
Nonparanormal Graphical Model  A nonparanormal graphical model is a semiparametric generalization of a Gaussian graphical model for continuous variables in which it is assumed that the variables follow a Gaussian graphical model only after some unknown smooth monotone transformations. 
NonResponse Bias  Nonresponse bias occurs in statistical surveys if the answers of respondents differ from the potential answers of those who did not answer. 
NOnstationary Space TIme variable Latent Length scale GP (NOSTILLGP) 
One of the primary aspects of sustainable development involves accurate understanding and modeling of environmental phenomena. Many of these phenomena exhibit variations in both space and time and it is imperative to develop a deeper understanding of techniques that can model spacetime dynamics accurately. In this paper we propose NOSTILLGP – NOnstationary Space TIme variable Latent Length scale GP, a generic nonstationary, spatiotemporal Gaussian Process (GP) model. We present several strategies, for efficient training of our model, necessary for realworld applicability. Extensive empirical validation is performed using three realworld environmental monitoring datasets, with diverse dynamics across space and time. Results from the experiments clearly demonstrate general applicability and effectiveness of our approach for applications in environmental monitoring. 
Nonstationary Stochastic Processes  A stochastic process (a collection of random variables ordered in time, e.g. GDP(t)) is said to be (weakly) stationary if its mean and variance are constant over time, i.e. time invariant (along with its autocovariance). Such a time series will tend to return to its mean (mean reversion) and fluctuations around this mean will have a broadly constant amplitude. Alternatively, a stationary process will not drift too far away from its mean value because of the nite variance. By contrast, a nonstationary time series will have a timevarying mean or a timevarying variance or both. lmenssp 
NonStationary Streaming PCA  We consider the problem of streaming principal component analysis (PCA) when the observations are noisy and generated in a nonstationary environment. Given $T$, $p$dimensional noisy observations sampled from a nonstationary variant of the spiked covariance model, our goal is to construct the best linear $k$dimensional subspace of the terminal observations. We study the effect of nonstationarity by establishing a lower bound on the number of samples and the corresponding recovery error obtained by any algorithm. We establish the convergence behaviour of the noisy power method using a novel proof technique which maybe of independent interest. We conclude that the recovery guarantee of the noisy power method matches the fundamental limit, thereby generalizing existing results on streaming PCA to a nonstationary setting. 
NonUniform Fast Fourier Transform (NUFFT) 
Fourier analysis plays a natural role in a wide variety of applications, from medical imaging to radio astronomy, data analysis and the numerical solution of partial differential equations. When the sampling is uniform and the Fourier transform is desired at equispaced frequencies, the classical fast Fourier transform (FFT) has played a fundamental role in computation. The FFT requires O(N log N) work to compute N Fourier modes from N data points rather than O(N2) work. When the data is irregular in either the ‘physical’ or ‘frequency’ domain, unfortunately, the FFT does not apply. Over the last twenty years, a number of algorithms have been developed to overcome this limitation – generally referred to as nonuniform FFTs (NUFFT), nonequispaced FFTs (NFFT) or unequallyspaced FFTs (USFFT). They achieve the same O(N log N) computational complexity, but with a larger, precisiondependent, and dimensiondependent constant. http://…/glee_nufft_sirev.pdf https://…/optimizingpythonwithnumpyandnumba 
nonuniform phasestepping algorithm  We develop an errorfree, nonuniform phasestepping algorithm (nPSA) based on principal component analysis (PCA). PCAbased algorithms typically give phasedemodulation errors when applied to nonuniform phaseshifted interferograms. We present a straightforward way to correct those PCA phasedemodulation errors. We give mathematical formulas to fully analyze PCAbased nPSA (PCAnPSA). These formulas give a) the PCAnPSA frequency transfer function (FTF), b) its corrected Lissajous figure, c) the corrected PCAnPSA formula, d) its harmonic robustness, and e) its signaltonoiseratio (SNR). We show that the PCAnPSA can be seen as a linear quadrature filter, and as consequence, one can find its FTF. Using the FTF, we show why plain PCA often fails to demodulate nonuniform phaseshifted fringes. Previous works on PCAnPSA (without FTF), give specific numerical/experimental fringe data to ‘visually demonstrate’ that their new nPSA works better than competitors. This often leads to biased/favorable fringe pattern selections which ‘visually demonstrate’ the superior performance of their new nPSA. This biasing is herein totally avoided because we provide figuresofmerit formulas based on linear systems and stochastic process theories. However, and for illustrative purposes only, we provide specific fringe data phasedemodulation, including comprehensive analysis and comparisons. 
NoReference Image Quality Assessment (NRIQA) 
In this paper we investigate into the problem of image quality assessment (IQA) and enhancement via machine learning. This issue has long attracted a wide range of attention in computational intelligence and image processing communities, since, for many practical applications, e.g. object detection and recognition, raw images are usually needed to be appropriately enhanced to raise the visual quality (e.g. visibility and contrast). In fact, proper enhancement can noticeably improve the quality of input images, even better than originally captured images which are generally thought to be of the best quality. In this work, we present two most important contributions. The first contribution is to develop a new noreference image quality assessment (NRIQA) model. Given an image, our quality measure first extracts 17 features through analysis of contrast, sharpness, brightness and more, and then yields a measre of visual quality using a regression module, which is learned with bigdata training samples that are much bigger than the size of relevant image datasets. Results of experiments on nine datasets validate the superiority and efficiency of our blind metric compared with typical stateoftheart full, reduced and noreference IQA methods. The second contribution is that a robust image enhancement framework is established based on quality optimization. For an input image, by the guidance of the proposed NRIQA measure, we conduct histogram modification to successively rectify image brightness and contrast to a proper level. Thorough tests demonstrate that our framework can well enhance natural images, lowcontrast images, lowlight images and dehazed images. The source code will be released at https://…/publications. 
NoReward Meta Learning (NoRML) 
Efficiently adapting to new environments and changes in dynamics is critical for agents to successfully operate in the real world. Reinforcement learning (RL) based approaches typically rely on external reward feedback for adaptation. However, in many scenarios this reward signal might not be readily available for the target task, or the difference between the environments can be implicit and only observable from the dynamics. To this end, we introduce a method that allows for selfadaptation of learned policies: NoReward Meta Learning (NoRML). NoRML extends Model Agnostic Meta Learning (MAML) for RL and uses observable dynamics of the environment instead of an explicit reward function in MAML’s finetune step. Our method has a more expressive update step than MAML, while maintaining MAML’s gradient based foundation. Additionally, in order to allow more targeted exploration, we implement an extension to MAML that effectively disconnects the metapolicy parameters from the finetuned policies’ parameters. We first study our method on a number of synthetic control problems and then validate our method on common benchmark environments, showing that NoRML outperforms MAML when the dynamics change between tasks. 
Norm  In linear algebra, functional analysis and related areas of mathematics, a norm is a function that assigns a strictly positive length or size to each vector in a vector space – save possibly for the zero vector, which is assigned a length of zero. A seminorm, on the other hand, is allowed to assign zero length to some nonzero vectors (in addition to the zero vector). A norm must also satisfy certain properties pertaining to scalability and additivity which are given in the formal definition below. A simple example is the 2dimensional Euclidean space R2 equipped with the Euclidean norm. Elements in this vector space (e.g., (3, 7)) are usually drawn as arrows in a 2dimensional cartesian coordinate system starting at the origin (0, 0). The Euclidean norm assigns to each vector the length of its arrow. Because of this, the Euclidean norm is often known as the magnitude. A vector space on which a norm is defined is called a normed vector space. Similarly, a vector space with a seminorm is called a seminormed vector space. It is often possible to supply a norm for a given vector space in more than one way. 
Normal Beta Prime Prior  introduced by Bai and Ghosh (2018) <arXiv:1807.02421> and Bai and Ghosh (2018) <arXiv:1807.06539>. Normal means estimation and multiple testing for the DirichletLaplace <doi:10.1080/01621459.2014.960967> and horseshoe+ priors <doi:10.1214/16BA1028>. NormalBetaPrime 
Normalization  In statistics and applications of statistics, normalization can have a range of meanings. In the simplest cases, normalization of ratings means adjusting values measured on different scales to a notionally common scale, often prior to averaging. In more complicated cases, normalization may refer to more sophisticated adjustments where the intention is to bring the entire probability distributions of adjusted values into alignment. In the case of normalization of scores in educational assessment, there may be an intention to align distributions to a normal distribution. A different approach to normalization of probability distributions is quantile normalization, where the quantiles of the different measures are brought into alignment. 
Normalized Estimation Error Squared (NEES) 
Weak in the NEES?: Autotuning Kalman Filters with Bayesian Optimization 
Normalized Innovation Error Squared (NIS) 
Weak in the NEES?: Autotuning Kalman Filters with Bayesian Optimization 
Normalized Mutual Information (NMI) 
NMI 
Normalized Nonnegative Models (NNM) 
We introduce normalized nonnegative models (NNM) for explorative data analysis. NNMs are partial convexifications of models from probability theory. We demonstrate their value at the example of item recommendation. We show that NNMbased recommender systems satisfy three criteria that all recommender systems should ideally satisfy: high predictive power, computational tractability, and expressive representations of users and items. Expressive user and item representations are important in practice to succinctly summarize the pool of customers and the pool of items. In NNMs, user representations are expressive because each user’s preference can be regarded as normalized mixture of preferences of stereotypical users. The interpretability of item and user representations allow us to arrange properties of items (e.g., genres of movies or topics of documents) or users (e.g., personality traits) hierarchically. 
Normalized Setwise Levenshtein Distance (NSLD) 
This work tackles the problem of fuzzy joining of strings that naturally tokenize into meaningful substrings, e.g., full names. Tokenizedstring joins have several established applications in the context of data integration and cleaning. This work is primarily motivated by fraud detection, where attackers slightly modify tokenized strings, e.g., names on accounts, to create numerous identities that she can use to defraud service providers, e.g., Google, and LinkedIn. To detect such attacks, all the accounts are pairwise compared, and the resulting similar accounts are considered suspicious and are further investigated. Comparing the tokenizedstring features of a large number of accounts requires an intuitive tokenizedstring distance that can detect subtle edits introduced by an adversary, and a very scalable algorithm. This is not achievable by existing distance measure that are unintuitive, hard to tune, and whose join algorithms are serial and hence unscalable. We define a novel intuitive distance measure between tokenized strings, Normalized Setwise Levenshtein Distance (NSLD). To the best of our knowledge, NSLD is the first metric proposed for comparing tokenized strings. We propose a scalable distributed framework, TokenizedString Joiner (TSJ), that adopts existing scalable stringjoin algorithms as building blocks to perform NSLDjoins. We carefully engineer optimizations and approximations that dramatically improve the efficiency of TSJ. The effectiveness of the TSJ framework is evident from the evaluation conducted on tens of millions of tokenizedstring names from Google accounts. The superiority of the tokenizedstringspecific TSJ framework over the generalpurpose metricspaces joining algorithms has been established. 
Not only SQL (NoSQL) 
A NoSQL or Not Only SQL database provides a mechanism for storage and retrieval of data that is modeled in means other than the tabular relations used in relational databases. Motivations for this approach include simplicity of design, horizontal scaling and finer control over availability. The data structure (e.g. keyvalue, graph, or document) differs from the RDBMS, and therefore some operations are faster in NoSQL and some in RDBMS. There are differences though and the particular suitability of a given NoSQL DB depends on the problem to be solved (e.g. does the solution use graph algorithms?). The appearance of mature NoSQL databases has reduced the rationale for Java content repository (JCR) implementations. NoSQL databases are finding significant and growing industry use in big data and realtime web applications. NoSQL systems are also referred to as “Not only SQL” to emphasize that they may in fact allow SQLlike query languages to be used. Many NoSQL stores compromise consistency (in the sense of the CAP theorem) in favor of availability and partition tolerance. Barriers to the greater adoption of NoSQL stores include the use of lowlevel query languages, the lack of standardized interfaces, and the huge investments already made in SQL by enterprises. Most NoSQL stores lack true ACID transactions, although a few recent systems, such as FairCom ctreeACE, Google Spanner and FoundationDB, have made them central to their designs. 
NoUTurn (NUTS) 
Algorithm by Hoffman and Gelman (2014): Hamiltonian Monte Carlo (HMC) is a Markov chain Monte Carlo (MCMC) algorithm that avoids the random walk behavior and sensitivity to correlated parameters that plague many MCMC methods by taking a series of steps informed by firstorder gradient information. These features allow it to converge to highdimensional target distributions much more quickly than simpler methods such as random walk Metropolis or Gibbs sampling. However, HMC’s performance is highly sensitive to two userspecified parameters: a step size ϵϵ and a desired number of steps LL. In particular, if LL is too small then the algorithm exhibits undesirable random walk behavior, while if LL is too large the algorithm wastes computation. We introduce the NoUTurn Sampler (NUTS), an extension to HMC that eliminates the need to set a number of steps LL. NUTS uses a recursive algorithm to build a set of likely candidate points that spans a wide swath of the target distribution, stopping automatically when it starts to double back and retrace its steps. Empirically, NUTS performs at least as efficiently as (and sometimes more efficiently than) a well tuned standard HMC method, without requiring user intervention or costly tuning runs. We also derive a method for adapting the step size parameter ϵϵ on the fly based on primaldual averaging. NUTS can thus be used with no handtuning at all, making it suitable for applications such as BUGSstyle automatic inference engines that require efficient ‘turnkey’ samplers. adnuts 
Novel Data Streams (NDS) 
We define NDS as those data streams whose content is initiated directly by the user (patient) themselves. This would exclude data sources such as electronic health records, disease registries, vital statistics, electronic lab reporting, emergency department visits, ambulance call data, school absenteeism, prescription pharmacy sales, serology, amongst others. Although ready access to aggregated information from these excluded sources is novel in many health settings, our focus here is on those streams which are both directly initiated by the user and also not alreadymaintained by public health departments or other health professionals. Despite this more narrow definition our suggestions for improving NDS surveillancemay also be applicable to more established surveillance systems, participatory systems (e.g., Flu Near You, influenzaNet) , and new data streams aggregated from established systems, such as Biosense and ISDS DiSTRIBuTE network. While much of the recent focus on using NDS for disease surveillance has centered on Internet search queries andTwitter posts , there aremanyNDS outside of these two sources.Our aim therefore is to provide a general framework for enhancing and developing NDS surveillance systems, which applies to more than just search data and Tweets. At aminimum, our definition ofNDS would include Internet search data and socialmedia, such as Google searches, Google Plus, Facebook, and Twitter posts, as well asWikipedia access logs, restaurant reservation and review logs, nonprescription pharmacy sales, news source scraping , and prediction markets. 
Novel Integration of the Sample and Thresholded covariance estimators (NOVELIST) 
We propose a ‘NOVEL Integration of the Sample and Thresholded covariance estimators’ (NOVELIST) to estimate the large covariance (correlation) and precision matrix. NOVELIST performs shrinkage of the sample covariance (correlation) towards its thresholded version. The sample covariance (correlation) component is nonsparse and can be lowrank in high dimensions. The thresholded sample covariance (correlation) component is sparse, and its addition ensures the stable invertibility of NOVELIST. The bene ts of the NOVELIST estimator include simplicity, ease of implementation, computational e ciency and the fact that its application avoids eigenanalysis. We obtain an explicit convergence rate in the operator norm over a large class of covariance (correlation) matrices when the dimension p and the sample size n satisfy log(p/n) > 0. In empirical comparisons with several popular estimators, the NOVELIST estimator in which the amount of shrinkage and thresholding is chosen by crossvalidation performs well in estimating covariance and precision matrices over a wide range of models and sparsity classes. http://…/poster_NOVELIST_Sept2014.pdf novelist 
Novelty Detection  Novelty detection is the identification of new or unknown data that a machine learning system has not been trained with and was not previously aware of, with the help of either statistical or machine learning based approaches. Novelty detection is one of the fundamental requirements of a good classification system. A machine learning system can never be trained with all the possible object classes and hence the performance of the network will be poor for those classes that are underrepresented in the training set. A good classification system must have the ability to differentiate between known and unknown objects during testing. For this purpose, different models for novelty detection have been proposed. Novelty detection is a hard problem in machine learning since it depends on the statistics of the already known information. A generally applicable, parameterfree method for outlier detection in a highdimensional space is not yet known. Novelty detection finds a variety of applications especially in signal processing, computer vision, pattern recognition, data mining and robotics. Another important application is the detection of a disease or potential fault whose class may be underrepresented in the training set. The statistical approaches to novelty detection may be classified into parametric and nonparametric approaches. Parametric approaches assume a specific statistical distribution (such as a Gaussian distribution) of data and statistical modeling based on data mean and covariance, whereas nonparametric approaches do not make any assumption on the statistical properties of data. http://…/mlsp09a.pdf http://…/mlsp09b.pdf http://…i=10.1.1.3.3578&rep=rep1&type=pdf http://…/smola09a.pdf http://…/karkaliwise2013.pdf 
NoveltyOrganizing Team of Classifiers (NOTC) 
In reinforcement learning, there are basically two spaces to search: valuefunction space and policy space. Consequently, there are two fitness functions each with their associated tradeoffs. However, the problem is still perceived as a singleobjective one. Here a multiobjective reinforcement learning algorithm is proposed with a structured novelty map population evolving feedforward neural models. It outperforms a gradient based continuous inputoutput stateofart algorithm in two problems. Contrary to the gradient based algorithm, the proposed one solves both problems with the same parameters and smaller variance of results. Moreover, the results are comparable even with other discrete action algorithms of the literature as well as neuroevolution methods such as NEAT. The proposed method brings also the novelty map population concept, i.e., a novelty mapbased population which is less sensitive to the input distribution and therefore more suitable to create the state space. In fact, the novelty map framework is shown to be less dynamic and more resource efficient than variants of the selforganizing map. Noveltyorganizing team of classifiers in noisy and dynamic environments 
NSCaching  Knowledge Graph (KG) embedding is a fundamental problem in data mining research with many realworld applications. It aims to encode the entities and relations in the graph into low dimensional vector space, which can be used for subsequent algorithms. Negative sampling, which samples negative triplets from nonobserved ones in the training data, is an important step in KG embedding. Recently, generative adversarial network (GAN), has been introduced in negative sampling. By sampling negative triplets with large scores, these methods avoid the problem of vanishing gradient and thus obtain better performance. However, using GAN makes the original model more complex and hard to train, where reinforcement learning must be used. In this paper, motivated by the observation that negative triplets with large scores are important but rare, we propose to directly keep track of them with the cache. However, how to sample from and update the cache are two important questions. We carefully design the solutions, which are not only efficient but also achieve a good balance between exploration and exploitation. In this way, our method acts as a ‘distilled’ version of previous GAbased methods, which does not waste training time on additional parameters to fit the full distribution of negative triplets. The extensive experiments show that our method can gain significant improvement in various KG embedding models, and outperform the stateoftheart negative sampling methods based on GAN. 
NSGANet  This paper introduces NSGANet, an evolutionary approach for neural architecture search (NAS). NSGANet is designed with three goals in mind: (1) a NAS procedure for multiple, possibly conflicting, objectives, (2) efficient exploration and exploitation of the space of potential neural network architectures, and (3) output of a diverse set of network architectures spanning a tradeoff frontier of the objectives in a single run. NSGANet is a populationbased search algorithm that explores a space of potential neural network architectures in three steps, namely, a population initialization step that is based on priorknowledge from handcrafted architectures, an exploration step comprising crossover and mutation of architectures and finally an exploitation step that applies the entire history of evaluated neural architectures in the form of a Bayesian Network prior. Experimental results suggest that combining the objectives of minimizing both an error metric and computational complexity, as measured by FLOPS, allows NSGANet to find competitive neural architectures near the Pareto front of both objectives on two different tasks, object classification and object alignment. NSGANet obtains networks that achieve 3.72% (at 4.5 million FLOP) error on CIFAR10 classification and 8.64% (at 26.6 million FLOP) error on the CMUCar alignment task. Code available at: https://…/nsganet 
NSnet  Most textual entailment models focus on lexical gaps between the premise text and the hypothesis, but rarely on knowledge gaps. We focus on filling these knowledge gaps in the Science Entailment task, by leveraging an external structured knowledge base (KB) of science facts. Our new architecture combines standard neural entailment models with a knowledge lookup module. To facilitate this lookup, we propose a factlevel decomposition of the hypothesis, and verifying the resulting subfacts against both the textual premise and the structured KB. Our model, NSnet, learns to aggregate predictions from these heterogeneous data formats. On the SciTail dataset, NSnet outperforms a simpler combination of the two predictions by 3% and the base entailment model by 5%. 
NtMalDetect  As computing systems become increasingly advanced and as users increasingly engage themselves in technology, security has never been a greater concern. In malware detection, static analysis has been the prominent approach. This approach, however, quickly falls short as malicious programs become more advanced and adopt the capabilities of obfuscating its binaries to execute the same malicious functions, making static analysis virtually inapplicable to newer variants. The approach assessed in this paper uses dynamic analysis of malware which may generalize better than static analysis to variants. Widely used document classification techniques were assessed in detecting malware by doing such analysis on system call traces, a form of dynamic analysis. Features considered are extracted from system call traces of benign and malicious programs, and the task to classify these traces is treated as a binary document classification task using sparse features. The system call traces were processed to remove the parameters to only leave the system call function names. The features were grouped into various ngrams and weighted with Term FrequencyInverse Document Frequency. Support Vector Machines were used and optimized using a Stochastic Gradient Descent algorithm that implemented L1, L2, and ElasticNet regularization terms, the best of which achieved a highest of 98% accuracy with 98% recall score. Additional contributions include the identification of significant system call sequences that could be avenues for further research. 
Nucleus Neural Network  Artificial neural networks which model the neurons and connecting architectures in brain have achieved great successes in many problems, especially those with deep layers. In this paper, we propose a nucleus neural network (NNN) and corresponding architecture and parameter learning methods. In a nucleus, there are no regular layers, i.e., a neuron may connect to all the neurons in the nucleus. This architecture gets rid of layer limitation and may lead to more powerful learning capability. It is crucial to determine the connections given numerous neurons. Based on the principle that more relevant input and output neuron pair deserves higher connecting density, we propose an architecture learning model for the nucleus. Moreover, we propose an improved learning method for learning connecting weights and biases with the optimized architecture. We find that this novel architecture is robust to irrelevant components in test data. So we define a super robust learning problem and test the proposed network with one case where the types of image backgrounds in training and test sets are different. Experiments demonstrate that the proposed learner achieves significant improvement over traditional learners on the reconstructed data set. 
Null Hypothesis Significance Testing (NHST) 
Null Hypothesis Significance Testing (NHST) is a statistical method for testing whether the factor we are talking about has the effect on our observation. For example, a t test or an ANOVA test for comparing the means is a good example of NHST. It probably the most common statistical testing used in HCI. http://…/hypothesistestingisonlymostly.html 
NullHop  Convolutional neural networks (CNNs) have become the dominant neural network architecture for solving many stateoftheart (SOA) visual processing tasks. Even though Graphical Processing Units (GPUs) are most often used in training and deploying CNNs, their power consumption becomes a problem for real time mobile applications. We propose a flexible and efficient CNN accelerator architecture which can support the implementation of SOA CNNs in lowpower and lowlatency application scenarios. This architecture exploits the sparsity of neuron activations in CNNs to accelerate the computation and reduce memory requirements. The flexible architecture allows high utilization of available computing resources across a wide range of convolutional network kernel sizes; and numbers of input and output feature maps. We implemented the proposed architecture on an FPGA platform and present results showing how our implementation reduces external memory transfers and compute time in five different CNNs ranging from small ones up to the widely known large VGG16 and VGG19 CNNs. We show how in RTL simulations in a 28nm process with a clock frequency of 500MHz, the NullHop core is able to reach over 450 GOp/s and efficiency of 368%, maintaining over 98% utilization of the MAC units and achieving a power efficiency of over 3TOp/s/W in a core area of 5.8mm2 
Numenta Anomaly Benchmark (NAB) 
Much of the world’s data is streaming, timeseries data, where anomalies give significant information in critical situations; examples abound in domains such as finance, IT, security, medical, and energy. Yet detecting anomalies in streaming data is a difficult task, requiring detectors to process data in realtime, not batches, and learn while simultaneously making predictions. There are no benchmarks to adequately test and score the efficacy of realtime anomaly detectors. Here we propose the Numenta Anomaly Benchmark (NAB), which attempts to provide a controlled and repeatable environment of opensource tools to test and measure anomaly detection algorithms on streaming data. The perfect detector would detect all anomalies as soon as possible, trigger no false alarms, work with realworld timeseries data across a variety of domains, and automatically adapt to changing statistics. Rewarding these characteristics is formalized in NAB, using a scoring algorithm designed for streaming data. NAB evaluates detectors on a benchmark dataset with labeled, realworld timeseries data. We present these components, and give results and analyses for several open source, commerciallyused algorithms. The goal for NAB is to provide a standard, open source framework with which the research community can compare and evaluate different algorithms for detecting anomalies in streaming data. 
Numerical Formal Concept Analysis (nFCA) 
Numerical Formal Concept Analysis (nFCA) technique: Formal Concept Analysis (FCA) is a powerful method in computer science (CS) for identifying overall inherent structures within and between the row and column variables (called objects and attributes in CS) of a binary data set. It is a bit like lifting up the overall hierarchical structure of a forest from a superposition based on simple local information, ie. pairwise relationships between variables of the data. The objective of nFCA is to combine FCA and statistics to translate what an FCA can offer for binary data to numerical data. The end product of our nFCA is a pair of nFCA graphs, where the Hgraph is a clustered lattice graph indicating inherent hierarchical and clustered relations and the Igraph is a complementary tree plot indicating the strength and directions of each of the relations and additional network relationships. The nFCA performs better than the conventional hierarchical clustering methods in terms of the Cophenetic correlation coefficient and the relational structure. nFCA 
Numerical Template Toolbox (NT2) 
The Numerical Template Toolbox (NT2) is an Open Source C++ library aimed at simplifying the development, debugging and optimization of highperformance computing applications by providing a Matlab like syntax that eases the transition between prototype and actual application. RcppNT2 
numpywren  ➚ “LAmbdaPACK” 
nuTonomy scenes (nuScenes) 
Robust detection and tracking of objects is crucial for the deployment of autonomous vehicle technology. Imagebased benchmark datasets have driven the development in computer vision tasks such as object detection, tracking and segmentation of agents in the environment. Most autonomous vehicles, however, carry a combination of cameras and range sensors such as lidar and radar. As machine learning based methods for detection and tracking become more prevalent, there is a need to train and evaluate such methods on datasets containing range sensor data along with images. In this work we present nuTonomy scenes (nuScenes), the first dataset to carry the full autonomous vehicle sensor suite: 6 cameras, 5 radars and 1 lidar, all with full 360 degree field of view. nuScenes comprises 1000 scenes, each 20s long and fully annotated with 3D bounding boxes for 23 classes and 8 attributes. It has 7x as many annotations and 100x as many images as the pioneering KITTI dataset. We also define a new metric for 3D detection which consolidates the multiple aspects of the detection task: classification, localization, size, orientation, velocity and attribute estimation. We provide careful dataset analysis as well as baseline performance for lidar and image based detection methods. Data, development kit, and more information are available at www.nuscenes.org. 
nutsflow/ml  Data preprocessing is a fundamental part of any machine learning application and frequently the most timeconsuming aspect when developing a machine learning solution. Preprocessing for deep learning is characterized by pipelines that lazily load data and perform data transformation, augmentation, batching and logging. Many of these functions are common across applications but require different arrangements for training, testing or inference. Here we introduce a novel software framework named nutsflow/ml that encapsulates common preprocessing operations as components, which can be flexibly arranged to rapidly construct efficient preprocessing pipelines for deep learning. 
NVIDIA Data Loading Library (DALI) 
Today’s deep learning applications include complex, multistage preprocessing data pipelines that include computeintensive steps mainly carried out on the CPU. For instance, steps such as load data from disk, decode, crop, random resize, color and spatial augmentations and format conversions are carried out on the CPUs, limiting the performance and scalability of training and inference tasks. In addition, the deep learning frameworks today have multiple data preprocessing implementations, resulting in challenges such as portability of training and inference workflows and code maintainability. NVIDIA Data Loading Library (DALI) is a collection of highly optimized building blocks and an execution engine to accelerate input data preprocessing for deep learning applications. DALI provides both performance and flexibility of accelerating different data pipelines, as a single library, that can be easily integrated into different deep learning training and inference applications. 
NVIDIA Deep Learning GPU Training System (DIGITS) 
The NVIDIA Deep Learning GPU Training System (DIGITS) puts the power of deep learning in the hands of data scientists and researchers. Quickly design the best deep neural network (DNN) for your data using realtime network behavior visualization. Best of all, DIGITS is a complete system so you don’t have to write any code. Get started with DIGITS in under an hour. 
NyquistShannon Sampling Theorem  In the field of digital signal processing, the sampling theorem is a fundamental bridge between continuoustime signals (often called ‘analog signals’) and discretetime signals (often called ‘digital signals’). It establishes a sufficient condition between a signal’s bandwidth and the sample rate that permits a discrete sequence of samples to capture all the information from the continuoustime signal. Strictly speaking, the theorem only applies to a class of mathematical functions having a Fourier transform that is zero outside of a finite region of frequencies. Intuitively we expect that when one reduces a continuous function to a discrete sequence and interpolates back to a continuous function, the fidelity of the result depends on the density (or sample rate) of the original samples. The sampling theorem introduces the concept of a sample rate that is sufficient for perfect fidelity for the class of functions that are bandlimited to a given bandwidth, such that no actual information is lost in the sampling process. It expresses the sufficient sample rate in terms of the bandwidth for the class of functions. The theorem also leads to a formula for perfectly reconstructing the original continuoustime function from the samples. Perfect reconstruction may still be possible when the samplerate criterion is not satisfied, provided other constraints on the signal are known. (See § Sampling of nonbaseband signals below, and Compressed sensing.) The name NyquistShannon sampling theorem honors Harry Nyquist and Claude Shannon. The theorem was also discovered independently by E. T. Whittaker, by Vladimir Kotelnikov, and by others. So it is also known by the names NyquistShannonKotelnikov, WhittakerShannonKotelnikov, WhittakerNyquistKotelnikovShannon, and cardinal theorem of interpolation. http://…Nyquist%E2%80%93Shannonsamplingtheorem 