F#  F# (pronounced eff sharp) is a strongly typed, multiparadigm programming language that encompasses functional, imperative, and objectoriented programming techniques. F# is most often used as a crossplatform CLI language, but can also be used to generate JavaScript and GPU code. F# is developed by the F# Software Foundation, Microsoft and open contributors. An open source, crossplatform compiler for F# is available from the F# Software Foundation. F# is also a fully supported language in Visual Studio and Xamarin Studio. Other tools supporting F# development include Mono, MonoDevelop, SharpDevelop and WebSharper. F# originated from ML and has been influenced by OCaml, C#, Python, Haskell, Scala and Erlang. 
F1 Score  
Facebook 20 Tasks (FB20) 

Facebook AI Research (FAIR) 
Facebook Artificial Intelligence Researchers (FAIR) seek to understand and develop systems with humanlevel intelligence by advancing the longerterm academic problems surrounding AI. Our research covers the full spectrum of topics related to AI, and to deriving knowledge from data: theory, algorithms, applications, software infrastructure and hardware infrastructure. Longterm objectives of understanding intelligence and building intelligent machines are bold and ambitious, and we know that making significant progress towards AI can’t be done in isolation. That’s why we actively engage with the research community through publications, open source software, participation in technical conferences and workshops, and collaborations with colleagues in academia. Human and Smart Machine CoLearning with Brain Computer Interface 
Faceted Classification  A Faceted classification is a classification scheme used in organizing knowledge into a systematic order. A faceted classification uses semantic categories, either general or subjectspecific, that are combined to create the full classification entry. Many library classification systems use a combination of a fixed, enumerative taxonomy of concepts with subordinate facets that further refine the topic. There are two primary types of classification used for information organization: enumerative and faceted. An enumerative classification contains a full set of entries for all concepts. A faceted classification system uses a set of semantically cohesive categories that are combined as needed to create an expression of a concept. In this way, the faceted classification is not limited to already defined concepts. While this makes the classification quite flexible, it also makes the resulting expression of topics complex. To the extent possible, facets represent ‘clearly defined, mutually exclusive, and collectively exhaustive aspects of a subject. The premise is that any subject or class can be analyzed into its component parts (i.e., its aspects, properties, or characteristics).’ Some commonly used generalpurpose facets are time, place, and form. 
Facets Dive  Dive is a tool for interactively exploring up to tens of thousands of multidimensional data points, allowing users to seamlessly switch between a highlevel overview and lowlevel details. Each example is a represented as single item in the visualization and the points can be positioned by faceting/bucketing in multiple dimensions by their feature values. Combining smooth animation and zooming with faceting and filtering, Dive makes it easy to spot patterns and outliers in complex data sets. 
Facets Overview  Overview gives a highlevel view of one or more data sets. It produces a visual featurebyfeature statistical analysis, and can also be used to compare statistics across two or more data sets. The tool can process both numeric and string features, including multiple instances of a number or string per feature. Overview can help uncover issues with datasets, including the following: • Unexpected feature values • Missing feature values for a large number of examples • Training/serving skew • Training/test/validation set skew Key aspects of the visualization are outlier detection and distribution comparison across multiple datasets. Interesting values (such as a high proportion of missing data, or very different distributions of a feature across multiple datasets) are highlighted in red. Features can be sorted by values of interest such as the number of missing values or the skew between the different datasets. 
FactChecker  We present a novel natural language query interface, the FactChecker, aimed at text summaries of relational data sets. The tool focuses on natural language claims that translate into an SQL query and a claimed query result. Similar in spirit to a spell checker, the FactChecker marks up text passages that seem to be inconsistent with the actual data. At the heart of the system is a probabilistic model that reasons about the input document in a holistic fashion. Based on claim keywords and the document structure, it maps each text claim to a probability distribution over associated query translations. By efficiently executing tens to hundreds of thousands of candidate translations for a typical input document, the system maps text claims to correctness probabilities. This process becomes practical via a specialized processing backend, avoiding redundant work via query merging and result caching. Verification is an interactive process in which users are shown tentative results, enabling them to take corrective actions if necessary. Our system was tested on a set of 53 public articles containing 392 claims. Our test cases include articles from major newspapers, summaries of survey results, and Wikipedia articles. Our tool revealed erroneous claims in roughly a third of test cases. A detailed user study shows that users using our tool are in average six times faster at checking text summaries, compared to generic SQL interfaces. In fully automated verification, our tool achieves significantly higher recall and precision than baselines from the areas of natural language query interfaces and fact checking. 
Factor Adjusted Robust Multiple Testing  Largescale multiple testing with correlated and heavytailed data arises in a wide range of research areas from genomics, medical imaging to finance. Conventional methods for estimating the false discovery proportion (FDP) often ignore the effect of heavytailedness and the dependence structure among test statistics, and thus may lead to inefficient or even inconsistent estimation. Also, the assumption of joint normality is often imposed, which is too stringent for many applications. To address these challenges, in this paper we propose a factoradjusted robust procedure for largescale simultaneous inference with control of the false discovery proportion. We demonstrate that robust factor adjustments are extremely important in both improving the power of the tests and controlling FDP. We identify general conditions under which the proposed method produces consistent estimate of the FDP. As a byproduct that is of independent interest, we establish an exponentialtype deviation inequality for a robust Utype covariance estimator under the spectral norm. Extensive numerical experiments demonstrate the advantage of the proposed method over several stateoftheart methods especially when the data are generated from heavytailed distributions. Our proposed procedures are implemented in the Rpackage farmtest. FarmTest 
Factor Analysis  Factor analysis is a statistical method used to describe variability among observed, correlated variables in terms of a potentially lower number of unobserved variables called factors. For example, it is possible that variations in four observed variables mainly reflect the variations in two unobserved variables. Factor analysis searches for such joint variations in response to unobserved latent variables. The observed variables are modelled as linear combinations of the potential factors, plus “error” terms. The information gained about the interdependencies between observed variables can be used later to reduce the set of variables in a dataset. 
Factor Graph  A factor graph is a bipartite graph representing the factorization of a function. In probability theory and its applications, factor graphs are used to represent factorization of a probability distribution function, enabling efficient computations, such as the computation of marginal distributions through the sumproduct algorithm. One of the important success stories of factor graphs and the sumproduct algorithm is the decoding of capacityapproaching errorcorrecting codes, such as LDPC and turbo codes. Factor graphs generalize constraint graphs. A factor whose value is either 0 or 1 is called a constraint. A constraint graph is a factor graph where all factors are constraints. The maxproduct algorithm for factor graphs can be viewed as a generalization of the arcconsistency algorithm for constraint processing. 
FactorBase  We describe FactorBase, a new SQLbased framework that leverages a relational database management system to support multirelational model discovery. A multirelational statistical model provides an integrated analysis of the heterogeneous and interdependent data resources in the database. We adopt the BayesStore design philosophy: statistical models are stored and managed as firstclass citizens inside a database. Whereas previous systems like BayesStore support multirelational inference, FactorBase supports multirelational learning. A case study on six benchmark databases evaluates how our system supports a challenging machine learning application, namely learning a firstorder Bayesian network model for an entire database. Model learning in this setting has to examine a large number of potential statistical associations across data tables. Our implementation shows how the SQL constructs in FactorBase facilitate the fast, modular, and reliable development of highly scalable model learning systems. 
Factored Bandits  We introduce the factored bandits model, which is a framework for learning with limited (bandit) feedback, where actions can be decomposed into a Cartesian product of atomic actions. Factored bandits incorporate rank1 bandits as a special case, but significantly relax the assumptions on the form of the reward function. We provide an anytime algorithm for stochastic factored bandits and up to constants matching upper and lower regret bounds for the problem. Furthermore, we show that with a slight modification the proposed algorithm can be applied to utility based dueling bandits. We obtain an improvement in the additive terms of the regret bound compared to state of the art algorithms (the additive terms are dominating up to time horizons which are exponential in the number of arms). 
Factorial Hidden Markov Models (FHMM) 
We present a framework for learning in hidden Markov models with distributed state representations. Within this framework , we derive a learning algorithm based on the ExpectationMaximization (EM) procedure for maximum likelihood estimation. Analogous to the standard BaumWelch update rules, the Mstep of our algorithm is exact and can be solved analytically. However, due to the combinatorial nature of the hidden state representation, the exact Estep is intractable. A simple and tractable mean field approximation is derived. Empirical results on a set of problems suggest that both the mean field approximation and Gibbs sampling are viable alternatives to the computationally expensive exact algorithm. 
Factorisation Autoencoder (FAE) 

Factorization Machine (FM) 
In this paper, we introduce Factorization Machines (FM) which are a new model class that combines the advantages of Support Vector Machines (SVM) with factorization models. Like SVMs, FMs are a general predictor working with any real valued feature vector. In contrast to SVMs, FMs model all interactions between variables using factorized parameters. Thus they are able to estimate interactions even in problems with huge sparsity (like recommender systems) where SVMs fail. We show that the model equation of FMs can be calculated in linear time and thus FMs can be optimized directly. So unlike nonlinear SVMs, a transformation in the dual form is not necessary and the model parameters can be estimated directly without the need of any support vector in the solution. We show the relationship to SVMs and the advantages of FMs for parameter estimation in sparse settings. On the other hand there are many different factorization models like matrix factorization, parallel factor analysis or specialized models like SVD++, PITF or FPMC. The drawback of these models is that they are not applicable for general prediction tasks but work only with special input data. Furthermore their model equations and optimization algorithms are derived individually for each task. We show that FMs can mimic these models just by specifying the input data (i.e. the feature vectors). This makes FMs easily applicable even for users without expert knowledge in factorization models. libFM: Factorization Machine Library A Boosting Framework of Factorization Machine 
Factorized Adversarial Network (FAN) 
In this paper, we propose Factorized Adversarial Networks (FAN) to solve unsupervised domain adaptation problems for image classification tasks. Our networks map the data distribution into a latent feature space, which is factorized into a domainspecific subspace that contains domainspecific characteristics and a taskspecific subspace that retains category information, for both source and target domains, respectively. Unsupervised domain adaptation is achieved by adversarial training to minimize the discrepancy between the distributions of two taskspecific subspaces from source and target domains. We demonstrate that the proposed approach outperforms stateoftheart methods on multiple benchmark datasets used in the literature for unsupervised domain adaptation. Furthermore, we collect two realworld tagging datasets that are much larger than existing benchmark datasets, and get significant improvement upon baselines, proving the practical value of our approach. 
Fader Network  This paper introduces a new encoderdecoder architecture that is trained to reconstruct images by disentangling the salient information of the image and the values of attributes directly in the latent space. As a result, after training, our model can generate different realistic versions of an input image by varying the attribute values. By using continuous attribute values, we can choose how much a specific attribute is perceivable in the generated image. This property could allow for applications where users can modify an image using sliding knobs, like faders on a mixing console, to change the facial expression of a portrait, or to update the color of some objects. Compared to the stateoftheart which mostly relies on training adversarial networks in pixel space by altering attribute values at train time, our approach results in much simpler training schemes and nicely scales to multiple attributes. We present evidence that our model can significantly change the perceived value of the attributes while preserving the naturalness of images. 
Failure Rate  Failure rate is the frequency with which an engineered system or component fails, expressed, for example, in failures per hour. It is often denoted by the Greek letter lambda and is important in reliability engineering. The failure rate of a system usually depends on time, with the rate varying over the life cycle of the system. For example, an automobile’s failure rate in its fifth year of service may be many times greater than its failure rate during its first year of service. One does not expect to replace an exhaust pipe, overhaul the brakes, or have major transmission problems in a new vehicle. In practice, the mean time between failures (MTBF, 1/lambda) is often reported instead of the failure rate. This is valid and useful if the failure rate may be assumed constant – often used for complex units / systems, electronics – and is a general agreement in some reliability standards (Military and Aerospace). It does in this case only relate to the flat region of the bathtub curve, also called the ‘useful life period’. Because of this, it is incorrect to extrapolate MTBF to give an estimate of the service life time of a component, which will typically be much less than suggested by the MTBF due to the much higher failure rates in the ‘endoflife wearout’ part of the ‘bathtub curve’. The reason for the preferred use for MTBF numbers is that the use of large positive numbers (such as 2000 hours) is more intuitive and easier to remember than very small numbers (such as 0.0005 per hour). The MTBF is an important system parameter in systems where failure rate needs to be managed, in particular for safety systems. The MTBF appears frequently in the engineering design requirements, and governs frequency of required system maintenance and inspections. In special processes called renewal processes, where the time to recover from failure can be neglected and the likelihood of failure remains constant with respect to time, the failure rate is simply the multiplicative inverse of the MTBF (1/lambda). A similar ratio used in the transport industries, especially in railways and trucking is ‘mean distance between failures’, a variation which attempts to correlate actual loaded distances to similar reliability needs and practices. Failure rates are important factors in the insurance, finance, commerce and regulatory industries and fundamental to the design of safe systems in a wide variety of applications. 
Failure Time Analysis  
Fair Forest  The potential lack of fairness in the outputs of machine learning algorithms has recently gained attention both within the research community as well as in society more broadly. Surprisingly, there is no prior work developing treeinduction algorithms for building fair decision trees or fair random forests. These methods have widespread popularity as they are one of the few to be simultaneously interpretable, nonlinear, and easytouse. In this paper we develop, to our knowledge, the first technique for the induction of fair decision trees. We show that our ‘Fair Forest’ retains the benefits of the treebased approach, while providing both greater accuracy and fairness than other alternatives, for both ‘group fairness’ and ‘individual fairness.” We also introduce new measures for fairness which are able to handle multinomial and continues attributes as well as regression problems, as opposed to binary attributes and labels only. Finally, we demonstrate a new, more robust evaluation procedure for algorithms that considers the dataset in its entirety rather than only a specific protected attribute. 
Fair Topk Ranking (FA*IR) 
We present a formal problem definition and an algorithm to solve the Fair Topk Ranking problem. The problem consists of creating a ranking of k elements out of a pool of n >> k candidates. The objective is to maximize utility, and maximization is subject to a ranked group fairness constraint. Our definition of ranked group fairness uses the standard notion of protected group to extend the concept of group fairness. It ensures that every prefix of the rank contains a number of protected candidates that is statistically indistinguishable from a given target proportion, or exceeds it. The utility objective favors rankings in which every candidate included in the ranking is more qualified than any candidate not included, and rankings in which candidates are sorted by decreasing qualifications. We describe an efficient algorithm for this problem, which is tested on a series of existing datasets, as well as new datasets. Experimentally, this approach yields a ranking that is similar to the socalled ‘colorblind’ ranking, while respecting the fairness criteria. To the best of our knowledge, FA*IR is the first algorithm grounded in statistical tests that can be used to mitigate biases in ranking against an underrepresented group. 
Fairnessaware Generative Adversarial Network (FairGAN) 
Fairnessaware learning is increasingly important in data mining. Discrimination prevention aims to prevent discrimination in the training data before it is used to conduct predictive analysis. In this paper, we focus on fair data generation that ensures the generated data is discrimination free. Inspired by generative adversarial networks (GAN), we present fairnessaware generative adversarial networks, called FairGAN, which are able to learn a generator producing fair data and also preserving good data utility. Compared with the naive fair data generation models, FairGAN further ensures the classifiers which are trained on generated data can achieve fair classification on real data. Experiments on a real dataset show the effectiveness of FairGAN. 
Faithfulness Condition  The faithfulness condition states that also all independences among of the two possible explanations for spurious causalities of type I, we accept only the second with an additional latent confounding variable. Consequently, detection of spurious causalities of type I allows identifying spurious causalities of type II without knowing the confounding variable. the variables are implied by the causal structure. In particular, this rules out that two or more causal links cancel each other out due to a particular choice of the parameters. This means that, 
FALKON  Kernel methods provide a principled way to perform non linear, nonparametric learning. They rely on solid functional analytic foundations and enjoy optimal statistical properties. However, at least in their basic form, they have limited applicability in large scale scenarios because of stringent computational requirements in terms of time and especially memory. In this paper, we take a substantial step in scaling up kernel methods, proposing FALKON, a novel algorithm that allows to efficiently process millions of points. FALKON is derived combining several algorithmic principles, namely stochastic projections, iterative solvers and preconditioning. Our theoretical analysis shows that optimal statistical accuracy is achieved requiring essentially $O(n)$ memory and $O(n\sqrt{n})$ time. Extensive experiments show that state of the art results on available large scale datasets can be achieved even on a single machine. 
False Confidence Theorem (FCT) 
Satellite conjunction analysis is the assessment of collision risk during a close encounter between a satellite and another object in orbit. A counterintuitive phenomenon has emerged in the conjunction analysis literature: probability dilution, in which lower quality data paradoxically appear to reduce the risk of collision. We show that probability dilution is a special case of a broader structural deficiency in epistemic probability distributions. In probabilistic representations of statistical inference, there are always false propositions that have a high probability of being assigned a high degree of belief. This is the false confidence theorem. As a practical matter, its manifestation in satellite conjunction analysis is particularly detrimental. Under ordinary operating conditions, satellite navigators using epistemic probability of collision as their decisionmotivating risk metric are rendered incapable of detecting an impending collision. An explicit remedy for false confidence can be found in the Martin–Liu theory of inferential models. In satellite conjunction analysis, we show that Ks uncertainty ellipsoids satisfy the Martin–Liu validity criterion. Performing collision avoidance maneuvers based on ellipsoid overlap will ensure that operational collision risk is capped at the userspecified level. An exposition of the false confidence theorem 
False Discovery Rate (FDR) 
False discovery rate (FDR) control is a statistical method used in multiple hypothesis testing to correct for multiple comparisons. In a list of findings (i.e. studies where the nullhypotheses are rejected), FDR procedures are designed to control the expected proportion of incorrectly rejected null hypotheses (“false discoveries”). FDR controlling procedures exert a less stringent control over false discovery compared to familywise error rate (FWER) procedures (such as the Bonferroni correction), which seek to reduce the probability of even one false discovery, as opposed to the expected proportion of false discoveries. Thus FDR procedures have greater power at the cost of increased rates of type I errors, i.e., rejecting the null hypothesis of no effect when it should fail to be rejected. LocFDRPois 
False Nearest Neighbor (FNN) 
The false nearest neighbor algorithm is an algorithm for estimating the embedding dimension. The concept was proposed by Kennel et al. The main idea is to examine how the number of neighbors of a point along a signal trajectory change with increasing embedding dimension. In too low an embedding dimension, many of the neighbors will be false, but in an appropriate embedding dimension or higher, the neighbors are real. With increasing dimension, the false neighbors will no longer be neighbors. Therefore, by examining how the number of neighbors change as a function of dimension, an appropriate embedding can be determined. 
False Positive Rate  In statistics, when performing multiple comparisons, the term false positive ratio, also known as the false alarm ratio, usually refers to the probability of falsely rejecting the null hypothesis for a particular test. The false positive rate (or “false alarm rate”) usually refers to the expectancy of the false positive ratio. 
Fama French  In asset pricing and portfolio management the FamaFrench threefactor model is a model designed by Eugene Fama and Kenneth French to describe stock returns. Introduction to Fama French 
Familia  In the last decade, a variety of topic models have been proposed for text engineering. However, except Probabilistic Latent Semantic Analysis (PLSA) and Latent Dirichlet Allocation (LDA), most of existing topic models are seldom applied or considered in industrial scenarios. This phenomenon is caused by the fact that there are very few convenient tools to support these topic models so far. Intimidated by the demanding expertise and labor of designing and implementing parameter inference algorithms, software engineers are prone to simply resort to PLSA/LDA, without considering whether it is proper for their problem at hand or not. In this paper, we propose a configurable topic modeling framework named Familia, in order to bridge the huge gap between academic research fruits and current industrial practice. Familia supports an important line of topic models that are widely applicable in text engineering scenarios. In order to relieve burdens of software engineers without knowledge of Bayesian networks, Familia is able to conduct automatic parameter inference for a variety of topic models. Simply through changing the data organization of Familia, software engineers are able to easily explore a broad spectrum of existing topic models or even design their own topic models, and find the one that best suits the problem at hand. With its superior extendability, Familia has a novel sampling mechanism that strikes balance between effectiveness and efficiency of parameter inference. Furthermore, Familia is essentially a big topic modeling framework that supports parallel parameter inference and distributed parameter storage. The utilities and necessity of Familia are demonstrated in reallife industrial applications. Familia would significantly enlarge software engineers’ arsenal of topic models and pave the way for utilizing highly customized topic models in reallife problems. 
Familywise Error Rate (FWER) 
In statistics, familywise error rate (FWER) is the probability of making one or more false discoveries, or type I errors, among all the hypotheses when performing multiple hypotheses tests. Coarsetofine Multiple Testing Strategies 
Fan Chart  In time series analysis, a fan chart is a chart that joins a simple line chart for observed past data, by showing ranges for possible values of future data together with a line showing a central estimate or most likely value for the future outcomes. As predictions become increasingly uncertain the further into the future one goes, these forecast ranges spread out, creating distinctive wedge or ‘fan’ shapes, hence the term. Alternative forms of the chart can also include uncertainty for past data, such as preliminary data that is subject to revision. The term ‘fan chart’ was coined by the Bank of England, which has been using these charts and this term since 1997 in its ‘Inflation Report’ to describe its best prevision of future inflation to the general public. Fan charts have been used extensively in finance and monetary policy, for instance to represent forecasts of inflation. fanplot 
Farewells Linear Increments Model (FLIM) 
FLIM fits linear models for the observed increments in a longitudinal dataset, and imputes missing values according to the models. FLIM 
Fast Alternating Minimization (FAM) 

Fast and Accurate Timing Error Prediction Framework (FATE) 
Deep neural networks (DNN) are increasingly being accelerated on applicationspecific hardware such as the Google TPU designed especially for deep learning. Timing speculation is a promising approach to further increase the energy efficiency of DNN accelerators. Architectural exploration for timing speculation requires detailed gatelevel timing simulations that can be timeconsuming for large DNNs that execute millions of multiplyandaccumulate (MAC) operations. In this paper we propose FATE, a new methodology for fast and accurate timing simulations of DNN accelerators like the Google TPU. FATE proposes two novel ideas: (i) DelayNet, a DNN based timing model for MAC units; and (ii) a statistical sampling methodology that reduces the number of MAC operations for which timing simulations are performed. We show that FATE results in between 8 times58 times speedup in timing simulations, while introducing less than 2% error in classification accuracy estimates. We demonstrate the use of FATE by comparing to conventional DNN accelerator that uses 2’s complement (2C) arithmetic with an alternative implementation that uses signed magnitude representations (SMR). We show that that the SMR implementation provides 18% more energy savings for the same classification accuracy than 2C, a result that might be of independent interest. 
Fast and Asymptotically efficient Distributed Estimator (FADE) 
Consider a set of agents that wish to estimate a vector of parameters of their mutual interest. For this estimation goal, agents can sense and communicate. When sensing, an agent measures (in additive gaussian noise) linear combinations of the unknown vector of parameters. When communicating, an agent can broadcast information to a few other agents, by using the channels that happen to be randomly at its disposal at the time. To coordinate the agents towards their estimation goal, we propose a novel algorithm called FADE (Fast and Asymptotically efficient Distributed Estimator), in which agents collaborate at discrete timesteps; at each timestep, agents sense and communicate just once, while also updating their own estimate of the unknown vector of parameters. FADE enjoys five attractive features: first, it is an intuitive estimator, simple to derive; second, it withstands dynamic networks, that is, networks whose communication channels change randomly over time; third, it is strongly consistent in that, as timesteps play out, each agent’s local estimate converges (almost surely) to the true vector of parameters; fourth, it is both asymptotically unbiased and efficient, which means that, across time, each agent’s estimate becomes unbiased and the meansquare error (MSE) of each agent’s estimate vanishes to zero at the same rate of the MSE of the optimal estimator at an almighty central node; fifth, and most importantly, when compared with a stateofart consensus+innovation (CI) algorithm, it yields estimates with outstandingly lower meansquare errors, for the same number of communications — for example, in a sparsely connected network model with 50 agents, we find through numerical simulations that the reduction can be dramatic, reaching several orders of magnitude. 
Fast and Frugal Trees (FFT) 
Fast and Frugal Trees (FFTs) are very simply decision trees for classifying cases (i.e.; breast cancer patients) into one of two classes (e.g.; no cancer vs. true cancer). FFTs can be preferable to more complex algorithms (such as logistic regression) because they are easy to communicate and implement, and are robust against noisy data. FFTrees 
Fast and Robust Twin Support Vector Machine (FRTSVM) 
Twin support vector machine~(TSVM) is a powerful learning algorithm by solving a pair of smaller SVMtype problems. However, there are still some specific issues waiting to be solved when it faces with some real applications, \emph{e.g}, low efficiency and noise data. In this paper, we propose a Fast and Robust TSVM~(FRTSVM) to deal with these issues above. In FRTSVM, we propose an effective fuzzy membership function to ease the effects of noisy inputs. We apply the fuzzy membership to each input instance and reformulate the TSVMs such that different input instances can make different contributions to the learning of the separating hyperplanes. To further speed up the training procedure, we develop an efficient coordinate descent algorithm with shirking to solve the involved a pair of quadratic programming problems (QPPs) of FRTSVM. Moreover, theoretical foundations of the proposed model are analyzed in details. The experimental results on several artificial and benchmark datasets indicate that the FRTSVM not only obtains the fast learning speed but also shows the robust classification performance. ➘ “Twin Support Vector Machine” 
Fast Boosted Decision Trees (FastBDT) 
Stochastic gradientboosted decision trees are widely employed for multivariate classification and regression tasks. This paper presents a speedoptimized and cachefriendly implementation for multivariate classification called FastBDT. FastBDT is one order of magnitude faster during the fittingphase and applicationphase, in comparison with popular implementations in software frameworks like TMVA, scikitlearn and XGBoost. The concepts used to optimize the execution time and performance studies are discussed in detail in this paper. The key ideas include: An equalfrequency binning on the input data, which allows replacing expensive floatingpoint with integer operations, while at the same time increasing the quality of the classification; a cachefriendly linear access pattern to the input data, in contrast to usual implementations, which exhibit a random access pattern. FastBDT provides interfaces to C/C++, Python and TMVA. It is extensively used in the field of high energy physics by the Belle II experiment. 
Fast Causal Inference (FCI) 
Causally insufficient structures (models with latent or hidden variables, or with confounding etc.) of joint probability distributions have been subject of intense study not only in statistics, but also in various AI systems. In AI, belief networks, being representations of joint probability distribution with an underlying directed acyclic graph structure, are paid special attention due to the fact that efficient reasoning (uncertainty propagation) methods have been developed for belief network structures. Algorithms have been therefore developed to acquire the belief network structure from data. As artifacts due to variable hiding negatively influence the performance of derived belief networks, models with latent variables have been studied and several algorithms for learning belief network structure under causal insufficiency have also been developed. Regrettably, some of them are known already to be erroneous (e.g. IC algorithm of [Pearl:Verma:91]. This paper is devoted to another algorithm, the Fast Causal Inference (FCI) Algorithm of [Spirtes:93]. It is proven by a specially constructed example that this algorithm, as it stands in [Spirtes:93], is also erroneous. Fundamental reason for failure of this algorithm is the temporary introduction of nonreal links between nodes of the network with the intention of later removal. While for trivial dependency structures these nonreal links may be actually removed, this may not be the case for complex ones, e.g. for the case described in this paper. A remedy of this failure is proposed. 
Fast Compressed Neural Networks (FCNN) 
FCNN (Fast Compressed Neural Networks) is a free open source C++ library for Artificial Neural Network computations. It is easy to use and extend, written in modern C++ and is very fast (to author’s best knowledge it is the fastest freely available neural network library). All FCNN classes are templated to support both single and double precision computations. Main features are listed under Features tab. Internal representation of network in FCNN differs from all other libraries allowing true code modularisation with simultaneous speed improvements. FCNN4R 
Fast Data  Fast Data is ‘data in motion’, data in the process of being collected or moved between applications as part of a transaction or business process flow. Fast Data is realtime data not yet stored as big data. It offers an opportunity for immediate response based on insights derived from deep analytics of incoming data streams. Fast Data processing sits in front of the big data fire hose, sifting through the massive amounts of incoming information to identify actionable business opportunities or threats. 
Fast Library for Approximate Nearest Neighbors (FLANN) 
FLANN is a library for performing fast approximate nearest neighbor searches in high dimensional spaces. It contains a collection of algorithms we found to work best for nearest neighbor search and a system for automatically choosing the best algorithm and optimum parameters depending on the dataset. FLANN is written in C++ and contains bindings for the following languages: C, MATLAB and Python. 
Fast Linear Iterative Clustering (FLIC) 
Benefiting from its high efficiency and simplicity, Simple Linear Iterative Clustering (SLIC) remains one of the most popular oversegmentation tools. However, due to explicit enforcement of spatial similarity for region continuity, the boundary adaptation of SLIC is suboptimal. It also has drawbacks on convergence rate as a result of both the fixed search region and separately doing the assignment step and the update step. In this paper, we propose an alternative approach to fix the inherent limitations of SLIC. In our approach, each pixel actively searches its corresponding segment under the help of its neighboring pixels, which naturally enables region coherence without being harmful to boundary adaptation. We also jointly perform the assignment and update steps, allowing high convergence rate. Extensive evaluations on Berkeley segmentation benchmark verify that our method outperforms competitive methods under various evaluation metrics. It also has the lowest time cost among existing methods (approximately 30fps for a 481×321 image on a single CPU core). 
Fast Oriented Text Spotting (FOTS) 
Incidental scene text spotting is considered one of the most difficult and valuable challenges in the document analysis community. Most existing methods treat text detection and recognition as separate tasks. In this work, we propose a unified endtoend trainable Fast Oriented Text Spotting (FOTS) network for simultaneous detection and recognition, sharing computation and visual information among the two complementary tasks. Specially, RoIRotate is introduced to share convolutional features between detection and recognition. Benefiting from convolution sharing strategy, our FOTS has little computation overhead compared to baseline text detection network, and the joint training method learns more generic features to make our method perform better than these twostage methods. Experiments on ICDAR 2015, ICDAR 2017 MLT, and ICDAR 2013 datasets demonstrate that the proposed method outperforms stateoftheart methods significantly, which further allows us to develop the first realtime oriented text spotting system which surpasses all previous stateoftheart results by more than 5% on ICDAR 2015 text spotting task while keeping 22.6 fps. 
Fast Parallel Proximal Algorithm (FPPA) 
Learningbased Image Reconstruction via Parallel Proximal Algorithm 
Fast Rotation Forest  Ensemble approaches in classification are a very popular research area in recent years. An ensemble consists of a set of individual classifiers such as neural networks or decision trees whose predictions are combined for classifying new instances. A method is used here for generating classifier ensembles based on feature extraction. In the base classifier, the feature set is randomly split into K subsets (K is a parameter of the algorithm) and Principal Component Analysis (PCA) is applied to each subset. It is a technique that is useful for the extraction and classification of data. The purpose is to reduce the dimensionality of a data set. Then the Decision tree is used to classify the data set. Rotation Forest and Extended Space Forest algorithms are used to calculate the accuracy. A novel approach Fast Rotation Forest is introduced to enrich the accuracy rate. The idea of the fast rotation approach is to encourage simultaneously individual accuracy and specificity within the ensemble. By comparing Random forest and Extended Space Forest, Fast Rotation Forest yields high accuracy. 
Fast Similarity Search (FastSS) 
Fast Similarity Search (FastSS) performs an exhaustive similarity search in a dictionary, based on the edit distance model of string similarity. The algorithm uses deletions to model the edit distance. For a dictionary containing n words, and given a maximum number of spelling errors k, FastSS creates an index of all n words containing up to k deletions. At search time each query is mutated to generate a deletion neighborhood, which is compared to the indexed deletion dictionary. 
Fast Temporal Pattern Mining with Extended Vertical Lists  Temporal Pattern Mining (TPM) is the problem of mining predictive complex temporal patterns from multivariate time series in a supervised setting. We develop a new method called the Fast Temporal Pattern Mining with Extended Vertical Lists. This method utilizes an extension of the Apriori property which requires a more complex pattern to appear within records only at places where all of its subpatterns are detected as well. The approach is based on a novel data structure called the Extended Vertical List that tracks positions of the first state of the pattern inside records. Extensive computational results indicate that the new method performs significantly faster than the previous version of the algorithm for TMP. However, the speedup comes at the expense of memory usage. 
Fast Weight Long ShortTerm Memory  Associative memory using fast weights is a shortterm memory mechanism that substantially improves the memory capacity and time scale of recurrent neural networks (RNNs). As recent studies introduced fast weights only to regular RNNs, it is unknown whether fast weight memory is beneficial to gated RNNs. In this work, we report a significant synergy between long shortterm memory (LSTM) networks and fast weight associative memories. We show that this combination, in learning associative retrieval tasks, results in much faster training and lower test error, a performance boost most prominent at high memory task difficulties. 
FastICA  FastICA is an efficient and popular algorithm for independent component analysis invented by Aapo Hyvaerinen at Helsinki University of Technology. The algorithm is based on a fixedpoint iteration scheme maximizing nonGaussianity as a measure of statistical independence. It can also be derived as an approximative Newton iteration. 
FastNode2Vec  Node2Vec is a stateoftheart generalpurpose feature learning method for network analysis. However, current solutions cannot run Node2Vec on largescale graphs with billions of vertices and edges, which are common in realworld applications. The existing distributed Node2Vec on Spark incurs significant space and time overhead. It runs out of memory even for midsized graphs with millions of vertices. Moreover, it considers at most 30 edges for every vertex in generating random walks, causing poor result quality. In this paper, we propose FastNode2Vec, a family of efficient Node2Vec random walk algorithms on a Pregellike graph computation framework. FastNode2Vec computes transition probabilities during random walks to reduce memory space consumption and computation overhead for largescale graphs. The Pregellike scheme avoids space and time overhead of Spark’s readonly RDD structures and shuffle operations. Moreover, we propose a number of optimization techniques to further reduce the computation overhead for popular vertices with large degrees. Empirical evaluation show that FastNode2Vec is capable of computing Node2Vec on graphs with billions of vertices and edges on a midsized machine cluster. Compared to SparkNode2Vec, FastNode2Vec achieves 7.7–122x speedups. 
FastSlow Recurrent Neural Networks (FSRNN) 
Processing sequential data of variable length is a major challenge in a wide range of applications, such as speech recognition, language modeling, generative image modeling and machine translation. Here, we address this challenge by proposing a novel recurrent neural network (RNN) architecture, the FastSlow RNN (FSRNN). The FSRNN incorporates the strengths of both multiscale RNNs and deep transition RNNs as it processes sequential data on different timescales and learns complex transition functions from one time step to the next. We evaluate the FSRNN on two character level language modeling data sets, Penn Treebank and Hutter Prize Wikipedia, where we improve state of the art results to $1.19$ and $1.25$ bitspercharacter (BPC), respectively. In addition, an ensemble of two FSRNNs achieves $1.20$ BPC on Hutter Prize Wikipedia outperforming the best known compression algorithm with respect to the BPC measure. We also present an empirical investigation of the learning and network dynamics of the FSRNN, which explains the improved performance compared to other RNN architectures. Our approach is general as any kind of RNN cell is a possible building block for the FSRNN architecture, and thus can be flexibly applied to different tasks. 
fastText  fastText is a library for efficient learning of word representations and sentence classification. Analysis and Optimization of fastText Linear Text Classifier 
Fault Tree Analysis (FTA) 
Fault tree analysis (FTA) is a top down, deductive failure analysis in which an undesired state of a system is analyzed using Boolean logic to combine a series of lowerlevel events. This analysis method is mainly used in the fields of safety engineering and reliability engineering to understand how systems can fail, to identify the best ways to reduce risk or to determine (or get a feeling for) event rates of a safety accident or a particular system level (functional) failure. FTA is used in the aerospace, nuclear power, chemical and process, pharmaceutical, petrochemical and other highhazard industries; but is also used in fields as diverse as risk factor identification relating to social service system failure. FTA is also used in software engineering for debugging purposes and is closely related to causeelimination technique used to detect bugs. In aerospace, the more general term ‘system Failure Condition’ is used for the ‘undesired state’ / Top event of the fault tree. These conditions are classified by the severity of their effects. The most severe conditions require the most extensive fault tree analysis. These ‘system Failure Conditions’ and their classification are often previously determined in the functional Hazard analysis. Fault Tree Analysis (FTA): Concepts and Applications 
Fay Herriot Model  smallarea 
FearNet  Incremental class learning involves sequentially learning classes in bursts of examples from the same class. This violates the assumptions that underlie methods for training standard deep neural networks, and will cause them to suffer from catastrophic forgetting. Arguably, the best method for incremental class learning is iCaRL, but it requires storing training examples for each class, making it challenging to scale. Here, we propose FearNet for incremental class learning. FearNet is a generative model that does not store previous examples, making it memory efficient. FearNet uses a braininspired dualmemory system in which new memories are consolidated from a network for recent memories inspired by the mammalian hippocampal complex to a network for longterm storage inspired by medial prefrontal cortex. Memory consolidation is inspired by mechanisms that occur during sleep. FearNet also uses a module inspired by the basolateral amygdala for determining which memory system to use for recall. FearNet achieves stateoftheart performance at incremental class learning on image (CIFAR100, CUB200) and audio classification (AudioSet) benchmarks. 
Feature Baggingbased Outlier Detection (FBOD) 
In this paper, a novel feature bagging approach for detecting outliers in very large, high dimensional and noisy databases is proposed. It combines results from multiple outlier detection algorithms that are applied using different set of features. Every outlier detection algorithm uses a small subset of features that are randomly selected from the original feature set. As a result, each outlier detector identifies different outliers, and thus assigns to all data records outlier scores that correspond to their probability of being outliers. The outlier scores computed by the individual outlier detection algorithms are then combined in order to find the better quality outliers. Experiments performed on several synthetic and real life data sets show that the proposed methods for combining outputs from multiple outlier detection algorithms provide nontrivial improvements over the base algorithm. HighDimOut 
Feature Engineering  Feature engineering is the process of determining which predictor variables will contribute the most to the predictive power of a machine learning algorithm. There are two commonly used methods for making this selection – the Forward Selection Procedure starts with no variables in the model. You then iteratively add variables and test the predictive accuracy of the model until adding more variables no longer makes a positive effect. Next, the Backward Elimination Procedure begins with all the variables in the model. You proceed by removing variables and testing the predictive accuracy of the model. 
Feature Engineering Wrapper (FEW) 
We propose a general wrapper for feature learning that interfaces with other machine learning methods to compose effective data representations. The proposed feature engineering wrapper (FEW) uses genetic programming to represent and evolve individual features tailored to the machine learning method with which it is paired. In order to maintain feature diversity,lexicase survival is introduced, a method based on lexicase selection. This survival method preserves semantically unique individuals in the population based on their ability to solve difficult subsets of training cases, thereby yielding a population of uncorrelated features. We demonstrate FEW with five different offtheshelf machine learning methods and test it on a set of realworld and synthetic regression problems with dimensions varying across three orders of magnitude. The results show that FEW is able to improve model test predictions across problems for several ML methods. We discuss and test the scalability of FEW in comparison to other feature composition strategies, most notably polynomial feature expansion. 
Feature Evolvable Streaming Learning  Learning with streaming data has attracted much attention during the past few years. Though most studies consider data stream with fixed features, in real practice the features may be evolvable. For example, features of data gathered by limitedlifespan sensors will change when these sensors are substituted by new ones. In this paper, we propose a novel learning paradigm: Feature Evolvable Streaming Learning where old features would vanish and new features will occur. Rather than relying on only the current features, we attempt to recover the vanished features and exploit it to improve performance. Specifically, we learn two models from the recovered features and the current features, respectively. To benefit from the recovered features, we develop two ensemble methods. In the first method, we combine the predictions from two models and theoretically show that with assistance of old features, the performance on new features can be improved. In the second approach, we dynamically select the best single prediction and establish a better performance guarantee when the best model switches. Experiments on both synthetic and real data validate the effectiveness of our proposal. 
Feature Fusion Single Shot Multibox Detector (FSSD) 
SSD (Single Shot Multibox Detetor) is one of the best object detection algorithms with both high accuracy and fast speed. However, SSD’s feature pyramid detection method makes it hard to fuse the features from different scales. In this paper, we proposed FSSD (Feature Fusion Single Shot Multibox Detector), an enhanced SSD with a novel and lightweight feature fusion module which can improve the performance significantly over SSD with just a little speed drop. In the feature fusion module, features from different layers with different scales are concatenated together, followed by some downsampling blocks to generate new feature pyramid, which will be fed to multibox detectors to predict the final detection results. On the Pascal VOC 2007 test, our network can achieve 82.7 mAP (mean average precision) at the speed of 65.8 FPS (frame per second) with the input size 300$\times$300 using a single Nvidia 1080Ti GPU. In addition, our result on COCO is also better than the conventional SSD with a large margin. Our FSSD outperforms a lot of stateoftheart object detection algorithms in both aspects of accuracy and speed. Code will be made publicly available. 
Feature Learning  Feature learning or representation learning is a set of techniques in machine learning that learn a transformation of “raw” inputs to a representation that can be effectively exploited in a supervised learning task such as classification. Feature learning algorithms themselves may be either unsupervised or supervised, and include autoencoders, dictionary learning, matrix factorization, restricted Boltzmann machines and various form of clustering. 
Feature Matching  MatchBench: An Evaluation of Feature Matchers 
Feature Refine Net (FRN) 
This paper presents a method that can accurately detect heads especially small heads under indoor scene. To achieve this, we propose a novel Feature Refine Net (FRN) and a cascaded multiscale architecture. FRN exploits the multiscale hierarchical features created by deep convolutional neural networks. Proposed channel weighting method enables FRN to make use of features alternatively and effectively. To improve the performance of small head detection, we propose a cascaded multiscale architecture which has two detectors. One called global detector is responsible for detecting large objects and acquiring the global distribution information. The other called local detector is specified for small objects detection and makes use of the information provided by global detector. Due to the lack of head detection datasets, we have collected and labeled a new large dataset named SCUTHEAD that includes 4405 images with 111251 heads annotated. Experiments show that our method has achieved stateofart performance on SCUTHEAD. 
Feature Scaling  Feature scaling is a method used to standardize the range of independent variables or features of data. In data processing, it is also known as data normalization and is generally performed during the data preprocessing step. http://…/2014_about_feature_scaling.html 
Feature Selection  In machine learning and statistics, feature selection, also known as variable selection, attribute selection or variable subset selection, is the process of selecting a subset of relevant features for use in model construction. The central assumption when using a feature selection technique is that the data contains many redundant or irrelevant features. Redundant features are those which provide no more information than the currently selected features, and irrelevant features provide no useful information in any context. Feature selection techniques are a subset of the more general field of feature extraction. 
FEAture Selection for compilation Tasks (FEAST) 
The success of the application of machinelearning techniques to compilation tasks can be largely attributed to the recent development and advancement of program characterization, a process that numerically or structurally quantifies a target program. While great achievements have been made in identifying key features to characterize programs, choosing a correct set of features for a specific compiler task remains an ad hoc procedure. In order to guarantee a comprehensive coverage of features, compiler engineers usually need to select excessive number of features. This, unfortunately, would potentially lead to a selection of multiple similar features, which in turn could create a new problem of bias that emphasizes certain aspects of a program’s characteristics, hence reducing the accuracy and performance of the target compiler task. In this paper, we propose FEAture Selection for compilation Tasks (FEAST), an efficient and automated framework for determining the most relevant and representative features from a feature pool. Specifically, FEAST utilizes widely used statistics and machinelearning tools, including LASSO, sequential forward and backward selection, for automatic feature selection, and can in general be applied to any numerical feature set. This paper further proposes an automated approach to compiler parameter assignment for assessing the performance of FEAST. Intensive experimental results demonstrate that, under the compiler parameter assignment task, FEAST can achieve comparable results with about 18% of features that are automatically selected from the entire feature pool. We also inspect these selected features and discuss their roles in program execution. 
Feature Squeezing  Although deep neural networks (DNNs) have achieved great success in many computer vision tasks, recent studies have shown they are vulnerable to adversarial examples. Such examples, typically generated by adding small but purposeful distortions, can frequently fool DNN models. Previous studies to defend against adversarial examples mostly focused on refining the DNN models. They have either shown limited success or suffer from the expensive computation. We propose a new strategy, \emph{feature squeezing}, that can be used to harden DNN models by detecting adversarial examples. Feature squeezing reduces the search space available to an adversary by coalescing samples that correspond to many different feature vectors in the original space into a single sample. By comparing a DNN model’s prediction on the original input with that on the squeezed input, feature squeezing detects adversarial examples with high accuracy and few false positives. This paper explores two instances of feature squeezing: reducing the color bit depth of each pixel and smoothing using a spatial filter. These strategies are straightforward, inexpensive, and complementary to defensive methods that operate on the underlying model, such as adversarial training. 
FeATure TransfEr Network (FATTEN) 
The problem of data augmentation in feature space is considered. A new architecture, denoted the FeATure TransfEr Network (FATTEN), is proposed for the modeling of feature trajectories induced by variations of object pose. This architecture exploits a parametrization of the pose manifold in terms of pose and appearance. This leads to a deep encoder/decoder network architecture, where the encoder factors into an appearance and a pose predictor. Unlike previous attempts at trajectory transfer, FATTEN can be efficiently trained endtoend, with no need to train separate feature transfer functions. This is realized by supplying the decoder with information about a target pose and the use of a multitask loss that penalizes category and posemismatches. In result, FATTEN discourages discontinuous or nonsmooth trajectories that fail to capture the structure of the pose manifold, and generalizes well on object recognition tasks involving large pose variation. Experimental results on the artificial ModelNet database show that it can successfully learn to map source features to target features of a desired pose, while preserving class identity. Most notably, by using feature space transfer for data augmentation (w.r.t. pose and depth) on SUNRGBD objects, we demonstrate considerable performance improvements on one/fewshot object recognition in a transfer learning setup, compared to current stateoftheart methods. 
FeatureBradleyTerryLuce (fBTL) 
We consider the problem of ranking a set of items from pairwise comparisons in the presence of features associated with the items. Recent works have established that $O(n\log(n))$ samples are needed to rank well when there is no feature information present. However, this might be suboptimal in the presence of associated features. We introduce a new probabilistic preference model called featureBradleyTerryLuce (fBTL) model that generalizes the standard BTL model to incorporate feature information. We present a new least squares based algorithm called fBTLLS which we show requires much lesser than $O(n\log(n))$ pairs to obtain a good ranking — precisely our new sample complexity bound is of $O(\alpha\log \alpha)$, where $\alpha$ denotes the number of `independent items’ of the set, in general $\alpha << n$. Our analysis is novel and makes use of tools from classical graph matching theory to provide tighter bounds that sheds light on the true complexity of the ranking problem, capturing the item dependencies in terms of their feature representations. This was not possible with earlier matrix completion based tools used for this problem. We also prove an information theoretic lower bound on the required sample complexity for recovering the underlying ranking, which essentially shows the tightness of our proposed algorithms. The efficacy of our proposed algorithms are validated through extensive experimental evaluations on a variety of synthetic and real world datasets. 
FeatureBradleyTerryLuce Least Squares (fBTLLS) 
➚ “FeatureBradleyTerryLuce” 
FeatureDistributed Stochastic Variance Reduced Gradient (FDSVRG) 
Linear classification has been widely used in many highdimensional applications like text classification. To perform linear classification for largescale tasks, we often need to design distributed learning methods on a cluster of multiple machines. In this paper, we propose a new distributed learning method, called featuredistributed stochastic variance reduced gradient (FDSVRG) for highdimensional linear classification. Unlike most existing distributed learning methods which are instancedistributed, FDSVRG is featuredistributed. FDSVRG has lower communication cost than other instancedistributed methods when the data dimensionality is larger than the number of data instances. Experimental results on real data demonstrate that FDSVRG can outperform other stateoftheart distributed methods for highdimensional linear classification in terms of both communication cost and wallclock time, when the dimensionality is larger than the number of instances in training data. 
FeatureFu  FeatureFu contains a collection of library/tools for advanced feature engineering, such as using extended sexpression based feature transformation, to derive features on top of other features, or convert a light weighted model (logistical regression or decision tree) into a feature, in an intuitive way without touching any code. 
FeatureLabel Memory Network  Deep learning typically requires training a very capable architecture using large datasets. However, many important learning problems demand an ability to draw valid inferences from small size datasets, and such problems pose a particular challenge for deep learning. In this regard, various researches on ‘metalearning’ are being actively conducted. Recent work has suggested a Memory Augmented Neural Network (MANN) for metalearning. MANN is an implementation of a Neural Turing Machine (NTM) with the ability to rapidly assimilate new data in its memory, and use this data to make accurate predictions. In models such as MANN, the input data samples and their appropriate labels from previous step are bound together in the same memory locations. This often leads to memory interference when performing a task as these models have to retrieve a feature of an input from a certain memory location and read only the label information bound to that location. In this paper, we tried to address this issue by presenting a more robust MANN. We revisited the idea of metalearning and proposed a new memory augmented neural network by explicitly splitting the external memory into feature and label memories. The feature memory is used to store the features of input data samples and the label memory stores their labels. Hence, when predicting the label of a given input, our model uses its feature memory unit as a reference to extract the stored feature of the input, and based on that feature, it retrieves the label information of the input from the label memory unit. In order for the network to function in this framework, a new memorywritingmodule to encode label information into the label memory in accordance with the metalearning task structure is designed. Here, we demonstrate that our model outperforms MANN by a large margin in supervised oneshot classification tasks using Omniglot and MNIST datasets. 
FeatureLevel Domain Adaptation (FLDA) 
Domain adaptation is the supervised learning setting in which the training and test data originate from different domains: the socalled source and target domains. In this paper, we propose and study a domain adaption approach, called featurelevel domain adaptation (flda), that models the dependence between two domains by means of a featurelevel transfer distribution. The domain adapted classifier is trained by minimizing the expected loss under this transfer distribution. Our empirical evaluation of flda focuses on problems with binary and count features in which the domain adaptation can be naturally modeled via a dropout distribution, which allows the final classifier to adapt to the importance of specific features in the target data. Our experimental evaluation suggests that under certain conditions, flda converges to the classifier trained on the target distribution. Experiments with our domain adaptation approach on several realworld problems show that flda performs on par with stateoftheart techniques in domain adaptation. 
Featuretools  Featuretools is a framework to perform automated feature engineering. It excels at transforming temporal and relational datasets into feature matrices for machine learning. 
Featurized Bidirectional Generative Adversarial Network (FBGAN) 
Deep neural networks have been demonstrated to be vulnerable to adversarial attacks, where small perturbations are intentionally added to the original inputs to fool the classifier. In this paper, we propose a defense method, Featurized Bidirectional Generative Adversarial Networks (FBGAN), to capture the semantic features of the input and filter the nonsemantic perturbation. FBGAN is pretrained on the clean dataset in an unsupervised manner, adversarially learning a bidirectional mapping between the highdimensional data space and the lowdimensional semantic space, and mutual information is applied to disentangle the semantically meaningful features. After the bidirectional mapping, the adversarial data can be reconstructed to denoised data, which could be fed into the classifier for classification. We empirically show the quality of reconstruction images and the effectiveness of defense. 
FedMark  The Web of Data (WoD) has experienced a phenomenal growth in the past. This growth is mainly fueled by tireless volunteers, government subsidies, and open data legislations. The majority of commercial data has not made the transition to the WoD, yet. The problem is that it is not clear how publishers of commercial data can monetize their data in this new setting. Advertisement, which is one of the main financial engines of the World Wide Web, cannot be applied to the Web of Data as such unwanted data can easily be filtered out, automatically. This raises the question how the WoD can (i) maintain its grow when subsidies disappear and (ii) give commercial data providers financial incentives to share their wealth of data. In this paper, we propose a marketplace for the WoD as a solution for this data monetization problem. Our approach allows a customer to transparently buy data from a combination of different providers. To that end, we introduce two different approaches for deciding which data elements to buy and compare their performance. We also introduce FedMark, a prototypical implementation of our marketplace that represents a first step towards an economically viable WoD beyond subsidies. 
Feedback Generative Adversarial Network (FBGAN) 
Generative Adversarial Networks (GANs) represent an attractive and novel approach to generate realistic data, such as genes, proteins, or drugs, in synthetic biology. Here, we apply GANs to generate synthetic DNA sequences encoding for proteins of variable length. We propose a novel feedbackloop architecture, called Feedback GAN (FBGAN), to optimize the synthetic gene sequences for desired properties using an external function analyzer. The proposed architecture also has the advantage that the analyzer need not be differentiable. We apply the feedbackloop mechanism to two examples: 1) generating synthetic genes coding for antimicrobial peptides, and 2) optimizing synthetic genes for the secondary structure of their resulting peptides. A suite of metrics demonstrate that the GAN generated proteins have desirable biophysical properties. The FBGAN architecture can also be used to optimize GANgenerated datapoints for useful properties in domains beyond genomics. 
Feedback Networks  Currently, the most successful learning models in computer vision are based on learning successive representations followed by a decision layer. This is usually actualized through feedforward multilayer neural networks, e.g. ConvNets, where each layer forms one of such successive representations. However, an alternative that can achieve the same goal is a feedback based approach in which the representation is formed in an iterative manner based on a feedback received from previous iteration’s output. We establish that a feedback based approach has several fundamental advantages over feedforward: it enables making early predictions at the query time, its output naturally conforms to a hierarchical structure in the label space (e.g. a taxonomy), and it provides a new basis for Curriculum Learning. We observe that feedback networks develop a considerably different representation compared to feedforward counterparts, in line with the aforementioned advantages. We put forth a general feedback based learning architecture with the endpoint results on par or better than existing feedforward networks with the addition of the above advantages. We also investigate several mechanisms in feedback architectures (e.g. skip connections in time) and design choices (e.g. feedback length). We hope this study offers new perspectives in quest for more natural and practical learning models. 
Feedforward Neural Network Language Model (NNLM) 
The probabilistic feedforward neural network language model has been proposed. It consists of input, projection, hidden and output layers. At the input layer, N previous words are encoded using 1ofV coding, where V is size of the vocabulary. The input layer is then projected to a projection layer P that has dimensionality ND, using a shared projection matrix. As only N inputs are active at any given time, composition of the projection layer is a relatively cheap operation. The NNLM architecture becomes complex for computation between the projection and the hidden layer, as values in the projection layer are dense. For a common choice of N = 10, the size of the projection layer (P) might be 500 to 2000, while the hidden layer size H is typically 500 to 1000 units. Moreover, the hidden layer is used to compute probability distribution over all the words in the vocabulary, resulting in an output layer with dimensionality V. Thus, the computational complexity per each training example is Q = find + NDH + HV; where the dominating term is HV. However, several practical solutions were proposed for avoiding it; either using hierarchical versions of the softmax, or avoiding normalized models completely by using models that are not normalized during training. With binary tree representations of the vocabulary, the number of output units that need to be evaluated can go down to around log2(V). Thus, most of the complexity is caused by the term NDH. 
FeedForward Neural Network Lattice Decoding Algorithm  Neural network decoding algorithms are recently introduced by Nachmani et al. to decode highdensity paritycheck (HDPC) codes. In contrast with iterative decoding algorithms such as sumproduct or minsum algorithms in which the weight of each edge is set to $1$, in the neural network decoding algorithms, the weight of every edge depends on its impact in the transmitted codeword. In this paper, we provide a novel \emph{feedforward neural network lattice decoding algorithm} suitable to decode lattices constructed based on Construction A, whose underlying codes have HDPC matrices. We first establish the concept of feedforward neural network for HDPC codes and improve their decoding algorithms compared to Nachmani et al. We then apply our proposed decoder for a Construction A lattice with HDPC underlying code, for which the wellknown iterative decoding algorithms show poor performances. The main advantage of our proposed algorithm is that instead of assigning and training weights for all edges, which turns out to be timeconsuming especially for highdensity paritycheck matrices, we concentrate on edges which are present in most of $4$cycles and removing them gives a girth$6$ Tanner graph. This approach, by slight modifications using updated LLRs instead of initial ones, simultaneously accelerates the training process and improves the error performance of our proposed decoding algorithm. 
Feedforward Sequential Memory Networks (FSMN) 
We introduce a new structure for memory neural networks, called feedforward sequential memory networks (FSMN), which can learn longterm dependency without using recurrent feedback. The proposed FSMN is a standard feedforward neural networks equipped with learnable sequential memory blocks in the hidden layers. In this work, we have applied FSMN to several language modeling (LM) tasks. Experimental results have shown that the memory blocks in FSMN can learn effective representations of long history. Experiments have shown that FSMN based language models can significantly outperform not only feedforward neural network (FNN) based LMs but also the popular recurrent neural network (RNN) LMs. 
Fence Methods  This method is a new class of model selection strategies, for mixed model selection, which includes linear and generalized linear mixed models. The idea involves a procedure to isolate a subgroup of what are known as correct models (of which the optimal model is a member). This is accomplished by constructing a statistical fence, or barrier, to carefully eliminate incorrect models. Once the fence is constructed, the optimal model is selected from among those within the fence according to a criterion which can be made flexible. References: 1. Jiang J., Rao J.S., Gu Z., Nguyen T. (2008), Fence Methods for Mixed Model Selection. The Annals of Statistics, 36(4): 16691692. <DOI:10.1214/07AOS517> <https://…/1216237296>. 2. Jiang J., Nguyen T., Rao J.S. (2009), A Simplified Adaptive Fence Procedure. Statistics and Probability Letters, 79, 625629. <DOI:10.1016/j.spl.2008.10.014> <https://…A_simplified_adaptive_fence_procedure> 3. Jiang J., Nguyen T., Rao J.S. (2010), Fence Method for Nonparametric Small Area Estimation. Survey Methodology, 36(1), 311. <http://…/12001x2010001eng.pdf>. 4. Jiming Jiang, Thuan Nguyen and J. Sunil Rao (2011), Invisible fence methods and the identification of differentially expressed gene sets. Statistics and Its Interface, Volume 4, 403415. <http://…/SII201100040003a014.pdf>. 5. Thuan Nguyen & Jiming Jiang (2012), Restricted fence method for covariate selection in longitudinal data analysis. Biostatistics, 13(2), 303314. <DOI:10.1093/biostatistics/kxr046> <https://…cemethodforcovariateselectionin>. 6. Thuan Nguyen, Jie Peng, Jiming Jiang (2014), Fence Methods for Backcross Experiments. Statistical Computation and Simulation, 84(3), 644662. <DOI:10.1080/00949655.2012.721885> <https://…/>. 7. Jiang, J. (2014), The fence methods, in Advances in Statistics, Hindawi Publishing Corp., Cairo. <DOI:10.1155/2014/830821>. 8. Jiming Jiang and Thuan Nguyen (2015), The Fence Methods, World Scientific, Singapore. <https://…/plp>. fence 
FeUdal Networks (FuNs) 
We introduce FeUdal Networks (FuNs): a novel architecture for hierarchical reinforcement learning. Our approach is inspired by the feudal reinforcement learning proposal of Dayan and Hinton, and gains power and efficacy by decoupling endtoend learning across multiple levels — allowing it to utilise different resolutions of time. Our framework employs a Manager module and a Worker module. The Manager operates at a lower temporal resolution and sets abstract goals which are conveyed to and enacted by the Worker. The Worker generates primitive actions at every tick of the environment. The decoupled structure of FuN conveys several benefits — in addition to facilitating very long timescale credit assignment it also encourages the emergence of subpolicies associated with different goals set by the Manager. These properties allow FuN to dramatically outperform a strong baseline agent on tasks that involve longterm credit assignment or memorisation. We demonstrate the performance of our proposed system on a range of tasks from the ATARI suite and also from a 3D DeepMind Lab environment. 
FewShot Classification  In fewshot classification a classifier must generalize to new classes not seen in the training set, given only a small number of examples of each new class. ➘ “OneShot Learning” 
FidelityWeighted Learning (FWL) 
Training deep neural networks requires many training samples, but in practice training labels are expensive to obtain and may be of varying quality, as some may be from trusted expert labelers while others might be from heuristics or other sources of weak supervision such as crowdsourcing. This creates a fundamental quality versusquantity tradeoff in the learning process. Do we learn from the small amount of highquality data or the potentially large amount of weaklylabeled data? We argue that if the learner could somehow know and take the labelquality into account when learning the data representation, we could get the best of both worlds. To this end, we propose ‘fidelityweighted learning’ (FWL), a semisupervised studentteacher approach for training deep neural networks using weaklylabeled data. FWL modulates the parameter updates to a student network (trained on the task we care about) on a persample basis according to the posterior confidence of its labelquality estimated by a teacher (who has access to the highquality labels). Both student and teacher are learned from the data. We evaluate FWL on two tasks in information retrieval and natural language processing where we outperform stateoftheart alternative semisupervised methods, indicating that our approach makes better use of strong and weak labels, and leads to better taskdependent data representations. 
Fiducial Inference  Fiducial inference is one of a number of different types of statistical inference. These are rules, intended for general application, by which conclusions can be drawn from samples of data. In modern statistical practice, attempts to work with fiducial inference have fallen out of fashion in favour of frequentist inference, Bayesian inference and decision theory. However, fiducial inference is important in the history of statistics since its development led to the parallel development of concepts and tools in theoretical statistics that are widely used. Some current research in statistical methodology is either explicitly linked to fiducial inference or is closely connected to it. Multivariate Subjective Fiducial Inference 
Fieldaware Factorization Machines (FFM) 
Fieldaware factorization machines (FFM) have been used to win two clickthrough rate prediction competitions hosted by Criteo and Avazu. In these slides we introduce the formulation of FFM together with well known linear model, degree2 polynomial model, and factorization machines. 
FiloDB  FiloDB is a new opensource distributed, versioned, and columnar analytical database designed for modern streaming workloads. · Distributed – FiloDB is designed from the beginning to run on bestofbreed distributed, scaleout storage platforms such as Apache Cassandra. Queries run in parallel in Apache Spark for scaleout adhoc analysis. · Columnar – FiloDB brings breakthrough performance levels for analytical queries by using a columnar storage layout with different spacesaving techniques like dictionary compression. True columnar querying techniques are on the roadmap. The current performance is comparable to Parquet, and one to two orders of magnitude faster than Spark on Cassandra 2.x for analytical queries. For the POC performance comparison, please see cassandragdelt repo. · Versioned – At the same time, rowlevel, columnlevel operations and built in versioning gives FiloDB far more flexibility than can be achieved using filebased technologies like Parquet alone. · Designed for streaming – Enable easy exactlyonce ingestion from Kafka for streaming events, time series, and IoT applications – yet enable extremely fast adhoc analysis using the ease of use of SQL. Each row is keyed by a partition and sort key, and writes using the same key are idempotent. FiloDB does the hard work of keeping data stored in an efficient and sorted format. FiloDB is easy to use! You can use Spark SQL for both ingestion (including from Streaming!) and querying. Connect Tableau or any other JDBC analysis tool to Spark SQL, and easily ingest data from any source with Spark support(JSON, CSV, traditional database, Kafka, etc.) FiloDB is a great fit for bulk analytical workloads, or streaming / event data. It is not optimized for heavily transactional, updateoriented workflows. Introducing FiloDB 
Filter Bubble  A filter bubble is a result of a personalized search in which a website algorithm selectively guesses what information a user would like to see based on information about the user (such as location, past click behavior and search history) and, as a result, users become separated from information that disagrees with their viewpoints, effectively isolating them in their own cultural or ideological bubbles. Prime examples are Google Personalized Search results and Facebook’s personalized news stream. The term was coined by internet activist Eli Pariser in his book by the same name; according to Pariser, users get less exposure to conflicting viewpoints and are isolated intellectually in their own informational bubble. Pariser related an example in which one user searched Google for “BP” and got investment news about British Petroleum while another searcher got information about the Deepwater Horizon oil spill and that the two search results pages were “strikingly different”. The bubble effect may have negative implications for civic discourse, according to Pariser, but there are contrasting views suggesting the effect is minimal and addressable. 
Filtering Variational Objectives (FIVOs) 
The evidence lower bound (ELBO) appears in many algorithms for maximum likelihood estimation (MLE) with latent variables because it is a sharp lower bound of the marginal loglikelihood. For neural latent variable models, optimizing the ELBO jointly in the variational posterior and model parameters produces stateoftheart results. Inspired by the success of the ELBO as a surrogate MLE objective, we consider the extension of the ELBO to a family of lower bounds defined by a Monte Carlo estimator of the marginal likelihood. We show that the tightness of such bounds is asymptotically related to the variance of the underlying estimator. We introduce a special case, the filtering variational objectives (FIVOs), which takes the same arguments as the ELBO and passes them through a particle filter to form a tighter bound. FIVOs can be optimized tractably with stochastic gradients, and are particularly suited to MLE in sequential latent variable models. In standard sequential generative modeling tasks we present uniform improvements over models trained with ELBO, including some whole natpertimestep improvements. 
FinBrain  Artificial intelligence (AI) is the core technology of technological revolution and industrial transformation. As one of the new intelligent needs in the AI 2.0 era, financial intelligence has elicited much attention from the academia and industry. In our current dynamic capital market, financial intelligence demonstrates a fast and accurate machine learning capability to handle complex data and has gradually acquired the potential to become a ‘financial brain’. In this work, we survey existing studies on financial intelligence. First, we describe the concept of financial intelligence and elaborate on its position in the financial technology field. Second, we introduce the development of financial intelligence and review stateoftheart techniques in wealth management, risk management, financial security, financial consulting, and blockchain. Finally, we propose a research framework called FinBrain and summarize four open issues, namely, explainable financial agents and causality, perception and prediction under uncertainty, risksensitive and robust decision making, and multiagent game and mechanism design. We believe that these research directions can lay the foundation for the development of AI 2.0 in the finance field. 
FineGrained Pattern Matching  Processing of streaming time series data from sensors with lower latency and limited computing resource comes to a critical problem as the growth of Industry 4.0 and Industry Internet of Things(IIoT). To tackle the real world challenge in this area, like equipment health monitoring by comparing the incoming data stream with known fault patterns, we formulate a new problem, called ‘finegrained pattern matching’. It allows users to define varied deviations to different segments of a given pattern, and fuzzy breakpoint of adjunct segments, which urges the dramatically increased complexity against traditional pattern matching problem over stream. In this paper, we propose a novel 2phase approach to solve this problem. In pruning phase, we propose ELB(Equal Length Block) Representation and BSP (BlockSkipping Pruning) policy, which efficiently filter the unmatched subsequence with the guarantee of nofalse dismissals. In postprocessing phase, we provide an algorithm to further examine the possible matches in linear complexity. We conducted an extensive experimental evaluation on synthetic and realworld datasets, which illustrates that our algorithm outperforms the bruteforce method and MSM, a multistep filter mechanism over the multiscaled representation, by orders of magnitude. 
FineTuned Language Model (FitLaM) 
Transfer learning has revolutionized computer vision, but existing approaches in NLP still require taskspecific modifications and training from scratch. We propose Finetuned Language Models (FitLaM), an effective transfer learning method that can be applied to any task in NLP, and introduce techniques that are key for finetuning a stateoftheart language model. Our method significantly outperforms the stateoftheart on five text classification tasks, reducing the error by 1824% on the majority of datasets. We opensource our pretrained models and code to enable adoption by the community. 
Finite FirstOrder Theory  We present the finite firstorder theory (FFOT) machine, which provides an atemporal description of computation. We then develop a concept of complexity for the FFOT machine, and prove that the class of problems decidable by a FFOT machine with polynomial resources is NP intersect coNP. 
Finite Primitiveness  Finite Primitiveness’ by Mauldin and Urbanski. 
Firefighter Problem  The dynamics of infectious diseases spread is crucial in determining their risk and offering ways to contain them. We study sequential vaccination of individuals in networks. In the original (deterministic) version of the Firefighter problem, a fire breaks out at some node of a given graph. At each time step, b nodes can be protected by a firefighter and then the fire spreads to all unprotected neighbors of the nodes on fire. The process ends when the fire can no longer spread. We extend the Firefighter problem to a probabilistic setting, where the infection is stochastic. We devise a simple policy that only vaccinates neighbors of infected nodes and is optimal on regular trees and on general graphs for a sufficiently large budget. We derive methods for calculating upper and lower bounds of the expected number of infected individuals, as well as provide estimates on the budget needed for containment in expectation. We calculate these explicitly on trees, ddimensional grids, and Erd\H{o}s R\'{e}nyi graphs. Finally, we construct a statedependent budget allocation strategy and demonstrate its superiority over constant budget allocation on real networks following a first order acquaintance vaccination policy. 
Firefly Algorithm (FA) 
The firefly algorithm (FA) is a metaheuristic algorithm, inspired by the flashing behaviour of fireflies. The primary purpose for a firefly’s flash is to act as a signal system to attract other fireflies. XinShe Yang formulated this firefly algorithm by assuming: 1.All fireflies are unisexual, so that one firefly will be attracted to all other fireflies; 2.Attractiveness is proportional to their brightness, and for any two fireflies, the less bright one will be attracted by (and thus move to) the brighter one; however, the brightness can decrease as their distance increases; 3.If there are no fireflies brighter than a given firefly, it will move randomly. The brightness should be associated with the objective function. Firefly algorithm is a natureinspired metaheuristic optimization algorithm. 
First Story Detection (FSD) 
Given a series of documents, first story is defined as the first document to discuss a specific event, which occurred at a particular time and place. First story detection (FSD) was firstly defined byAllan in 2002 in terms of topic detection and tracking. 
FISH  Timeevolving stream datasets exist ubiquitously in many realworld applications where their inherent hot keys often evolve over times. Nevertheless, few existing solutions can provide efficient load balance on these timeevolving datasets while preserving low memory overhead. In this paper, we present a novel grouping approach (named FISH), which can provide the efficient timeevolving stream processing at scale. The key insight of this work is that the keys of timeevolving stream data can have a skewed distribution within any bounded distance of time interval. This enables to accurately identify the recent hot keys for the realtime load balance within a bounded scope. We therefore propose an epochbased recent hot key identification with specialized intraepoch frequency counting (for maintaining low memory overhead) and interepoch hotness decaying (for suppressing superfluous computation). We also propose to heuristically infer the accurate information of remote workers through computation rather than communication for costefficient worker assignment. We have integrated our approach into Apache Storm. Our results on a cluster of 128 nodes for both synthetic and realworld stream datasets show that FISH significantly outperforms stateoftheart with the average and the 99th percentile latency reduction by 87.12% and 76.34% (vs. WChoices), and memory overhead reduction by 99.96% (vs. Shuffle Grouping). 
Fisher Vector encoding with Variational AutoEncoder (FVVAE) 
Deep convolutional neural networks (CNNs) have proven highly effective for visual recognition, where learning a universal representation from activations of convolutional layer plays a fundamental problem. In this paper, we present Fisher Vector encoding with Variational AutoEncoder (FVVAE), a novel deep architecture that quantizes the local activations of convolutional layer in a deep generative model, by training them in an endtoend manner. To incorporate FV encoding strategy into deep generative models, we introduce Variational AutoEncoder model, which steers a variational inference and learning in a neural network which can be straightforwardly optimized using standard stochastic gradient method. Different from the FV characterized by conventional generative models (e.g., Gaussian Mixture Model) which parsimoniously fit a discrete mixture model to data distribution, the proposed FVVAE is more flexible to represent the natural property of data for better generalization. Extensive experiments are conducted on three public datasets, i.e., UCF101, ActivityNet, and CUB2002011 in the context of video action recognition and finegrained image classification, respectively. Superior results are reported when compared to stateoftheart representations. Most remarkably, our proposed FVVAE achieves todate the best published accuracy of 94.2% on UCF101. 
Fixed Effects Model  In econometrics and statistics, a fixed effects model is a statistical model that represents the observed quantities in terms of explanatory variables that are treated as if the quantities were nonrandom. This is in contrast to random effects models and mixed models in which either all or some of the explanatory variables are treated as if they arise from random causes. Contrast this to the biostatistics definitions, as biostatisticians use “fixed” and “random” effects to respectively refer to the populationaverage and subjectspecific effects (and where the latter are generally assumed to be unknown, latent variables). Often the same structure of model, which is usually a linear regression model, can be treated as any of the three types depending on the analyst’s viewpoint, although there may be a natural choice in any given situation. 
FixedPoint Factorized Networks (FFN) 
In recent years, Deep Neural Networks (DNNs) based methods have achieved remarkable performance in a wide range of tasks and have been among the most powerful and widely used techniques in computer vision, speech recognition and Natural Language Processing. However, DNNbased methods are both computationalintensive and resourceconsuming, which hinders the application of these methods on embedded systems like smart phones. To alleviate this problem, we introduce a novel Fixedpoint Factorized Networks (FFN) on pretrained models to reduce the computational complexity as well as the storage requirement of networks. Extensive experiments on largescale ImageNet classification task show the effectiveness of our proposed method. 
Flat Clustering and Topic Modeling based on Fast Rank2 NMF (FlatNMF2) 
The importance of unsupervised clustering and topic modeling is well recognized with everincreasing volumes of text data. In this paper, we propose a fast method for hierarchical clustering and topic modeling called HierNMF2. Our method is based on fast Rank2 nonnegative matrix factorization (NMF) that performs binary clustering and an efficient node splitting rule. Further utilizing the final leaf nodes generated in HierNMF2 and the idea of nonnegative least squares fitting, we propose a new clustering/topic modeling method called FlatNMF2 that recovers a flat clustering/topic modeling result in a very simple yet significantly more effective way than any other existing methods. We describe highly optimized open source software in C++ for both HierNMF2 and FlatNMF2 for hierarchical and partitional clustering/topic modeling of document data sets. Substantial experimental tests are presented that illustrate significant improvements both in computational time as well as quality of solutions. We compare our methods to other clustering methods including Kmeans, standard NMF, and CLUTO, and also topic modeling methods including latent Dirichlet allocation (LDA) and recently proposed algorithms for NMF with separability constraints. Overall, we present efficient tools for analyzing largescale data sets, and techniques that can be generalized to many other data analytics problem domains. 
Flatland Paradox  https://…/theflatlandparadox 
FlexFlow  ➘ “Sample, Operation, Attribute, and Parameter Dimensions” 
Flexible Deep Neural Network Processing  The recent success of Deep Neural Networks (DNNs) has drastically improved the state of the art for many application domains. While achieving high accuracy performance, deploying stateoftheart DNNs is a challenge since they typically require billions of expensive arithmetic computations. In addition, DNNs are typically deployed in ensemble to boost accuracy performance, which further exacerbates the system requirements. This computational overhead is an issue for many platforms, e.g. data centers and embedded systems, with tight latency and energy budgets. In this article, we introduce flexible DNNs ensemble processing technique, which achieves large reduction in average inference latency while incurring small to negligible accuracy drop. Our technique is flexible in that it allows for dynamic adaptation between quality of results (QoR) and execution runtime. We demonstrate the effectiveness of the technique on AlexNet and ResNet50 using the ImageNet dataset. This technique can also easily handle other types of networks. 
Flexible Parametric Model  flexPM 
Flexpoint  Deep neural networks are commonly developed and trained in 32bit floating point format. Significant gains in performance and energy efficiency could be realized by training and inference in numerical formats optimized for deep learning. Despite advances in limited precision inference in recent years, training of neural networks in low bitwidth remains a challenging problem. Here we present the Flexpoint data format, aiming at a complete replacement of 32bit floating point format training and inference, designed to support modern deep network topologies without modifications. Flexpoint tensors have a shared exponent that is dynamically adjusted to minimize overflows and maximize available dynamic range. We validate Flexpoint by training AlexNet, a deep residual network and a generative adversarial network, using a simulator implemented with the neon deep learning framework. We demonstrate that 16bit Flexpoint closely matches 32bit floating point in training all three models, without any need for tuning of model hyperparameters. Our results suggest Flexpoint as a promising numerical format for future hardware for training and inference. 
Flint  Serverless architectures organized around looselycoupled function invocations represent an emerging design for many applications. Recent work mostly focuses on userfacing products and eventdriven processing pipelines. In this paper, we explore a completely different part of the application space and examine the feasibility of analytical processing on big data using a serverless architecture. We present Flint, a prototype Spark execution engine that takes advantage of AWS Lambda to provide a pure payasyougo cost model. With Flint, a developer uses PySpark exactly as before, but without needing an actual Spark cluster. We describe the design, implementation, and performance of Flint, along with the challenges associated with serverless analytics. 
Flood Algorithm  With the Umatrix Ultsch (Information and Classification: Concepts, Methods and Applications, pp. 307313, Springer, 1993) introduced a powerful visual representation of the Self Organizing Maps results. We propose an approach that utilizes the Umatrix to identify outlying data points. Then the revised subsample (i.e. the initial sample minus the outlying points) is used to give a robust estimation of location and scatter. restlos 
Flotilla  Flotilla is a human friendly service for task execution. It allows you to focus on the work you’re doing rather than how to do it. In other words, Flotilla takes the struggle out of defining and running containerized jobs. 
Flow Classification Algorithm (FCA) 

Flow Map  Flow maps in cartography are a mix of maps and flow charts, that ‘show the movement of objects from one location to another, such as the number of people in a migration, the amount of goods being traded, or the number of packets in a network’. 
Flowr  Flowr is a robust and scalable framework for designing and deploying computing pipelines in an easytouse fashion. It implements a scattergather approach using computing clusters, simplifying the concept to the use of five simple terms (in submission and dependency types). Most importantly, it is flexible, such that customizing existing pipelines is easy, and since it works across several computing environments (LSF, SGE, Torque, and SLURM), it is portable. GitXiv 
FluidNets  We present FluidNets, an approach to automate the design of neural network structures. FluidNets iteratively shrinks and expands a network, shrinking via a resourceweighted sparsifying regularizer on activations and expanding via a uniform multiplicative factor on all layers. In contrast to previous approaches, our method is scalable to large networks, adaptable to specific resource constraints (e.g. the number of floatingpoint operations per inference), and capable of increasing the network’s performance. When applied to standard network architectures on a wide variety of datasets, our approach discovers novel structures in each domain, obtaining higher performance while respecting the resource constraint. 
FMeasure  In statistical analysis of binary classification, the F1 score (also Fscore or Fmeasure) is a measure of a test’s accuracy. It considers both the precision p and the recall r of the test to compute the score: p is the number of correct positive results divided by the number of all positive results, and r is the number of correct positive results divided by the number of positive results that should have been returned. The F1 score can be interpreted as a weighted average of the precision and recall, where an F1 score reaches its best value at 1 and worst score at 0. The traditional Fmeasure or balanced Fscore (F1 score) is the harmonic mean of precision and recall. 
FMLkNN  Efficient management and analysis of large volumes of data is a demanding task of increasing scientific and industrial importance, as the ubiquitous generation of information governs more and more aspects of human life. In this article, we introduce FMLkNN, a novel distributed processing framework for Big Data that performs probabilistic classification and regression, implemented in Apache Flink. The framework’s core is consisted of a knearest neighbor joins algorithm which, contrary to similar approaches, is executed in a single distributed session and is able to operate on very large volumes of data of variable granularity and dimensionality. We assess FMLkNN’s performance and scalability in a detailed experimental evaluation, in which it is compared to similar methods implemented in Apache Hadoop, Spark, and Flink distributed processing engines. The results indicate an overall superiority of our framework in all the performed comparisons. Further, we apply FMLkNN in two motivating uses cases for water demand management, against realworld domestic water consumption data. In particular, we focus on forecasting water consumption using 1h smart meter data, and extracting consumer characteristics from water use data in the shower. We further discuss on the obtained results, demonstrating the framework’s potential in useful knowledge extraction. 
FOCA  Modeling an ontology is a hard and timeconsuming task. Although methodologies are useful for ontologists to create good ontologies, they do not help with the task of evaluating the quality of the ontology to be reused. For these reasons, it is imperative to evaluate the quality of the ontology after constructing it or before reusing it. Few studies usually present only a set of criteria and questions, but no guidelines to evaluate the ontology. The effort to evaluate an ontology is very high as there is a huge dependence on the evaluator’s expertise to understand the criteria and questions in depth. Moreover, the evaluation is still very subjective. This study presents a novel methodology for ontology evaluation, taking into account three fundamental principles: i) it is based on the Goal, Question, Metric approach for empirical evaluation; ii) the goals of the methodologies are based on the roles of knowledge representations combined with specific evaluation criteria; iii) each ontology is evaluated according to the type of ontology. The methodology was empirically evaluated using different ontologists and ontologies of the same domain. The main contributions of this study are: i) defining a stepbystep approach to evaluate the quality of an ontology; ii) proposing an evaluation based on the roles of knowledge representations; iii) the explicit difference of the evaluation according to the type of the ontology iii) a questionnaire to evaluate the ontologies; iv) a statistical model that automatically calculates the quality of the ontologies. 
FogLearn  Big data analytics with the cloud computing are one of the emerging area for processing and analytics. Fog computing is the paradigm where fog devices help to reduce latency and increase throughput for assisting at the edge of the client. This paper discussed the emergence of fog computing for mining analytics in big data from geospatial and medical health applications. This paper proposed and developed fog computing based framework i.e. FogLearn for application of Kmeans clustering in Ganga River Basin Management and real world feature data for detecting diabetes patients suffering from diabetes mellitus. Proposed architecture employed machine learning on deep learning framework for analysis of pathological feature data that obtained from smart watches worn by the patients with diabetes and geographical parameters of River Ganga basin geospatial database. The results showed that fog computing hold an immense promise for analysis of medical and geospatial big data. 
Folium  Python Data. Leaflet.js Maps. Folium builds on the data wrangling strengths of the Python ecosystem and the mapping strengths of the Leaflet.js library. Manipulate your data in Python, then visualize it in on a Leaflet map via Folium. Concept: Folium makes it easy to visualize data that’s been manipulated in Python on an interactive Leaflet map. It enables both the binding of data to a map for choropleth visualizations as well as passing Vincent/Vega visualizations as markers on the map. The library has a number of builtin tilesets from OpenStreetMap, MapQuest Open, MapQuest Open Aerial, Mapbox, and Stamen, and supports custom tilesets with Mapbox or Cloudmade API keys. Folium supports both GeoJSON and TopoJSON overlays, as well as the binding of data to those overlays to create choropleth maps with colorbrewer color schemes. Creating interactive crime maps with Folium 
Folksodriven  The Folksodriven framework makes it possible for data scientists to define an ontology environment where searching for buried patterns that have some kind of predictive power to build predictive models more effectively. It accomplishes this through an abstractions that isolate parameters of the predictive modeling process searching for patterns and designing the feature set, too. To reflect the evolving knowledge, this paper considers ontologies based on folksonomies according to a new concept structure called ‘Folksodriven’ to represent folksonomies. So, the studies on the transformational regulation of the Folksodriven tags are regarded to be important for adaptive folksonomies classifications in an evolving environment used by Intelligent Systems to represent the knowledge sharing. Folksodriven tags are used to categorize salient data points so they can be fed to a machinelearning system and ‘featurizing’ the data. 
FolksoDrivenCloud (FDC) 
In this paper we present the FolksoDriven Cloud (FDC) built on Cloud and on Semantic technologies. Cloud computing has emerged in these recent years as the new paradigm for the provision of ondemand distributed computing resources. Semantic Web can be used for relationship between different data and descriptions of services to annotate provenance of repositories on ontologies. The FDC service is composed of a backend which submits and monitors the documents, and a user frontend which allows users to schedule ondemand operations and to watch the progress of running processes. The impact of the proposed method is illustrated on a user since its inception. 
Folksonomy  A folksonomy is a system in which users apply public tags to online items, typically to aid them in refinding those items. This can give rise to a classification system based on those tags and their frequencies, in contrast to a taxonomic classification specified by the owners of the content when it is published. This practice is also known as collaborative tagging, social classification, social indexing, and social tagging. However, these terms have slightly different meanings than folksonomy. Folksonomy was originally the result of personal free tagging of information for ones own retrieval. Social tagging is the application of tags in an open online environment where the tags of other users are available to others. Collaborative tagging (also known as group tagging) is tagging performed by a group of users. This type of folksonomy is commonly used in cooperative and collaborative projects such as research, content repositories, and social bookmarking. The term was coined by Thomas Vander Wal in 2004 as a portmanteau of folk and taxonomy. Folksonomies became popular as part of social software applications such as social bookmarking and photograph annotation that enable users to collectively classify and find information via shared tags. Some websites include tag clouds as a way to visualize tags in a folksonomy. Folksonomies can be used for K12 education, business, and higher education. More specifically, folksonomies may be implemented for social bookmarking, teacher resource repositories, elearning systems, collaborative learning, collaborative research, and professional development. 
Follow The (Proximally) Regularized Leader (FTRL) 
Predicting ad clickthrough rates (CTR) is a massivescale learning problem that is central to the multibillion dollar online advertising industry. We present a selection of case studies and topics drawn from recent experiments in the setting of a deployed CTR prediction system. These include improvements in the context of traditional supervised learning based on an FTRLProximal online learning algorithm (which has excellent sparsity and convergence properties) and the use of percoordinate learning rates. We also explore some of the challenges that arise in a realworld system that may appear at first to be outside the domain of traditional machine learning research. These include useful tricks for memory savings, methods for assessing and visualizing performance, practical methods for providing confidence estimates for predicted probabilities, calibration methods, and methods for automated management of features. Finally, we also detail several directions that did not turn out to be beneficial for us, despite promising results elsewhere in the literature. The goal of this paper is to highlight the close relationship between theoretical advances and practical engineering in this industrial setting, and to show the depth of challenges that appear when applying traditional machine learning methods in a complex dynamic system. 
Follow the Leader (FTL) 
A natural algorithm to use in the OCO framework is Follow the Leader, which tries to minimize the regret over all of the previous time steps. https://…/notes.pdf 
Follow the Regularized Leader (FTRL,FoReL) 
To avoid the failure of FTL we can try to “regularize” the weight vectors by adding a penalty function R(w) to the objective. This yields the FoReL algorithm. https://…/notes.pdf http://…/lecture3.pdf 
Force Directed Graph  Forcedirected graph drawing algorithms are a class of algorithms for drawing graphs in an aesthetically pleasing way. Their purpose is to position the nodes of a graph in twodimensional or threedimensional space so that all the edges are of more or less equal length and there are as few crossing edges as possible, by assigning forces among the set of edges and the set of nodes, based on their relative positions, and then using these forces either to simulate the motion of the edges and nodes or to minimize their energy. While graph drawing can be a difficult problem, forcedirected algorithms, being physical simulations, usually require no special knowledge about graph theory such as planarity. Force Layout ForceDirected Graph qrage 
Forest Packing  Machine learning has an emerging critical role in highperformance computing to modulate simulations, extract knowledge from massive data, and replace numerical models with efficient approximations. Decision forests are a critical tool because they provide insight into model operation that is critical to interpreting learned results. While decision forests are trivially parallelizable, the traversals of tree data structures incur many random memory accesses and are very slow. We present memory packing techniques that reorganize learned forests to minimize cache misses during classification. The resulting layout is hierarchical. At low levels, we pack the nodes of multiple trees into contiguous memory blocks so that each memory access fetches data for multiple trees. At higher levels, we use leaf cardinality to identify the most popular paths through a tree and collocate those paths in cache lines. We extend this layout with outoforder execution and cacheline prefetching to increase memory throughput. Together, these optimizations increase the performance of classification in ensembles by a factor of four over an optimized C++ implementation and a actor of 50 over a popular R language implementation. 
Formal Concept Analysis (FCA) 
In information science, formal concept analysis is a principled way of deriving a concept hierarchy or formal ontology from a collection of objects and their properties. Each concept in the hierarchy represents the set of objects sharing the same values for a certain set of properties; and each subconcept in the hierarchy contains a subset of the objects in the concepts above it. The term was introduced by Rudolf Wille in 1984, and builds on applied lattice and order theory that was developed by Garrett Birkhoff and others in the 1930s. Formal concept analysis finds practical application in fields including data mining, text mining, machine learning, knowledge management, semantic web, software development, chemistry and biology. 
Fortified Network  Deep networks have achieved impressive results across a variety of important tasks. However a known weakness is a failure to perform well when evaluated on data which differ from the training distribution, even if these differences are very small, as is the case with adversarial examples. We propose Fortified Networks, a simple transformation of existing networks, which fortifies the hidden layers in a deep network by identifying when the hidden states are off of the data manifold, and maps these hidden states back to parts of the data manifold where the network performs well. Our principal contribution is to show that fortifying these hidden states improves the robustness of deep networks and our experiments (i) demonstrate improved robustness to standard adversarial attacks in both blackbox and whitebox threat models; (ii) suggest that our improvements are not primarily due to the gradient masking problem and (iii) show the advantage of doing this fortification in the hidden layers instead of the input space. 
Forward Search  The Forward Search is a powerful general method, incorporating flexible datadriven trimming, for the detection of outliers and unsuspected structure in data and so for building robust models. Starting from small subsets of data, observations that are close to the fitted model are added to the observations used in parameter estimation. As this subset grows we monitor parameter estimates, test statistics and measures of fit such as residuals. ForwardSearch,forward 
Forward Slice  We propose a method for stochastic optimization: ‘Forward Slice’. We evaluate its performance and apply to design problems in Section 3. At its core, our method is based on the procedure that Neal (2003) called the `slice sampling’ procedure , which was originally developed as a Markov chain Monte Carlo sampling procedure to draw samples from a target distribution. The slice sampling method relies on an auxiliary variable which de nes a level at which we slice the target density to obtain regions from which we draw samples of the target distribution. Similar to Neal’s method, our procedure uses an auxiliary variable for stochastic optimization that also de nes the slices, but of an objective function to be maximized (or minimized). Moreover, unlike with Neal’s method, the auxiliary variable in our approach is not sampled and takes on nondecreasing values in the sequential iterations of the procedure so that, for a given pre{speci ed tolerance, at the end of the procedure we attain the maxima and the argument of the maxima (or close values given the selected tolerance level). 
Forward Thinking  We present a general framework for training deep neural networks without backpropagation. This substantially decreases training time and also allows for construction of deep networks with many sorts of learners, including networks whose layers are defined by functions that are not easily differentiated, like decision trees. The main idea is that layers can be trained one at a time, and once they are trained, the input data are mapped forward through the layer to create a new learning problem. The process is repeated, transforming the data through multiple layers, one at a time, rendering a new data set, which is expected to be better behaved, and on which a final output layer can achieve good performance. We call this forward thinking and demonstrate a proof of concept by achieving stateoftheart accuracy on the MNIST dataset for convolutional neural networks. We also provide a general mathematical formulation of forward thinking that allows for other types of deep learning problems to be considered. 
Forward Thinking Deep Random Forest  The success of deep neural networks has inspired many to wonder whether other learners could benefit from deep, layered architectures. We present a general framework called forward thinking for deep learning that generalizes the architectural flexibility and sophistication of deep neural networks while also allowing for (i) different types of learning functions in the network, other than neurons, and (ii) the ability to adaptively deepen the network as needed to improve results. This is done by training one layer at a time, and once a layer is trained, the input data are mapped forward through the layer to create a new learning problem. The process is then repeated, transforming the data through multiple layers, one at a time, rendering a new dataset, which is expected to be better behaved, and on which a final output layer can achieve good performance. In the case where the neurons of deep neural nets are replaced with decision trees, we call the result a Forward Thinking Deep Random Forest (FTDRF). We demonstrate a proof of concept by applying FTDRF on the MNIST dataset. We also provide a general mathematical formulation that allows for other types of deep learning problems to be considered. 
fpgaConvNet  In recent years, Convolutional Neural Networks (ConvNets) have become an enabling technology for a wide range of novel embedded Artificial Intelligence systems. Across the range of applications, the performance needs vary significantly, from highthroughput video surveillance to the very lowlatency requirements of autonomous cars. In this context, FPGAs can provide a potential platform that can be optimally configured based on the different performance needs. However, the complexity of ConvNet models keeps increasing making their mapping to an FPGA device a challenging task. This work presents fpgaConvNet, an endtoend framework for mapping ConvNets on FPGAs. The proposed framework employs an automated design methodology based on the Synchronous Dataflow (SDF) paradigm and defines a set of SDF transformations in order to efficiently explore the architectural design space. By selectively optimising for throughput, latency or multiobjective criteria, the presented tool is able to efficiently explore the design space and generate hardware designs from highlevel ConvNet specifications, explicitly optimised for the performance metric of interest. Overall, our framework yields designs that improve the performance by up to 6.65x over highly optimised embedded GPU designs for the same power constraints in embedded environments. 
FPGrowth Algorithm  In Data Mining the task of finding frequent pattern in large databases is very important and has been studied in large scale in the past few years. Unfortunately, this task is computationally expensive, especially when a large number of patterns exist. The FPGrowth Algorithm, proposed by Han in , is an efficient and scalable method for mining the complete set of frequent patterns by pattern fragment growth, using an extended prefixtree structure for storing compressed and crucial information about frequent patterns named frequentpattern tree (FPtree). In his study, Han proved that his method outperforms other popular methods for mining frequent patterns, e.g. the Apriori Algorithm and the TreeProjection. In some later works it was proved that FPGrowth has better performance than other methods, including Eclat and Relim. The popularity and efficiency of FPGrowth Algorithm contributes with many studies that propose variations to improve his performance. 
Fractal AI  Fractal AI is a theory for general artificial intelligence. It allows to derive new mathematical tools that constitute the foundations for a new kind of stochastic calculus, by modelling information using cellular automatonlike structures instead of smooth functions. In the repository included we are presenting a new Agent, derived from the first principles of the theory, which is capable of solving Atari games several orders of magnitude more efficiently than other similar techniques, like Monte Carlo Tree Search. The code provided shows how it is now possible to beat some of the current state of the art benchmarks on Atari games, without previous learning and using less than 1000 samples to calculate each one of the actions when standard MCTS uses 3 Million samples. Among other things, Fractal AI makes it possible to generate a huge database of top performing examples with very little amount of computation required, transforming Reinforcement Learning into a supervised problem. The algorithm presented is capable of solving the exploration vs exploitation dilemma on both the discrete and continuous cases, while maintaining control over any aspect of the behavior of the Agent. From a general approach, new techniques presented here have direct applications to other areas such as: Nonequilibrium thermodynamics, chemistry, quantum physics, economics, information theory, and nonlinear control theory. 
Fractional Imputation  
Fractional Langevin Monte Carlo (FLMC) 
Along with the recent advances in scalable Markov Chain Monte Carlo methods, sampling techniques that are based on Langevin diffusions have started receiving increasing attention. These so called Langevin Monte Carlo (LMC) methods are based on diffusions driven by a Brownian motion, which gives rise to Gaussian proposal distributions in the resulting algorithms. Even though these approaches have proven successful in many applications, their performance can be limited by the lighttailed nature of the Gaussian proposals. In this study, we extend classical LMC and develop a novel Fractional LMC (FLMC) framework that is based on a family of heavytailed distributions, called $\alpha$stable L\'{e}vy distributions. As opposed to classical approaches, the proposed approach can possess large jumps while targeting the correct distribution, which would be beneficial for efficient exploration of the state space. We develop novel computational methods that can scale up to largescale problems and we provide formal convergence analysis of the proposed scheme. Our experiments support our theory: FLMC can provide superior performance in multimodal settings, improved convergence rates, and robustness to algorithm parameters. 
Frailty Model  Frailty models are extensions of the proportional hazards model which is best known as the Cox model (Cox, 1972), the most popular model in survival analysis. Normally, in most clinical applications, survival analysis implicitly assumes a homogenous population to be studied. This means that all individuals sampled into that study are subject in principle under the same risk (e.g., risk of death, risk of disease recurrence). In many applications, the study population can not be assumed to be homogeneous but must be considered as a heterogeneous sample, i.e. a mixture of individuals with different hazards. For example, in many cases it is impossible to measure all relevant covariates related to the disease of interest, sometimes because of economical reasons, sometimes the importance of some covariates is still unknown. The frailty approach is a statistical modelling concept which aims to account for heterogeneity, caused by unmeasured covariates. In statistical terms, a frailty model is a random effect model for timetoevent data, where the random effect (the frailty) has a multiplicative effect on the baseline hazard function. parfm,frailtySurv 
Francy  Data visualization and interaction with large data sets is known to be essential and critical in many businesses today, and the same applies to research and teaching, in this case, when exploring large and complex mathematical objects. GAP is a computer algebra system for computational discrete algebra with an emphasis on computational group theory. The existing XGAP package for GAP works exclusively on the X Window System. It lacks abstraction between its mathematical and graphical cores, making it difficult to extend, maintain, or port. In this paper, we present Francy, a graphical semantics package for GAP. Francy is responsible for creating a representational structure that can be rendered using many GUI frameworks independent from any particular programming language or operating system. Building on this, we use state of the art web technologies that take advantage of an improved REPL environment, which is currently under development for GAP. The integration of this project with Jupyter provides a rich graphical environment full of features enhancing the usability and accessibility of GAP. 
FrankWolfe Type Boosting Algorithm (FWBoost) 
Boosting is a generic learning method for classification and regression. Yet, as the number of base hypotheses becomes larger, boosting can lead to a deterioration of test performance. Overfitting is an important and ubiquitous phenomenon, especially in regression settings. To avoid overfitting, we consider using $l_1$ regularization. We propose a novel FrankWolfe type boosting algorithm (FWBoost) applied to general loss functions. By using exponential loss, the FWBoost algorithm can be rewritten as a variant of AdaBoost for binary classification. FWBoost algorithms have exactly the same form as existing boosting methods, in terms of making calls to a base learning algorithm with different weights update. This direct connection between boosting and FrankWolfe yields a new algorithm that is as practical as existing boosting methods but with new guarantees and rates of convergence. Experimental results show that the test performance of FWBoost is not degraded with larger rounds in boosting, which is consistent with the theoretical analysis. 
Freedman’s Paradox  In statistical analysis, Freedman’s paradox, named after David Freedman, describes a problem in model selection whereby predictor variables with no explanatory power can appear artificially important. Freedman demonstrated (through simulation and asymptotic calculation) that this is a common occurrence when the number of variables is similar to the number of data points. Recently, new informationtheoretic estimators have been developed in an attempt to reduce this problem, in addition to the accompanying issue of model selection bias, whereby estimators of predictor variables that have a weak relationship with the response variable are biased. Freedman’s Paradox 
Freemium  Freemium is a pricing strategy by which a product or service (typically a digital offering such as software, media, games or web services) is provided free of charge, but money (premium) is charged for proprietary features, functionality, or virtual goods. The word “freemium” is a portmanteau neologism combining the two aspects of the business model: “free” and “premium”. 
Freestyle Multilingual Image Question Answering (FMIQA) 
Freestyle Multilingual Image Question Answering (FMIQA) dataset to train and evaluate our mQA model. It contains over 120,000 images and 250,000 freestyle Chinese questionanswer pairs and their English translations. The quality of the generated answers of our mQA model on this dataset are evaluated by human judges through a Turing Test. 
FrequencyBased Kernel Kalman Filter (FKKF) 
One main challenge for the design of networks is that traffic load is not generally known in advance. This makes it hard to adequately devote resources such as to best prevent or mitigate bottlenecks. While several authors have shown how to predict traffic in a coarse grained manner by aggregating flows, fine grained prediction of traffic at the level of individual flows, including bursty traffic, is widely considered to be impossible. This paper shows, to the best of our knowledge, the first approach to fine grained per flow traffic prediction. In short, we introduce the Frequencybased Kernel Kalman Filter (FKKF), which predicts individual flows’ behavior based on measurements. Our FKKF relies on the well known Kalman Filter in combination with a kernel to support the prediction of non linear functions. Furthermore we change the operating space from time to frequency space. In this space, into which we transform the input data via a ShortTime Fourier Transform (STFT), the peak structures of flows can be predicted after gleaning their key characteristics, with a Principal Component Analysis (PCA), from past and ongoing flows that stem from the same sockettosocket connection. We demonstrate the effectiveness of our approach on popular benchmark traces from a university data center. Our approach predicts traffic on average across 17 out of 20 groups of flows with an average prediction error of 6.43% around 0.49 (average) seconds in advance, whilst existing coarse grained approaches exhibit prediction errors of 77% at best. 
Frequent Pattern Mining  The problem of frequent pattern mining is that of finding relationships among the items in a database. The problem can be stated as follows. Given a database D with transactions T1 … TN, determine all patterns P that are present in at least a fraction s of the transactions. The fraction s is referred to as the minimum support. The parameter s can be expressed either as an absolute number, or as a fraction of the total number of transactions in the database. Each transaction Ti can be considered a sparse binary vector, or as a set of discrete values representing the identifiers of the binary attributes that are instantiated to the value of 1. The problem was originally proposed in the context of market basket data in order to find frequent groups of items that are bought together. Thus, in this scenario, each attribute corresponds to an item in a superstore, and the binary value represents whether or not it is present in the transaction. Because the problem was originally proposed, it has been applied to numerous other applications in the context of data mining,Web log mining, sequential pattern mining, and software bug analysis. 
Frequent Sequence Mining  
Frequentist Information Criterion (FIC) 
The failure of the informationbased Akaike Information Criterion (AIC) in the context of singular models can be rectified by the definition of a Frequentist Information Criterion (FIC). FIC applies a frequentist approximation to the computation of the model complexity, which can be estimated analytically in many contexts. Like AIC, FIC can be understood as an unbiased estimator of the model predictive performance and is therefore identical to AIC for regular models in the largeobservationnumber limit . In the presence of unidentifiable parameters, the complexity exhibits a more general, nonAIClike scaling. For instance, both BIClike (logN ) and HannanQuinnlike (loglogN ) scaling with observation number N are observed. Unlike the Bayesian model selection approach, FIC is free from {\it ad hoc} prior probability distributions and appears to be widely applicable to model selection problems. Finally we demonstrate that FIC (informationbased inference) is equivalent to frequentist inference for an important class of models. 
Frequently Updated Timestamped Structured Data (FUTS) 
The Internet, and hence IoT, contains potentially billions of Frequently Updated Timestamped Structured (FUTS) data sources, such as realtime traffic reports, air pollution detection, temperature monitoring, crops monitoring, etc. FUTS data sources contain states and updates of physical world things. 
Friedman Test  The Friedman test is a nonparametric statistical test developed by Milton Friedman. Similar to the parametric repeated measures ANOVA, it is used to detect differences in treatments across multiple test attempts. The procedure involves ranking each row (or block) together, then considering the values of ranks by columns. Applicable to complete block designs, it is thus a special case of the Durbin test. 
Frozen Analytics  Frozen analytics to create and prototype rule and scoring system, using crossvalidation, training sets, sampling and algorithms like traditional machine learning algorithms. 
Full Reference Image Quality Assessment (FRIQA) 
While it is nearly effortless for humans to quickly assess the perceptual similarity between two images, the underlying processes are thought to be quite complex. Despite this, the most widely used perceptual metrics today, such as PSNR and SSIM, are simple, shallow functions, and fail to account for many nuances of human perception. Recently, the deep learning community has found that features of the VGG network trained on the ImageNet classification task has been remarkably useful as a training loss for image synthesis. But how perceptual are these socalled ‘perceptual losses’? What elements are critical for their success? To answer these questions, we introduce a new Full Reference Image Quality Assessment (FRIQA) dataset of perceptual human judgments, orders of magnitude larger than previous datasets. We systematically evaluate deep features across different architectures and tasks and compare them with classic metrics. We find that deep features outperform all previous metrics by huge margins. More surprisingly, this result is not restricted to ImageNettrained VGG features, but holds across different deep architectures and levels of supervision (supervised, selfsupervised, or even unsupervised). Our results suggest that perceptual similarity is an emergent property shared across deep visual representations. 
fullFORCE  Trained recurrent networks are powerful tools for modeling dynamic neural computations. We present a targetbased method for modifying the full connectivity matrix of a recurrent network to train it to perform tasks involving temporally complex input/output transformations. The method introduces a second network during training to provide suitable ‘target’ dynamics useful for performing the task. Because it exploits the full recurrent connectivity, the method produces networks that perform tasks with fewer neurons and greater noise robustness than traditional leastsquares (FORCE) approaches. In addition, we show how introducing additional input signals into the targetgenerating network, which act as task hints, greatly extends the range of tasks that can be learned and provides control over the complexity and nature of the dynamics of the trained, taskperforming network. 
Fully Conditional Specification (FCS) 
In this method, an imputation model for each variable with missing values is specified. This method is an iterative MCMC procedure. In each iteration, it sequentially imputes missing values starting from the first variable with missing values. smcfcs 
Fully Convolution Networks (FCN) 
Convolutional networks are powerful visual models that yield hierarchies of features. We show that convolutional networks by themselves, trained endtoend, pixelstopixels, improve on the previous best result in semantic segmentation. Our key insight is to build ‘fully convolutional’ networks that take input of arbitrary size and produce correspondinglysized output with efficient inference and learning. We define and detail the space of fully convolutional networks, explain their application to spatially dense prediction tasks, and draw connections to prior models. We adapt contemporary classification networks (AlexNet, the VGG net, and GoogLeNet) into fully convolutional networks and transfer their learned representations by finetuning to the segmentation task. We then define a skip architecture that combines semantic information from a deep, coarse layer with appearance information from a shallow, fine layer to produce accurate and detailed segmentations. Our fully convolutional network achieves improved segmentation of PASCAL VOC (30% relative improvement to 67.2% mean IU on 2012), NYUDv2, SIFT Flow, and PASCALContext, while inference takes one tenth of a second for a typical image. Improving Fully Convolution Network for Semantic Segmentation 
Fully Convolutional twoStream Fusion Network (FCTSFN) 
In this paper, we propose a novel fully convolutional twostream fusion network (FCTSFN) for interactive image segmentation. The proposed network includes two subnetworks: a twostream late fusion network (TSLFN) that predicts the foreground at a reduced resolution, and a multiscale refining network (MSRN) that refines the foreground at full resolution. The TSLFN includes two distinct deep streams followed by a fusion network. The intuition is that, since user interactions are more direction information on foreground/background than the image itself, the twostream structure of the TSLFN reduces the number of layers between the pure user interaction features and the network output, allowing the user interactions to have a more direct impact on the segmentation result. The MSRN fuses the features from different layers of TSLFN with different scales, in order to seek the local to global information on the foreground to refine the segmentation result at full resolution. We conduct comprehensive experiments on four benchmark datasets. The results show that the proposed network achieves competitive performance compared to current stateoftheart interactive image segmentation methods. 
Functional Additive Regression (FAR) 
We suggest a new method, called Functional Additive Regression, or FAR, for efficiently performing highdimensional functional regression. FAR extends the usual linear regression model involving a functional predictor, $X(t)$, and a scalar response, $Y$, in two key respects. First, FAR uses a penalized least squares optimization approach to efficiently deal with highdimensional problems involving a large number of functional predictors. Second, FAR extends beyond the standard linear regression setting to fit general nonlinear additive models. We demonstrate that FAR can be implemented with a wide range of penalty functions using a highly efficient coordinate descent algorithm. Theoretical results are developed which provide motivation for the FAR optimization criterion. Finally, we show through simulations and two real data sets that FAR can significantly outperform competing methods. 
Functional Causal Model  
Functional Data Analysis (FDA) 
Functional data analysis is a branch of statistics that analyzes data providing information about curves, surfaces or anything else varying over a continuum. The continuum is often time, but may also be spatial location, wavelength, probability, etc. http://…/annurevstatistics010814020413 
Functional Decision Theory  This paper describes and motivates a new decision theory known as functional decision theory (FDT), as distinct from causal decision theory and evidential decision theory. Functional decision theorists hold that the normative principle for action is to treat one’s decision as the output of a fixed mathematical function that answers the question, ‘Which output of this very function would yield the best outcome?’ Adhering to this principle delivers a number of benefits, including the ability to maximize wealth in an array of traditional decisiontheoretic and gametheoretic problems where CDT and EDT perform poorly. Using one simple and coherent decision rule, functional decision theorists (for example) achieve more utility than CDT on Newcomb’s problem, more utility than EDT on the smoking lesion problem, and more utility than both in Parfit’s hitchhiker problem. In this paper, we define FDT, explore its prescriptions in a number of different decision problems, compare it to CDT and EDT, and give philosophical justifications for FDT as a normative theory of decisionmaking. 
Functional Dynamic Principle Components Analysis  freqdom.fda 
Functional Linear Array Model (FLAM) 
The functional linear array model (FLAM) is a unified model class for functional regression models including functiononscalar, scalaronfunction and functiononfunction regression. Mean, median, quantile as well as generalized additive regression models for functional or scalar responses are contained as special cases in this general framework. Our implementation features a broad variety of covariate effects, such as, linear, smooth and interaction effects of grouping variables, scalar and functional covariates. Computational efficiency is achieved by representing the model as a generalized linear array model. While the array structure requires a common grid for functional responses, missing values are allowed. Estimation is conducted using a boosting algorithm, which allows for numerous covariates and automatic, datadriven model selection. To illustrate the flexibility of the model class we use three applications on curing of resin for car production, heat values of fossil fuels and Canadian climate data (the last one in the electronic supplement). These require functiononscalar, scalaronfunction and functiononfunction regression models, respectively, as well as additional capabilities such as robust regression, spatial functional regression, model selection and accommodation of missings. An implementation of our methods is provided in the R addon package FDboost. FDboost 
Functional Principal Component Analysis (FPCA) 
Functional principal component analysis (FPCA) is a statistical method for investigating the dominant modes of variation of functional data. Using this method, a random function is represented in the eigenbasis, which is an orthonormal basis of the Hilbert space L2 that consists of the eigenfunctions of the autocovariance operator. FPCA represents functional data in the most parsimonious way, in the sense that when using a fixed number of basis functions, the eigenfunction basis explains more variation than any other basis expansion. FPCA can be applied for representing random functions, or functional regression and classification. 
Functional Regression  This paper deals with functional regression, in which the input attributes as well as the response are functions. To deal with this problem, we develop a functional reproducing kernel Hilbert space approach; here, a kernel is an operator acting on a function and yielding a function. We demonstrate basic properties of these functional RKHS, as well as a representer theorem for this setting; we investigate the construction of kernels; we provide some experimental insight. 
Fundamental Theorem of Linear Algebra  In mathematics, the fundamental theorem of linear algebra makes several statements regarding vector spaces. These may be stated concretely in terms of the rank r of an m x n matrix A and its singular value decomposition. 
funFEM  A novel modelbased clustering method for time series (and more generally functional data), called FunFEM. It is based on the discriminative functional mixture (DFM) model which models the data into a single discriminative functional subspace. This subspace allows afterward an insightful visualizations of the clustered data. funFEM 
funHDDC  General procedure for clustering functional data which adapts the efficient clustering method HDDC, originally proposed in the multivariate context. The resulting clustering method, called funHDDC, is based on a functional latent mixture model which fits the functional data in groupspecific functional subspaces. By constraining model parameters within and between groups, a family of parsimonious models is exhibited which allow to fit onto various situations. An estimation procedure based on the EM algorithm is proposed for estimating both the model parameters and the groupspecific functional subspaces. Experiments on realworld datasets show that the proposed approach performs better or similarly than classical clustering methods while providing useful interpretations of the groups. funHDDC 
Fused GromovWasserstein Distance  Optimal transport has recently gained a lot of interest in the machine learning community thanks to its ability to compare probability distributions while respecting the underlying space’s geometry. Wasserstein distance deals with feature information through its metric or cost function, but fails in exploiting the structural information, i.e the specific relations existing among the components of the distribution. Recently adapted to a machine learning context, the GromovWasserstein distance defines a metric well suited for comparing distributions that live in different metric spaces by exploiting their inner structural information. In this paper we propose a new optimal transport distance, called the Fused GromovWasserstein distance, capable of leveraging both structural and feature information by combining both views and prove its metric properties over very general manifolds. We also define the barycenter of structured objects as their Fr\’echet mean, leveraging both feature and structural information. We illustrate the versatility of the method for problems where structured objects are involved, computing barycenters in graph and time series contexts. We also use this new distance for graph classification where we obtain comparable or superior results than stateoftheart graph kernel methods and endtoend graph CNN approach. 
Fused Lasso  fuser 
FusedGAN  We present FusedGAN, a deep network for conditional image synthesis with controllable sampling of diverse images. Fidelity, diversity and controllable sampling are the main quality measures of a good image generation model. Most existing models are insufficient in all three aspects. The FusedGAN can perform controllable sampling of diverse images with very high fidelity. We argue that controllability can be achieved by disentangling the generation process into various stages. In contrast to stacked GANs, where multiple stages of GANs are trained separately with full supervision of labeled intermediate images, the FusedGAN has a single stage pipeline with a builtin stacking of GANs. Unlike existing methods, which requires full supervision with paired conditions and images, the FusedGAN can effectively leverage more abundant images without corresponding conditions in training, to produce more diverse samples with high fidelity. We achieve this by fusing two generators: one for unconditional image generation, and the other for conditional image generation, where the two partly share a common latent space thereby disentangling the generation. We demonstrate the efficacy of the FusedGAN in fine grained image generation tasks such as texttoimage, and attributetoface generation. 
Fusion Graph Convolutional Network  Semisupervised node classification involves learning to classify unlabelled nodes given a partially labeled graph. In transductive learning, all unlabelled nodes to be classified are observed during training and in inductive learning, predictions are to be made for nodes not seen at training. In this paper, we focus on both these settings for node classification in attributed graphs, i.e., graphs in which nodes have additional features. Stateoftheart models for node classification on such attributed graphs use differentiable recursive functions. These differentiable recursive functions enable aggregation and filtering of neighborhood information from multiple hops (depths). Despite being powerful, these variants are limited in their ability to combine information from different hops efficiently. In this work, we analyze this limitation of recursive graph functions in terms of their representation capacity to effectively capture multihop neighborhood information. Further, we provide a simple fusion component which is mathematically motivated to address this limitation and improve the existing models to explicitly learn the importance of information from different hops. This proposed mechanism is shown to improve over existing methods across 8 popular datasets from different domains. Specifically, our model improves the Graph Convolutional Network (GCN) and a variant of Graph SAGE by a significant margin providing highly competitive stateoftheart results. 
Fusion Subspace Clustering  Modern inference and learning often hinge on identifying lowdimensional structures that approximate large scale data. Subspace clustering achieves this through a union of linear subspaces. However, in contemporary applications data is increasingly often incomplete, rendering standard (fulldata) methods inapplicable. On the other hand, existing incompletedata methods present major drawbacks, like lifting an already highdimensional problem, or requiring a super polynomial number of samples. Motivated by this, we introduce a new subspace clustering algorithm inspired by fusion penalties. The main idea is to permanently assign each datum to a subspace of its own, and minimize the distance between the subspaces of all data, so that subspaces of the same cluster get fused together. Our approach is entirely new to both, full and missing data, and unlike other methods, it directly allows noise, it requires no liftings, it allows low, high, and even fullrank data, it approaches optimal (informationtheoretic) sampling rates, and it does not rely on other methods such as lowrank matrix completion to handle missing data. Furthermore, our extensive experiments on both real and synthetic data show that our approach performs comparably to the stateoftheart with complete data, and dramatically better if data is missing. 
Future  In computer science, future, promise, and delay refer to constructs used for synchronization in some concurrent programming languages. They describe an object that acts as a proxy for a result that is initially unknown, usually because the computation of its value is yet incomplete. The term promise was proposed in 1976 by Daniel P. Friedman and David Wise, and Peter Hibbard called it eventual. A somewhat similar concept future was introduced in 1977 in a paper by Henry Baker and Carl Hewitt. The terms future, promise, and delay are often used interchangeably, although some differences in usage between future and promise are treated below. Specifically, when usage is distinguished, a future is a readonly placeholder view of a variable, while a promise is a writable, single assignment container which sets the value of the future. Notably, a future may be defined without specifying which specific promise will set its value, and different possible promises may set the value of a given future, though this can be done only once for a given future. In other cases a future and a promise are created together and associated with each other: the future is the value, the promise is the function that sets the value – essentially the return value (future) of an asynchronous function (promise). Setting the value of a future is also called resolving, fulfilling, or binding it. future 
FuzzerGym  Fuzzing is a commonly used technique designed to test software by automatically crafting program inputs. Currently, the most successful fuzzing algorithms emphasize simple, lowoverhead strategies with the ability to efficiently monitor program state during execution. Through compiletime instrumentation, these approaches have access to numerous aspects of program state including coverage, data flow, and heterogeneous fault detection and classification. However, existing approaches utilize blind random mutation strategies when generating test inputs. We present a different approach that uses this state information to optimize mutation operators using reinforcement learning (RL). By integrating OpenAI Gym with libFuzzer we are able to simultaneously leverage advancements in reinforcement learning as well as fuzzing to achieve deeper coverage across several varied benchmarks. Our technique connects the rich, efficient program monitors provided by LLVM Santizers with a deep neural net to learn mutation selection strategies directly from the input data. The crosslanguage, asynchronous architecture we developed enables us to apply any OpenAI Gym compatible deep reinforcement learning algorithm to any fuzzing problem with minimal slowdown. 
Fuzzy Bayesian Learning  In this paper we propose a novel approach for learning from data using rule based fuzzy inference systems where the model parameters are estimated using Bayesian inference and Markov Chain Monte Carlo (MCMC) techniques. We show the applicability of the method for regression and classification tasks using synthetic datasets and also a real world example in the financial services industry. Then we demonstrate how the method can be extended for knowledge extraction to select the individual rules in a Bayesian way which best explains the given data. Finally we discuss the advantages and pitfalls of using this method over stateoftheart techniques and highlight the specific class of problems where this would be useful. 
Fuzzy Clustering  Fuzzy clustering is a class of algorithms for cluster analysis in which the allocation of data points to clusters is not “hard” (allornothing) but “fuzzy” in the same sense as fuzzy logic. 
Fuzzy clustering by Local Approximation of MEmberships Clustering (FLAME) 
Fuzzy clustering by Local Approximation of MEmberships (FLAME) is a data clustering algorithm that defines clusters in the dense parts of a dataset and performs cluster assignment solely based on the neighborhood relationships among objects. The key feature of this algorithm is that the neighborhood relationships among neighboring objects in the feature space are used to constrain the memberships of neighboring objects in the fuzzy membership space. 
Fuzzy CMeans Clustering (FCM) 
In fuzzy clustering, every point has a degree of belonging to clusters, as in fuzzy logic, rather than belonging completely to just one cluster. Thus, points on the edge of a cluster, may be in the cluster to a lesser degree than points in the center of cluster. Any point x has a set of coefficients giving the degree of being in the kth cluster wk(x). With fuzzy cmeans, the centroid of a cluster is the mean of all points, weighted by their degree of belonging to the cluster. 
Fuzzy Cognitive Map  A Fuzzy cognitive map is a cognitive map within which the relations between the elements (e.g. concepts, events, project resources) of a “mental landscape” can be used to compute the “strength of impact” of these elements. The theory behind that computation is fuzzy logic. FCMapper,fcm 
Fuzzy Constraint Linear Discriminant Analysis (FCLDA) 
In this paper we introduce a fuzzy constraint linear discriminant analysis (FCLDA). The FCLDA tries to minimize misclassification error based on modified perceptron criterion that benefits handling the uncertainty near the decision boundary by means of a fuzzy linear programming approach with fuzzy resources. The method proposed has low computational complexity because of its linear characteristics and the ability to deal with noisy data with different degrees of tolerance. Obtained results verify the success of the algorithm when dealing with different problems. Comparing FCLDA and LDA shows superiority in classification task. 
Fuzzy GP Reinforcement Learning (FGPRL) 
Autonomously training interpretable control strategies, called policies, using preexisting plant trajectory data is of great interest in industrial applications. Fuzzy controllers have been used in industry for decades as interpretable and efficient system controllers. In this study, we introduce a fuzzy genetic programming (GP) approach called fuzzy GP reinforcement learning (FGPRL) that can select the relevant state features, determine the size of the required fuzzy rule set, and automatically adjust all the controller parameters simultaneously. Each GP individual’s fitness is computed using modelbased batch reinforcement learning (RL), which first trains a model using available system samples and subsequently performs Monte Carlo rollouts to predict each policy candidate’s performance. We compare FGPRL to an extended version of a related method called fuzzy particle swarm reinforcement learning (FPSRL), which uses swarm intelligence to tune the fuzzy policy parameters. Experiments using an industrial benchmark show that FGPRL is able to autonomously learn interpretable fuzzy policies with high control performance. 
Fuzzy Supervised Learning with Binary MetaFeature (FSLBM) 
This paper introduces a novel realtime Fuzzy Supervised Learning with Binary MetaFeature (FSLBM) for big data classification task. The study of realtime algorithms addresses several major concerns, which are namely: accuracy, memory consumption, and ability to stretch assumptions and time complexity. Attaining a fast computational model providing fuzzy logic and supervised learning is one of the main challenges in the machine learning. In this research paper, we present FSLBM algorithm as an efficient solution of supervised learning with fuzzy logic processing using binary metafeature representation using Hamming Distance and Hash function to relax assumptions. While many studies focused on reducing time complexity and increasing accuracy during the last decade, the novel contribution of this proposed solution comes through integration of Hamming Distance, Hash function, binary metafeatures, binary classification to provide real time supervised method. Hash Tables (HT) component gives a fast access to existing indices; and therefore, the generation of new indices in a constant time complexity, which supersedes existing fuzzy supervised algorithms with better or comparable results. To summarize, the main contribution of this technique for realtime Fuzzy Supervised Learning is to represent hypothesis through binary input as metafeature space and creating the Fuzzy Supervised Hash table to train and validate model. 
Advertisements