S3Pool Feature pooling layers (e.g., max pooling) in convolutional neural networks (CNNs) serve the dual purpose of providing increasingly abstract representations as well as yielding computational savings in subsequent convolutional layers. We view the pooling operation in CNNs as a two-step procedure: first, a pooling window (e.g., $2\times 2$) slides over the feature map with stride one which leaves the spatial resolution intact, and second, downsampling is performed by selecting one pixel from each non-overlapping pooling window in an often uniform and deterministic (e.g., top-left) manner. Our starting point in this work is the observation that this regularly spaced downsampling arising from non-overlapping windows, although intuitive from a signal processing perspective (which has the goal of signal reconstruction), is not necessarily optimal for \emph{learning} (where the goal is to generalize). We study this aspect and propose a novel pooling strategy with stochastic spatial sampling (S3Pool), where the regular downsampling is replaced by a more general stochastic version. We observe that this general stochasticity acts as a strong regularizer, and can also be seen as doing implicit data augmentation by introducing distortions in the feature maps. We further introduce a mechanism to control the amount of distortion to suit different datasets and architectures. To demonstrate the effectiveness of the proposed approach, we perform extensive experiments on several popular image classification benchmarks, observing excellent improvements over baseline models. Experimental code is available at https://…/s3pool. SaberLDA Latent Dirichlet Allocation (LDA) is a popular tool for analyzing discrete count data such as text and images. Applications require LDA to handle both large datasets and a large number of topics. Though distributed CPU systems have been used, GPU-based systems have emerged as a promising alternative because of the high computational power and memory bandwidth of GPUs. However, existing GPU-based LDA systems cannot support a large number of topics because they use algorithms on dense data structures whose time and space complexity is linear to the number of topics. In this paper, we propose SaberLDA, a GPU-based LDA system that implements a sparsity-aware algorithm to achieve sublinear time complexity and scales well to learn a large number of topics. To address the challenges introduced by sparsity, we propose a novel data layout, a new warp-based sampling kernel, and an efficient sparse count matrix updating algorithm that improves locality, makes efficient utilization of GPU warps, and reduces memory consumption. xperiments show that SaberLDA can learn from billions-token-scale data with up to 10,000 topics, which is almost two orders of magnitude larger than that of the previous GPU-based systems. With a single GPU card, SaberLDA is able to earn 10,000 topics from a dataset of billions of tokens in a few hours, which is only achievable with clusters with tens of machines before. Sac2Vec Network representation learning (also known as information network embedding) has been the central piece of research in social and information network analytics for the last couple of years. An information network can be viewed as a linked structure of a set of entities. A set of linked web pages and documents, a set of users in a social network are common examples of information network. Typically a node in the information network is formed with a unique id, some content information and the links to its direct neighbors. Information network representation techniques traditionally use only link structure of the network. But the textual or other types of content in each node plays an important role to understand the underlying semantics of the network. In this paper, we propose Sac2Vec, a network representation technique using structure and content. It is a multi-layered graph approach which uses a random walk to generate the node embedding. Our approach is simple and computationally fast, yet able to use the content as a complement to structure and the vice-versa. Experimental evaluations on three real world publicly available datasets show the merit of our approach compared to state-of-the-art algorithms in the domain. SAFE This paper presents a practical approach for detecting non-stationarity in time series prediction. This method is called SAFE and works by monitoring the evolution of the spectral contents of time series through a distance function. This method is designed to work in combination with state-of-the-art machine learning methods in real time by informing the online predictors to perform necessary adaptation when a non-stationarity presents. We also propose an algorithm to proportionally include some past data in the adaption process to overcome the Catastrophic Forgetting problem. To validate our hypothesis and test the effectiveness of our approach, we present comprehensive experiments in different elements of the approach involving artificial and real-world datasets. The experiments show that the proposed method is able to significantly save computational resources in term of processor or GPU cycles while maintaining high prediction performances. Safe Policy-Model Iteration(SPMI) In many real-world problems, there is the possibility to configure, to a limited extent, some environmental parameters to improve the performance of a learning agent. In this paper, we propose a novel framework, Configurable Markov Decision Processes (Conf-MDPs), to model this new type of interaction with the environment. Furthermore, we provide a new learning algorithm, Safe Policy-Model Iteration (SPMI), to jointly and adaptively optimize the policy and the environment configuration. After having introduced our approach and derived some theoretical results, we present the experimental evaluation in two explicative problems to show the benefits of the environment configurability on the performance of the learned policy. Safe Reinforcement Learning Safe Reinforcement Learning can be defined as the process of learning policies that maximize the expectation of the return in problems in which it is important to ensure reasonable system performance and/or respect safety constraints during the learning and/or deployment processes. We categorize and analyze two approaches of Safe Reinforcement Learning. The first is based on the modification of the optimality criterion, the classic discounted finite/infinite horizon, with a safety factor. The second is based on the modification of the exploration process through the incorporation of external knowledge or the guidance of a risk metric. We use the proposed classification to survey the existing literature, as well as suggesting future directions for Safe Reinforcement Learning. SafePredict SafePredict is a novel meta-algorithm that works with any base prediction algorithm for online data to guarantee an arbitrarily chosen correctness rate, $1-\epsilon$, by allowing refusals. Allowing refusals means that the meta-algorithm may refuse to emit a prediction produced by the base algorithm on occasion so that the error rate on non-refused predictions does not exceed $\epsilon$. The SafePredict error bound does not rely on any assumptions on the data distribution or the base predictor. When the base predictor happens not to exceed the target error rate $\epsilon$, SafePredict refuses only a finite number of times. When the error rate of the base predictor changes through time SafePredict makes use of a weight-shifting heuristic that adapts to these changes without knowing when the changes occur yet still maintains the correctness guarantee. Empirical results show that (i) SafePredict compares favorably with state-of-the art confidence based refusal mechanisms which fail to offer robust error guarantees; and (ii) combining SafePredict with such refusal mechanisms can in many cases further reduce the number of refusals. Our software (currently in Python) is included in the supplementary material. SAFFRON In the online false discovery rate (FDR) problem, one observes a possibly infinite sequence of $p$-values $P_1,P_2,\dots$, each testing a different null hypothesis, and an algorithm must pick a sequence of rejection thresholds $\alpha_1,\alpha_2,\dots$ in an online fashion, effectively rejecting the $k$-th null hypothesis whenever $P_k \leq \alpha_k$. Importantly, $\alpha_k$ must be a function of the past, and cannot depend on $P_k$ or any of the later unseen $p$-values, and must be chosen to guarantee that for any time $t$, the FDR up to time $t$ is less than some pre-determined quantity $\alpha \in (0,1)$. In this work, we present a powerful new framework for online FDR control that we refer to as SAFFRON. Like older alpha-investing (AI) algorithms, SAFFRON starts off with an error budget, called alpha-wealth, that it intelligently allocates to different tests over time, earning back some wealth on making a new discovery. However, unlike older methods, SAFFRON’s threshold sequence is based on a novel estimate of the alpha fraction that it allocates to true null hypotheses. In the offline setting, algorithms that employ an estimate of the proportion of true nulls are called adaptive methods, and SAFFRON can be seen as an online analogue of the famous offline Storey-BH adaptive procedure. Just as Storey-BH is typically more powerful than the Benjamini-Hochberg (BH) procedure under independence, we demonstrate that SAFFRON is also more powerful than its non-adaptive counterparts, such as LORD and other generalized alpha-investing algorithms. Further, a monotone version of the original AI algorithm is recovered as a special case of SAFFRON, that is often more stable and powerful than the original. Lastly, the derivation of SAFFRON provides a novel template for deriving new online FDR rules. SAF-Pooling Major winning Convolutional Neural Networks (CNNs), such as VGGNet, ResNet, DenseNet, \etc, include tens to hundreds of millions of parameters, which impose considerable computation and memory overheads. This limits their practical usage in training and optimizing for real-world applications. On the contrary, light-weight architectures, such as SqueezeNet, are being proposed to address this issue. However, they mainly suffer from low accuracy, as they have compromised between the processing power and efficiency. These inefficiencies mostly stem from following an ad-hoc designing procedure. In this work, we discuss and propose several crucial design principles for an efficient architecture design and elaborate intuitions concerning different aspects of the design procedure. Furthermore, we introduce a new layer called {\it SAF-pooling} to improve the generalization power of the network while keeping it simple by choosing best features. Based on such principles, we propose a simple architecture called {\it SimpNet}. We empirically show that SimpNet provides a good trade-off between the computation/memory efficiency and the accuracy solely based on these primitive but crucial principles. SimpNet outperforms the deeper and more complex architectures such as VGGNet, ResNet, WideResidualNet \etc, on several well-known benchmarks, while having 2 to 25 times fewer number of parameters and operations. We obtain state-of-the-art results (in terms of a balance between the accuracy and the number of involved parameters) on standard datasets, such as CIFAR10, CIFAR100, MNIST and SVHN. The implementations are available at \href{url}{https://…/SimpNet}. SAGA In this paper, we propose a unified framework and an algorithm for the problem of group recommendation where a fixed number of items or alternatives can be recommended to a group of users. The problem of group recommendation arises naturally in many real world contexts, and is closely related to the budgeted social choice problem studied in economics. We frame the group recommendation problem as choosing a subgraph with the largest group consensus score in a completely connected graph defined over the item affinity matrix. We propose a fast greedy algorithm with strong theoretical guarantees, and show that the proposed algorithm compares favorably to the state-of-the-art group recommendation algorithms according to commonly used relevance and coverage performance measures on benchmark dataset. Saliency Detection Within our line of sight there are always things that stand out more than others. If you find yourself gazing over a city from a height for example, you may be drawn to a nearby skyscraper, a flashing light or even a red coat someone is wearing below. Saliency is the aspect of any stimulus that makes it stand out from the crowd. The reason a particular stimulus has such salience may be due to contrast i.e. a white line on a black background or as a result of emotional or cognitive factors. For example, we may hone in on something because we are actively looking for it or because it triggers something in our past or memory. Saliency is most commonly discussed in relation to the visual system but it is employed by every perceptual system such as sound and touch. If we are hungry the smell of a favourite food may be highly salient for example. The mechanisms by which humans grant certain stimuli more attentional focus than others probably holds root in our evolutionary past. Our limited cognitive resources require a way to identify the most relevant stimuli for learning and or survival. The world is full of stimuli everywhere you turn and we cannot attend to all of these at once. How does our visual system know where to focus? Saliency Methods Saliency methods aim to explain the predictions of deep neural networks. These methods lack reliability when the explanation is sensitive to factors that do not contribute to the model prediction. Saliency Prediction Same Place Different Time(SPDT) Sammon Mapping Sammon mapping or Sammon projection is an algorithm that maps a high-dimensional space to a space of lower dimensionality () by trying to preserve the structure of inter-point distances in high-dimensional space in the lower-dimension projection. It is particularly suited for use in exploratory data analysis. The method was proposed by John W. Sammon in 1969. It is considered a non-linear approach as the mapping cannot be represented as a linear combination of the original variables as possible in techniques such as principal component analysis, which also makes it more difficult to use for classification applications. Sample Size Determination(SSD) Sample size determination is the act of choosing the number of observations or replicates to include in a statistical sample. The sample size is an important feature of any empirical study in which the goal is to make inferences about a population from a sample. In practice, the sample size used in a study is determined based on the expense of data collection, and the need to have sufficient statistical power. In complicated studies there may be several different sample sizes involved in the study: for example, in a survey sampling involving stratified sampling there would be different sample sizes for each population. In a census, data are collected on the entire population, hence the sample size is equal to the population size. In experimental design, where a study may be divided into different treatment groups, there may be different sample sizes for each group. ssd Sample Size Optimization(SSO) Finding the minimal sample size for a query regarding given error constraints. MISS: Finding Optimal Sample Sizes for Approximate Analytics Sample, Explore, Modify, Model and Assess(SEMMA) SEMMA is an acronym that stands for Sample, Explore, Modify, Model and Assess. It is a list of sequential steps developed by SAS Institute Inc., one of the largest producers of statistics and business intelligence software. It guides the implementation of data mining applications. Although SEMMA is often considered to be a general data mining methodology, SAS claims that it is “rather a logical organisation of the functional tool set of” one of their products, SAS Enterprise Miner, “for carrying out the core tasks of data mining”. Sample, Operation, Attribute, and Parameter Dimensions(SOAP) The computational requirements for training deep neural networks (DNNs) have grown to the point that it is now standard practice to parallelize training. Existing deep learning systems commonly use data or model parallelism, but unfortunately, these strategies often result in suboptimal parallelization performance. In this paper, we define a more comprehensive search space of parallelization strategies for DNNs called SOAP, which includes strategies to parallelize a DNN in the Sample, Operation, Attribute, and Parameter dimensions. We also propose FlexFlow, a deep learning framework that uses guided randomized search of the SOAP space to find a fast parallelization strategy for a specific parallel machine. To accelerate this search, FlexFlow introduces a novel execution simulator that can accurately predict a parallelization strategy’s performance and is three orders of magnitude faster than prior approaches that have to execute each strategy. We evaluate FlexFlow with six real-world DNN benchmarks on two GPU clusters and show that FlexFlow can increase training throughput by up to 3.8x over state-of-the-art approaches, even when including its search time, and also improves scalability. Sampled Weighted Min-Hashing(SWMH) We present Sampled Weighted Min-Hashing (SWMH), a randomized approach to automatically mine topics from large-scale corpora. SWMH generates multiple random partitions of the corpus vocabulary based on term co-occurrence and agglomerates highly overlapping inter-partition cells to produce the mined topics. While other approaches define a topic as a probabilistic distribution over a vocabulary, SWMH topics are ordered subsets of such vocabulary. Interestingly, the topics mined by SWMH underlie themes from the corpus at different levels of granularity. We extensively evaluate the meaningfulness of the mined topics both qualitatively and quantitatively on the NIPS (1.7 K documents), 20 Newsgroups (20 K), Reuters (800 K) and Wikipedia (4 M) corpora. Additionally, we compare the quality of SWMH with Online LDA topics for document representation in classification. Sampling Clustering We propose an efficient graph-based divisive cluster analysis approach called sampling clustering. It constructs a lite informative dendrogram by recursively dividing a graph into subgraphs. In each recursive call, a graph is sampled first with a set of vertices being removed to disconnect latent clusters, then condensed by adding edges to the remaining vertices to avoid graph fragmentation caused by vertex removals. We also present some sampling and condensing methods and discuss the effectiveness in this paper. Our implementations run in linear time and achieve outstanding performance on various types of datasets. Experimental results show that they outperform state-of-the-art clustering algorithms with significantly less computing resources requirements. Sampling Error In statistics, sampling error is incurred when the statistical characteristics of a population are estimated from a subset, or sample, of that population. Since the sample does not include all members of the population, statistics on the sample, such as means and quantiles, generally differ from parameters on the entire population. For example, if one measures the height of a thousand individuals from a country of one million, the average height of the thousand is typically not the same as the average height of all one million people in the country. Since sampling is typically done to determine the characteristics of a whole population, the difference between the sample and population values is considered a sampling error. Exact measurement of sampling error is generally not feasible since the true population values are unknown; however, sampling error can often be estimated by probabilistic modeling of the sample. Sampling Importance Resampling(SIR) Sampling Importance Resampling allows us to sample from the posterior distribution, p(theta|data) where p(theta|data)?L(theta;data)×p(theta) by resampling from a series of draws from the prior, p(theta). Denote one of those n draws from the prior distribution, p(theta), as thetai. Then draw i from the prior sample is drawn with replacement into the posterior sample with probability qi … Samsara Apache Mahout introduces a new math environment we call Samsara, for its theme of universal renewal. It reflects a fundamental rethinking of how scalable machine learning algorithms are built and customized. Mahout-Samsara is here to help people create their own math while providing some off-the-shelf algorithm implementations. At its core are general linear algebra and statistical operations along with the data structures to support them. You can use is as a library or customize it in Scala with Mahout-specific extensions that look something like R. Mahout-Samsara comes with an interactive shell that runs distributed operations on a Spark cluster. This make prototyping or task submission much easier and allows users to customize algorithms with a whole new degree of freedom. Sankey Diagram Sankey diagrams are a specific type of flow diagram, in which the width of the arrows is shown proportionally to the flow quantity. They are typically used to visualize energy or material or cost transfers between processes. SAP HANA(HANA) SAP HANA has completely transformed the database industry by combining database, data processing, and application platform capabilities in a single in-memory platform. The platform also provides libraries for predictive, planning, text processing, spatial, and business analytics – all on the same architecture. This makes it possible for applications and analytics to be rethought without information processing latency, and sense-and-response solutions can work on massive quantities of real-time data for immediate answers without building pre-aggregates. Simply put – this makes SAP HANA the platform for building and deploying next-generation, real-time applications and analytics. SAP River River is a programming model and a programming language where you define your application (Data Model, Queries & Business Logic) and upon deployment every run-time artifact is deployed onto a DB (such as HANA) and a run-time container (such as XS to run the JavaScript which handles the business logic side). SAP River is an easy way to make SAP HANA Applications. Develop and test an application backend, in a matter of minutes, that runs on SAP HANA – SAP’s in-memory database and application platform. SAP River is a new way of developing native applications on SAP HANA. River consists of a language, a programming model and a set of tools, which allow the developer to focus on the business intent of the application, and largely ignore issues of implementation and optimization. These aspects are taken care of automatically by the language tools, which choose, on compilation, the most appropriate run-time context for each part of the application. River allows a developer to specify the data model, the application business logic as well as access control, all in a single integrated specification. River is compatible with existing SAP HANA objects, like tables, views, stored procedures and XSJS procedures. River code is in fact cross-compiled into these same native runtime objects, which are automatically exposed via an OData API. The result is a simpler development process, increased developer productivity, and application code that is easier to understand and to maintain. Sapphire RDF data in the linked open data (LOD) cloud is very valuable for many different applications. In order to unlock the full value of this data, users should be able to issue complex queries on the RDF datasets in the LOD cloud. SPARQL can express such complex queries, but constructing SPARQL queries can be a challenge to users since it requires knowing the structure and vocabulary of the datasets being queried. In this paper, we introduce Sapphire, a tool that helps users write syntactically and semantically correct SPARQL queries without prior knowledge of the queried datasets. Sapphire interactively helps the user while typing the query by providing auto-complete suggestions based on the queried data. After a query is issued, Sapphire provides suggestions on ways to change the query to better match the needs of the user. We evaluated Sapphire based on performance experiments and a user study and showed it to be superior to competing approaches. SAQL Recently, advanced cyber attacks, which consist of a sequence of steps that involve many vulnerabilities and hosts, compromise the security of many well-protected businesses. This has led to the solutions that ubiquitously monitor system activities in each host (big data) as a series of events, and search for anomalies (abnormal behaviors) for triaging risky events. Since fighting against these attacks is a time-critical mission to prevent further damage, these solutions face challenges in incorporating expert knowledge to perform timely anomaly detection over the large-scale provenance data. To address these challenges, we propose a novel stream-based query system that takes as input, a real-time event feed aggregated from multiple hosts in an enterprise, and provides an anomaly query engine that queries the event feed to identify abnormal behaviors based on the specified anomalies. To facilitate the task of expressing anomalies based on expert knowledge, our system provides a domain-specific query language, SAQL, which allows analysts to express models for (1) rule-based anomalies, (2) time-series anomalies, (3) invariant-based anomalies, and (4) outlier-based anomalies. We deployed our system in NEC Labs America comprising 150 hosts and evaluated it using 1.1TB of real system monitoring data (containing 3.3 billion events). Our evaluations on a broad set of attack behaviors and micro-benchmarks show that our system has a low detection latency (<2s) and a high system throughput (110,000 events/s; supporting ~4000 hosts), and is more efficient in memory utilization than the existing stream-based complex event processing systems. SA-Siam Observing that Semantic features learned in an image classification task and Appearance features learned in a similarity matching task complement each other, we build a twofold Siamese network, named SA-Siam, for real-time object tracking. SA-Siam is composed of a semantic branch and an appearance branch. Each branch is a similarity-learning Siamese network. An important design choice in SA-Siam is to separately train the two branches to keep the heterogeneity of the two types of features. In addition, we propose a channel attention mechanism for the semantic branch. Channel-wise weights are computed according to the channel activations around the target position. While the inherited architecture from SiamFC \cite{SiamFC} allows our tracker to operate beyond real-time, the twofold design and the attention mechanism significantly improve the tracking performance. The proposed SA-Siam outperforms all other real-time trackers by a large margin on OTB-2013/50/100 benchmarks. SAX Transformation ➘ “Symbolic Aggregate Approximation” TSMining s-bAbI In this study, we investigate the limits of the current state of the art AI system for detecting buffer overflows and compare it with current static analysis tools. To do so, we developed a code generator, s-bAbI, capable of producing an arbitrarily large number of code samples of controlled complexity. We found that the static analysis engines we examined have good precision, but poor recall on this dataset, except for a sound static analyzer that has good precision and recall. We found that the state of the art AI system, a memory network modeled after Choi et al. [1], can achieve similar performance to the static analysis engines, but requires an exhaustive amount of training data in order to do so. Our work points towards future approaches that may solve these problems; namely, using representations of code that can capture appropriate scope information and using deep learning methods that are able to perform arithmetic operations. SBG-Sketch Applications in various domains rely on processing graph streams, e.g., communication logs of a cloud-troubleshooting system, road-network traffic updates, and interactions on a social network. A labeled-graph stream refers to a sequence of streamed edges that form a labeled graph. Label-aware applications need to filter the graph stream before performing a graph operation. Due to the large volume and high velocity of these streams, it is often more practical to incrementally build a lossy-compressed version of the graph, and use this lossy version to approximately evaluate graph queries. Challenges arise when the queries are unknown in advance but are associated with filtering predicates based on edge labels. Surprisingly common, and especially challenging, are labeled-graph streams that have highly skewed label distributions that might also vary over time. This paper introduces Self-Balanced Graph Sketch (SBG-Sketch, for short), a graphical sketch for summarizing and querying labeled-graph streams that can cope with all these challenges. SBG-Sketch maintains synopsis for both the edge attributes (e.g., edge weight) as well as the topology of the streamed graph. SBG-Sketch allows efficient processing of graph-traversal queries, e.g., reachability queries. Experimental results over a variety of real graph streams show SBG-Sketch to reduce the estimation errors of state-of-the-art methods by up to 99%. Scala Scala is an object-functional programming language for general software applications. Scala has full support for functional programming and a very strong static type system. This allows programs written in Scala to be very concise and thus smaller in size than other general-purpose programming languages. Many of Scala’s design decisions were inspired by criticism of the shortcomings of Java. Scala source code is intended to be compiled to Java bytecode, so that the resulting executable code runs on a Java virtual machine. Java libraries may be used directly in Scala code and vice versa (language interoperability). Like Java, Scala is object-oriented, and uses a curly-brace syntax reminiscent of the C programming language. Unlike Java, Scala has many features of functional programming languages like Scheme, Standard ML and Haskell, including currying, type inference, immutability, lazy evaluation, and pattern matching. It also has an advanced type system supporting algebraic data types, covariance and contravariance, higher-order types, and anonymous types. Other features of Scala not present in Java include operator overloading, optional parameters, named parameters, raw strings, and no checked exceptions. The name Scala is a portmanteau of ‘scalable’ and ‘language’, signifying that it is designed to grow with the demands of its users. Scalable Advanced Massive Online Analysis(SAMOA) SAMOA (Scalable Advanced Massive Online Analysis) is a platform for mining big data streams. It provides a collection of distributed streaming algorithms for the most common data mining and machine learning tasks such as classification, clustering, and regression, as well as programming abstractions to develop new algorithms. It features a pluggable architecture that allows it to run on several distributed stream processing engines such as Storm, S4, and Samza. samoa is written in Java, is open source, and is available at http://samoa-project.net under the Apache Software License version 2.0. http://samoa.incubator.apache.org http://samoa-project.net https://…able-advanced-massive-online-analysis.pdf Scalable Bayesian Multi-relational Factorization with Side Information using MCMC(Macau) We propose Macau, a powerful and flexible Bayesian factorization method for heterogeneous data. Our model can factorize any set of entities and relations that can be represented by a relational model, including tensors and also multiple relations for each entity. Macau can also incorporate side information, specifically entity and relation features, which are crucial for predicting sparsely observed relations. Macau scales to millions of entity instances, hundred millions of observations, and sparse entity features with millions of dimensions. To achieve the scale up, we specially designed sampling procedure for entity and relation features that relies primarily on noise injection in linear regressions. We show performance and advanced features of Macau in a set of experiments, including challenging drug-protein activity prediction task. Scalable Generalized Dynamic Topic Model Dynamic topic models (DTMs) model the evolution of prevalent themes in literature, online media, and other forms of text over time. DTMs assume that word co-occurrence statistics change continuously and therefore impose continuous stochastic process priors on their model parameters. These dynamical priors make inference much harder than in regular topic models, and also limit scalability. In this paper, we present several new results around DTMs. First, we extend the class of tractable priors from Wiener processes to the generic class of Gaussian processes (GPs). This allows us to explore topics that develop smoothly over time, that have a long-term memory or are temporally concentrated (for event detection). Second, we show how to perform scalable approximate inference in these models based on ideas around stochastic variational inference and sparse Gaussian processes. This way we can train a rich family of DTMs to massive data. Our experiments on several large-scale datasets show that our generalized model allows us to find interesting patterns that were not accessible by previous approaches. Scalable Logo Self-co-Learning(SL^2) Existing logo detection methods usually consider a small number of logo classes and limited images per class with a strong assumption of requiring tedious object bounding box annotations, therefore not scalable to real-world dynamic applications. In this work, we tackle these challenges by exploring the webly data learning principle without the need for exhaustive manual labelling. Specifically, we propose a novel incremental learning approach, called Scalable Logo Self-co-Learning (SL^2), capable of automatically self-discovering informative training images from noisy web data for progressively improving model capability in a cross-model co-learning manner. Moreover, we introduce a very large (2,190,757 images of 194 logo classes) logo dataset ‘WebLogo-2M’ by an automatic web data collection and processing method. Extensive comparative evaluations demonstrate the superiority of the proposed SL^2 method over the state-of-the-art strongly and weakly supervised detection models and contemporary webly data learning approaches. Scalable Machine Learning Scalability has become one of those core concept slash buzzwords of Big Data. It’s all about scaling out, web scale, and so on. In principle, the idea is to be able to take one piece of code and then throw any number of computers at it to make it fast. The terms “scalable” and “large scale” have been used in machine learning circles long before there was Big Data. There had always been certain problems which lead to a large amount of data, for example in bioinformatics, or when dealing with large number of text documents. So finding learning algorithms, or more generally data analysis algorithms which can deal with a very large set of data was always a relevant question. Scalable Online Learning(SOL) SOL is an open-source library for scalable online learning algorithms, and is particularly suitable for learning with high-dimensional data. The library provides a family of regular and sparse online learning algorithms for large-scale binary and multi-class classification tasks with high efficiency, scalability, portability, and extensibility. SOL was implemented in C++, and provided with a collection of easy-to-use command-line tools, python wrappers and library calls for users and developers, as well as comprehensive documents for both beginners and advanced users. SOL is not only a practical machine learning toolbox, but also a comprehensive experimental platform for online learning research. Experiments demonstrate that SOL is highly efficient and scalable for large-scale machine learning with high-dimensional data. Scalding Scalding is a Scala library that makes it easy to specify Hadoop MapReduce jobs. Scalding is built on top of Cascading, a Java library that abstracts away low-level Hadoop details. Scalding is comparable to Pig, but offers tight integration with Scala, bringing advantages of Scala to your MapReduce jobs. Scale Adaptive Dictionary Learning(SADL) Dictionary learning has been widely used in many image processing tasks. In most of these methods, the number of basis vectors is either set by experience or coarsely evaluated empirically. In this paper we propose a new Scale Adaptive Dictionary Learning (SADL) framework, which jointly estimates suitable scales and corresponding atoms in an adaptive fashion according to the training data, without the need of prior information. We design an atom counting function and develop a reliable numerical scheme to solve the challenging optimization problem. Extensive experiments on texture and video datasets demonstrate quantitatively and visually that our method can estimate the scale, without damaging the sparse reconstruction ability. Scale Invariant Probabilistic Neural Network(SPNN) Proposed by Specht (1990) . spnn Scaled Cayley Orthogonal Recurrent Neural Network(scoRNN) Recurrent Neural Networks (RNNs) are designed to handle sequential data but suffer from vanishing or exploding gradients. Recent work on Unitary Recurrent Neural Networks (uRNNs) have been used to address this issue and in some cases, exceed the capabilities of Long Short-Term Memory networks (LSTMs). We propose a simpler and novel update scheme to maintain orthogonal recurrent weight matrices without using complex valued matrices. This is done by parametrizing with a skew-symmetric matrix using the Cayley transform. Such a parametrization is unable to represent matrices with negative one eigenvalues, but this limitation is overcome by scaling the recurrent weight matrix by a diagonal matrix consisting of ones and negative ones. The proposed training scheme involves a straightforward gradient calculation and update step. In several experiments, the proposed scaled Cayley orthogonal recurrent neural network (scoRNN) achieves superior results with fewer trainable parameters than other unitary RNNs. Scaled Exponentially-Regularized Linear Unit(SERLU) Recently, self-normalizing neural networks (SNNs) have been proposed with the intention to avoid batch or weight normalization. The key step in SNNs is to properly scale the exponential linear unit (referred to as SELU) to inherently incorporate normalization based on central limit theory. SELU is a monotonically increasing function, where it has an approximately constant negative output for large negative input. In this work, we propose a new activation function to break the monotonicity property of SELU while still preserving the self-normalizing property. Differently from SELU, the new function introduces a bump-shaped function in the region of negative input by regularizing a linear function with a scaled exponential function, which is referred to as a scaled exponentially-regularized linear unit (SERLU). The bump-shaped function has approximately zero response to large negative input while being able to push the output of SERLU towards zero mean statistically. To effectively combat over-fitting, we develop a so-called shift-dropout for SERLU, which includes standard dropout as a special case. Experimental results on MNIST, CIFAR10 and CIFAR100 show that SERLU-based neural networks provide consistently promising results in comparison to other 5 activation functions including ELU, SELU, Swish, Leakly ReLU and ReLU. Scaled Sparse Linear Regression Scaled sparse linear regression jointly estimates the regression coefficients and noise level in a linear model. It chooses an equilibrium with a sparse regression method by iteratively estimating the noise level via the mean residual square and scaling the penalty in proportion to the estimated noise level. The iterative algorithm costs little beyond the computation of a path or grid of the sparse regression estimator for penalty levels above a proper threshold. For the scaled lasso, the algorithm is a gradient descent in a convex minimization of a penalized joint loss function for the regression coefficients and noise level. Under mild regularity conditions, we prove that the scaled lasso simultaneously yields an estimator for the noise level and an estimated coefficient vector satisfying certain oracle inequalities for prediction, the estimation of the noise level and the regression coefficients. These inequalities provide sufficient conditions for the consistency and asymptotic normality of the noise level estimator, including certain cases where the number of variables is of greater order than the sample size. Parallel results are provided for the least squares estimation after model selection by the scaled lasso. Numerical results demonstrate the superior performance of the proposed methods over an earlier proposal of joint convex minimization. Scale-Free Identifier Network(sfIN) We propose scale-free Identifier Network(sfIN), a novel model for event identification in documents. In general, sfIN first encodes a document into multi-scale memory stacks, then extracts special events via conducting multi-scale actions, which can be considered as a special type of sequence labelling. The design of large scale actions makes it more efficient processing a long document. The whole model is trained with both supervised learning and reinforcement learning. Scaling Invariable Benford Distance For the first time, we introduce ‘Scaling invariable Benford distance’ and ‘Benford cyclic graph’, which can be used to analyze any data set. Using the quantity and the graph, we analyze some date sets with common distributions, such as normal, exponent, etc., find that different data set has a much different value of ‘Scaling invariable Benford distance’ and different figure feature of ‘Benford cyclic graph’. We also explore the influence of data size on ‘Scaling invariable Benford distance’, and find that it firstly reduces with data size increasing, then approximate to a fixed value when the size is large enough. Scattering Transforms(ScatterNets) Scattering Transforms (or ScatterNets) introduced by Mallat are a promising start into creating a well-defined feature extractor to use for pattern recognition and image classification tasks. They are of particular interest due to their architectural similarity to Convolutional Neural Networks (CNNs), while requiring no parameter learning and still performing very well (particularly in constrained classification tasks). In this paper we visualize what the deeper layers of a ScatterNet are sensitive to using a ‘DeScatterNet’. We show that the higher orders of ScatterNets are sensitive to complex, edge-like patterns (checker-boards and rippled edges). These complex patterns may be useful for texture classification, but are quite dissimilar from the patterns visualized in second and third layers of Convolutional Neural Networks (CNNs) – the current state of the art Image Classifiers. We propose that this may be the source of the current gaps in performance between ScatterNets and CNNs (83% vs 93% on CIFAR-10 for ScatterNet+SVM vs ResNet). We then use these visualization tools to propose possible enhancements to the ScatterNet design, which show they have the power to extract features more closely resembling CNNs, while still being well-defined and having the invariance properties fundamental to ScatterNets. ScatterNet Hybrid Deep Learning(SHDL) The paper proposes the ScatterNet Hybrid Deep Learning (SHDL) network that extracts invariant and discriminative image representations for object recognition. SHDL framework is constructed with a multi-layer ScatterNet front-end, an unsupervised learning middle, and a supervised learning back-end module. Each layer of the SHDL network is automatically designed as an explicit optimization problem leading to an optimal deep learning architecture with improved computational performance as compared to the more usual deep network architectures. SHDL network produces the state-of-the-art classification performance against unsupervised and semi-supervised learning (GANs) on two image datasets. Advantages of the SHDL network over supervised methods (NIN, VGG) are also demonstrated with experiments performed on training datasets of reduced size. Scatterplot Smoothing In statistics, several scatterplot smoothing methods are available to fit a function through the points of a scatterplot to best represent the relationship between the variables. Scatterplots may be smoothed by fitting a line to the data points in a diagram. This line attempts to display the non-random component of the association between the variables in a 2D scatter plot. Smoothing attempts to separate the non-random behaviour in the data from the random fluctuations, removing or reducing these fluctuations, and allows prediction of the response based value of the explanatory variable. Scene-LSTM We develop a human movement trajectory prediction system that incorporates the scene information (Scene-LSTM) as well as human movement trajectories (Pedestrian movement LSTM) in the prediction process within static crowded scenes. We superimpose a two-level grid structure (scene is divided into grid cells each modeled by a scene-LSTM, which are further divided into smaller sub-grids for finer spatial granularity) and explore common human trajectories occurring in the grid cell (e.g., making a right or left turn onto sidewalks coming out of an alley; or standing still at bus/train stops). Two coupled LSTM networks, Pedestrian movement LSTMs (one per target) and the corresponding Scene-LSTMs (one per grid-cell) are trained simultaneously to predict the next movements. We show that such common path information greatly influences prediction of future movement. We further design a scene data filter that holds important non-linear movement information. The scene data filter allows us to select the relevant parts of the information from the grid cell’s memory relative to a target’s state. We evaluate and compare two versions of our method with the Linear and several existing LSTM-based methods on five crowded video sequences from the UCY [1] and ETH [2] datasets. The results show that our method reduces the location displacement errors compared to related methods and specifically about 80% reduction compared to social interaction methods. Scheduled Auxiliary Control(SAC-X) We propose Scheduled Auxiliary Control (SAC-X), a new learning paradigm in the context of Reinforcement Learning (RL). SAC-X enables learning of complex behaviors – from scratch – in the presence of multiple sparse reward signals. To this end, the agent is equipped with a set of general auxiliary tasks, that it attempts to learn simultaneously via off-policy RL. The key idea behind our method is that active (learned) scheduling and execution of auxiliary policies allows the agent to efficiently explore its environment – enabling it to excel at sparse reward RL. Our experiments in several challenging robotic manipulation settings demonstrate the power of our approach. Scheduling Theory A branch of applied mathematics (a division of operations research) concerned with mathematical formulations and solution methods of problems of optimal ordering and coordination in time of certain operations. Scheduling theory includes questions on the development of optimal schedules (Gantt charts, graphs) for performing finite (or repetitive) sets of operations. The area of application of results in scheduling theory include management, production, transportation, computer systems, construction, etc. The problems that scheduling theory deals with are usually formulated as optimization problems for a process of processing a finite set of jobs in a system with limited resources. A finite set of jobs is what distinguishes scheduling models from similar models in queueing theory, where basically infinite flows of activities are considered. In all other respects the starting points of the two theories are close. In scheduling theory, the time of arrival for every job into the system is specified. Within the system the job has to pass several processing stages, depending on the conditions of the problem. For every stage, feasible sets of resources are given, as well as the processing time depending on the resources used. The possibility of interruptions in the processing of certain jobs (so-called pre-emptions) can also be stipulated. Constraints on the processing sequence are usually described by a transitive anti-reflexive binary relation. Algorithms for the evaluation of characteristics of large partially ordered sets of jobs constitute the essence of the part of scheduling theory called network analysis (cf. Network model; Network planning). Sometimes, in scheduling models durations of re-adjustments are specified that are necessary when one job in process is replaced by another, as well as certain other conditions. Schelling’s Model of Segregation In 1971, the American economist Thomas Schelling created an agent-based model that might help explain why segregation is so difficult to combat. His model of segregation showed that even when individuals (or ‘agents’) didn’t mind being surrounded or living by agents of a different race, they would still choose to segregate themselves from other agents over time! Although the model is quite simple, it gives a fascinating look at how individuals might self-segregate, even when they have no explicit desire to do so. Scientific Data Mining Scientific data mining is defined as data mining applied to scientific problems, rather than database marketing, finance, or business-driven applications. Scientific data mining distinguishes itself in the sense that the nature of the datasets is often very different from traditional marketdriven data mining applications. The datasets now might involve vast amounts of precise and continuous data, and accounting for underlying system nonlinearities can be extremely challenging from a machine learning point of view. Scientific Information Extractor(SciIE) We introduce a multi-task setup of identifying and classifying entities, relations, and coreference clusters in scientific articles. We create SciERC, a dataset that includes annotations for all three tasks and develop a unified framework called Scientific Information Extractor (SciIE) for with shared span representations. The multi-task setup reduces cascading errors between tasks and leverages cross-sentence relations through coreference links. Experiments show that our multi-task model outperforms previous models in scientific information extraction without using any domain-specific features. We further show that the framework supports construction of a scientific knowledge graph, which we use to analyze information in scientific literature. scikit scikit-learn: Machine Learning in Python · Simple and efficient tools for data mining and data analysis · Accessible to everybody, and reusable in various contexts · Built on NumPy, SciPy, and matplotlib · Open source, commercially usable – BSD license Scikit-Multiflow Scikit-multiflow is a multi-output/multi-label and stream data mining framework for the Python programming language. Conceived to serve as a platform to encourage democratization of stream learning research, it provides multiple state of the art methods for stream learning, stream generators and evaluators. scikit-multiflow builds upon popular open source frameworks including scikit-learn, MOA and MEKA. Development follows the FOSS principles and quality is enforced by complying with PEP8 guidelines and using continuous integration and automatic testing. Scikit-Multiflow: A Multi-output Streaming Framework SciTokens The management of security credentials (e.g., passwords, secret keys) for computational science workflows is a burden for scientists and information security officers. Problems with credentials (e.g., expiration, privilege mismatch) cause workflows to fail to fetch needed input data or store valuable scientific results, distracting scientists from their research by requiring them to diagnose the problems, re-run their computations, and wait longer for their results. In this paper, we introduce SciTokens, open source software to help scientists manage their security credentials more reliably and securely. We describe the SciTokens system architecture, design, and implementation addressing use cases from the Laser Interferometer Gravitational-Wave Observatory (LIGO) Scientific Collaboration and the Large Synoptic Survey Telescope (LSST) projects. We also present our integration with widely-used software that supports distributed scientific computing, including HTCondor, CVMFS, and XrootD. SciTokens uses IETF-standard OAuth tokens for capability-based secure access to remote scientific data. The access tokens convey the specific authorizations needed by the workflows, rather than general-purpose authentication impersonation credentials, to address the risks of scientific workflows running on distributed infrastructure including NSF resources (e.g., LIGO Data Grid, Open Science Grid, XSEDE) and public clouds (e.g., Amazon Web Services, Google Cloud, Microsoft Azure). By improving the interoperability and security of scientific workflows, SciTokens 1) enables use of distributed computing for scientific domains that require greater data protection and 2) enables use of more widely distributed computing resources by reducing the risk of credential abuse on remote systems. Score Function In statistics, the score, score function, efficient score or informant indicates how sensitively a likelihood function L(theta,X) depends on its parameter theta. Explicitly, the score for theta is the gradient of the log-likelihood with respect to theta. The score plays an important role in several aspects of inference. For example: · in formulating a test statistic for a locally most powerful test; · in approximating the error in a maximum likelihood estimate; · in demonstrating the asymptotic sufficiency of a maximum likelihood estimate; · in the formulation of confidence intervals; · in demonstrations of the Cramér-Rao inequality. The score function also plays an important role in computational statistics, as it can play a part in the computation of maximum likelihood estimates. Scoring Rule In decision theory, a score function, or scoring rule, measures the accuracy of probabilistic predictions. It is applicable to tasks in which predictions must assign probabilities to a set of mutually exclusive discrete outcomes. The set of possible outcomes can be either binary or categorical in nature, and the probabilities assigned to this set of outcomes must sum to one (where each individual probability is in the range of 0 to 1). A score can be thought of as either a measure of the “calibration” of a set of probabilistic predictions, or as a “cost function” or “loss function”. If a cost is levied in proportion to a proper scoring rule, the minimal expected cost corresponds to reporting the true set of probabilities. Proper scoring rules are used in meteorology, finance, and pattern classification where a forecaster or algorithm will attempt to minimize the average score to yield refined, calibrated probabilities (i.e. accurate probabilities). Various scoring rules have also been used to assess the predictive accuracy of football forecast models. Scott-Knott Scott-Knott is an hierarchical clustering algorithm used in the application of ANOVA, when the researcher is comparing treatment means, with a very important characteristic: it does not present any overlapping in its grouping results. We wrote a code, in R, that performs this algorithm starting from vectors, matrix, data.frame, aov or aov.list objects. The results are presented with letters representing groups, as well as through graphics using different colors to differentiate distinct groups. This R package, named ScottKnott is the main topic of this article. ScottKnott,ScottKnottESD SCOUT Finding the right cloud configuration for workloads is an essential step to ensure good performance and contain running costs. A poor choice of cloud configuration decreases application performance and increases running cost significantly. While Bayesian Optimization is effective and applicable to any workloads, it is fragile because performance and workload are hard to model (to predict). In this paper, we propose a novel method, SCOUT. The central insight of SCOUT is that using prior measurements, even those for different workloads, improves search performance and reduces search cost. At its core, SCOUT extracts search hints (inference of resource requirements) from low-level performance metrics. Such hints enable SCOUT to navigate through the search space more efficiently—only spotlight region will be searched. We evaluate SCOUT with 107 workloads on Apache Hadoop and Spark. The experimental results demonstrate that our approach finds better cloud configurations with a lower search cost than state of the art methods. Based on this work, we conclude that (i) low-level performance information is necessary for finding the right cloud configuration in an effective, efficient and reliable way, and (ii) a search method can be guided by historical data, thereby reducing cost and improving performance. ScoutBot ScoutBot is a dialogue interface to physical and simulated robots that supports collaborative exploration of environments. The demonstration will allow users to issue unconstrained spoken language commands to ScoutBot. ScoutBot will prompt for clarification if the user’s instruction needs additional input. It is trained on human-robot dialogue collected from Wizard-of-Oz experiments, where robot responses were initiated by a human wizard in previous interactions. The demonstration will show a simulated ground robot (Clearpath Jackal) in a simulated environment supported by ROS (Robot Operating System). Scree Plot A scree plot is a graphical display of the variance of each component in the dataset which is used to determine how many components should be retained in order to explain a high percentage of the variation in the data. The plot shows the variance for the first component and then for the subsequent components, it shows the additional variance that each component is adding. ScreenerNet We propose to learn a curriculum or a syllabus for supervised learning with deep neural networks. Specifically, we learn weights for each sample in training by an attached neural network, called ScreenerNet, to the original network and jointly train them in an end-to-end fashion. We show the networks augmented with our ScreenerNet achieve early convergence with better accuracy than the state-of-the-art rule-based curricular learning methods in extensive experiments using three popular vision datasets including MNIST, CIFAR10 and Pascal VOC2012, and a Cartpole task using Deep Q-learning. SCvx This paper presents the SCvx algorithm, a successive convexification algorithm designed to solve non-convex optimal control problems with global convergence and superlinear convergence-rate guarantees. The proposed algorithm handles nonlinear dynamics and non-convex state and control constraints by linearizing them about the solution of the previous iterate, and solving the resulting convex subproblem to obtain a solution for the current iterate. Additionally, the algorithm incorporates several safe-guarding techniques into each convex subproblem, employing virtual controls and virtual buffer zones to avoid artificial infeasibility, and a trust region to avoid artificial unboundedness. The procedure is repeated in succession, thus turning a difficult non-convex optimal control problem into a sequence of numerically tractable convex subproblems. Using fast and reliable Interior Point Method (IPM) solvers, the convex subproblems can be computed quickly, making the SCvx algorithm well suited for real-time applications. Analysis is presented to show that the algorithm converges both globally and superlinearly, guaranteeing the local optimality of the original problem. The superlinear convergence is obtained by exploiting the structure of optimal control problems, showcasing the superior convergence rate that can be obtained by leveraging specific problem properties when compared to generic nonlinear programming methods. Numerical simulations are performed for an illustrative non-convex quad-rotor motion planning example problem, and corresponding results obtained using Sequential Quadratic Programming (SQP) solver are provided for comparison. Our results show that the convergence rate of the SCvx algorithm is indeed superlinear, and surpasses that of the SQP-based method by converging in less than half the number of iterations. SE3-Nets We introduce SE3-Nets, which are deep networks designed to model rigid body motion from raw point cloud data. Based only on pairs of depth images along with an action vector and point wise data associations, SE3-Nets learn to segment effected object parts and predict their motion resulting from the applied force. Rather than learning point wise flow vectors, SE3-Nets predict SE3 transformations for different parts of the scene. Using simulated depth data of a table top scene and a robot manipulator, we show that the structure underlying SE3-Nets enables them to generate a far more consistent prediction of object motion than traditional flow based networks. Seaborn Seaborn is a Python visualization library based on matplotlib. It provides a high-level interface for drawing attractive statistical graphics. Search Partition Analysis(SPAN) spanr Search Session Markov Decision Process(SSMDP) In e-commerce platforms such as Amazon and TaoBao, ranking items in a search session is a typical multi-step decision-making problem. Learning to rank (LTR) methods have been widely applied to ranking problems. However, such methods often consider different ranking steps in a session to be independent, which conversely may be highly correlated to each other. For better utilizing the correlation between different ranking steps, in this paper, we propose to use reinforcement learning (RL) to learn an optimal ranking policy which maximizes the expected accumulative rewards in a search session. Firstly, we formally define the concept of search session Markov decision process (SSMDP) to formulate the multi-step ranking problem. Secondly, we analyze the property of SSMDP and theoretically prove the necessity of maximizing accumulative rewards. Lastly, we propose a novel policy gradient algorithm for learning an optimal ranking policy, which is able to deal with the problem of high reward variance and unbalanced reward distribution of an SSMDP. Experiments are conducted in simulation and TaoBao search engine. The results demonstrate that our algorithm performs much better than online LTR methods, with more than 40% and 30% growth of total transaction amount in the simulation and the real application, respectively. SEARNN We propose SEARNN, a novel training algorithm for recurrent neural networks (RNNs) inspired by the ‘learning to search’ (L2S) approach to structured prediction. RNNs have been widely successful in structured prediction applications such as machine translation or parsing, and are commonly trained using maximum likelihood estimation (MLE). Unfortunately, this training loss is not always an appropriate surrogate for the test error: by only maximizing the ground truth probability, it fails to exploit the wealth of information offered by structured losses. Further, it introduces discrepancies between training and predicting (such as exposure bias) that may hurt test performance. Instead, SEARNN leverages test-alike search space exploration to introduce global-local losses that are closer to the test error. We demonstrate improved performance over MLE on three different tasks: OCR, spelling correction and text chunking. Finally, we propose a subsampling strategy to enable SEARNN to scale to large vocabulary sizes. Seasonal ARIMA(SARIMA) Often time series possess a seasonal component that repeats every s observations. For monthly observations s = 12 (12 in 1 year), for quarterly observations s = 4 (4 in 1 year). In order to deal with seasonality, ARIMA processes have been generalized: SARIMA models have then been formulated. Seasonal Decomposition of Time Series by Loess(STL) Decompose a time series into seasonal, trend and irregular components using loess. Seasonal Hybrid ESD(S-H-ESD) The primary algorithm, Seasonal Hybrid ESD (S-H-ESD), builds upon the Generalized ESD test for detecting anomalies. S-H-ESD can be used to detect both global and local anomalies. This is achieved by employing time series decomposition and using robust statistical metrics, viz., median together with ESD. In addition, for long time series such as 6 months of minutely data, the algorithm employs piecewise approximation. This is rooted to the fact that trend extraction in the presence of anomalies is non-trivial for anomaly detection. AnomalyDetection Second-Order Convolutional Neural Networks Convolutional Neural Networks (CNNs) have been successfully applied to many computer vision tasks, such as image classification. By performing linear combinations and element-wise nonlinear operations, these networks can be thought of as extracting solely first-order information from an input image. In the past, however, second-order statistics computed from handcrafted features, e.g., covariances, have proven highly effective in diverse recognition tasks. In this paper, we introduce a novel class of CNNs that exploit second-order statistics. To this end, we design a series of new layers that (i) extract a covariance matrix from convolutional activations, (ii) compute a parametric, second-order transformation of a matrix, and (iii) perform a parametric vectorization of a matrix. These operations can be assembled to form a Covariance Descriptor Unit (CDU), which replaces the fully-connected layers of standard CNNs. Our experiments demonstrate the benefits of our new architecture, which outperform the first-order CNNs, while relying on up to 90% fewer parameters. Second-Order Pooling SecureStreams The growing adoption of distributed data processing frameworks in a wide diversity of application domains challenges end-to-end integration of properties like security, in particular when considering deployments in the context of large-scale clusters or multi-tenant Cloud infrastructures. This paper therefore introduces SecureStreams, a reactive middleware framework to deploy and process secure streams at scale. Its design combines the high-level reactive dataflow programming paradigm with Intel’s low-level software guard extensions (SGX) in order to guarantee privacy and integrity of the processed data. The experimental results of SecureStreams are promising: while offering a fluent scripting language based on Lua, our middleware delivers high processing throughput, thus enabling developers to implement secure processing pipelines in just few lines of code. Sedano We present Sedano, a system for processing and indexing a continuous stream of business-related news. Sedano defines pipelines whose stages analyze and enrich news items (e.g., newspaper articles and press releases). News data coming from several content sources are stored, processed and then indexed in order to be consumed by Atoka, our business intelligence product. Atoka users can retrieve news about specific companies, filtering according to various facets. Sedano features both an entity-linking phase, which finds mentions of companies in news, and a classification phase, which classifies news according to a set of business events. Its flexible architecture allows Sedano to be deployed on commodity machines while being scalable and fault-tolerant. Seemingly Unrelated Regression(SUR) In econometrics, the seemingly unrelated regressions (SUR) or seemingly unrelated regression equations (SURE) model, proposed by Arnold Zellner in (1962), is a generalization of a linear regression model that consists of several regression equations, each having its own dependent variable and potentially different sets of exogenous explanatory variables. Each equation is a valid linear regression on its own and can be estimated separately, which is why the system is called seemingly unrelated, although some authors suggest that the term seemingly related would be more appropriate, since the error terms are assumed to be correlated across the equations. The model can be estimated equation-by-equation using standard ordinary least squares (OLS). Such estimates are consistent, however generally not as efficient as the SUR method, which amounts to feasible generalized least squares with a specific form of the variance-covariance matrix. Two important cases when SUR is in fact equivalent to OLS are when the error terms are in fact uncorrelated between the equations (so that they are truly unrelated) and when each equation contains exactly the same set of regressors on the right-hand-side. The SUR model can be viewed as either the simplification of the general linear model where certain coefficients in matrix B {\displaystyle \mathrm {B} } \Beta are restricted to be equal to zero, or as the generalization of the general linear model where the regressors on the right-hand-side are allowed to be different in each equation. The SUR model can be further generalized into the simultaneous equations model, where the right-hand side regressors are allowed to be the endogenous variables as well. Seglearn Seglearn is an open-source python package for machine learning time series or sequences using a sliding window segmentation approach. The implementation provides a flexible pipeline for tackling classification, regression, and forecasting problems with multivariate sequence and contextual data. This package is compatible with scikit-learn and is listed under scikit-learn Related Projects. The package depends on numpy, scipy, and scikit-learn. Seglearn is distributed under the BSD 3-Clause License. Documentation includes a detailed API description, user guide, and examples. Unit tests provide a high degree of code coverage. Segmental Recurrent Neural Network(SRNN) We introduce segmental recurrent neural networks (SRNNs) which define, given an input sequence, a joint probability distribution over segmentations of the input and labelings of the segments. Representations of the input segments (i.e., contiguous subsequences of the input) are computed by encoding their constituent tokens using bidirectional recurrent neural nets, and these ‘segment embeddings’ are used to define compatibility scores with output labels. These local compatibility scores are integrated using a global semi-Markov conditional random field. Both fully supervised training – in which segment boundaries and labels are observed – as well as partially supervised training – in which segment boundaries are latent – are straightforward. Experiments on handwriting recognition and joint Chinese word segmentation/POS tagging show that, compared to models that do not explicitly represent segments such as BIO tagging schemes and connectionist temporal classification (CTC), SRNNs obtain substantially higher accuracies. Segmented Linear Regression Segmented linear regression with two segments separated by a breakpoint can be useful to quantify an abrupt change of the response function (Yr) of a varying influential factor (x). The breakpoint can be interpreted as a critical, safe, or threshold value beyond or below which (un)desired effects occur. The breakpoint can be important in decision making. cowbell Segmented Regression Segmented regression, also known as piecewise regression or ‘broken-stick regression’, is a method in regression analysis in which the independent variable is partitioned into intervals and a separate line segment is fit to each interval. Segmented regression analysis can also be performed on multivariate data by partitioning the various independent variables. Segmented regression is useful when the independent variables, clustered into different groups, exhibit different relationships between the variables in these regions. The boundaries between the segments are breakpoints. Segmented linear regression is segmented regression whereby the relations in the intervals are obtained by linear regression. segmented Segment-level POlariTy annotations(SPOT) We consider the task of fine-grained sentiment analysis from the perspective of multiple instance learning (MIL). Our neural model is trained on document sentiment labels, and learns to predict the sentiment of text segments, i.e. sentences or elementary discourse units (EDUs), without segment-level supervision. We introduce an attention-based polarity scoring method for identifying positive and negative text snippets and a new dataset which we call SPOT (as shorthand for Segment-level POlariTy annotations) for evaluating MIL-style sentiment models like ours. Experimental results demonstrate superior performance against multiple baselines, whereas a judgement elicitation study shows that EDU-level opinion extraction produces more informative summaries than sentence-based alternatives. SegReg In statistics and data analysis the application software SegReg is a free and user-friendly tool for linear segmented regression analysis to determine the breakpoint where the relation between the dependent variable and the independent variable changes abruptly. Originally the method was developed for the analysis of the influence of soil salinity and depth of the watertable on growth of agricultural crops. However, it can be used for many other types of phenomena and relations, for example: · the change of nutrient contents in plants with time · the number of negative indicator responses at 30% upstream riparian harvest · phosphorus and flow duration on the Saline River Seldon Seldon is an open-source predictive analytics platform. Our proven machine learning algorithms and highly scalable platform serve recommendations to hundreds of millions of people across some of the world’s leading media and e-commerce brands. Seldon VM contains the entire platform pre-configured in a virtual machine for you to get started quickly, to test Seldon with your service data and a movie recommender demo. Selective Clustering Annotated Using Modes of Projections(SCAMP) Selective clustering annotated using modes of projections (SCAMP) is a new clustering algorithm for data in $\mathbb{R}^p$. SCAMP is motivated from the point of view of non-parametric mixture modeling. Rather than maximizing a classification likelihood to determine cluster assignments, SCAMP casts clustering as a search and selection problem. One consequence of this problem formulation is that the number of clusters is $\textbf{not}$ a SCAMP tuning parameter. The search phase of SCAMP consists of finding sub-collections of the data matrix, called candidate clusters, that obey shape constraints along each coordinate projection. An extension of the dip test of Hartigan and Hartigan (1985) is developed to assist the search. Selection occurs by scoring each candidate cluster with a preference function that quantifies prior belief about the mixture composition. Clustering proceeds by selecting candidates to maximize their total preference score. SCAMP concludes by annotating each selected cluster with labels that describe how cluster-level statistics compare to certain dataset-level quantities. SCAMP can be run multiple times on a single data matrix. Comparison of annotations obtained across iterations provides a measure of clustering uncertainty. Simulation studies and applications to real data are considered. A C++ implementation with R interface is $\href{https://…/scamp}{available\ online}$. SelectScript We introduce a new declarative language called SELECTSCRIPT. As its name suggests, it is a scripting language inspired primarily by SQL and its relational algebra. It is intended to be used for complex queries on different kinds of world models. Scripts can be dynamically generated and executed, or embedded into the code of foreign programming languages. A first interpreter was therefore developed for Python. Adapting the ideas of language-oriented programming, which enables developers to create their own domain-specific language, we developed a language stub that can be easily adapted and extended to comply with any (discrete) robotic world model or robotic simulator. We will further show how simple SELECT-statements can be used to extract any kind of valuable information in various return formats, thereby going beyond traditional SQL capabilities. Reasoning in complex environments with the SelectScript declarative language Self Driving Data Curation Past. Data curation – the process of discovering, integrating, and cleaning data – is one of the oldest data management problems. Unfortunately, it is still the most time consuming and least enjoyable work of data scientists. So far, successful data curation stories are mainly ad-hoc solutions that are either domain-specific (for example, ETL rules) or task-specific (for example, entity resolution). Present. The power of current data curation solutions are not keeping up with the ever changing data ecosystem in terms of volume, velocity, variety and veracity, mainly due to the high human cost, instead of machine cost, needed for providing the ad-hoc solutions mentioned above. Meanwhile, deep learning is making strides in achieving remarkable successes in areas such as image recognition, natural language processing, and speech recognition. This is largely due to its ability to understanding features that are neither domain-specific nor task-specific. Future. Data curation solutions need to keep the pace with the fast-changing data ecosystem, where the main hope is to devise domain-agnostic and task-agnostic solutions. To this end, we start a new research project, called AutoDC, to unleash the potential of deep learning towards self-driving data curation. We will discuss how different deep learning concepts can be adapted and extended to solve various data curation problems. We showcase some low-hanging fruits about the early encounters between deep learning and data curation happening in AutoDC. We believe that the directions pointed out by this work will not only drive AutoDC towards democratizing data curation, but also serve as a cornerstone for researchers and practitioners to move to a new realm of data curation solutions. Self Exciting Point Process(SEPP) A point process N is called self-exciting if cov(N(s,t),N(t,u))>0 for s