# Distilled News

This is a Lasso; it is used to pick and capture animals. As a non-native English speaker, my first exposure to this word is in supervised learning. In this LASSO data science tutorial, we discuss the strengths of the Lasso logistic regression by stepping through how to apply this useful statistical method for classification problems in R and how the Lasso can be ‘similarly’ used to pick and select input variables that are relevant to the classification problem at hand.
Outliers affect the distribution. If a value is significantly below the expected range, it will drag the distribution to the left, making the graph left-skewed or negative. Alternatively, if a value is significantly above the expected range, it will drag the distribution to the right, making the graph right-skewed or positive.
Works best when the data is linear. If the data is not linear, then we may need to transform the data, add features, or use another model. Sensitive to outliers. Outliers contribute too much to the errors, so will impact the model. We may need to determine the outliers and remove them if necessary.
Jupyter notebooks are interactive documents that contain code, narratives, plots. They are an excellent place for experimenting with code and data. Notebooks are easily shared, and the 2.6M notebooks on GitHub just tell how popular notebooks are! Jupyter notebooks are great, but they often are huge files, with a very specific JSON file format. Let us introduce Jupytext, a Jupyter plugin that reads and writes notebooks as plain text files: either Julia, Python, R scripts, Markdown, or R Markdown documents.
Data and data-empowered algorithms now shape our professional, personal, and political realities. This course introduces students both to critical thinking and practice in understanding how we got here, and the future we now are building together as scholars, scientists, and citizens.
The intellectual content of the class will comprise
• the history of human use of data;
• functional literacy in how data are used to reveal insight and support decisions;
• critical literacy in investigating how data and data-powered algorithms shape, constrain, and manipulate our commercial, civic, and personal transactions and experiences; and
• rhetorical literacy in how exploration and analysis of data have become part of our logic and rhetoric of communication and persuasion, especially including visual rhetoric.
While introducing students to recent trends in the computational exploration of data, the course will survey the key concepts of ‘small data’ statistics.
The Plato Research Dialogue System is a flexible framework that can be used to create, train, and evaluate conversational AI agents in various environments. It supports interactions through speech, text, or dialogue acts and each conversational agent can interact with data, human users, or other conversational agents (in a multi-agent setting). Every component of every agent can be trained independently online or offline and Plato provides an easy way of wrapping around virtually any existing model, as long as Plato’s interface is adhered to.
After exploring CNN for a while, I decided to try another crucial area in Computer Vision, object detection. There are several methods popular in this area, including Faster R-CNN, RetinaNet, YOLOv3, SSD and etc. I tried Faster R-CNN in this article. Here, I want to summarise what I have learned and maybe give you a little inspiration if you are interested in this topic.
When people spoke about personalization, all they meant was that a mail was introduced with a person’s name (‘Dear Franck…’) rather than an anonymous greeting (‘Dear Sir or Madam’). Heidi Unruh, Global VP Content Marketing at e-Spirit, said ‘Today, consumers are bombarded with countless advertising and marketing messages, making it more and more difficult for brands to engage with the customer. Artificial Intelligence might be a solution to this issue.’ (source) Indeed, hyper-personalization will become an important trend in the world of AI. Once again, data is central to making this happen. The more data there is available, the more relevant information and products become. This evolution means the end of the road for the marketing approach of segmentation. Segmentation divides the market into a number of groups with similar needs. This allows the marketeer to match both communication and product characteristics to the needs of each different segment. Based on this approach, marketing teams assume a high degree of homogeneity within and between segments.
Diffractive deep neural network is an optical machine learning framework that uses diffractive surfaces and engineered matter to all optically perform computation. After its design and training in a computer using modern deep learning methods, each network is physically fabricated, using for example 3-D printing or lithography, to engineer the trained network model into matter. This 3-D structure of engineered matter is composed of transmissive and/or reflective surfaces that altogether perform machine learning tasks through light-matter interaction and optical diffraction, at the speed of light, and without the need for any power, except for the light that illuminates the input object. This is especially significant for recognizing target objects much faster and with significantly less power compared to standard computer based machine learning systems, and might provide major advantages for autonomous vehicles and various defense related applications, among others. Introduced by UCLA researchers , this framework was experimentally validated for object classification and imaging, providing a scalable and energy efficient optical computation framework. In following research, UCLA engineers further improved the inference performance of diffractive optical neural networks by integrating them with standard digital deep neural networks, forming hybrid machine learning models that perform computation partially using light diffraction through matter and partially using a computer .
Learn about some of the advantages of using Amazon Web Services Elastic Compute Cloud (EC2). Then, the first part of the tutorial covers how to launch and connect to Windows virtual machines or instances on EC2. The next part goes over how to setup a basic data science environment (install R, RStudio, and Python) on the instance.
Humans are the natural maker; we enjoy the freedom of making things. However, in the context of automation, machines challenge humans’ role in fabrication. It wastes the human’s unique skills and makes people disconnect to real-world materials. To address this issue, researchers proposed the hybrid workflow, which starts from studying how humans work and combining both human and machine specialties in the fabrication process. It allows us to maintain craftsmanship and input more humanity into digital crafts. The Hybrid fabrication values humans’ muscle memory and tactile skill in crafting process. What if we extend the hybrid idea for intangible processes, like creativity? Can machines work beyond assistant but also act as a creative partner? Can we create ‘Hybrid intelligence,’ which combining machine and human intelligence to create works that either could not do on its own?
Are you considering creating a chatbot for your company’s website? But aren’t sure where to start. Well, you are not alone. Presently, almost every business wants to integrate a chatbot in their system to deliver better customer service and support. And thus, everybody wants to know how to create a chatbot for their business, which fulfills its needs and delivers its brands persona in the market. So, more people are having questions about the best ways to approach creating a chatbot from scratch in order to cut out the third party involvement and do it themselves.
A lot of my work heavily involves time series analysis. One of the great but lesser-known algorithms that I use is change point detection. Change point detection (or CPD) detects abrupt shifts in time series trends (i.e. shifts in a time series’ instantaneous velocity), that can be easily identified via the human eye, but are harder to pinpoint using traditional statistical approaches. CPD is applicable across an array of industries, including finance, manufacturing quality control, energy, medical diagnostics, and human activity analysis.
This article is about identifying outliers through funnel plots using the Microsoft Power BI (Visualization tool). Before we move on let’s see what is an outlier and why is it important to identify them. Outliers are those data points that lie outside the overall pattern of distribution & the easiest way to detect outliers is though graphs. Box plots, Scatter plots can help detect them easily.
One of the most popular dataset for Machine Learning correspond to the Titanic accident Here we are playing with features within this dataset, trying to discover the effect of the choice of differente features in the accuracy of some basic ML algorithms.

# Book Memo: “Creativity in Intelligent Technologies and Data Science”

 Third Conference, CIT&DS 2019, Volgograd, Russia, September 16-19, 2019, Proceedings This two-volume set constitutes the proceedings of the Third Conference on Creativity in Intellectual Technologies and Data Science, CIT&DS 2019, held in Volgograd, Russia, in September 2019. The 67 full papers, 1 short paper and 3 keynote papers presented were carefully reviewed and selected from 231 submissions. The papers are organized in topical sections in the two volumes. Part I: cyber-physical systems and Big Data-driven world. Part II: artificial intelligence and deep learning technologies for creative tasks; intelligent technologies in social engineering.

# Document worth reading: “A Comprehensive Analysis on Adversarial Robustness of Spiking Neural Networks”

In this era of machine learning models, their functionality is being threatened by adversarial attacks. In the face of this struggle for making artificial neural networks robust, finding a model, resilient to these attacks, is very important. In this work, we present, for the first time, a comprehensive analysis of the behavior of more bio-plausible networks, namely Spiking Neural Network (SNN) under state-of-the-art adversarial tests. We perform a comparative study of the accuracy degradation between conventional VGG-9 Artificial Neural Network (ANN) and equivalent spiking network with CIFAR-10 dataset in both whitebox and blackbox setting for different types of single-step and multi-step FGSM (Fast Gradient Sign Method) attacks. We demonstrate that SNNs tend to show more resiliency compared to ANN under black-box attack scenario. Additionally, we find that SNN robustness is largely dependent on the corresponding training mechanism. We observe that SNNs trained by spike-based backpropagation are more adversarially robust than the ones obtained by ANN-to-SNN conversion rules in several whitebox and blackbox scenarios. Finally, we also propose a simple, yet, effective framework for crafting adversarial attacks from SNNs. Our results suggest that attacks crafted from SNNs following our proposed method are much stronger than those crafted from ANNs. A Comprehensive Analysis on Adversarial Robustness of Spiking Neural Networks

# Let’s get it right

Article: Machinery And Ethics

No one can escape the fact that trying to legislate on the existing connection between man and machine will lead us to the serious problem of shifting our customs and morality into the field of ethics. In fact, it would be foolish to think that any social pact marked between humans and humanoids would not be reflected through human rights, especially when humanoids themselves were exclusively artificial. The starting point is to analyze which are the inherent human rights acquired by the machines so as to be able to consider which aspect is concerned with morality. So one way to simplify the essay is to postulate that the 1948 Charter of Human Rights could form part of the ethical principles that define not only human beings, but also the very concept of humanity.
Algorithms and big data are entering the often shrouded world of alternative dispute resolution. Robots and artificial intelligence seem worlds away from the sensitive and nuanced area of international mediation. Here, battles are largely settled behind closed doors and skilled mediators pick their way through sticky negotiations. Algorithms and big data, however, are fast entering the often mystery-shrouded world of alternative dispute resolution. This is much the result of the rapidly increasing demand for the kind of data analytics being harnessed in US litigation to predict trial outcomes. The incursion of robots into mediation hit a new milestone in February, when Canadian electronic negotiation specialists iCan Systems reputedly became the first company to resolve a dispute in a public court in England and Wales using a ‘robot mediator’.
We study the fair allocation of indivisible goods under the assumption that the goods form an undirected graph and each agent must receive a connected subgraph. Our focus is on well-studied fairness notions including envy-freeness and maximin share fairness. We establish graph-specific maximin share guarantees, which are tight for large classes of graphs in the case of two agents and for paths and stars in the general case. Unlike in previous work, our guarantees are with respect to the complete-graph maximin share, which allows us to compare possible guarantees for different graphs. For instance, we show that for biconnected graphs it is possible to obtain at least \$3/4\$ of the maximin share, while for the remaining graphs the guarantee is at most \$1/2\$. In addition, we determine the optimal relaxation of envy-freeness that can be obtained with each graph for two agents, and characterize the set of trees and complete bipartite graphs that always admit an allocation satisfying envy-freeness up to one good (EF1) for three agents. Our work demonstrates several applications of graph-theoretical tools and concepts to fair division problems.
Abuse on the Internet represents an important societal problem of our time. Millions of Internet users face harassment, racism, personal attacks, and other types of abuse on online platforms. The psychological effects of such abuse on individuals can be profound and lasting. Consequently, over the past few years, there has been a substantial research effort towards automated abuse detection in the field of natural language processing (NLP). In this paper, we present a comprehensive survey of the methods that have been proposed to date, thus providing a platform for further development of this area. We describe the existing datasets and review the computational approaches to abuse detection, analyzing their strengths and limitations. We discuss the main trends that emerge, highlight the challenges that remain, outline possible solutions, and propose guidelines for ethics and explainability
Computational Politics is the study of computational methods to analyze and moderate users\textquotesingle behaviors related to political activities such as election campaign persuasion, political affiliation, and opinion mining. With the rapid development and ease of access to the Internet, Information Communication Technologies (ICT) have given rise to a massive number of users joining the online communities and to the digitization of analogous data such as political debates. These communities and digitized data contain both explicit and latent information about users and their behaviors related to politics. For researchers, it is essential to utilize data from these sources to develop and design systems that not only provide solutions to computational politics but also help other businesses, such as marketers, to increase the users\textquotesingle participation and interaction. In this survey, we attempt to categorize main areas in computational politics and summarize the prominent studies at one place to better understand computational politics across different and multidimensional platforms. e.g., online social networks, online forums, and political debates. We then conclude this study by highlighting future research directions, opportunities, and challenges.
Research Findings:
• There is a diversity crisis in the AI sector across gender and race.
• The AI sector needs a profound shift in how it addresses the current diversity crisis.
• The overwhelming focus on ‘women in tech’ is too narrow and likely to privilege white women over others.
• Fixing the ‘pipeline’ won’t fix AI’s diversity problems.
• The use of AI systems for the classification, detection, and prediction of race and gender is in urgent need of re-evaluation.
From massive face-recognition-based surveillance and machine-learning-based decision systems predicting crime recidivism rates, to the move towards automated health diagnostic systems, artificial intelligence (AI) is being used in scenarios that have serious consequences in people’s lives. However, this rapid permeation of AI into society has not been accompanied by a thorough investigation of the sociopolitical issues that cause certain groups of people to be harmed rather than advantaged by it. For instance, recent studies have shown that commercial face recognition systems have much higher error rates for dark skinned women while having minimal errors on light skinned men. A 2016 ProPublica investigation uncovered that machine learning based tools that assess crime recidivism rates in the US are biased against African Americans. Other studies show that natural language processing tools trained on newspapers exhibit societal biases (e.g. finishing the analogy ‘Man is to computer programmer as woman is to X’ by homemaker). At the same time, books such as Weapons of Math Destruction and Automated Inequality detail how people in lower socioeconomic classes in the US are subjected to more automated decision making tools than those who are in the upper class. Thus, these tools are most often used on people towards whom they exhibit the most bias. While many technical solutions have been proposed to alleviate bias in machine learning systems, we have to take a holistic and multifaceted approach. This includes standardization bodies determining what types of systems can be used in which scenarios, making sure that automated decision tools are created by people from diverse backgrounds, and understanding the historical and political factors that disadvantage certain groups who are subjected to these tools.

Paper: A Mulching Proposal

The ethical implications of algorithmic systems have been much discussed in both HCI and the broader community of those interested in technology design, development and policy. In this paper, we explore the application of one prominent ethical framework – Fairness, Accountability, and Transparency – to a proposed algorithm that resolves various societal issues around food security and population ageing. Using various standardised forms of algorithmic audit and evaluation, we drastically increase the algorithm’s adherence to the FAT framework, resulting in a more ethical and beneficent system. We discuss how this might serve as a guide to other researchers or practitioners looking to ensure better ethical outcomes from algorithmic systems in their line of work.

# Whats new on arXiv

In this paper we study k-means clustering in the online setting. In the offline setting the main parameters are number of centers, k, and size of the dataset, n. Performance guarantees are given as a function of these parameters. In the online setting new factors come into place: the ordering of the dataset and whether n is known in advance or not. One of the main results of this paper is the discovery that these new factors have dramatic effects on the quality of the clustering algorithms. For example, for constant k: (1) $\Omega(n)$ centers are needed if the order is arbitrary, (2) if the order is random and n is unknown in advance, the number of centers reduces to $\Theta(logn)$, and (3) if n is known, then the number of centers reduces to a constant. For different values of the new factors, we show upper and lower bounds that are exactly the same up to a constant, thus achieving optimal bounds.
Time series prediction with missing values is an important problem of time series analysis since complete data is usually hard to obtain in many real-world applications. To model the generation of time series, autoregressive (AR) model is a basic and widely used one, which assumes that each observation in the time series is a noisy linear combination of some previous observations along with a constant shift. To tackle the problem of prediction with missing values, a number of methods were proposed based on various data models. For real application scenarios, how do these methods perform over different types of time series with different levels of data missing remains to be investigated. In this paper, we focus on online methods for AR-model-based time series prediction with missing values. We adapted five mainstream methods to fit in such a scenario. We make detailed discussion on each of them by introducing their core ideas about how to estimate the AR coefficients and their different strategies to deal with missing values. We also present algorithmic implementations for better understanding. In order to comprehensively evaluate these methods and do the comparison, we conduct experiments with various configurations of relative parameters over both synthetic and real data. From the experimental results, we derived several noteworthy conclusions and shows that imputation is a simple but reliable strategy to handle missing values in online prediction tasks.
With the proliferation of social media platforms and e-commerce sites, several cross-domain collaborative filtering strategies have been recently introduced to transfer the knowledge of user preferences across domains. The main challenge of cross-domain recommendation is to weigh and learn users’ different behaviors in multiple domains. In this paper, we propose a Cross-Domain collaborative filtering model following a Translation-based strategy, namely CDT. In our model, we learn the embedding space with translation vectors and capture high-order feature interactions in users’ multiple preferences across domains. In doing so, we efficiently compute the transitivity between feature latent embeddings, that is if feature pairs have high interaction weights in the latent space, then feature embeddings with no observed interactions across the domains will be closely related as well. We formulate our objective function as a ranking problem in factorization machines and learn the model’s parameters via gradient descent. In addition, to better capture the non-linearity in user preferences across domains we extend the proposed CDT model by using a deep learning strategy, namely DeepCDT. Our experiments on six publicly available cross-domain tasks demonstrate the effectiveness of the proposed models, outperforming other state-of-the-art cross-domain strategies.
This paper is about a machine learning approach based on the multilinear projection of an unknown function (or probability distribution) to be estimated towards a linear (or multilinear) dimensional space E’. The proposal transforms the problem of predicting the target of an observation x into a problem of determining a consensus among the k nearest neighbors of x’s image within the dimensional space E’. The algorithms that concretize it allow both regression and binary classification. Implementations carried out using Scala/Spark and assessed on a dozen LIBSVM datasets have demonstrated improvements in prediction accuracies in comparison with other prediction algorithms implemented within Spark MLLib such as multilayer perceptrons, logistic regression classifiers and random forests.
In this paper, we generalize the basic notions and results of Dempster-Shafer theory from predicates to formal concepts. Results include the representation of conceptual belief functions as inner measures of suitable probability functions, and a Dempster-Shafer rule of combination on belief functions on formal concepts.
Training of convolutional neural networks (CNNs)on embedded platforms to support on-device learning is earning vital importance in recent days. Designing flexible training hard-ware is much more challenging than inference hardware, due to design complexity and large computation/memory requirement. In this work, we present an automatic compiler-based FPGA accelerator with 16-bit fixed-point precision for complete CNNtraining, including Forward Pass (FP), Backward Pass (BP) and Weight Update (WU). We implemented an optimized RTL library to perform training-specific tasks and developed an RTL compiler to automatically generate FPGA-synthesizable RTL based on user-defined constraints. We present a new cyclic weight storage/access scheme for on-chip BRAM and off-chip DRAMto efficiently implement non-transpose and transpose operations during FP and BP phases, respectively. Representative CNNs for CIFAR-10 dataset are implemented and trained on Intel Stratix 10-GX FPGA using proposed hardware architecture, demonstrating up to 479 GOPS performance.
One aim of data mining is the identification of interesting structures in data. Basic properties of the empirical distribution, such as skewness and an eventual clipping, i.e., hard limits in value ranges, need to be assessed. Of particular interest is the question, whether the data originates from one process, or contains subsets related to different states of the data producing process. Data visualization tools should deliver a sensitive picture of the univariate probability density distribution (PDF) for each feature. Visualization tools for PDFs are typically kernel density estimates and range from the classical histogram to modern tools like bean or violin plots. Conventional methods have difficulties in visualizing the pdf in case of uniform, multimodal, skewed and clipped data if density estimation parameters remain in a default setting. As a consequence, a new visualization tool called Mirrored Density plot (MD plot) is proposed which is particularly designed to discover interesting structures in continuous features. The MD plot does not require any adjustments of parameters of density estimation which makes the usage compelling for non-experts. The visualization tools are evaluated in comparison to statistical tests for the typical challenges of explorative distribution analysis. The results are presented on bimodal Gaussian and skewed distributions as well as several features with published pdfs. In exploratory data analysis of 12 features describing the quarterly financial statements, when statistical testing becomes a demanding task, only the MD plots can identify the structure of their pdfs. Overall, the MD plot can outperform the methods mentioned above.
Multi-agent systems have a wide range of applications in cooperative and competitive tasks. As the number of agents increases, nonstationarity gets more serious in multi-agent reinforcement learning (MARL), which brings great difficulties to the learning process. Besides, current mainstream algorithms configure each agent an independent network,so that the memory usage increases linearly with the number of agents which greatly slows down the interaction with the environment. Inspired by Generative Adversarial Networks (GAN), this paper proposes an iterative update method (IU) to stabilize the nonstationary environment. Further, we add first-person perspective and represent all agents by only one network which can change agents’ policies from sequential compute to batch compute. Similar to continual lifelong learning, we realize the iterative update method in this unified representative network (IUUR). In this method, iterative update can greatly alleviate the nonstationarity of the environment, unified representation can speed up the interaction with environment and avoid the linear growth of memory usage. Besides, this method does not bother decentralized execution and distributed deployment. Experiments show that compared with MADDPG, our algorithm achieves state-of-the-art performance and saves wall-clock time by a large margin especially with more agents.
This paper describes a new method for Symbolic Regression that allows to find mathematical expressions from a dataset. This method has a strong mathematical basis. As opposed to other methods such as Genetic Programming, this method is deterministic, and does not involve the creation of a population of initial solutions. Instead of it, a simple expression is being grown until it fits the data. The experiments performed show that the results are as good as other Machine Learning methods, in a very low computational time. Another advantage of this technique is that the complexity of the expressions can be limited, so the system can return mathematical expressions that can be easily analysed by the user, in opposition to other techniques like GSGP.
This document is one of the deliverable reports created for the ESCAPE project. ESCAPE stands for Energy-efficient Scalable Algorithms for Weather Prediction at Exascale. The project develops world-class, extreme-scale computing capabilities for European operational numerical weather prediction and future climate models. This is done by identifying Weather & Climate dwarfs which are key patterns in terms of computation and communication (in the spirit of the Berkeley dwarfs). These dwarfs are then optimised for different hardware architectures (single and multi-node) and alternative algorithms are explored. Performance portability is addressed through the use of domain specific languages. In this deliverable report, we present Atlas, a new software library that is currently being developed at the European Centre for Medium-Range Weather Forecasts (ECMWF), with the scope of handling data structures required for NWP applications in a flexible and massively parallel way. Atlas provides a versatile framework for the future development of efficient NWP and climate applications on emerging HPC architectures. The applications range from full Earth system models, to specific tools required for post-processing weather forecast products. Atlas provides data structures for building various numerical strategies to solve equations on the sphere or limited area’s on the sphere. These data structures may contain a distribution of points (grid) and, possibly, a composition of elements (mesh), required to implement the numerical operations required. Atlas can also represent a given field within a specific spatial projection. Atlas is capable of mapping fields between different grids as part of pre- and post-processing stages or as part of coupling processes whose respective fields are discretised on different grids or meshes.
Training accurate deep neural networks (DNNs) in the presence of noisy labels is an important and challenging task. Though a number of approaches have been proposed for learning with noisy labels, many open issues remain. In this paper, we show that DNN learning with Cross Entropy (CE) exhibits overfitting to noisy labels on some classes (‘easy’ classes), but more surprisingly, it also suffers from significant under learning on some other classes (‘hard’ classes). Intuitively, CE requires an extra term to facilitate learning of hard classes, and more importantly, this term should be noise tolerant, so as to avoid overfitting to noisy labels. Inspired by the symmetric KL-divergence, we propose the approach of \textbf{Symmetric cross entropy Learning} (SL), boosting CE symmetrically with a noise robust counterpart Reverse Cross Entropy (RCE). Our proposed SL approach simultaneously addresses both the under learning and overfitting problem of CE in the presence of noisy labels. We provide a theoretical analysis of SL and also empirically show, on a range of benchmark and real-world datasets, that SL outperforms state-of-the-art methods. We also show that SL can be easily incorporated into existing methods in order to further enhance their performance.
This paper introduces a novel orchestration framework, called CFO (COMPUTATION FLOW ORCHESTRATOR), for building, experimenting with, and deploying interactive NLP (Natural Language Processing) and IR (Information Retrieval) systems to production environments. We then demonstrate a question answering system built using this framework which incorporates state-of-the-art BERT based MRC (Machine Reading Comprehension) with IR components to enable end-to-end answer retrieval. Results from the demo system are shown to be high quality in both academic and industry domain specific settings. Finally, we discuss best practices when (pre-)training BERT based MRC models for production systems.
Recent advances in both machine learning and Internet-of-Things have attracted attention to automatic Activity Recognition, where users wear a device with sensors and their outputs are mapped to a predefined set of activities. However, few studies have considered the balance between wearable power consumption and activity recognition accuracy. This is particularly important when part of the computational load happens on the wearable device. In this paper, we present a new methodology to perform feature selection on the device based on Reinforcement Learning (RL) to find the optimum balance between power consumption and accuracy. To accelerate the learning speed, we extend the RL algorithm to address multiple sources of feedback, and use them to tailor the policy in conjunction with estimating the feedback accuracy. We evaluated our system on the SPHERE challenge dataset, a publicly available research dataset. The results show that our proposed method achieves a good trade-off between wearable power consumption and activity recognition accuracy.
In this paper, we introduce a tunable generative adversary network (TunaGAN) that uses an auxiliary network on top of existing generator networks (Style-GAN) to modify high-resolution face images according to user’s high-level instructions, with good qualitative and quantitative performance. To optimize for feature disentanglement, we also investigate two different latent space that could be traversed for modification. The problem of mode collapse is characterized in detail for model robustness. This work could be easily extended to content-aware image editor based on other GANs and provide insight on mode collapse problems in more general settings.
The recent success of natural language understanding (NLU) systems has been troubled by results highlighting the failure of these models to generalize in a systematic and robust way. In this work, we introduce a diagnostic benchmark suite, named CLUTRR, to clarify some key issues related to the robustness and systematicity of NLU systems. Motivated by classic work on inductive logic programming, CLUTRR requires that an NLU system infer kinship relations between characters in short stories. Successful performance on this task requires both extracting relationships between entities, as well as inferring the logical rules governing these relationships. CLUTRR allows us to precisely measure a model’s ability for systematic generalization by evaluating on held-out combinations of logical rules, and it allows us to evaluate a model’s robustness by adding curated noise facts. Our empirical results highlight a substantial performance gap between state-of-the-art NLU models (e.g., BERT and MAC) and a graph neural network model that works directly with symbolic inputs—with the graph-based model exhibiting both stronger generalization and greater robustness.
State-of-the-art approaches for Knowledge Base Completion (KBC) exploit deep neural networks trained with both false and true assertions: positive assertions are explicitly taken from the knowledge base, whereas negative ones are generated by random sampling of entities. In this paper, we argue that random sampling is not a good training strategy since it is highly likely to generate a huge number of nonsensical assertions during training, which does not provide relevant training signal to the system. Hence, it slows down the learning process and decreases accuracy. To address this issue, we propose an alternative approach called Distributional Negative Sampling that generates meaningful negative examples which are highly likely to be false. Our approach achieves a significant improvement in Mean Reciprocal Rank values amongst two different KBC algorithms in three standard academic benchmarks.
In this paper, we study the adversarial robustness of subspace learning problems. Different from the assumptions made in existing work on robust subspace learning where data samples are contaminated by gross sparse outliers or small dense noises, we consider a more powerful adversary who can first observe the data matrix and then intentionally modify the whole data matrix. We first characterize the optimal rank-one attack strategy that maximizes the subspace distance between the subspace learned from the original data matrix and that learned from the modified data matrix. We then generalize the study to the scenario without the rank constraint and characterize the corresponding optimal attack strategy. Our analysis shows that the optimal strategies depend on the singular values of the original data matrix and the adversary’s energy budget. Finally, we provide numerical experiments and practical applications to demonstrate the efficiency of the attack strategies.
A linear restriction of a function is the same function with its domain restricted to points on a given line. This paper addresses the problem of computing a succinct representation for a linear restriction of a piecewise-linear neural network. This primitive, which we call ExactLine, allows us to exactly characterize the result of applying the network to all of the infinitely many points on a line. In particular, ExactLine computes a partitioning of the given input line segment such that the network is affine on each partition. We present an efficient algorithm for computing ExactLine for networks that use ReLU, MaxPool, batch normalization, fully-connected, convolutional, and other layers, along with several applications. First, we show how to exactly determine decision boundaries of an ACAS Xu neural network, providing significantly improved confidence in the results compared to prior work that sampled finitely many points in the input space. Next, we demonstrate how to exactly compute integrated gradients, which are commonly used for neural network attributions, allowing us to show that the prior heuristic-based methods had relative errors of 25-45% and show that a better sampling method can achieve higher accuracy with less computation. Finally, we use ExactLine to empirically falsify the core assumption behind a well-known hypothesis about adversarial examples, and in the process identify interesting properties of adversarially-trained networks.
Recently, Attention-Gated Convolutional Neural Networks (AGCNNs) perform well on several essential sentence classification tasks and show robust performance in practical applications. However, AGCNNs are required to set many hyperparameters, and it is not known how sensitive the model’s performance changes with them. In this paper, we conduct a sensitivity analysis on the effect of different hyperparameters s of AGCNNs, e.g., the kernel window size and the number of feature maps. Also, we investigate the effect of different combinations of hyperparameters settings on the model’s performance to analyze to what extent different parameters settings contribute to AGCNNs’ performance. Meanwhile, we draw practical advice from a wide range of empirical results. Through the sensitivity analysis experiment, we improve the hyperparameters settings of AGCNNs. Experiments show that our proposals achieve an average of 0.81% and 0.67% improvements on AGCNN-NLReLU-rand and AGCNN-SELU-rand, respectively; and an average of 0.47% and 0.45% improvements on AGCNN-NLReLU-static and AGCNN-SELU-static, respectively.
In this paper, we investigate the emotion recognition ability of the pre-training language model, namely BERT. By the nature of the framework of BERT, a two-sentence structure, we adapt BERT to continues dialogue emotion prediction tasks, which rely heavily on the sentence-level context-aware understanding. The experiments show that by mapping the continues dialogue into a causal utterance pair, which is constructed by the utterance and the reply utterance, models can better capture the emotions of the reply utterance. The present method has achieved 0.815 and 0.885 micro F1 score in the testing dataset of Friends and EmotionPush, respectively.
Graph data management (also called NoSQL) has revealed beneficial characteristics in terms of flexibility and scalability by differently balancing between query expressivity and schema flexibility. This peculiar advantage has resulted into an unforeseen race of developing new task-specific graph systems, query languages and data models, such as property graphs, key-value, wide column, resource description framework (RDF), etc. Present-day graph query languages are focused towards flexible graph pattern matching (aka sub-graph matching), whereas graph computing frameworks aim towards providing fast parallel (distributed) execution of instructions. The consequence of this rapid growth in the variety of graph-based data management systems has resulted in a lack of standardization. Gremlin, a graph traversal language, and machine provides a common platform for supporting any graph computing system (such as an OLTP graph database or OLAP graph processors). We present a formalization of graph pattern matching for Gremlin queries. We also study, discuss and consolidate various existing graph algebra operators into an integrated graph algebra.
Adaptive inference is a promising technique to improve the computational efficiency of deep models at test time. In contrast to static models which use the same computation graph for all instances, adaptive networks can dynamically adjust their structure conditioned on each input. While existing research on adaptive inference mainly focuses on designing more advanced architectures, this paper investigates how to train such networks more effectively. Specifically, we consider a typical adaptive deep network with multiple intermediate classifiers. We present three techniques to improve its training efficacy from two aspects: 1) a Gradient Equilibrium algorithm to resolve the conflict of learning of different classifiers; 2) an Inline Subnetwork Collaboration approach and a One-for-all Knowledge Distillation algorithm to enhance the collaboration among classifiers. On multiple datasets (CIFAR-10, CIFAR-100 and ImageNet), we show that the proposed approach consistently leads to further improved efficiency on top of state-of-the-art adaptive deep networks.
We define a new class of “implicit” deep learning prediction rules that generalize the recursive rules of feedforward neural networks. These models are based on the solution of a fixed-point equation involving a single a vector of hidden features. The new framework greatly simplifies the notation of deep learning, and opens up new possibilities, for example in terms of novel architectures and algorithms, robustness analysis and design, interpretability, sparsity, and network architecture optimization.
Translation of fully automated deep learning based medical image segmentation technologies to clinical workflows face two main algorithmic challenges. The first, is the collection and archival of large quantities of manually annotated ground truth data for both training and validation. The second is the relative inability of the majority of deep learning based segmentation techniques to alert physicians to a likely segmentation failure. Here we propose a novel algorithm, named `Eigenrank’ which addresses both of these challenges. Eigenrank can select for manual labeling, a subset of medical images from a large database, such that a U-Net trained on this subset is superior to one trained on a randomly selected subset of the same size. Eigenrank can also be used to pick out, cases in a large database, where deep learning segmentation will fail. We present our algorithm, followed by results and a discussion of how Eigenrank exploits the Von Neumann information to perform both data subset selection and failure prediction for medical image segmentation using deep learning.
Spiking neural networks (SNNs) are more biologically plausible than conventional artificial neural networks (ANNs). SNNs well support spatiotemporal learning and energy-efficient event-driven hardware neuromorphic processors. As an important class of SNNs, recurrent spiking neural networks (RSNNs) possess great computational power. However, the practical application of RSNNs is severely limited by challenges in training. Biologically-inspired unsupervised learning has limited capability in boosting the performance of RSNNs. On the other hand, existing backpropagation (BP) methods suffer from high complexity of unrolling in time, vanishing and exploding gradients, and approximate differentiation of discontinuous spiking activities when applied to RSNNs. To enable supervised training of RSNNs under a well-defined loss function, we present a novel Spike-Train level RSNNs Backpropagation (ST-RSBP) algorithm for training deep RSNNs. The proposed ST-RSBP directly computes the gradient of a rated-coded loss function defined at the output layer of the network w.r.t tunable parameters. The scalability of ST-RSBP is achieved by the proposed spike-train level computation during which temporal effects of the SNN is captured in both the forward and backward pass of BP. Our ST-RSBP algorithm can be broadly applied to RSNNs with a single recurrent layer or deep RSNNs with multiple feed-forward and recurrent layers. Based upon challenging speech and image datasets including TI46, N-TIDIGITS, and Fashion-MNIST, ST-RSBP is able to train RSNNs with an accuracy surpassing that of the current state-of-art SNN BP algorithms and conventional non-spiking deep learning models.
Stochastic variance-reduced gradient (SVRG) is a classical optimization method. Although it is theoretically proved to have better convergence performance than stochastic gradient descent (SGD), the generalization performance of SVRG remains open. In this paper we investigate the effects of some training techniques, mini-batching and learning rate decay, on the generalization performance of SVRG, and verify the generalization performance of Batch-SVRG (B-SVRG). In terms of the relationship between optimization and generalization, we believe that the average norm of gradients on each training sample as well as the norm of average gradient indicate how flat the landscape is and how well the model generalizes. Based on empirical observations of such metrics, we perform a sign switch on B-SVRG and derive a practical algorithm, BatchPlus-SVRG (BP-SVRG), which is numerically shown to enjoy better generalization performance than B-SVRG, even SGD in some scenarios of deep neural networks.
To solve the problems in measuring coefficient of skewness related to extreme value, irregular distance from the middle point and distance between two consecutive numbers, ‘Rank skewness’ a new measure of the coefficient of skewness has been proposed in this paper. Comparing with other measures of the coefficient of skewness, proposed measure of the coefficient of skewness performs better specially for skewed distribution. An alternative of five point summary boxplot, a four point summary graph has also been proposed which is simpler than the traditional boxplot. It is based on all observation and give better result than the five point summary.
To achieve high throughput in the POW based blockchain systems, a series of methods has been proposed, and DAG is one of the most active and promising field. We designed and implemented the StreamNet aiming to engineer a scalable and endurable DAG system. When attaching a new block in the DAG, only two tips are selected. One is the ‘parent’ tip whose definition is the same as in Conflux [29], another is using Markov Chain Monte Carlo (MCMC) technique by which the definition is the same as IOTA [40]. We infer a pivotal chain along the path of each epoch in the graph, and a total order of the graph could be calculated without a centralized authority. To scale up, we leveraged the graph streaming property, high transaction validation speed will be achieved even if the DAG is growing. To scale out, we designed the ‘direct signal’ gossip protocol to help disseminate block updates in the network, such that message can be passed in the network in a more efficient way. We implemented our system based on IOTA’s reference code (IRI), and ran comprehensive experiments over different size of clusters of multiple network topologies.

# Fresh from the Python Package Index

jcopdl
J.COp DL is a deep Learning package to complement pytorch workflow. It includes pytroch callbacks and metrics.

latent-dirichlet-allocation
Latent-dirichlet-allocation

TensorKit

TensorKit-plottools

TensorKit-tools

torchtrainer
Focus on building and optimizing pytorch models not on training loops. PyTorch model training made simpler without loosing control. Focus on optimizing your model! Concepts are heavily inspired by the awesome project [torchsample](https://…/torchsample ) and [Keras](https://…/keras ). Further, besides applying Epoch Callbacks it also allows to call Callbacks every time after a specific number of batches passed (iterations) for long epoch durations.

TorchVC
Voice Conversion in PyTorch

abode
Python Environment and Package Manager

dibbo
A framework for Distributed Black-Box Optimization

jupyterlab-zenodo

mlgen
MLGen is a tool which helps you to generate machine learning code with ease. MlGen is a tool which helps you to generate machine learning code with ease. MLGen uses a ‘.mlm’ file format which is a file with YML like syntax. This tool as of now supports keras and tensorflow2.0(not fully supported)

naturalselection
An all-purpose pythonic genetic algorithm

pyexlatex
Python Extends LaTeX – A High-Level Python API for Creating Latex Documents. This project is aimed at creating LaTeX documents using only Python, without directly writing LaTeX code. Rather than building a direct Python API to LaTeX, this package has its own, simpler API to creating documents.

pysimrel
Simulating data from linear model data

saattrupdan.darwin
An all-purpose pythonic genetic algorithm

som-learn
Self-Organizing Map algorithm.

sqlizer
Orchestration service for SQL only ETL workflows. In many cases you can use SQL only for ETL (extract/transform/load) pipelines relying on CTAS (create table as) queries and the builting import/export futures of your RDBMS or data warehouse software (eg. Redshift).

textsense
TextSense.ai a Text analytics platform.

# R Packages worth a look

Algorithms for Electivity Indices (electivity)
Provides all electivity algorithms (including Vanderploeg and Scavia electivity) that were examined in Lechowicz (1982) <doi:10.1007/BF00349007>, plus the example data that were provided for moth resource utilisation.

Actuarial Functions for Non-Life Insurance Modelling (NetSimR)
Assists actuaries and other insurance modellers in pricing, reserving and capital modelling for non-life insurance and reinsurance modelling. Provides functions that help model excess levels, capping and pure Incurred but not reported claims (pure IBNR). Includes capped mean, exposure curves and increased limit factor curves (ILFs) for LogNormal, Gamma, Pareto, Sliced LogNormal-Pareto and Sliced Gamma-Pareto distributions. Includes mean, probability density function (pdf), cumulative probability function (cdf) and inverse cumulative probability function for Sliced LogNormal-Pareto and Sliced Gamma-Pareto distributions. Includes calculating pure IBNR exposure with LogNormal and Gamma distribution for reporting delay.

The Hellinger Correlation (HellCor)
Empirical value of the Hellinger correlation, a new measure of dependence between two continuous random variables that satisfies a set of 7 desirable axioms (existence, symmetry, normalisation, characterisation of independence, weak Gaussian conformity, characterisation of pure dependence, generalised Data Processing Inequality). More details can be found in Geenens and Lafaye De Micheaux (2018) <arXiv:1810.10276>.

Fast Embedding Guided by Self-Organizing Map (EmbedSOM)
Provides a smooth mapping of multidimensional points into low-dimensional space defined by a self-organizing map. Designed to work with ‘FlowSOM’ and flow-cytometry use-cases. See Kratochvil et al. (2019) <doi:10.1101/496869>.

# Document worth reading: “A Selective Overview of Deep Learning”

Deep learning has arguably achieved tremendous success in recent years. In simple words, deep learning uses the composition of many nonlinear functions to model the complex dependency between input features and labels. While neural networks have a long history, recent advances have greatly improved their performance in computer vision, natural language processing, etc. From the statistical and scientific perspective, it is natural to ask: What is deep learning? What are the new characteristics of deep learning, compared with classical methods? What are the theoretical foundations of deep learning? To answer these questions, we introduce common neural network models (e.g., convolutional neural nets, recurrent neural nets, generative adversarial nets) and training techniques (e.g., stochastic gradient descent, dropout, batch normalization) from a statistical point of view. Along the way, we highlight new characteristics of deep learning (including depth and over-parametrization) and explain their practical and theoretical benefits. We also sample recent results on theories of deep learning, many of which are only suggestive. While a complete understanding of deep learning remains elusive, we hope that our perspectives and discussions serve as a stimulus for new statistical research. A Selective Overview of Deep Learning