“The best technology tools will be those that empower subject matter experts to quickly apply intuitive reasoning:
• Data visualization and advanced analytics that do not require programming or advanced technical skills
• Data integration and workflow tools to rapidly infuse existing data sets with new, untapped data sources that enable a more complete analytic picture – again with no programming required
• Agile application development tools that shield the user from coding, data connectivity and other programming challenges
• Tools for predictive analytics that support intuitive reasoning as to what data attributes and other conditions will impact future performance.”Mike Urbonas ( April 11, 2015 )

Quadratic Unconstrained Binary Optimization (QUBO) Gaussian Processes are used in many applications to model spatial phenomena. Within this context, a key issue is to decide the set of locations where to take measurements so as to obtain a better approximation of the underlying function. Current state of the art techniques select such set to minimize the posterior variance of the Gaussian process. We explore the feasibility of solving this problem by proposing a novel Quadratic Unconstrained Binary Optimization (QUBO) model. In recent years this QUBO formulation has gained increasing attention since it represents the input for the specialized quantum annealer D-Wave machines. Hence, our contribution takes an important first step towards the sampling optimization of Gaussian processes in the context of quantum computation. Results of our empirical evaluation shows that the optimum of the QUBO objective function we derived represents a good solution for the above mentioned problem. In fact we are able to obtain comparable and in some cases better results than the widely used submodular technique. …

Point Attention Transformer (PAT) Geometric deep learning is increasingly important thanks to the popularity of 3D sensors. Inspired by the recent advances in NLP domain, the self-attention transformer is introduced to consume the point clouds. We develop Point Attention Transformers (PATs), using a parameter-efficient Group Shuffle Attention (GSA) to replace the costly Multi-Head Attention. We demonstrate its ability to process size-varying inputs, and prove its permutation equivariance. Besides, prior work uses heuristics dependence on the input data (e.g., Furthest Point Sampling) to hierarchically select subsets of input points. Thereby, we for the first time propose an end-to-end learnable and task-agnostic sampling operation, named Gumbel Subset Sampling (GSS), to select a representative subset of input points. Equipped with Gumbel-Softmax, it produces a ‘soft’ continuous subset in training phase, and a ‘hard’ discrete subset in test phase. By selecting representative subsets in a hierarchical fashion, the networks learn a stronger representation of the input sets with lower computation cost. Experiments on classification and segmentation benchmarks show the effectiveness and efficiency of our methods. Furthermore, we propose a novel application, to process event camera stream as point clouds, and achieve a state-of-the-art performance on DVS128 Gesture Dataset. …

Autoencoding Variational Transformation (AVT) The learning of Transformation-Equivariant Representations (TERs), which is introduced by Hinton et al. \cite{hinton2011transforming}, has been considered as a principle to reveal visual structures under various transformations. It contains the celebrated Convolutional Neural Networks (CNNs) as a special case that only equivary to the translations. In contrast, we seek to train TERs for a generic class of transformations and train them in an {\em unsupervised} fashion. To this end, we present a novel principled method by Autoencoding Variational Transformations (AVT), compared with the conventional approach to autoencoding data. Formally, given transformed images, the AVT seeks to train the networks by maximizing the mutual information between the transformations and representations. This ensures the resultant TERs of individual images contain the {\em intrinsic} information about their visual structures that would equivary {\em extricably} under various transformations. Technically, we show that the resultant optimization problem can be efficiently solved by maximizing a variational lower-bound of the mutual information. This variational approach introduces a transformation decoder to approximate the intractable posterior of transformations, resulting in an autoencoding architecture with a pair of the representation encoder and the transformation decoder. Experiments demonstrate the proposed AVT model sets a new record for the performances on unsupervised tasks, greatly closing the performance gap to the supervised models. …

Distill and Transfer Learning (Distral) Most deep reinforcement learning algorithms are data inefficient in complex and rich environments, limiting their applicability to many scenarios. One direction for improving data efficiency is multitask learning with shared neural network parameters, where efficiency may be improved through transfer across related tasks. In practice, however, this is not usually observed, because gradients from different tasks can interfere negatively, making learning unstable and sometimes even less data efficient. Another issue is the different reward schemes between tasks, which can easily lead to one task dominating the learning of a shared model. We propose a new approach for joint training of multiple tasks, which we refer to as Distral (Distill & transfer learning). Instead of sharing parameters between the different workers, we propose to share a ‘distilled’ policy that captures common behaviour across tasks. Each worker is trained to solve its own task while constrained to stay close to the shared policy, while the shared policy is trained by distillation to be the centroid of all task policies. Both aspects of the learning process are derived by optimizing a joint objective function. We show that our approach supports efficient transfer on complex 3D environments, outperforming several related methods. Moreover, the proposed learning process is more robust and more stable—attributes that are critical in deep reinforcement learning. …

“(Graph Databases:) No longer need enterprises make tradeoffs between rich and reach; they can submit highly complex queries against enormous, highly varied data sets without concern that new data with varying structure will break their queries. These innovations are also bringing the capability for asking questions that, in the SQL environment, would otherwise require tens or hundreds of table joins.”Tony Baer ( 2014 )

There are two big unsolved mathematical questions in artificial intelligence (AI): (1) Why is deep learning so successful in classification problems and (2) why are neural nets based on deep learning at the same time universally unstable, where the instabilities make the networks vulnerable to adversarial attacks. We present a solution to these questions that can be summed up in two words; false structures. Indeed, deep learning does not learn the original structures that humans use when recognising images (cats have whiskers, paws, fur, pointy ears, etc), but rather different false structures that correlate with the original structure and hence yield the success. However, the false structure, unlike the original structure, is unstable. The false structure is simpler than the original structure, hence easier to learn with less data and the numerical algorithm used in the training will more easily converge to the neural network that captures the false structure. We formally define the concept of false structures and formulate the solution as a conjecture. Given that trained neural networks always are computed with approximations, this conjecture can only be established through a combination of theoretical and computational results similar to how one establishes a postulate in theoretical physics (e.g. the speed of light is constant). Establishing the conjecture fully will require a vast research program characterising the false structures. We provide the foundations for such a program establishing the existence of the false structures in practice. Finally, we discuss the far reaching consequences the existence of the false structures has on state-of-the-art AI and Smale’s 18th problem.What do AI algorithms actually learn? – On false structures in deep learning

Scanning all new published packages on PyPI I know that the quality is often quite bad. I try to filter out the worst ones and list here the ones which might be worth a look, being followed or inspire you in some way.

• autorank Automated ranking of populations in a repeated measures experiment, e.g., to rank different machine learning approaches tested on the same data.

“Raw numbers are easy to report and analyze, but without the proper context, they can be misleading. Is the effect you’re seeing real, or a simple result of the underlying, obvious distribution? Too many analyses and news stories end up reporting things we already know.”Robert Kosara ( 23.07.2014 )

Scanning all new published packages on PyPI I know that the quality is often quite bad. I try to filter out the worst ones and list here the ones which might be worth a look, being followed or inspire you in some way.

• torch_maml Gradient checkpointing technique for Model Agnostic Meta Learning

Tuple Plot Complex systems are described with high-dimensional data that is hard to visualise. Inselberg’s parallel coordinates are one representation technique for visualising high-dimensional data. Here we generalise Inselberg’s approach, and use it for visualising trajectories through high dimensional state spaces. We introduce two geometric projections of parallel coordinate representations — ‘plan tuple plots’ and ‘side tuple plots’ — and demonstrate a link between state space and ordinary space representations. We provide examples from many domains to illustrate use of the approach, including Cellular Automata, Random Boolean Networks, coupled logistic maps, reservoir computing, search algorithms, Turing Machines, and flocking. …

Consumer and Producer Based Recommendation (CPRec) User-Generated Content (UGC) is at the core of web applications where users can both produce and consume content. This differs from traditional e-Commerce domains where content producers and consumers are usually from two separate groups. In this work, we propose a method CPRec (consumer and producer based recommendation), for recommending content on UGC-based platforms. Specifically, we learn a core embedding for each user and two transformation matrices to project the user’s core embedding into two ‘role’ embeddings (i.e., a producer and consumer role). We model each interaction by the ternary relation between the consumer, the consumed item, and its producer. Empirical studies on two large-scale UGC applications show that our method outperforms standard collaborative filtering methods as well as recent methods that model producer information via item features. …

Backdoor Injection Attack Deep learning models have consistently outperformed traditional machine learning models in various classification tasks, including image classification. As such, they have become increasingly prevalent in many real world applications including those where security is of great concern. Such popularity, however, may attract attackers to exploit the vulnerabilities of the deployed deep learning models and launch attacks against security-sensitive applications. In this paper, we focus on a specific type of data poisoning attack, which we refer to as a {\em backdoor injection attack}. The main goal of the adversary performing such attack is to generate and inject a backdoor into a deep learning model that can be triggered to recognize certain embedded patterns with a target label of the attacker’s choice. Additionally, a backdoor injection attack should occur in a stealthy manner, without undermining the efficacy of the victim model. Specifically, we propose two approaches for generating a backdoor that is hardly perceptible yet effective in poisoning the model. We consider two attack settings, with backdoor injection carried out either before model training or during model updating. We carry out extensive experimental evaluations under various assumptions on the adversary model, and demonstrate that such attacks can be effective and achieve a high attack success rate (above $90\%$) at a small cost of model accuracy loss (below $1\%$) with a small injection rate (around $1\%$), even under the weakest assumption wherein the adversary has no knowledge either of the original training data or the classifier model. …

Graph Learning Neural Network Semi-supervised classification on graph-structured data has received increasing attention, where labels are only available for a small subset of data such as social networks and citation networks. This problem is challenging due to the irregularity of graphs. Graph convolutional neural networks (GCN) have been recently proposed to address such kinds of problems, which feed the graph topology into the network to guide operations such as graph convolution. Nevertheless, in most cases where the graphs are not given, they are empirically constructed manually, which tends to be sub-optimal. Hence, we propose Graph Learning Neural Networks (GLNN), which exploits the optimization of graphs (the adjacency matrix in particular) and integrates into the GCN for semi-supervised node classification. Leveraging on spectral graph theory, this essentially combines both graph learning and graph convolution into a unified framework. Specifically, we represent features of social/citation networks as graph signals, and propose the objective of graph learning from the graph-signal prior, sparsity constraint and properties of a valid adjacency matrix via maximum a posteriori estimation. The optimization objective is then integrated into the loss function of the GCN, leading to joint learning of the adjacency matrix and high-level features. Experimental results show that our proposed GLNN outperforms state-of-the-art approaches over widely adopted social network datasets and citation network datasets. …

The construction of artificial general intelligence (AGI) was a long-term goal of AI research aiming to deal with the complex data in the real world and make reasonable judgments in various cases like a human. However, the current AI creations, referred to as ‘Narrow AI’, are limited to a specific problem. The constraints come from two basic assumptions of data, which are independent and identical distributed samples and single-valued mapping between inputs and outputs. We completely break these constraints and develop the subjectivity learning theory for general intelligence. We assign the mathematical meaning for the philosophical concept of subjectivity and build the data representation of general intelligence. Under the subjectivity representation, then the global risk is constructed as the new learning goal. We prove that subjectivity learning holds a lower risk bound than traditional machine learning. Moreover, we propose the principle of empirical global risk minimization (EGRM) as the subjectivity learning process in practice, establish the condition of consistency, and present triple variables for controlling the total risk bound. The subjectivity learning is a novel learning theory for unconstrained real data and provides a path to develop AGI.Subjectivity Learning Theory towards Artificial General Intelligence

“It is important to remember that Data Science techniques are tools that we can use to help make better decisions, with an organization and are not an end in themselves. It is paramount that, when tasked with creating a predictive model, we fully understand the business problem that this model is being constructed to address and ensure that it does address it.”Damian Mingle ( September 15, 2015 )