What’s going on on PyPI

Scanning all new published packages on PyPI I know that the quality is often quite bad. I try to filter out the worst ones and list here the ones which might be worth a look, being followed or inspire you in some way.

A binary timeseries storage format, where the time axis is given via an expression.

Tools for Binary and Text data.

A tool for generating data and managing relational databases.

Package for Multiple Objective Decision Analysis

A Data Science project structure pip package

Eisen is a collection of tools to train neural networks for medical image analysis

Helpers utils for manage and track experiments.

A library for fair machine learning.

FlowETL is a collection of special purposes Airflow operators and sensors for use with FlowKit.

A python package that allows importing Jupyter notebooks as python modules

Python library for Natural Language Processing supporting the open annotation standard

Generates a navigable book-like structure to a collection of jupyter notebooks


If you did not already know

Multiset Dimension google
We introduce a variation of the metric dimension, called the multiset dimension. The representation multiset of a vertex $v$ with respect to $W$ (which is a subset of the vertex set of a graph $G$), $r_m (v|W)$, is defined as a multiset of distances between $v$ and the vertices in $W$ together with their multiplicities. If $r_m (u |W) \neq r_m(v|W)$ for every pair of distinct vertices $u$ and $v$, then $W$ is called a resolving set of $G$. If $G$ has a resolving set, then the cardinality of a smallest resolving set is called the multiset dimension of $G$, denoted by $md(G)$. If $G$ does not contain a resolving set, we write $md(G) = \infty$. We present basic results on the multiset dimension. We also study graphs of given diameter and give some sufficient conditions for a graph to have an infinite multiset dimension. …

Empirical Bayes Geometric Mean (EBGM) google
Adjusted estimate for the relative reporting ratio. Example: if EBGM=3.9 for acetaminophen-hepatic failure, then this drug-event combination occurred in the data 3.9 times more frequently than expected under the assumption of no association between the drug and the event. …

Jack the Reader (Jack) google
Many Machine Reading and Natural Language Understanding tasks require reading supporting text in order to answer questions. For example, in Question Answering, the supporting text can be newswire or Wikipedia articles; in Natural Language Inference, premises can be seen as the supporting text and hypotheses as questions. Providing a set of useful primitives operating in a single framework of related tasks would allow for expressive modelling, and easier model comparison and replication. To that end, we present Jack the Reader (Jack), a framework for Machine Reading that allows for quick model prototyping by component reuse, evaluation of new models on existing datasets as well as integrating new datasets and applying them on a growing set of implemented baseline models. Jack is currently supporting (but not limited to) three tasks: Question Answering, Natural Language Inference, and Link Prediction. It is developed with the aim of increasing research efficiency and code reuse. …

Jaya Optimisation Algorithm google
An Efficient Multi-core Implementation of the Jaya Optimisation Algorithm

Document worth reading: “Object Detection in 20 Years: A Survey”

Object detection, as of one the most fundamental and challenging problems in computer vision, has received great attention in recent years. Its development in the past two decades can be regarded as an epitome of computer vision history. If we think of today’s object detection as a technical aesthetics under the power of deep learning, then turning back the clock 20 years we would witness the wisdom of cold weapon era. This paper extensively reviews 400+ papers of object detection in the light of its technical evolution, spanning over a quarter-century’s time (from the 1990s to 2019). A number of topics have been covered in this paper, including the milestone detectors in history, detection datasets, metrics, fundamental building blocks of the detection system, speed up techniques, and the recent state of the art detection methods. This paper also reviews some important detection applications, such as pedestrian detection, face detection, text detection, etc, and makes an in-deep analysis of their challenges as well as technical improvements in recent years. Object Detection in 20 Years: A Survey

What’s going on on PyPI

Scanning all new published packages on PyPI I know that the quality is often quite bad. I try to filter out the worst ones and list here the ones which might be worth a look, being followed or inspire you in some way.

Fast CUDA C++ GMRES implementation for Toeplitz-like (Toeplitz, Hankel, Circulant) matrices and mixed (combinations of Diagonal ones and Toeplitz-like ones) matrices. We propose implementations of the Generalized Minimal Residual Method (GMRES) for solving linear systems based on dense, Toeplitz or mixed matrices. The software consists of a python module and a C++ library. The mixed matrices consist of the sum of the diag- onal and the Toeplitz matrices. The GMRES solver is parallelized for running on NVIDIA GPGPU accelerator. We report on the efficiency of the parallelization method applying GMRES to the Helmholtz linear system based on the use of Green’s Function Integral Equation Method (GFIEM) for computing electric field distribution in the design domain.

High-level Regression Utilities

robotframework-sqless is a SQL abstraction library for Robot Framework

State-of-the-art Computer Vision and Object Detection for TensorFlow.

Scorebased Neural Architecture REduction

Search space optimization via gradient boosting regression

A Python algorithms used to perform machine learning.

A Library for Homomorphic Encryption Operations on Tensors

Generate C code for microcontrollers from Tensorflow models

A python package capable of performing and illustrating a few clustering methods.

Automated Time Series in Python

A deep learning approach for mapping and dating burned areas using temporal sequences of satellite images

If you did not already know

Gather-Excite google
While the use of bottom-up local operators in convolutional neural networks (CNNs) matches well some of the statistics of natural images, it may also prevent such models from capturing contextual long-range feature interactions. In this work, we propose a simple, lightweight approach for better context exploitation in CNNs. We do so by introducing a pair of operators: gather, which efficiently aggregates feature responses from a large spatial extent, and excite, which redistributes the pooled information to local features. The operators are cheap, both in terms of number of added parameters and computational complexity, and can be integrated directly in existing architectures to improve their performance. Experiments on several datasets show that gather-excite can bring benefits comparable to increasing the depth of a CNN at a fraction of the cost. For example, we find ResNet-50 with gather-excite operators is able to outperform its 101-layer counterpart on ImageNet with no additional learnable parameters. We also propose a parametric gather-excite operator pair which yields further performance gains, relate it to the recently-introduced Squeeze-and-Excitation Networks, and analyse the effects of these changes to the CNN feature activation statistics. …

Generalized k-Nearest Neighbor (GkNN) google
Three methods of temporal data upscaling, which may collectively be called the generalized k-nearest neighbor (GkNN) method, are considered. The accuracy of the GkNN simulation of month by month yield is considered (where the term yield denotes the dependent variable). The notion of an eventually well distributed time series is introduced and on the basis of this assumption some properties of the average annual yield and its variance for a GkNN simulation are computed. The total yield over a planning period is determined and a general framework for considering the GkNN algorithm based on the notion of stochastically dependent time series is described and it is shown that for a sufficiently large training set the GkNN simulation has the same statistical properties as the training data. An example of the application of the methodology is given in the problem of simulating yield of a rainwater tank given monthly climatic data. …

Proximal Policy Optimization (PPO) google
We propose a new family of policy gradient methods for reinforcement learning, which alternate between sampling data through interaction with the environment, and optimizing a ‘surrogate’ objective function using stochastic gradient ascent. Whereas standard policy gradient methods perform one gradient update per data sample, we propose a novel objective function that enables multiple epochs of minibatch updates. The new methods, which we call proximal policy optimization (PPO), have some of the benefits of trust region policy optimization (TRPO), but they are much simpler to implement, more general, and have better sample complexity (empirically). Our experiments test PPO on a collection of benchmark tasks, including simulated robotic locomotion and Atari game playing, and we show that PPO outperforms other online policy gradient methods, and overall strikes a favorable balance between sample complexity, simplicity, and wall-time. …

Residual Hourglass Recurrent Neural Network (RHR-Net) google
Most current speech enhancement models use spectrogram features that require an expensive transformation and result in phase information loss. Previous work has overcome these issues by using convolutional networks to learn long-range temporal correlations across high-resolution waveforms. These models, however, are limited by memory-intensive dilated convolution and aliasing artifacts from upsampling. We introduce an end-to-end fully-recurrent hourglass-shaped neural network architecture with residual connections for waveform-based single-channel speech enhancement. Our model can efficiently capture long-range temporal dependencies by reducing the features resolution without information loss. Experimental results show that our model outperforms state-of-the-art approaches in six evaluation metrics. …

Document worth reading: “Recent Advances in Deep Learning for Object Detection”

Object detection is a fundamental visual recognition problem in computer vision and has been widely studied in the past decades. Visual object detection aims to find objects of certain target classes with precise localization in a given image and assign each object instance a corresponding class label. Due to the tremendous successes of deep learning based image classification, object detection techniques using deep learning have been actively studied in recent years. In this paper, we give a comprehensive survey of recent advances in visual object detection with deep learning. By reviewing a large body of recent related work in literature, we systematically analyze the existing object detection frameworks and organize the survey into three major parts: (i) detection components, (ii) learning strategies, and (iii) applications & benchmarks. In the survey, we cover a variety of factors affecting the detection performance in detail, such as detector architectures, feature learning, proposal generation, sampling strategies, etc. Finally, we discuss several future directions to facilitate and spur future research for visual object detection with deep learning. Keywords: Object Detection, Deep Learning, Deep Convolutional Neural Networks Recent Advances in Deep Learning for Object Detection

Document worth reading: “Wasserstein Distributionally Robust Optimization: Theory and Applications in Machine Learning”

Many decision problems in science, engineering and economics are affected by uncertain parameters whose distribution is only indirectly observable through samples. The goal of data-driven decision-making is to learn a decision from finitely many training samples that will perform well on unseen test samples. This learning task is difficult even if all training and test samples are drawn from the same distribution—especially if the dimension of the uncertainty is large relative to the training sample size. Wasserstein distributionally robust optimization seeks data-driven decisions that perform well under the most adverse distribution within a certain Wasserstein distance from a nominal distribution constructed from the training samples. In this tutorial we will argue that this approach has many conceptual and computational benefits. Most prominently, the optimal decisions can often be computed by solving tractable convex optimization problems, and they enjoy rigorous out-of-sample and asymptotic consistency guarantees. We will also show that Wasserstein distributionally robust optimization has interesting ramifications for statistical learning and motivates new approaches for fundamental learning tasks such as classification, regression, maximum likelihood estimation or minimum mean square error estimation, among others. Wasserstein Distributionally Robust Optimization: Theory and Applications in Machine Learning

If you did not already know

Productive Machine Learning (Pro-ML) google
The goal of Pro-ML is to double the effectiveness of machine learning engineers while simultaneously opening the tools for AI and modeling to engineers from across the LinkedIn stack. As we mapped out the effort, we kept a set of key ideas in place to constrain the solution space and focus our efforts.
• We will leverage and improve best-of-breed components from our existing code base to the maximum extent feasible. We are unlikely to rewrite our entire tech stack, but any particular component is fair game.
• The state of the art is constantly evolving with new algorithms and open source frameworks – we need to be flexible to support our existing major ML algorithms as well as new ones that will emerge.
• We will use an agile-inspired strategy so that each step we take is delivering value by making at least one product line better or providing generally useable improvements to existing components.
• The ability to run the models in real-time is as important as the ability to author or train them. The services hosting the models must be able to be independently upgraded without breaking their downstream or upstream services.
• New models, retrained models, and models using new technologies must be A/B testable in production.
• We must build GDPR privacy requirements into every stage of the solution. …

Random Projection Forest (rpForest) google
K-nearest neighbor (kNN) search has wide applications in many areas, including data mining, machine learning, statistics and many applied domains. Inspired by the success of ensemble methods and the flexibility of tree-based methodology, we propose random projection forests (rpForests), for kNN search. rpForests finds kNNs by aggregating results from an ensemble of random projection trees with each constructed recursively through a series of carefully chosen random projections. rpForests achieves a remarkable accuracy in terms of fast decay in the missing rate of kNNs and that of discrepancy in the kNN distances. rpForests has a very low computational complexity. The ensemble nature of rpForests makes it easily run in parallel on multicore or clustered computers; the running time is expected to be nearly inversely proportional to the number of cores or machines. We give theoretical insights by showing the exponential decay of the probability that neighboring points would be separated by ensemble random projection trees when the ensemble size increases. Our theory can be used to refine the choice of random projections in the growth of trees, and experiments show that the effect is remarkable. …

IPMAN google
We present a new methodology, called IPMAN, that combines interior point methods and generative adversarial networks to solve constrained optimization problems with feasible sets that are non-convex or not explicitly defined. Our methodology produces {\epsilon}-optimal solutions and demonstrates that, when there are multiple global optima, it learns a distribution over the optimal set. We apply our approach to synthetic examples to demonstrate its effectiveness and to a problem in radiation therapy treatment optimization with a non-convex feasible set. …

Fuzzy Bayesian Learning google
In this paper we propose a novel approach for learning from data using rule based fuzzy inference systems where the model parameters are estimated using Bayesian inference and Markov Chain Monte Carlo (MCMC) techniques. We show the applicability of the method for regression and classification tasks using synthetic data-sets and also a real world example in the financial services industry. Then we demonstrate how the method can be extended for knowledge extraction to select the individual rules in a Bayesian way which best explains the given data. Finally we discuss the advantages and pitfalls of using this method over state-of-the-art techniques and highlight the specific class of problems where this would be useful. …

What’s going on on PyPI

Scanning all new published packages on PyPI I know that the quality is often quite bad. I try to filter out the worst ones and list here the ones which might be worth a look, being followed or inspire you in some way.

Library for creating data input pipeline in pure Tensorflow 2.x

CVRPTW Optimization Models

TensorFlow 2 implementation of Deep Graph Convolutional Neural Networks.

An early version of an educational module that is being developed to make it easier to experiment with different deep learning networks in PyTorch

A pure Python document and graph database engine

FDLSGM: Fast Directed Line Segment Grouping Method

Level set machine learning for image segmentation

A markdown extension for mathematical documents.

Workflow tools for pytorch and ignite

Multi-Output Gaussian Process ToolKit

Non-uniform FFT in 1D, 2D and 3D for CPU and GPU (CUDA)

A package to extract text from PDF