The Tensor Algebra Compiler

Tensor algebra is a powerful tool with applications in machine learning, data analytics, engineering and the physical sciences. Tensors are often sparse and compound operations must frequently be computed in a single kernel for performance and to save memory. Programmers are left to write kernels for every operation of interest, with different mixes of dense and sparse tensors in different formats. The combinations are infinite, which makes it impossible to manually implement and optimize them all. This paper introduces the first compiler technique to automatically generate kernels for any compound tensor algebra operation on dense and sparse tensors. The technique is implemented in a C++ library called taco. Its performance is competitive with best-in-class hand-optimized kernels in popular libraries, while supporting far more tensor operations.

Big Data: 50 Fascinating and Free Data Sources for Data Visualization

Have you ever felt frustrated when try to look for some data on Google? Pages of relevant websites but none can fulfill your expectation? Have you ever felt that your articles are less persuasive without data support?

Data Analytics for Internal Audit

Large publicly listed companies not only have external auditors who check the books, but often also a large community of internal auditors. These collaborators provide the company with a sufficient level of assurance in terms of adherence to internal and external rules and guidelines. This covers financial aspects (spend, invoices, investments, …), human resources (working time, payroll, …) but also production related aspects (e.g. food safety and quality). One of the strongest trends observed in internal auditing communities is the more and more widespread use of Data Analytics. The term refers to the use of data, statistical methods and statistical thinking as a way of working, in addition to traditional auditing methods like interviews, document and process reviews, etc. This is naturally a trend not only observed in audit: many other business processes now rely more and more on data-driven decision making, and this manifests itself in the buzz around “Big Data”, “Data Science” and “Business Analytics”. In this article, we describe different approaches to ensure that Data Analytics is used efficiently in a large company for controlling and internal audit.

A Primer on Deep Learning

What is deep learning?
I like to use the following three-part definition as a baseline. Deep learning is:
1. A collection of statistical machine learning techniques
2. Used to learn feature hierarchies
3. Often based on artificial neural networks
That’s it. Not so scary after all. For sounding so innocuous under the hood, there’s a lot of rumble in the news about what might be done with DL in the future. Let’s start with an example of what has already been done to motivate why it is proving interesting to so many.

Large Scale Deep Learning with TensorFlow

In this video presentation from the Spark Summit 2016 conference in San Francisco, Google’s Jeff Dean examines large scale deep learning with the TensorFlow framework. Jeff joined Google in 1999 and is currently a Google Senior Fellow. He currently works in Google’s Research division, where he co-founded and leads the Google Brain team, Google’s deep learning research team in Mountain View. He is a co-designer and co-implementor of Google’s distributed computing infrastructure, including the MapReduce, BigTable, Spanner, DistBelief and TensorFlow systems, protocol buffers, LevelDB, and a variety of internal and external libraries and developer tools. He received a Ph.D. in Computer Science from the University of Washington in 1996.

Overview of GANs (Generative Adversarial Networks) – Part I

The purpose of this article series is to provide an overview of GAN research and explain the nature of the contributions. I’m new to this area myself, so this will surely be incomplete, but hopefully it can provide some quick context to other newbies. For Part I we’ll introduce GANs at a high level and summarize the original paper. Feel free to skip to Part II if you’re already familiar with the basics. It’s assumed you’re familiar with the basics of neural networks.

ShinyProxy 1.0.2

ShinyProxy is a novel, open source platform to deploy Shiny apps for the enterprise or larger organizations. Since our last blog post ten new releases of ShinyProxy have seen the light, but with the 1.0.2 release it is time to provide an overview of the lines of development and advances made.

Measuring & Monitoring Internet Speed with R

Working remotely has many benefits, but if you work remotely in an area like, say, rural Maine, one of those benefits is not massively speedy internet connections. Being able to go fast and furious on the internet is one of the many things I miss about our time in Seattle and it is unlikely that we’ll be seeing Google Fiber in my small town any time soon. One other issue is that residential plans from evil giants like Comcast come with things like “bandwidth caps”. I suspect many WFH-ers can live within those limits, but I work with internet-scale data and often shunt extracts or whole datasets to the DatCave™ server farm for local processing. As such, I pay an extra penalty as a Comcast “Business-class” user that has little benefit besides getting slightly higher QoS and some faster service response times when there are issues.