Time series classification with Tensorflow

Time-series data arise in many fields including finance, signal processing, speech recognition and medicine. A standard approach to time-series problems usually requires manual engineering of features which can then be fed into a machine learning algorithm. Engineering of features generally requires some domain knowledge of the discipline where the data has originated from. For example, if one is dealing with signals (i.e. classification of EEG signals), then possible features would involve power spectra at various frequency bands, Hjorth parameters and several other specialized statistical properties. A similar situation arises in image classification, where manually engineered features (obtained by applying a number of filters) could be used in classification algorithms. However, with the advent of deep learning, it has been shown that convolutional neural networks (CNN) can outperform this strategy. A CNN does not require any manual engineering of features. During training, the CNN learns lots of “filters” with increasing complexity as the layers get deeper, and uses them in a final classifier. In this blog post, I will discuss the use of deep leaning methods to classify time-series data, without the need to manually engineer features. The example I will consider is the classic Human Activity Recognition (HAR) dataset from the UCI repository. The dataset contains the raw time-series data, as well as a pre-processed one with 561 engineered features. I will compare the performance of typical machine learning algorithms which use engineered features with two deep learning methods (convolutional and recurrent neural networks) and show that deep learning can surpass the performance of the former.

Getting Started with Audio Data Analysis using Deep Learning (with case study)

When you get started with data science, you start simple. You go through simple projects like Loan Prediction problem or Big Mart Sales Prediction. These problems have structured data arranged neatly in a tabular format. In other words, you are spoon-fed the hardest part in data science pipeline. The datasets in real life are much more complex. You first have to understand it, collect it from various sources and arrange it in a format which is ready for processing. This is even more difficult when the data is in an unstructured format such as image or audio. This is so because you would have to represent image/audio data in a standard way for it to be useful for analysis.

3 Ways to Create a Strategic Framework for AI Adoption

Artificial intelligence has become a buzzword in boardrooms and C-suite meetings in companies large and small, across all industries. The purported advantages of AI and related technology have led CIOs and other decision makers to engage in a virtual arms race in which no one wants to be left holding legacy technology. While it makes good business sense to invest in new, more powerful tools, there’s a downside to diving headfirst into asset collection without a plan. Leaders who quickly decide to invest in an AI platform will eventually find themselves locked into just one type of AI, which could potentially inhibit future growth and innovation.

HDFS vs. HBase : All you need to know

The sudden increase in the volume of data from the order of gigabytes to zettabytes has created the need for a more organized file system for storage and processing of data. The demand stemming from the data market has brought Hadoop in the limelight making it one of biggest players in the industry. Hadoop Distributed File System (HDFS), the commonly known file system of Hadoop and Hbase (Hadoop’s database) are the most topical and advanced data storage and management systems available in the market.

Creating maps in R using ggplot2 and maps libraries

Here is how we can use the maps, mapdata and ggplot2 libraries to create maps in R. In this particular example, we’re going to create a world map showing the points of Beijing and Shanghai, both cities in China. For this particular map, we will be displaying the Northern Hemisphere from Europe to Asia.

IoT: Penetrating the Possibilities of a Data Driven Economy

Ever since the Internet of Things (IoT) manifested into reality, integrating the physical world with our digital routine, experts and thought leaders have waited for it to transform the dream of a data driven economy into a witnessed possibility. As the concept of Internet of Things continues to evolve and grow, it now appears that the wait is finally over. Welcome to the Industrial Internet of Things (IIoT). This is a concept-turned-reality, which looks set to change the traditional picture of industrial production for years to come.

ASAP: Automatic Smoothing for Attention Prioritization in Streaming Time Series Visualization

Time series visualization of streaming telemetry (i.e., charting of key metrics such as server load over time) is increasingly prevalent in recent application deployments. Existing systems simply plot the raw data streams as they arrive, potentially obscuring large-scale deviations due to local variance and noise. We propose an alternative: to better prioritize attention in time series exploration and monitoring visualizations, smooth the time series as much as possible to remove noise while still retaining large-scale structure. We develop a new technique for automatically smoothing streaming time series that adaptively optimizes this trade-off between noise reduction (i.e., variance) and outlier retention (i.e., kurtosis). We introduce metrics to quantitatively assess the quality of the choice of smoothing parameter and provide an efficient streaming analytics operator, ASAP, that optimizes these metrics by combining techniques from stream processing, user interface design, and signal processing via a novel autocorrelation-based pruning strategy and pixel-aware preaggregation. We demonstrate that ASAP is able to improve users’ accuracy in identifying significant deviations in time series by up to 38.4% while reducing response times by up to 44.3%. Moreover, ASAP delivers these results several orders of magnitude faster than alternative optimization strategies.

Building distributed systems with containers

Five questions for Brendan Burns:
How containers and cluster management have changed systems development, and common patterns for building distributed systems.
1. What prompted you to develop Kubernetes?
2. How have containers and cluster management changed how systems are developed?
3. What are the toughest obstacles developers face in moving to building distributed systems with containers?
4. What are some of the common patterns for building distributed systems?
5. What other parts of the program for Velocity NY are of interest to you?

A Contrarian View on Automation

The ‘Age of Automation’ is upon us. Companies strive to reduce their costs by using technology to replace humans at every opportunity. Business executives fight over experts in artificial intelligence and data science in hopes of attaining a competitive edge over their rivals. Even the wary flock to siren calls of ever-greater efficiency via investments in computers, robotics and software. But is the path to profitability paved with big data? It’s important to consider more contrarian views, so that investments in digital work for us, rather than vice versa. We examine here the tangible and intangible costs and benefits of embarking on a journey towards automation.

Causal Discovery from Temporally Aggregated Time Series

Discovering causal structure of a dynamical system from observed time series is a traditional and important problem. In many practical applications, observed data are obtained by applying subsampling or temporally aggregation to the original causal processes, making it difficult to discover the underlying causal relations. Subsampling refers to the procedure that for every k consecutive observations, one is kept, the rest being skipped, and recently some advances have been made in causal discovery from such data. With temporal aggregation, the local averages or sums of k consecutive, non-overlapping observations in the causal process are computed as new observations, and causal discovery from such data is even harder. In this paper, we investigate how to recover causal relations at the original causal frequency from temporally aggregated data when k is known. Assuming the time series at the causal frequency follows a vector autoregressive (VAR) model, we show that the causal structure at the causal frequency is identifiable from aggregated time series if the noise terms are independent and non-Gaussian and some other technical conditions hold. We then present an estimation method based on non-Gaussian state-space modeling and evaluate its performance on both synthetic and real data.

The current state of applied data science

As we enter the latter part of 2017, it’s time to take a look at the common challenges faced by companies interested in using data science and machine learning (ML). Let’s assume your organization is already collecting data at a scale that justifies the use of analytic tools, and that you’ve managed to identify and prioritize use cases where data science can be transformative (including improvements to decision-making or business operations, increasing revenue, etc.). Data gathering and identifying interesting problems are non-trivial, but assuming you’ve gotten a healthy start on these tasks, what challenges remain? Data science is a large topic, so I’ll offer a disclaimer: this post is mainly about the use of supervised machine learning today, and it draws from a series of conversations over the last few months. I’ll have more to say about AI systems in future posts, but such systems clearly rely on more than just supervised learning.

Cognitive Computing Defined

Cognitive computing makes a new class of problems computable. It addresses complex situations that are characterized by ambiguity and uncertainty; in other words it handles human kinds of problems. In these dynamic, information-rich, and shifting situations, data tends to change frequently, and it is often conflicting. The goals of users evolve as they learn more and redefine their objectives. To respond to the fluid nature of users’ understanding of their problems, the cognitive computing system offers a synthesis not just of information sources but of influences, contexts, and insights. To do this, systems often need to weigh conflicting evidence and suggest an answer that is “best” rather than “right”. Cognitive computing systems make context computable. They identify and extract context features such as hour, location, task, history or profile to present an information set that is appropriate for an individual or for a dependent application engaged in a specific process at a specific time and place. They provide machine-aided serendipity by wading through massive collections of diverse information to find patterns and then apply those patterns to respond to the needs of the moment. Cognitive computing systems redefine the nature of the relationship between people and their increasingly pervasive digital environment. They may play the role of assistant or coach for the user, and they may act virtually autonomously in many problem-solving situations. The boundaries of the processes and domains these systems will affect are still elastic and emergent. Their output may be prescriptive, suggestive, instructive, or simply entertaining.

ParlAI: A Dialog Research Software Platform

We introduce ParlAI (pronounced ‘par-lay’), an open-source software platform for dialog research implemented in Python, available at this http URL Its goal is to provide a unified framework for sharing, training and testing of dialog models, integration of Amazon Mechanical Turk for data collection, human evaluation, and online/reinforcement learning; and a repository of machine learning models for comparing with others’ models, and improving upon existing architectures. Over 20 tasks are supported in the first release, including popular datasets such as SQuAD, bAbI tasks, MCTest, WikiQA, QACNN, QADailyMail, CBT, bAbI Dialog, Ubuntu, OpenSubtitles and VQA. Several models are integrated, including neural models such as memory networks, seq2seq and attentive LSTMs.