How to Use IoT Datasets in #AI Applications

Recently, google launched a Dataset search – which is a great resource to find Datasets. In this post, I list some IoT datasets which can be used for Machine Learning or Deep Learning applications. But finding datasets is only part of the story. A static dataset for IoT is not enough i.e. some of the interesting analysis is in streaming mode. To create an end to end streaming implementation from a given dataset, we need knowledge of full stack skills. These are more complex (and in high demand). In this post, I hence describe the datasets but also a full stack implementation. An end to end flow implementation is described in the book Agile Data Science, 2.0 by Russell Jurney. I use this book in my teaching at the Data Science for Internet of Things course at the University of Oxford. I demonstrate the implementation from this book below. The views here represent my own. In understanding an end to end application, the first problem is .. how to capture data from a wide range of IoT devices. The protocol used for this is typically MQTT. MQTT is lightweight IoT connectivity protocol. MQTT is publish-subscribe-based messaging protocol used in IoT applications to manage a large number of IoT devices who often have limited connectivity, bandwidth and power. MQTT integrates with Apache Kafka. Kafka provides high scalability, longer storage and easy integration to legacy systems. Apache Kafka is a highly scalable distributed streaming platform. Kafka ingests, stores, processes and forwards high volumes of data from thousands of IoT devices. (source Kai Waehner)

88 Resources & Tools to Become a Data Scientist

Harvard Business Review has regareded data scientist as the sexiest job of the 21st century. In this article, with the assistance of Octoparse V7, one of the best free web data scraping tool, we aggregated the resources and tools that you may need to become a data scientist.

Supervised Learning – Everything You Need To Know

Supervised learning – A blessing we have in this machines era. It helps to depict inputs to outputs. It uses labelled training data to deduce a function which has set of training examples. The majority of practical machine learning uses supervised learning as on date.

Fraud prevention in peer-to-peer (P2P) transaction networks using Neural Nets: A Node Embedding approach

Since the emergence of digital banking and online shopping, it has never been easier for companies, banks, and customers to trade goods and transfer money. In this new era of mobile eCommerce, peer-to-peer (P2P) transaction platforms (e.g. PayPal, Venmo, Prosper) have emerged as an attractive alternative that bypasses traditional intermediaries, allowing them to offer customers very low (sometimes even zero) transaction fees.

SQL, Python, & R: All in One Platform

Mode Studio connects a SQL editor, Python and R notebooks, and a visualization builder in one platform. Sign up now for access.

Machine Reading Comprehension Part II: Learning to Ask & Answer

In the last post of this series, I have introduced the task of machine reading comprehension (MRC) and presented a simple neural architecture for tackling such task. In fact, this architecture can be found in many state-of-the-art MRC models, e.g. BiDAF, S-Net, R-Net, match-LSTM, ReasonNet, Document Reader, Reinforced Mnemonic Reader, FusionNet and QANet. I also pointed out an assumption made in this architecture: the answer is always a continuous span of a given passage. Under this assumption, an answer can be simplified as a pair of two integers, representing its start and end position in the passage respectively. This greatly reduces the solution space and simplifies the training, yielding promising score on SQuAD dataset. Unfortunately, beyond artificial datasets this assumption is often not true in practice.

Teach Machine to Comprehend Text and Answer Question with Tensorflow – Part I

Reading comprehension is one of the fundamental skills for human, which one must learn systematically since the elementary school. Do you still remember how the worksheet of your reading class looks like? It usually consists of an article and few questions about its content. To answer these questions, you need to first gather information by collecting answer-related sentences from the article. Sometimes you can directly copy those original sentences from the article as the final answer. This is a trivial ‘gut question’, and every student likes it. Unfortunately (for students), quite often you need to summarize, assert, infer, refine those evidences and finally write the answer in your own words. Drawing inferences about the writer’s intention is especially hard. Back in high school, I was often confused by the questions like ‘why did the poet write a sentence like this?’ How could I possibly know the original intention of an ancient poet a thousand years ago? How could my teacher know it for sure?

RStudio 1.2 Preview: C/C++ and Rcpp

We’ve now discussed the improved support in RStudio v1.2 for SQL, D3, and Python. Today, we’ll talk about IDE support for C/C++ and Rcpp.
The IDE has had excellent support for C/C++ since RStudio v0.99, including:
• Tight integration with the Rcpp package
• Code completion
• Source diagnostics as you edit
• Code snippets
• Auto-indentation
• Navigable list of compilation errors
• Code navigation (go to definition)

Imagine you and your partner are trying to find the perfect restaurant for a pleasant dinner. Knowing this process can lead to hours of arguments, you seek out the oracle of modern life: online reviews. Doing so, you find your choice, Carlo’s Restaurant is recommended by a higher percentage of both men and women than your partner’s selection, Sophia’s Restaurant. However, just as you are about to declare victory, your partner, using the same data, triumphantly states that since Sophia’s is recommended by a higher percentage of all users, it is the clear winner. What is going on? Who’s lying here? Has the review site got the calculations wrong? In fact, both you and your partner are right and you have unknowingly entered the world of Simpson’s Paradox, where a restaurant can be both better and worse than its competitor, exercise can lower and increase the risk of disease, and the same dataset can be used to prove two opposing arguments. Instead of going out to dinner, perhaps you and your partner should spend the evening discussing this fascinating statistical phenomenon.

Discretification. It’s not a word, but knowing its value can revolutionize your decision making.

The entire purpose of analytics is to provide us with information that allows us to make the best decision possible. Unfortunately, there are times when the data does not make the path forward a clear one. These difficult decisions can take time to sort out between stakeholders. For example, let’s say you are in charge of the marketing department of an organization, and you are testing out a specific new campaign. You select a subset of locations to present the advertisement, then track the results. At the end of the trial, you discover that for every dollar spent on the marketing campaign, $4.50 was generated in revenue for your company. Is the campaign worth scaling up? These types of decisions arise often in the course of using analytics. Very rarely is there an obvious answer as to how to move forward. It’s clear that the trial campaign worked, but did it work well enough to justify a more significant investment? On one hand, it made money! $4.50 per dollar is a solid return, and not likely to be viewed as a terrible decision come your performance review (assuming that when you scale it up it shows similar results). On the other hand, there is not only the risk that it won’t perform as well with a larger sample, you also need to consider how your marketing budget can be better spent. What if you have existing marketing strategies that generate $6 in revenue per dollar invested? Now the $4.50 return doesn’t look so good. Difficulty in making a decision isn’t just emotionally taxing; it costs your company time and money. Additionally, there are political complications to going against the preferences of one decision maker to satisfy another. How we can avoid the lengthy process of making hard decisions? Discretification! Don’t bother googling it, for you’ll come up empty. However don’t let its absence from the dictionary prevent you from becoming more decisive. Before we touch on what ‘discretification’ means, let’s revisit two basic types of data.

Beyond Word Embeddings Part 1 – An Overview of Neural NLP Milestones

Since the advent of word2vec, neural word embeddings have become a goto method for encapsulating distributional semantics in NLP applications. This series will review the strengths and weaknesses of using pre-trained word embeddings and demonstrate how to incorporate more complex semantic representation schemes such as Semantic Role Labeling, Abstract Meaning Representation and Semantic Dependency Parsing into your applications.

Understanding DevSecOps in Data Science

Some software development methods have redefined how products reach the market and how quickly that happens. DevOps is one example. It combines software development with information technology, focusing on high velocity and a shorter development life cycle than past methods. Moreover, those adhering to the DevOps methods deliver frequent features and fixes. DevSecOps is a methodology similar to DevOps in that both of them are within an agile framework that breaks projects into smaller chunks. However, DevSecOps incorporates security into every step of the development process. It requires ongoing communications between the development and security department – departments that didn’t typically communicate until the later stages. It’s also useful to note that DevOps and DevSecOps teams both often integrate automation into their practices. In that way, and others, these methodologies are much more similar than different. DevSecOps merely prioritizes security, a factor that developers may have limited knowledge about and not be overly concerned about unless the problem results in a bug.

Kotlin: The Next Frontier in Modern (Meta)Programming

Today, I’m writing a followup for my KotlinConf talk on Kotlin, TornadoFX, and metaprogramming. In the talk, I draw generalized definitions of crosscutting and explore common forms of metaprogramming. We explored TornadoFX features written in Kotlin to examine the functional characteristics of the language. In retrospect, I wish I had focused on talking about my applied research a bit more, so I’m here to talk about where I’ve been and where I’m going!

Aspect-Oriented Programming

In computing, aspect-oriented programming (AOP) is a programming paradigm that aims to increase modularity by allowing the separation of cross-cutting concerns. It does so by adding additional behavior to existing code (an advice) without modifying the code itself, instead separately specifying which code is modified via a ‘pointcut’ specification, such as ‘log all function calls when the function’s name begins with ‘set”. This allows behaviors that are not central to the business logic (such as logging) to be added to a program without cluttering the code, core to the functionality. AOP forms a basis for aspect-oriented software development. AOP includes programming methods and tools that support the modularization of concerns at the level of the source code, while ‘aspect-oriented software development’ refers to a whole engineering discipline.