Digital Twin Data Modeling with AutomationML and a Communication Methodology for Data Exchange

In the context of the Cyber Physical Systems toward the realization of a Digital Twin system for future manufacturing and product service systems we propose the use of AutomationML to model attributes related to the Digital Twin. Also, we propose that this model is very useful for data exchange between different systems that are connected with the Digital Twin. We present a case study where a industrial component was modeled and simulated to prove that our methodology works.

5 Reasons why Businesses Struggle to Adopt Deep Learning

1. Expectations bordering science fiction
2. Lack of data to satiate the giant´s appetite
3. Lots of training data, but none labelled
4. When the cost-benefit tradeoff doesn´t stack up
5. When insights are so smart that people get creeped out

Machine learning – Is the emperor wearing clothes?

Machine learning uses patterns in data to label things. Sounds magical? The core concepts are actually embarrassingly simple. I say ’embarrassingly’ because if someone made you think it’s mystical, they should be embarrassed. Here, let me fix that for you.

The Case for Just Getting Your Feet Wet with AI

Even if you’re not big enough to have a full blown data science group that shouldn’t hold you back from benefiting from AI. The market has evolved so that there are now industry and process specific vertical applications available from 3rd party AI vendors that you can implement. There are just a few things to look out for.

Prediction at Scale with scikit-learn and PySpark Pandas UDFs

scikit-learn is a wonderful tool for machine learning in Python, with great flexibility for implementing pipelines and running experiments (see, e.g., this Civis blog post series), but it’s not really designed for distributed computing on ‘big data’ (e.g., hundreds of millions of records or more). A common predictive modeling scenario, at least at Civis, is having a small or medium amount of labeled data to estimate a model from (e.g., 10,000 records), but a much larger unlabeled dataset to make predictions about. In this scenario, one might want to train a model on a laptop or single server with scikit-learn for ease of use and flexibility, but then apply that model to the large unlabeled dataset more quickly by distributing the computation with PySpark. Using PySpark for distributed prediction might also make sense if your ETL task is already implemented with (or would benefit from being implemented with) PySpark, which is wonderful for data transformations and ETL.

AlexNet Implementation Using Keras

Alex Krizhevsky, Geoffrey Hinton and Ilya Sutskever created a neural network architecture called ‘AlexNet’ and won Image Classification Challenge (ILSVRC) in 2012. They trained their network on 1.2 million high-resolution images into 1000 different classes with 60 million parameters and 650,000 neurons. The training was done on two GPUs with split layer concept because GPUs were a little bit slow at that time. The original paper is available at ImageNet Classification with Deep Convolutional Neural Networks

Turning Machine Learning Models into APIs in Python

Learn to how to create a simple API from a machine learning model in Python using Flask.

Large-Scale Study of Curiosity-Driven Learning

Reinforcement learning algorithms rely on carefully engineering environment rewards that are extrinsic to the agent. However, annotating each environment with hand-designed, dense rewards is not scalable, motivating the need for developing reward functions that are intrinsic to the agent. Curiosity is a type of intrinsic reward function which uses prediction error as reward signal. In this paper: (a) We perform the first large-scale study of purely curiosity-driven learning, i.e. without any extrinsic rewards, across 54 standard benchmark environments, including the Atari game suite. Our results show surprisingly good performance, and a high degree of alignment between the intrinsic curiosity objective and the handdesigned extrinsic rewards of many game environments. (b) We investigate the effect of using different feature spaces for computing prediction error and show that random features are sufficient for many popular RL game benchmarks, but learned features appear to generalize better (e.g. to novel game levels in Super Mario Bros.). (c) We demonstrate limitations of the prediction-based rewards in stochastic setups. Game-play videos and code are at https://pathak22.github. io/large-scale-curiosity/.

Curiosity and Procrastination in Reinforcement Learning

Reinforcement learning (RL) is one of the most actively pursued research techniques of machine learning, in which an artificial agent receives a positive reward when it does something right, and negative reward otherwise. This carrot-and-stick approach is simple and universal, and allowed DeepMind to teach the DQN algorithm to play vintage Atari games and AlphaGoZero to play the ancient game of Go. This is also how OpenAI taught its OpenAI-Five algorithm to play the modern video game Dota, and how Google taught robotic arms to grasp new objects. However, despite the successes of RL, there are many challenges to making it an effective technique. Standard RL algorithms struggle with environments where feedback to the agent is sparse – crucially, such environments are common in the real world. As an example, imagine trying to learn how to find your favorite cheese in a large maze-like supermarket. You search and search but the cheese section is nowhere to be found. If at every step you receive no ‘carrot’ and no ‘stick’, there’s no way to tell if you are headed in the right direction or not. In the absence of rewards, what is to stop you from wandering around in circles? Nothing, except perhaps your curiosity, which motivates you go into a product section that looks unfamiliar to you in pursuit of your sought-after cheese.

Named Entity Recognition and Classification with Scikit-Learn

Named Entity Recognition and Classification (NERC) is a process of recognizing information units like names, including person, organization and location names, and numeric expressions including time, date, money and percent expressions from unstructured text. The goal is to develop practical and domain-independent techniques in order to detect named entities with high accuracy automatically. Last week, we gave an introduction on Named Entity Recognition (NER) in NLTK and SpaCy. Today, we go a step further, – training machine learning models for NER using some of Scikit-Learn’s libraries. Let’s get started!

Implementing Automated Machine Learning Systems with Open Source Tools

What if you want to implement an automated machine learning pipeline of your very own, or automate particular aspects of a machine learning pipeline? Rest assured that there is no need to reinvent any wheels.

What is Benford’s Law and why is it important for data science?

We discuss a little-known gem for data analytics?-?Benford’s law, which tells us about expected distribution of significant digits in a diverse set of naturally occurring datasets and how this can be used for anomaly or fraud detection in scientific or technical publications.

Multi-Class Text Classification with SKlearn and NLTK in python – A Software Engineering Use Case

Recently, I worked on a software engineering research project. one of the main objectives of the project was to understand the focus areas of work in the development teams. when the size of a software project becomes large, managing the workflow and the development process is more challenging. therefore, it is essential for the management team and lead developers to understand the type of work that is carried out by the software developers.

SQL at Scale with Apache Spark SQL and DataFrames – Concepts, Architecture and Examples

Relational Databases are here to stay, regardless of the hype as well as the advent of newer databases often popularly termed as ‘NoSQL’ databases. The simple reason is that these databases enforce essential structure, constraints and provide a nice declarative language to query data which we love?-?SQL! However, scale has always been a problem with relational databases. Most enterprises now in the 21st century are loaded with rich data stores and repositories and want to take maximum advantage of their ‘Big Data’ for actionable insights. Relational databases might be popular but they don’t scale very well unless we invest in a proper Big Data management strategy. This involves thinking about potential data sources, data volume, constraints, schemas, ETL (extract-transform-load), access and querying patterns and much more!

Demystifying Convolutional Neural Networks

In the past decade, advances made in the field of computer vision are truly amazing and unprecedented. Machines can now recognize images and frames in videos with an accuracy (98 percent) surpassing humans(97 percent). The machinery behind this amazing feat is inspired by the functioning of human brain. Back then neurologists were conducting experiments on cats when they discovered that similar parts of an image cause similar parts of cat’s brain to become active. In other words, when a cat looks at circle, the alpha-zone is activated in its brain. When it looks at square, beta-zone is activated in the brain. Their findings concluded that animal’s brain contain a zone of neurons which react to the specific characteristics of an image i.e they perceive the environment through a layered architecture of neurons in the brain. And each and every image passes through some sort of feature extractor before entering further into the brain. Inspired by the functioning of brain, mathematicians hoarded to create a system to simulate different groups of neurons firing for different aspect of an image and communicate with each other to form a bigger picture.

Lessons Learned from Applying Deep Learning for NLP Without Big Data

As a data scientist, one of your most important skills should be choosing the right modeling techniques and algorithms for your problems. A few months ago I was trying to solve a text classification problem of classifying which news articles will be relevant for my customers.

TensorFlow-Stack-TS – Kickstart Your AI Project

Coming to work on an AI project there are lots of things to consider, and getting some proof-of-concept going with a bunch of tools is the way to go. But once this phase of the project is over you want to develop an application or even just a testbed to work on. You need an applicative environment to develop on. Recently, when I tried setting up an application stack for an AI/TensoFlow based project, I found that the MEAN-based templates I used in the past are way behind and do not fully integrate the latest technologies. So I decided to put together the libraries I found working for me, as a new full-stack template.

A Tutorial on Fairness in Machine Learning

The content is based on: the tutorial on fairness given by Solon Bacrocas and Moritz Hardt at NIPS2017, day1 and day4 from CS 294: Fairness in Machine Learning taught by Moritz Hardt at UC Berkeley and my own understanding of fairness literatures. I highly encourage interested readers to check out the linked NIPS tutorial and the course website.

Getting started with Python environments (using Conda)

Whether you want one or have no idea what it is, you’ll have to deal with environments in Python eventually. If you’re a newbie to Python like myself or a newbie to generally setting up your workspace for any programming language, then you may have dealt with the pain of setting up your environment for the first time. Between the lack of knowledge of what you’re doing and why it’s necessary, the many different online guides that may not have the exact same instructions and numerous Stack Overflow posts that help with very specific situations, I found it confusing to understand what exactly environments were, when they were needed, and how to set up my own. This guide is to help build an intuition for environments and provide some examples of dealing with them, particularly using conda, the package manager for Anaconda.

An Introduction to Deep Learning

During recent years, deep learning has become somewhat of a buzzword in the tech community. We always seem to hear about it in news regarding AI, and yet most people don’t actually know what it is! In this article, I’ll be demystifying the buzzword that is deep learning, and providing an intuition of how it works.

The Periodic Table of Data Structures

We describe the vision of being able to reason about the design space of data structures. We break this down into two questions: 1) Can we know all data structures that is possible to design? 2) Can we compute the performance of arbitrary designs on a given hardware and workload without having to im- plement the design or even access the target hardware? If those challenges are possible, then an array of exciting opportunities would become feasible such as interactive what-if design to improve the pro- ductivity of data systems researchers and engineers, and informed decision making in industrial settings with regards to critical hardware/workload/data structure design issues. Then, even fully automated dis- covery of new data structure designs becomes possible. Furthermore, the structure of the design space itself provides numerous insights and opportunities such as the existence of design continuums that can lead to data systems with deep adaptivity, and a new understanding of the possible performance trade- offs. Given the universal presence of data structures at the very core of any data-driven field across all sciences and industries, reasoning about their design can have significant benefits, making it more feasible (easier, faster and cheaper) to adopt tailored state-of-the-art storage solutions. And this effect is going to become increasingly more critical as data keeps growing, hardware keeps changing and more applications/fields realize the transformative power and potential of data analytics. This paper presents this vision and surveys first steps that demonstrate its feasibility.

Drawing beautiful maps programmatically with R, sf and ggplot2 – Part 3: Layouts

After the presentation of the basic map concepts, and the flexible approach in layer implemented in ggplot2, this part illustrates how to achieve complex layouts, for instance with map insets, or several maps combined. Depending on the visual information that needs to be displayed, maps and their corresponding data might need to be arranged to create easy to read graphical representations. This tutorial will provide different approaches to arranges maps in the plot, in order to make the information portrayed more aesthetically appealing, and most importantly, convey the information better.

Getting started Stamen maps with ggmap

Spatial visualizations really come to life when you have a real map as a background. In R, ggmap is the package that you’ll want to use to get these maps. In what follows, we’ll demonstrate how to use ggmap with the Sacramento dataset in the caret package. For each city, we are going to keep track of the median price of houses in that city, the number of transactions in that city, as well as the mean latitude and longitude of the transactions.

Role of Underscore(_) in Python

In this tutorial, your going to learn about the uses of underscore(_) in python. Many Python users don’t know about the functionalities of underscore(_).

Notes on Feature Preprocessing: The What, the Why, and the How

This article covers a few important points related to the preprocessing of numeric data, focusing on the scaling of feature values, and the broad question of dealing with outliers.

Computational Learning Theory (CLT) is a branch of statistics/machine learning/artificial intelligence (in general) which deals with fundamental bounds and theorems about analyzing our (man and machine) ability to learn rules and patterns from data. It goes beyond the realm of specific algorithms that we regularly hear about?-?regression, decision trees, support vector machines or deep neural network?-?and tries to answer fundamental questions about the limits and possibilities of the whole enterprise of machine learning. Sounds exciting? Please read on for a quick tour of this realm…

Uber Introduces PyML: Their Secret Weapon for Rapid Machine Learning Development

Uber has been one of the most active companies trying to accelerate the implementation of real world machine learning solutions. Just this year, Uber has introduced technologies like Michelangelo, and Horovod that focus on key building blocks of machine learning solutions in the real world. This week, Uber introduced another piece of its machine learning stack, this time aiming to short the cycle from experimentation to product. PyML, is a library to enable the rapid development of Python applications in a way that is compatible with their production runtime.

Deep Learning Approaches to understand Human Reasoning

For a doctor who is using Deep Learning to find whether the patient has multiple sclerosis, it is not at all good to get a yes or no answer from the model. For a safety critical application such as autonomous cars , it is not enough to predict the crash. There is an urgent need to make machine learning models reason its assertions and articulate it to humans. Visual Question Answering work by Devi Parikh, Druv Batra and work on understanding visual relationships by Fei-Fei Li team are few leads to achieve it. But there is long way to go in terms of learning reasoning structures. So in this blog, we are going to talk about, How to integrate Reasoning into CNNs and Knowledge Graphs. For a long time, reasoning has been understood to be a bunch of deductions and inductions. The study of abstract symbolic logic allowed canonicalization of these concepts as described by John Venn in 1881. It’s like those IQ Tests that we take. A implies B, B implies C, so A implies C. etc. Think of it as a bunch of logical equations.

RConsortium – Building an R Certification

No doubt the demand for R (and for Data Science) has been increasing over the past few years – Data Scientist became the sexiest job of the century according to Harvard Business Review, and Glassdoor listed it as the ‘Best Job in America’ in 2016, 2017 and 2018. No need to argue on that point: data science is now one of the top skills to have on the market. With the growth in interest came a growth of demands both from employers and people looking to improve their skills. To support this, several channels of training have been made available. But even with that, it’s still complex today for the three parties involved in the field:
• students (in a broad sense), who do not know where to start, and if self-taught don’t know how to ‘officially’ validate their skills
• teachers and trainers, who have to face a constantly innovating field, and might not be able to identify the skills needed on the market
• recruiters, who might find it hard to identify key skills, diplomas, and certifications
The RConsortium came to the conclusion that there was no ‘official R certification’ to assess one’s skills when it comes to R. Hence this WG, working to establish some ground rules for this certification.