What is Content Services?

In a previous blog, we looked at how Enterprise Content Management (ECM) is transitioning to become content services. It´s a change driven as much by a new way of thinking about information management as it is about new technologies. While ECM focused on the preservation and protection of content, content services build on that and use innovative technology advances to extend the focus to include information access, sharing and collaboration. Bottom line: The era of trying to implement a monolithic, enterprise-wide ECM platform for the all-inclusive control of content is over. While the objective is still necessary, the traditional approach was too unwieldy and complex to work consistently. Everything from scope to training took too long and rarely met organizational needs. And the growth of vast new pools of digital information have just made the idea even more untenable.

Building an expert system for NLP

The knowledge is extracted by asking the reader to answers a certain number of questions. Every time the answers to a question is yes, specifics tags are collected and stored. Every time the answer to a question is no, specific tags are also collected and stored. Some question ask the user to select in a list. In this case, all the elements selected in the list are stored. The reader does not have to answer all the question one by one. Instead of this, a first set of questions are asked. Based on the answers to the first set c question, a second set of questions is asked. Based on the answers to the first and second question a third set of question is asked. The process continue until no more question is triggered by previous answers. Filtering questions are used to ask only relevant questions. For example a filtering question is: Does the text talk about financing issues? If the answer is no, then all questions on financing issues are automatically skipped. The one who develops c set of questions with their tags and triggering conditions has developed a guideline for efficiently extracting useful information for a given domain from pieces of text. People of different expertise can develop question sequences independently and these question sequences will be combined in one. We end up with an expert system combining expertise from different people on which information we should be looking for in a piece of text.

New Course: Fundamentals of Bayesian Data Analysis in R

Bayesian data analysis is an approach to statistical modeling and machine learning that is becoming more and more popular. It provides a uniform framework to build problem specific models that can be used for both statistical inference and for prediction. This course will introduce you to Bayesian data analysis: What it is, how it works, and why it is a useful tool to have in your data science toolbox.

Principal Component Analysis in R

Principal Component Analysis (PCA) is a useful technique for exploratory data analysis, allowing you to better visualize the variation present in a dataset with many variables. It is particularly helpful in the case of ‘wide’ datasets, where you have many variables for each sample. In this tutorial, you’ll discover PCA in R.

Breakfast with AI – Chapter-1

In this article, we will be going through an amazing research paper in the field of Natural Language Processing titled ‘Sequence to Sequence Learning with Neural Networks’ written by Ilya Sutskever, Oriol Vinyals and Quoc V. Le from Google. It was published in the year 2014 and it has been the backbone of developments in the field of Machine Translation & Text Summarization in today´s era.

Breakfast with AI – Chapter-2

In this article, we will be going through an important research paper in the field of Natural Language Processing titled ‘GloVe: Global Vectors for Word Representation’ authored by Jeffrey Pennington, Richard Socher, Christopher D. Manning from Stanford University. This paper was published in the year 2014 and has helped the NLP community in building intricate models that capture the nuances of English Language.

Why Automated Feature Engineering Will Change the Way You Do Machine Learning

There are few certainties in data science – libraries, tools, and algorithms constantly change as better methods are developed. However, one trend that is not going away is the move towards increased levels of automation. Recent years have seen progress in automating model selection and hyperparameter tuning, but the most important aspect of the machine learning pipeline, feature engineering, has largely been neglected. The most capable entry in this critical field is Featuretools, an open-source Python library. In this article, we´ll use this library to see how automated feature engineering will change the way you do machine learning for the better.

The Machine Learning Behind Android Smart Linkify

Earlier this week we launched Android 9 Pie, the latest release of Android that uses machine learning to make your phone simpler to use. One of the features in Android 9 is Smart Linkify, a new API that adds clickable links when certain types of entities are detected in text. This is useful when, for example, you receive an address from a friend in a messaging app and want to look it up on a map. With a Smart Linkify-annotated text, it´s a lot easier!

Top 10 Roles in AI and Data Science

#0 Data Engineer
#1 Decision-Maker
#2 Analyst
#3 Expert Analyst
#4 Statistician
#5 Applied Machine Learning Engineer
#6 Data Scientist
#7 Analytics Manager / Data Science Leader
#8 Qualitative Expert / Social Scientist
#9 Researcher
#10+ Additional personnel
• Domain expert
• Ethicist
• Software engineer
• Reliability engineer
• UX designer
• Interactive visualizer / graphic designer
• Data collection specialist
• Data product manager
• Project / program manager

Reinforcement Learning: The Business Use Case, Part 1

The whirl of reinforcement learning started with the advent of AlphaGo by DeepMind, the AI system built to play the game Go. Since then, various companies have invested a great deal of time, energy, and research, and today reinforcement learning is one of the hot topics within Deep Learning. That said, most businesses are struggling to find use cases for reinforcement learning or ways to encompass it within their business logic. That shouldn´t surprise us. So far, it´s been studied only in risk-free, observed, environments that are easy to simulate, which means that industries like finance, health, insurance, tech-consultancies are reluctant to risk their own money to explore its applications. What´s more, the aspect of ‘risk factoring’ within reinforcement learning puts a high strain on systems. Andrew Ng, the co-chair and cofounder of Coursera has said that ‘reinforcement learning is a type of machine learning whose hunger for data is even greater than supervised learning. It is really difficult to get enough data for reinforcement learning algorithms. There´s more work to be done to translate this to businesses and practice.’

Beyond Basic R – Plotting with ggplot2 and Multiple Plots in One Figure

R can create almost any plot imaginable and as with most things in R if you don´t know where to start, try Google. The Introduction to R curriculum summarizes some of the most used plots, but cannot begin to expose people to the breadth of plot options that exist.

Temporal Causal Models

Temporal causal modeling attempts to discover key causal relationships in time series data. In temporal causal modeling, you specify a set of target series and a set of candidate inputs to those targets. The procedure then builds an autoregressive time series model for each target and includes only those inputs that have a causal relationship with the target. This approach differs from traditional time series modeling where you must explicitly specify the predictors for a target series. Since temporal causal modeling typically involves building models for multiple related time series, the result is referred to as a model system. In the context of temporal causal modeling, the term causal refers to Granger causality. A time series X is said to ‘Granger cause’ another time series Y if regressing for Y in terms of past values of both X and Y results in a better model for Y than regressing only on past values of Y.

From ‘What If?’ To ‘What Next?’ : Causal Inference and Machine Learning for Intelligent Decision Making

10 Things to Know About Causal Inference

The philosopher David Lewis described causation as ‘something that makes a difference, and the difference it makes must be a difference from what would have happened without it.’ This is more or less the interpretation given to causality by most experimentalists. It is a simple definition but it has many implications that can trip you up. Here are ten ideas implied by this notion of causality that matter for research strategies.
1. A causal claim is a statement about what didn´t happen.
2. There is a fundamental problem of causal inference.
3. You can estimate average causal effects even if you cannot observe any individual causal effects.
4. If you know that, on average, A causes B and B causes C, this does not mean that you know that A causes C.
5. The counterfactual model is all about contribution, not attribution.
6. X can cause Y even if there is no ‘causal path’ connecting X and Y.
7. Correlation is not Causation
8. X can cause Y even if X is not a necessary condition or a sufficient condition for Y.
9. Estimating average causal effects does not require that treatment and control groups are identical.
10. There is no causation without manipulation

Benchmarking Framework for Performance-Evaluation of Causal Inference Analysis

Causal inference analysis is the estimation of the effects of actions on outcomes. In the context of healthcare data this means estimating the outcome of counter-factual treatments (i.e. including treatments that were not observed) on a patient’s outcome. Compared to classic machine learning methods, evaluation and validation of causal inference analysis is more challenging because ground truth data of counter-factual outcome can never be obtained in any real-world scenario. Here, we present a comprehensive framework for benchmarking algorithms that estimate causal effect. The framework includes unlabeled data for prediction, labeled data for validation, and code for automatic evaluation of algorithm predictions using both established and novel metrics. The data is based on real-world covariates, and the treatment assignments and outcomes are based on simulations, which provides the basis for validation. In this framework we address two questions: one of scaling, and the other of data-censoring.