Conduct and Interpret a Canonical Correlation

The Canonical Correlation is a multivariate analysis of correlation. Canonical is the statistical term for analyzing latent variables (which are not directly observed) that represent multiple variables (which are directly observed). The term can also be found in canonical regression analysis and in multivariate discriminant analysis. Canonical Correlation analysis is the analysis of multiple-X multiple-Y correlation. The Canonical Correlation Coefficient measures the strength of association between two Canonical Variates. A Canonical Variate is the weighted sum of the variables in the analysis. The canonical variate is denoted CV. Similarly to the discussions on why to use factor analysis instead of creating unweighted indices as independent variables in regression analysis, canonical correlation analysis is preferable in analyzing the strength of association between two constructs. This is such because it creates an internal structure, for example, a different importance of single item scores that make up the overall score (as found in satisfaction measurements and aptitude testing).

Bring Order to Chaos: A Graph-Based Journey from Textual Data to Wisdom

Data is everywhere. News, blog posts, emails, videos and chats are just a few examples of the multiple streams of data we encounter on a daily basis. The majority of these streams contain textual data – written language – containing countless facts, observations, perspectives and insights that could make or break your business.

Can We Make Artificial Intelligence Accountable?

Lack of explainability of decisions made by Artificial Intelligence (AI) programs is a major problem. This inability to understand how AI does what it does also stops it from being deployed in areas such as law, healthcare and within enterprises that handle sensitive customer data. Understanding how data is handled, and how AI has reached a certain decision, is even more important in the context of recent data protection regulation, especially GDPR, that heavily penalizes companies who cannot provide an explanation and record as to how a decision has been reached (whether by a human or computer). IBM may have made a major step towards tackling this issue, announcing today a software service to detect bias in AI models and track the decision-making process. This service should allow companies to track AI decisions as they occur, and monitor any ‘biased’ actions to ensure that AI processes are in line with regulation and overall business objectives. If this software can truly explain the decisions taken by even the most complex deep learning algorithms, this development could provide the peace of mind that many companies need before unleashing AI on their data.

7 AI tools mobile developers need to know

Advancements in artificial intelligence ( AI) and machine learning has enabled the evolution of mobile applications that we see today. With AI, apps are now capable of recognizing speech, images, and gestures, and translate voices with extraordinary success rates. With a number of apps hitting the app stores, it is crucial that they stand apart from competitors by meeting the rising standards of consumers. To stay relevant it is important that mobile developers keep up with these advancements in artificial intelligence. As AI and machine learning become increasingly popular, there is a growing selection of tools and software available for developers to build their apps with. These cloud-based and device-based artificial intelligence tools provide developers a way to power their apps with unique features. In this article, we will look at some of these tools and how app developers are using them in their apps.

Facebook enlists AI to tweak web server performance

Facebook has to run lots of live tests to figure out which configurations are best for its HTTP servers, and it sped up the search for optimal settings by employing a machine learning approach called Bayesian optimization to narrow the list of plausible solutions.

The exploitation, injustice, and waste powering our AI

Alexa, what time is it?’ It’s a simple question that any person with a watch can answer with minimal effort. But when you ask an Amazon Echo the same question, a vast system powered by natural resources and human labor is activated to drum up the answer. As many of us reckon with Silicon Valley’s impact on the world and consider how it has upended life, work, and even democracy, we also must consider the infrastructure-and the tangible harm it can do-that usually remains hidden beneath these seemingly simple user experiences. It’s an aspect of AI that is nearly impossible to comprehend, let alone visualize, but a new map created by the co-founder of the AI Now Institute at NYU and AI researcher Kate Crawford and data visualization specialist Vladan Joler attempts this dizzying task anyway.


Solid empowers users and organizations to separate their data from the applications that use it. It allows people to look at the same data with different apps at the same time. It opens brand new avenues for creativity, problem-solving, and commerce. Learn how it came to be.

Open Source FPGA Dev Guide

We like the ICE40 FPGA from Lattice for two reasons: there are cheap development boards like the Icestick available for it and there are open source tools. We’ve based several tutorials on the Icestorm toolchain and it works quite well. However, the open source tools don’t always expose everything that you see from commercial tools. You sometimes have to dig a little to find the right tool or option. Sometimes that’s a good thing. I don’t need to learn yet another fancy IDE and we have plenty of good simulation tools, so why reinvent the wheel? However, if you are only using the basic workflow of Yosys, Arachne-pnr, icepack, and iceprog, you could be missing out on some of the most interesting features. Let’s take a deeper look.

RStudio 1.2 Preview: SQL Integration

The RStudio 1.2 Preview Release, available today, dramatically improves support and interoperability with many new programming languages and platforms, including SQL, D3, Python, Stan, and C++. Over the next few weeks on the blog, we’re going to be taking a look at improvements for each of these in turn. Today, we’re looking at SQL, and as a motivating example, we’re going to connect to a sample Chinook database to get a list of album titles.

Getting started with google laboratory for running deep learning applications

We all know that deep learning algorithms improve the accuracy of AI applications to great extent. But this accuracy comes with requiring heavy computational processing units such as GPU for developing deep learning models. Many of the machine learning developers cannot afford GPU as they are very costly and find this as a roadblock for learning and developing Deep learning applications. To help the AI, machine learning developers Google has released a free cloud based service Google Colaboratory – Jupyter notebook environment with free GPU processing capabilities with no strings attached for using this service. It is a ready to use service which requires no set at all. Any AI developers can use this free service to develop deep learning applications using popular AI libraries like Tensorflow, Pytorch, Keras, etc.


Pypeline is a simple yet powerful python library for creating concurrent data pipelines.
• Pypeline was designed to solve simple medium data tasks that require concurrency and parallelism but where using frameworks like Spark or Dask feel exaggerated or unnatural.
• Pypeline exposes an easy to use, familiar, functional API.
• Pypeline enables you to build pipelines using Processes, Threads and asyncio.Tasks via the exact same API.
• Pypeline allows you to have control over the memory and cpu resources used at each stage of your pipeline.

New Course: Factor Analysis in R

The world is full of unobservable variables that can’t be directly measured. You might be interested in a construct such as math ability, personality traits, or workplace climate. When investigating constructs like these, it’s critically important to have a model that matches your theories and data. This course will help you understand dimensionality and show you how to conduct exploratory and confirmatory factor analyses. With these statistical techniques in your toolkit, you’ll be able to develop, refine, and share your measures. These analyses are foundational for diverse fields including psychology, education, political science, economics, and linguistics.

Society of Machines

Society – Group of people living together, collaborating, competing and conflicting. Look around yourself, you might be interacting with so many people – other intelligent lives. When two intelligent life come in contact with each other, there could be either cooperation or conflicts or competition or maybe nothing. They both may just ignore each other. Each one has their own self interest, preferences and competencies.

Watermarking in Spark Structured Streaming

Handling late arriving events is a crucial functionality for Stream Processing Engines. A solution to this problem is the concept of watermarking. And it is supported by the Structured Streaming API since Spark 2.1.

Doing Data Science at the command line in Google Cloud Platform

Data engineering is about gathering and collecting data, storing it in a suitable way, doing some processing and serving it, perhaps to a data scientist. Every Data Scientist must face many data engineering and data preparation tasks before enjoying the always expected modeling stage. Also, when starting a new project you must think about the trade offs about choosing the right language and platform. Leaving apart the platform at this moment, R, Python, Julia, Matlab, Octave, etc. are some of those languages used in data science, specially R and Python with a big grow in the last years and a great community support. However, now I want to explain how you can face many initial tasks by using your operating system command line and how the shell must be part of your stack. We will discover how command line programs like curl, sed, jq or csvkit can facilitate many of your repeating tasks by writing less code, making it portable and even faster by using low level interfaces. In this post we will use those magic tools together, piping their commands without the need for any IDE, notebooks, etc. to download, convert, transform and load into a big data warehouse.

Time series Forecasting – ARIMA models

ARIMA stands for Auto Regressive Integrated Moving Average. There are seasonal and Non-seasonal ARIMA models that can be used for forecasting.

How to Visualise Black Box Optimization problems with Gaussian Processes

Black Box optimization is common in Machine Learning as more often than not, the process or model we are trying to optimize does not have an algebraic model that can be solved analytically. Moreover, in certain use cases where the objective function is expensive to evaluate, a general approach consists in creating a simpler surrogate model of the objective function which is cheaper to evaluate and will be used instead to solve the optimization problem. In a previous article I explained one of these approaches, Bayesian Optimization, which uses Gaussian Processes to approximate the objective function. This article is a Python tutorial which shows how you can evaluate and visualise your Gaussian Process surrogates with OPTaaS, an API for Bayesian Optimization.

How to build a Simple Recommender System in Python

In this article we are going to introduce the reader to recommender systems. We will also build a simple recommender system in Python. The system is no where close to industry standards and is only meant as an introduction to recommender systems. We assume that the reader has prior experience with scientific packages such as pandas and numpy.

The art of A/B testing

Walk through the beautiful math of statistical significance. A/B testing, aka. split testing, refers to an experiment technique to determine whether a new design brings improvement, according to a chosen metric. In web analytics, the idea is to challenge an existing version of a website (A) with a new one (B), by randomly splitting traffic and comparing metrics on each of the splits.

The Architect of Artificial intelligence — Deep Learning

Artificial Intelligence has been one the most remarkable advancement of the decade. People are hushing from explicit software development to building AI based models, businesses are now relying on data driven decisions rather on someone manually defining rules. Everything is turning into Ai, ranging from Ai chat-bots to self driving cars, speech recognition to language translations, robotics to medicine. AI is not a new thing to researchers though. It has been present even before 90’s. But what’s making it so trending and open to the world??

Deep Q Learning

The journey to Reinforcement learning continues… It’s time to analyze the infamous Q-learning and see how it became the new standard in the field of AI (with a little help from neural networks). First things first. In the last post , we saw the basic concept behind Reinforcement Learning and we frame the problem using an agent, an environment, a state (S), an action(A) and a reward (R). We talked about how the whole process can be described as a Markov Decision Process and we introduced the terms Policy and Value. Lastly, we had a quick high-level overview of the base methods out there. Remember that the goal is to find the optimal policy and that policy is a mapping between state and actions. So, we need to find which action to take while we stand in a specific state in order to maximize our expected reward. One way to find the optimal policy is to make use of the value functions (a model-free technique).