MIT researchers show how to detect and address AI bias without loss in accuracy

Bias in AI leads to poor search results or user experience for a predictive model deployed in social media, but it can seriously and negatively affect human lives when AI is used for things like health care, autonomous vehicles, criminal justice, or the predictive policing tactics used by law enforcement. In the age of AI being deployed virtually everywhere, this could lead to ongoing systematic discrimination. That’s why MIT Computer Science AI Lab (CSAIL) researchers have created a method to reduce bias in AI without reducing the accuracy of predictive results.


AI Privacy and Ethical Compliance Toolkit

New applications of machine learning are raising ethical concerns about a host of issues, including bias, transparency, and privacy. In this tutorial, we will demonstrate tools and capabilities that can help data scientists address these concerns. The tools help bridge the gap between ethicists and regulators on one side, and machine learning practitioners on the other side. Namely, we will present 3 tools:
(1) Privacy-Preserving Face Landmarks Detection: We will show how to design for privacy preservation in a face detection framework. This design approach enables the extraction of facial features and does not compromise the user´s identity.
(2) Vehicle Data Assurance (VEDA): Autonomous Vehicles are characterized by the collection of huge amount of sensor data used to train ML models. We provide a solution, VEDA, to ensure compliance with strict privacy regulations regarding the use and handling of this data, and to increase trust in the collected data and its management lifecycle.
(3) Bias Detection and Remediation: It has been shown that computer vision algorithms can be biased to certain age, race or gender based on the training datasets. We will show by example how to detect these biases and how tools can be used to rebalance a biased dataset.


AzureRMR: an R interface to Azure Resource Manager

In a previous article I announced AzureR, a new family of packages for working with Azure from R. This article goes into more detail on how you can use AzureRMR, the base package of the AzureR family, to manage resources with Azure Resource Manager.


Word Embeddings and Document Vectors: Part 1. Similarity

Classification hinges on the notion of similarity. This similarity can be as simple as a categorical feature value such as the color or shape of the objects we are classifying, or a more complex function of all categorical and/or continuous feature values that these objects possess. Documents can be classified as well using their quantifiable attributes such as size, file extension etc… Easy! But unfortunately it is the meaning/import of the text contained in the document is what we are usually interested in for classification.


How To Teach A Computer To See With Convolutional Neural Networks

The field of Computer Vision has made huge progress in the last few years. Convolutional neural networks have greatly boosted the accuracy of image recognition models and have a ton of applications in the real world. In this article I’ll cover how they work, some real-world applications, and how to code one with Python and Keras.


How to build an image duplicate finder

When you download images from the internet you usually find noisy data. Furthermore, popular pictures are all around the place. It is tedious to see them one by one and try to find duplicates to clean you dataset. With this problem in mind, I built a duplicate finder that finds the duplicates for you so you only need to choose if you want to delete them. You can find the code in the fastai library. In this post I will explain how I built this tool.


An Introduction to Bayesian Inference a mathematical venture – part 2

In the last article of this venture, we explored the Bayesian Linear regression model. In this article, we’ll explore the mathematics involved behind Bayesian Naive Bayes for the task of classification. Before we start I would like iterate over the most important assumption taken during the derivation of the solution of Naive Bayes is that the features are independent of each other. First, we’ll set up the problem and then go over the mathematics step by step.


Breast Cancer Classification Using Support Vector Machine (SVM)

Breast cancer is the most common cancer amongst women in the world. It accounts for 25% of all cancer cases, and affected over 2.1 Million people in 2015 alone. It starts when cells in the breast begin to grow out of control. These cells usually form tumors that can be seen via X-ray or felt as lumps in the breast area. Early diagnosis significantly increases the chances of survival. The key challenges against it’s detection is how to classify tumors into malignant (cancerous) or benign(non cancerous). A tumor is considered malignant if the cells can grow into surrounding tissues or spread to distant areas of the body. A benign tumor does not invade nearby tissue nor spread to other parts of the body the way cancerous tumors can. But benign tumors can be serious if they press on vital structures such as blood vessels or nerves. Machine Learning technique can dramatically improve the level of diagnosis in breast cancer. Research shows that experienced physicians can detect cancer by 79% accuracy, while a 91 %( sometimes up to 97%) accuracy can be achieved using Machine Learning techniques.


An Introduction to Bayesian Inference a mathematical venture

Hi, This is my first blog article so bear a little with me if I have made any mistakes or missed out on any conventions in writing an article please let me know in the comments. (Polite feedbacks about the content are always invited 🙂 ). In the world of statistics, there are two prominent approaches employed by the statisticians and data scientists to solve the real-world predictive problem which is known as the frequentist and the Bayesian approach.


When to Trust Robots with Decisions, and When Not To

Smarter and more adaptive machines are rapidly becoming as much a part of our lives as the internet, and more of our decisions are being handed over to intelligent algorithms that learn from ever-increasing volumes and varieties of data. As these ‘robots’ become a bigger part of our lives, we don’t have any framework for evaluating which decisions we should be comfortable delegating to algorithms and which ones humans should retain. That’s surprising, given the high stakes involved. I propose a risk-oriented framework for deciding when and how to allocate decision problems between humans and machine-based decision makers. I’ve developed this framework based on the experiences that my collaborators and I have had implementing prediction systems over the last 25 years in domains like finance, healthcare, education, and sports. The framework differentiates problems along two independent dimensions: predictability and cost per error.


Introduction to Machine Learning in Python

In this tutorial, you will be introduced to the world of Machine Learning (ML) with Python. To understand ML practically, you will be using a well-known machine learning algorithm called K-Nearest Neighbor (KNN) with Python.


What is the simplest way to prevent Overfitting?

The dropout technique in my understanding is tricky but practical. The term ‘dropout’ is used for a technique which drops out some nodes of the network. Dropping out can be seen as temporarily deactivating or ignoring neurons of the network. This technique is applied in the training phase to reduce overfitting.


Linear Regression – Understanding the Theory

I decided to transition from web development to data science. To do so, I opted to learn data science the boring way; reading books, taking notes, and practicing on my own. Like I said, it is not the sexiest way to learn something new, but it enforces your discipline and you build a good work methodology, which are key to being a good data scientist. In this post, you will learn what a linear regression is and why it is useful to know. I will show the math behind it and how to assess the accuracy of the model in different cases. Finally, I will briefly touch upon multiple linear regression, how to add an interaction effect. Assuming you’re not scared of equations, and you already know a bit about data science, let’s get to it!


K as in Keras…a DeepLearning Classifier!

Hope you have an idea what this post is all about, yes you are right! It’s about building a simple classification model using Keras API. As we all know Keras is one of the simple,user-friendly and most popular Deep learning library at the moment and it runs on top of TensorFlow/Theano. Complete documentation on Keras is here. Kears is popular because of the below guiding principles.


The three levels of Natural Language Processing for your business

In the past years, the tech world has seen a surge of Natural Language Processing (NLP) applications in various areas, including adtech, publishing, customer service and market intelligence. According to Gartner’s hype cycle, NLP has reached the peak of inflated expectations in 2018. Many businesses see it as a ‘go-to’ solution to generate value from the 80% of business-relevant data that comes in unstructured form. To put it simply – NLP is wildly adopted with wildly variable success. In this article, I share some practical advice for the smooth integration of NLP into your tech stack. The advice summarizes the experience I have accumulated on my journey with NLP – through academia, a number of industry projects, and my own company which develops NLP-driven applications for international market intelligence. The article does not provide technical details but focusses on organisational factors including hiring, communication and expectation management.


8 things I’ve learnt being a data guy

At a Python meet-up last week, my most dreaded question came up: ‘So what do you do?’ The feeling of dread comes from the anxiety of having to explain how I have the responsibility of three different job titles. So what do I say? I am a data developer, a data scientist and a data engineer. Quite a mouthful, especially when I remark that I’m also a rocket engineer. Today will mark my 268th day at Lexer and what a ride it’s been! As I distill some of my technical and business learnings, this post is intended to give the outsider a sneak peak into the fascinating world of startups.
Advertisements