Article: The Data Science Diversity Gap: Where Are the Women?

Data science and related fields, including artificial intelligence, business intelligence, and big data, are seeing tremendous growth. Data is important in numerous industries, from healthcare to transportation, making data scientists a must-have role in most companies. As more technology emerges, even more data can be collected, which only increases the need for experts.

Paper: Asymptotically Unambitious Artificial General Intelligence

General intelligence, the ability to solve arbitrary solvable problems, is supposed by many to be artificially constructible. Narrow intelligence, the ability to solve a given particularly difficult problem, has seen impressive recent development. Notable examples include self-driving cars, Go engines, image classifiers, and translators. Artificial General Intelligence (AGI) presents dangers that narrow intelligence does not: if something smarter than us across every domain were indifferent to our concerns, it would be an existential threat to humanity, just as we threaten many species despite no ill will. Even the theory of how to maintain the alignment of an AGI’s goals with our own has proven highly elusive. We present the first algorithm we are aware of for asymptotically unambitious AGI, where ‘unambitiousness’ includes not seeking arbitrary power. Thus, we identify an exception to the Instrumental Convergence Thesis, which is roughly that by default, an AGI would seek power, including over us.

Paper: Food for thought: Ethical considerations of user trust in computer vision

In computer vision research, especially when novel applications of tools are developed, ethical implications around user perceptions of trust in the underlying technology should be considered and supported. Here, we describe an example of the incorporation of such considerations within the long-term care sector for tracking resident food and fluid intake. We highlight our recent user study conducted to develop a Goldilocks quality horizontal prototype designed to support trust cues in which perceived trust in our horizontal prototype was higher than the existing system in place. We discuss the importance and need for user engagement as part of ongoing computer vision-driven technology development and describe several important factors related to trust that are relevant to developing decision-making tools.

Paper: Defending Against Neural Fake News

Recent progress in natural language generation has raised dual-use concerns. While applications like summarization and translation are positive, the underlying technology also might enable adversaries to generate neural fake news: targeted propaganda that closely mimics the style of real news. Modern computer security relies on careful threat modeling: identifying potential threats and vulnerabilities from an adversary’s point of view, and exploring potential mitigations to these threats. Likewise, developing robust defenses against neural fake news requires us first to carefully investigate and characterize the risks of these models. We thus present a model for controllable text generation called Grover. Given a headline like `Link Found Between Vaccines and Autism,’ Grover can generate the rest of the article; humans find these generations to be more trustworthy than human-written disinformation. Developing robust verification techniques against generators like Grover is critical. We find that best current discriminators can classify neural fake news from real, human-written, news with 73% accuracy, assuming access to a moderate level of training data. Counterintuitively, the best defense against Grover turns out to be Grover itself, with 92% accuracy, demonstrating the importance of public release of strong generators. We investigate these results further, showing that exposure bias — and sampling strategies that alleviate its effects — both leave artifacts that similar discriminators can pick up on. We conclude by discussing ethical issues regarding the technology, and plan to release Grover publicly, helping pave the way for better detection of neural fake news.

Paper: FairSearch: A Tool For Fairness in Ranked Search Results

Ranked search results and recommendations have become the main mechanism by which we find content, products, places, and people online. With hiring, selecting, purchasing, and dating being increasingly mediated by algorithms, rankings may determine career and business opportunities, educational placement, access to benefits, and even social and reproductive success. It is therefore of societal and ethical importance to ask whether search results can demote, marginalize, or exclude individuals of unprivileged groups or promote products with undesired features. In this paper we present FairSearch, the first fair open source search API to provide fairness notions in ranked search results. We implement two algorithms from the fair ranking literature, namely FA*IR (Zehlike et al., 2017) and DELTR (Zehlike and Castillo, 2018) and provide them as stand-alone libraries in Python and Java. Additionally we implement interfaces to Elasticsearch for both algorithms, that use the aforementioned Java libraries and are then provided as Elasticsearch plugins. Elasticsearch is a well-known search engine API based on Apache Lucene. With our plugins we enable search engine developers who wish to ensure fair search results of different styles to easily integrate DELTR and FA*IR into their existing Elasticsearch environment.

Paper: Fairness and Missing Values

The causes underlying unfair decision making are complex, being internalised in different ways by decision makers, other actors dealing with data and models, and ultimately by the individuals being affected by these decisions. One frequent manifestation of all these latent causes arises in the form of missing values: protected groups are more reluctant to give information that could be used against them, delicate information for some groups can be erased by human operators, or data acquisition may simply be less complete and systematic for minority groups. As a result, missing values and bias in data are two phenomena that are tightly coupled. However, most recent techniques, libraries and experimental results dealing with fairness in machine learning have simply ignored missing data. In this paper, we claim that fairness research should not miss the opportunity to deal properly with missing data. To support this claim, (1) we analyse the sources of missing data and bias, and we map the common causes, (2) we find that rows containing missing values are usually fairer than the rest, which should not be treated as the uncomfortable ugly data that different techniques and libraries get rid of at the first occasion, and (3) we study the trade-off between performance and fairness when the rows with missing values are used (either because the technique deals with them directly or by imputation methods). We end the paper with a series of recommended procedures about what to do with missing data when aiming for fair decision making.

Paper: Better Future through AI: Avoiding Pitfalls and Guiding AI Towards its Full Potential

Artificial Intelligence (AI) technology is rapidly changing many areas of society. While there is tremendous potential in this transition, there are several pitfalls as well. Using the history of computing and the world-wide web as a guide, in this article we identify those pitfalls and actions that lead AI development to its full potential. If done right, AI will be instrumental in achieving the goals we set for economy, society, and the world in general.

Article: AI and Cognitive Computing

Understanding the difference is critical for understanding the future of work. AI and Cognitive Computing are often interchangeable terms to people who are not working in the technology industry. Both imply that computers are now responsible for performing job functions that a human used to perform. In fact, there’s a big difference between AI and Cognitive Computing. Understanding the difference will be intrinsic in facilitating the work of a person working in the intersection of these two.