R.I.P. data governance: Data enablement is the clear path forward

Let’s be clear – we do not intend to bury data governance, but rather to place it within the context of a more comprehensive approach: data enablement. The goal of data governance – ensuring the quality of an organization’s data across the data lifecycle – is noble enough. But the processes involved often end up stymieing transformative progress and success. At their worst, these processes are black holes that drain resources from the organization and yield no tangible benefits. Data governance still holds value, but it is only one piece of the puzzle. Enterprises today should shift their focus to the larger picture: data enablement. By building a program around data enablement, enterprises can ensure that the right data is delivered to the right resource at the right time. Data enablement requires innovative thinking, vision, people, processes and technologies.


How Spotify’s Algorithm Knows Exactly What You Want to Listen To

Spotify is doing everything it can to get you to listen to more music. The company has created algorithms to govern everything from your personal best home screen to curated playlists like Discover Weekly, and continues to experiment with new ways to understand music, and why people listen to one song or genre over another. While competitors like Apple Music, Amazon Prime Music, and Google Music rely on a mix of paid humans and community-created playlists, Spotify’s main differentiating factor is the level of customization and expansion of music knowledge offered to customers. Spotify needs to continue building out these algorithms because it’s the only way to create custom listening experiences for each of its over 200 million users. As Spotify struggles to grow its business, that differentiating factor needs to be a compelling reason to subscribe to the service. The home screen of the Spotify app is a prime example of how algorithms govern a listening experience. Its goal is to quickly help users find something they are going to enjoy listening to, according to a presentation by Spotify Research director Mounia Lalmas-Roelleke at the Web Conference earlier this year.


AI Coverage Best Practices, According to AI Researchers

Interest in Artificial Intelligence (AI) has skyrocketed in recent years, both among the media and the general public. At the same time, media coverage of AI has wildly varied in quality – at one end, tabloid and clickbait media outlets have produced outrageously inaccurate portrayals of AI that reflect science fiction more than reality. At the other end, news outlets such as The New York Times or Wired have had specialized reporters such as Cade Metz and Tom Simonite who consistently write well-researched and accurate portrayals of AI. But, even responsible media coverage can inadvertently (and often unintentionally) propagate subtle misconceptions of AI through choice of wording, imagery, or analogy. As AI researchers, we are both invested and sensitive to how AI is portrayed in the media. In this article, we suggest a list of best practices for media coverage of AI, some of which may not be obvious to people without a technical background in AI. In being a set of best practices, this list will not be representative of what even we as researchers always do, but rather principles to keep in mind and try to stick to (and ignore as needbe according to good judgement). The list is inspired both by our own observations, and the observations of the AI researchers we surveyed online and at the Stanford AI Lab. We hope it will be useful to journalists, researchers, and anyone who reads or writes about AI.


How to Create an AI Center of Excellence for Enterprise

Organizations are spending millions of dollars on AI strategies and it has allowed them to edge ahead of competitors. They may have just one or several projects under their belt but find themselves asking, ‘Now what?’ Getting a return on AI investment will likely take time and cycles, tons of training data, algorithms, use cases, and several stakeholders involved in order to uncover the value of AI. Building an AI Center of Excellence within your organization will help establish long term vision, sustainable programs, and deliver consistently impactful improvements to your customer experience.


Why you should care about robotic process automation

In a classic 1983 paper, cognitive psychologist Lisanne Bainbridge drew attention to a curious irony: so-called ‘automated’ systems were, in practice, usually attended by one or more human operators. The more advanced the system, she observed, ‘the more crucial…the contribution of the human operator.’ Bainbridge believed that automation designers were in denial about this, however. As she saw it, designers approached the problem of automation as if the human factor were not, in fact, a factor. This resulted in systems that left their (inevitable) human operators ‘with an arbitrary collection of tasks’ with respect to which ‘little thought may have been given to providing support.’ This is precisely the kind of problem that robotic process automation (RPA) aims to address.


A Visual Description of Multicollinearity

Multicollinearity is one of those terms in statistics that is often defined in one of two ways:
1. Very mathematical terms that make no sense – I mean, what is a linear combination anyway?
2. Completely oversimplified in order to avoid the mathematical terms – it’s a high correlation, right?
So what is it really? In English?


RecSim: A Configurable Simulation Platform for Recommender Systems

Significant advances in machine learning, speech recognition, and language technologies are rapidly transforming the way in which recommender systems engage with users. As a result, collaborative interactive recommenders (CIRs) – recommender systems that engage in a deliberate sequence of interactions with a user to best meet that user’s needs – have emerged as a tangible goal for online services.


RecSim: A Configurable Recommender Systems Simulation Platform

RecSim is a configurable platform for authoring simulation environments for recommender systems (RSs) that naturally supports sequential interaction with users. RecSim allows the creation of new environments that reflect particular aspects of user behavior and item structure at a level of abstraction well-suited to pushing the limits of current reinforcement learning (RL) and RS techniques in sequential interactive recommendation problems. Environments can be easily configured that vary assumptions about: user preferences and item familiarity; user latent state and its dynamics; and choice models and other user response behavior. We outline how RecSim offers value to RL and RS researchers and practitioners, and how it can serve as a vehicle for academic-industrial collaboration.


A Pirate’s Guide to Accuracy, Precision, Recall, and Other Scores

Whether you’re inventing a new classification algorithm or investigating the efficacy of a new drug, getting results is not the end of the process. Your last step is to determine the correctness of the results. There are a great number of methods and implementations for this task. Like many aspects of data science, there is no single best measurement for results quality; the problem domain and data in question determine appropriate approaches.


Local-First Software: You Own Your Data, in spite of the Cloud

Cloud apps like Google Docs and Trello are popular because they enable real-time collaboration with colleagues, and they make it easy for us to access our work from all of our devices. However, by centralizing data storage on servers, cloud apps also take away ownership and agency from users. If a service shuts down, the software stops functioning, and data created with that software is lost. In this article we propose local-first software, a set of principles for software that enables both collaboration and ownership for users. Local-first ideals include the ability to work offline and collaborate across multiple devices, while also improving the security, privacy, long-term preservation, and user control of data. We survey existing approaches to data storage and sharing, ranging from email attachments to web apps to Firebasebacked mobile apps, and we examine the trade-offs of each. We look at Conflict-free Replicated Data Types (CRDTs): data structures that are multi-user from the ground up while also being fundamentally local and private. CRDTs have the potential to be a foundational technology for realizing local-first software. We share some of our findings from developing local-first software prototypes at the Ink & Switch research lab over the course of several years. These experiments test the viability of CRDTs in practice, and explore the user interface challenges for this new data model. Lastly, we suggest some next steps for moving towards local-first software: for researchers, for app developers, and a startup opportunity for entrepreneurs.


How to recognize AI snake oil

Much of what’s being sold as ‘AI’ today is snake oil – it does not and cannot work. Why is this happening? How can we recognize flawed AI claims and push back?
Advertisements