Extract features of Music

Extraction of features is a very important part in analyzing and finding relations between different things. The data provided of audio cannot be understood by the models directly to convert them into an understandable format feature extraction is used. It is a process that explains most of the data but in an understandable way. Feature extraction is required for classification, prediction and recommendation algorithms. In this blog, we will extract features of music files that will help us to classify music files into different genres or to recommend music based on your favorites. We will learn different techniques used for extracting features of music. The audio signal is a three-dimensional signal in which three axes represent time, amplitude and frequency.

DevOps for Machine Learning

Until very recently, most organizations have seen two distinct, non-overlapping work streams when building an AI enabled application: a development path and a data science path. Often, both groups are actually building similarly scripted functional solutions using something like python or C/F#. Further, once a data scientist finishes the evaluation and model selection step of the data science process , I’ve found there to be a ‘confusion vacuum’ when it comes to best practices around integrating into existing or augmenting new business processes, each side not fully understanding how to support the other / when to engage. Much of the convergence, in my opinion, has been fueled by the growing popularity and usage of container services like Docker and Kubernetes especially in the DevOps world. So, how do these swimlanes converge, you ask? I’m glad you did!

Dashboard for Sales Trends in Retail

Retail is probably the most talked about industry when it comes to disruption these days. Empty malls are a common blog topic and unusually high number of bankruptcies span across all subsectors. Some of the familiar names that filed for bankruptcy in the last few years span from well know Sears, ToysRUs, Limited Brands to lesser known Aerosoles, Gander Mountain and the Walking Company. Given all this hyper activity and my interest in Retail, when I had to pick a topic for my Visualization project I quickly jumped on to a readily available Retail dataset for exploration. Exploring this dataset has helped me understand the industry better and has also thrown some surprise insights at me.

Creating a custom classifier for Text Cleaning

Recently I’ve been studying NLP more than other data science fields, and one challenge that I face more often than not is the cleaning part of the process. Building NLP models require many pre-processing steps, and if the data is not properly treated, it could result in poor models, which is necessarily what we want to avoid. In this article, we’re going to focus on PDF documents. The goal here is to open a PDF file, convert it to plain text, understand the need for data cleaning and build a machine learning model for that purpose.

Compensating for NLP’s Lack of Understanding

The saying ‘a picture is worth a thousand words’ does something of an injustice to the medium of language. It suggests that words are an inefficient form of communication when in fact the opposite is true. When humans use language to communicate, so much is left out because the speaker and listener share experience of the same world, which makes explicit statements about that shared world unnecessary in everyday speech. For example, if I say to you ‘the vase is on its side, rolling along the table,’ I don’t need to also tell you that the vase is made of fragile stuff (it’s a reasonable assumption that it is), or that the table doesn’t have edges that will stop the vase’s rolling, or that as a result the vase will likely roll off the table, or that gravity will make the vase to fall to the floor, which is hard and will therefore cause the fragile vase to shatter. It’s enough for me to say ‘the vase is on its side, rolling along the table’ for you to know the vase will likely smash to pieces unless someone intervenes.

An overview of the NLP ecosystem in R

At BNOSAC, R is used a lot to perform text analytics as it is an excellent tool that provides anything a data scientist needs to perform data analysis on text in a business settings. For users unfamiliar with all the possibilities that the wealth of R packages offers regarding text analytics, we’ve made this small mindmap showing a list of techniques and R packages that are used frequently in text mining projects set up by BNOSAC.

Alternatives to Logistic Regression

Logistic regression (LR) models estimate the probability of a binary response, based on one or more predictor variables. Unlike linear regression models, the dependent variables are categorical. LR has become very popular, perhaps because of the wide availability of the procedure in software. Although LR is a good choice for many situations, it doesn’t work well for all situations.

Adversarial Robustness

Generative adversarial neural networks(GANs) are one of the most active areas of research in the deep learning ecosystem. Conceptually, GANs are a form of unsupervised learning in which two neural networks build knowledge by competing against each other in a zero-sum game. While GANs are a great mechanism for knowledge acquisition, they can also be used to generate attacks against deep neural networks. In a very well-known example, a GAN attacker can cause imperceptible changes in training images to trick a classification model.

A Mini-Introduction To Information Theory

This article consists of a very short introduction to classical and quantum information theory. Basic properties of the classical Shannon entropy and the quantum von Neumann entropy are described, along with related concepts such as classical and quantum relative entropy, conditional entropy, and mutual information. A few more detailed topics are considered in the quantum case.

A little trick for debugging Shiny

The first thing to do is to insert an action button, and a browser() in the observeEvent() watching this button. This is a standard approach: at any time, you just press this button, and you’re inside the Shiny Application – then, you can access the value of the reactiveValues and run the reactive elements, accessing the values they have at the moment you’ve pressed the button. This approach works, and it’s robust. But here’s the issue: it’s kind of cumbersome to add/remove or comment/uncomment this button when you want to show, make screenshots, or simply remove this button to have a full view of the app.

3 Types of Regression in One Picture

Interesting picture comparing linear, logistic and Poisson regression.

Advanced Analytic Platforms – Incumbents Fall – Challengers Rise

The Gartner Magic Quadrant for Data Science and Machine Learning Platforms is just out and once again there are big changes in the leaderboard. Some major incumbents have fallen and some new challengers have emerged.

Eliminate the Extraneous: The Art and (Data) Science of Holding a ‘Conversation’

In layman’s terms, we’re talking about the bar on the side of the page that says, ‘Hey Zank, you seem to love 90’s rock. You’re going to love this clip of Pearl Jam.’ In data science speak, this gets called a recommender algorithm, a recommendation engine, or simply a recommender. We’ll stick with the latter for the rest of the post. These recommenders are everywhere, from YouTube, to Amazon, to Netflix. What distinguishes the good, the bad, and the just plain ugly is the notion of ‘conversation.’ In this post I’ll explain how to use a mixture of common sense, math, and of course…data (it is 2019, after all), to provide users with an experience that is conversational. Without further ado..let’s dig in!

What is a Recurrent NNs and Gated Recurrent Unit (GRUS)

Recurrent Neural Networks (RNNs) are popular models that have shown great promise in many sequential data and among others used by Apples Siri and Googles Voice Search. Their great advantage is that the algorithm remembers its input, due to an internal memory. But despite their recent popularity there exists a limited number of resources. The goal of this article is to throughly explain how RNNs work, their biggest issues, how to solve them and introduce GRUs describing in detail their basic components.

Spectral clustering

Clustering is a widely used unsupervised learning method. The grouping is such that points in a cluster are similar to each other, and less similar to points in other clusters. Thus, it is up to the algorithm to find patterns in the data and group it for us and, depending on the algorithm used, we may end up with different clusters.

Self-Supervised GANs

If you aren’t familiar with Generative Adversarial Networks (GANs), they are a massively popular generative modeling technique formed by pitting two Deep Neural Networks, a generator and a discriminator, against each other. This adversarial loss has sparked the interest of many Deep Learning and Artificial Intelligence researchers. However, despite the beauty of the GAN formulation and the eye-opening results of the state-of-the-art architectures, GANs are generally very difficult to train. One of the best ways to get better results with GANs are to provide class labels. This is the basis of the conditional-GAN model. This article will show how Self-Supervised Learning can overcome the need for class labels for training GANs and rival the performance of conditional-GAN models.

LDA Topic Modeling: An Explanation

Topic modeling is the process of identifying topics in a set of documents. This can be useful for search engines, customer service automation, and any other instance where knowing the topics of documents is important. There are multiple methods of going about doing this, but here I will explain one: Latent Dirichlet Allocation (LDA).

2018 Lindberg-King Lecture: The Best Way to Predict the Future is to Create It. But Is It Already Too Late?

(CIT): Computer science pioneer Alan Curtis Kay, Ph.D., will deliver this year’s Lindberg-King Lecture in the Lister Hill Auditorium. His talk is titled, ‘The Best Way to Predict the Future is to Create It. But Is It Already Too Late?’ A child prodigy, Dr. Kay was an original member of the seminal Xerox-PARC group, and for his myriad innovations in computer science was awarded computer science’s highest honor: the Turing Prize. He has been elected a Fellow of the American Academy of Arts and Sciences, the National Academy of Engineering, and the Royal Society of Arts. He is the president of the Viewpoints Research Institute and an adjunct professor of computer science at the University of California, Los Angeles. The Lindberg-King Lecture honors former NLM Director Donald A.B. Lindberg, M.D., and former NLM Deputy Director of Research, and Education Donald West King, M.D. The event is co-sponsored by the NLM, Friends of the National Library of Medicine, and the American Medical Informatics Association.