Who’s the best scorer in the NBA?’ is a question that comes up a lot during conversations with my friends. Names like Kevin Durant, James Harden, and Steph Curry always come up. It’s often difficult to come up with a single answer; the question becomes more nuanced when distinctions are made within scorers. How do we distinguish talent when taking into context the different situations in which players score?
Travis CI is a common tool to build R packages. It is in my opinion the best platform to use R in continuous integration. Some of the most downloaded R packages built at this platform. These are for example testthat, magick or covr. I also built my package RTest at this platform. During the setup I ran into some trouble. The knowledge I gained I’m sharing with you in this guide.
Modeling time series data can be challenging, so it makes sense that some data enthusiasts (including myself) put off learning this topic until they absolutely have to. Before you can apply machine learning models to time series data, you have to transform it to an ‘ingestible’ format for your models, and this often involves calculating lagged variables, which can measure auto-correlation i.e. how past values of a variable influence its future values, thus unlocking predictive value.
Speech is the most natural form of communication for us?-?it’s second nature to us. And now, our machines have started to recognize our speech and they’re getting better and better at communicating with us. Current voice assistants and devices like Amazon Alexa and Google Home are getting more and more popular each month?-?they are changing how we shop, how we search, how we interact with our devices and even each other.
This article proposes a novel feature extraction approach for speech/music classification based on generalized Gaussian distribution descriptors extracted from IIR-CQT spectrogram representation. IIR-CQT spectrogram visual representation provides superior temporal resolution at high frequencies and better spectral resolution for low frequencies compared to the conventional short-time Fourier transform analysis which provides uniform frequency resolution. Multi-level decomposition of the spectrogram image is then performed using the Nonsubsampled Contourlet Transform (NSCT) which a fully shift-invariant, multi-scale, and multi-direction expansion that can preserve the edges of the textural pattern of speech and music. The generalized Gaussian distribution (GGD) parameters are produced using maximum likelihood estimation (MLE) from the NSCT subbands to create the image feature descriptor. Chaos crow search algorithm is employed to chose the most relevant feature sub-set and to discard redundant features and finally the extreme learning machine classifier categorizes input audio segment into speech/music. The experimental results show that the proposed feature descriptor is effective and performs better compared to the existing approaches in the speech/music classification. In addition, mismatched training and testing results are also presented.
Imagine a world where every computer system is customized specifically to your own personality. It learns the nuances of how you communicate and how you wish to be communicated with. Interacting with a computer system becomes more intuitive than ever and technological literacy sky rockets. These are the potential outcomes you could see in a future where reinforcement learning is the norm. In this article, we are going to break down reinforcement learning and dissect some of the components that come together to make up a reinforcement learning system.
A comparative study of algorithms like Monte-Carlo Control and Temporal-Difference Control used to solve games like Blackjack.
I’m excited to announce that my first package has been accepted to CRAN! The package pcLasso implements principal components lasso, a new method for sparse regression which I’ve developed with Rob Tibshirani and Jerry Friedman. In this post, I will give a brief overview of the method and some starter code. (For an in-depth description and elaboration of the method, please see our arXiv preprint. For more details on how to use the package, please see the package’s vignette.)
Learning the theoretical background for data science or machine learning can be a daunting experience, as it involves multiple fields of mathematics, and a long list of online resources. In this piece, my goal is to suggest resources to build the mathematical background necessary to get up and running in data science practical/research work. These suggestions are derived from my own experience in the data science field, and following up with the latest resources suggested by the community.
Understanding whether a tweet is meant as positive or negative is something humans rarely have problems with. For computers, however, it is an entirely different story?-?complicated sentence structure, sarcasm, figurative language etc. make it difficult for computers to judge the meaning and sentiment of a sentence. However, automatically assessing the sentiment of a tweet would allow for large-scale opinion-mining of the population on all sorts of issues and could help understanding why certain groups of the population hold certain opinions. On a more fundamental level, understanding the sentiment of text is a key part of natural language understanding and thus an essential task to solve if we want computers to be able to communicate efficiently with us. In this blog post, I will present the results of a small research project carried out as part of the SoBigData project at the University of Sheffield. We tested different approaches to processing text and analysed how much of the sentiment they are able to pick up. Read on for a full tour of the project and the results!
As more and more systems leverage ML models in their decision-making processes, it will become increasingly important to consider how malicious actors might exploit these models, and how to design defenses against those attacks. The purpose of this post is to share some of my recent learnings on this topic.
There are a lot of articles out there explaining common mistakes new Data Scientists make, focusing mainly on practices but not on the Machine Learning (ML) process itself. This article is going to cover just that: What kind of mistakes a Data Scientist can make in the ML pipeline and a few ways to address them…
The genetic algorithm owes its form to biomimicry, not derivation from first principles. So, unlike the workings of conventional optimization algorithms, which are typically apparent from the underlying mathematical derivations, the workings of the genetic algorithm require elucidation. Attempts to explain how genetic algorithms work can be divided in two: those based to a lesser or greater extent on the scientific method, and those that reject the scientific method in favor of logical positivism.