Is The Data Science Profession At Risk of Automation?

Can Quality Forecasts Really be Produced On-Demand? And What Does that Means for the Data Science Profession?


How to beat resistance to AI projects: 3 steps

1. Find your business problem that AI can solve
2. Evaluate the feasibility and potential with AI pilots
3. Showcase ROI and humanize the AI model’s results


Turn your previous Python projects into awesome tools – with Tkinter

So you wrote your first scripts on a Jupyter Notebook and you frantically run every cell each time you want to use the script. It’s not very practical, is it? Sure, you can install some extensions and some widgets and I even came across an article where someone created a dashboard that was updated in real time inside the notebook. But wouldn’t it be cooler if you created your own program that you could start with a simple click? As a side project, I decided to make some videos about the code in my articles, so I would love to hear from you regarding the format, duration, and any other feedback! Thanks!


Master Dimensionality Reduction with these 5 Must-Know Applications of Singular Value Decomposition (SVD) in Data Science

• Singular Value Decomposition (SVD) is a common dimensionality reduction technique in data science
• We will discuss 5 must-know applications of SVD here and understand their role in data science
• We will also see three different ways of implementing SVD in Python


Text mining in education

The explosive growth of online education environments is generating a massive volume of data, specially in text format from forums, chats, social networks, assessments, essays, among others. It produces exciting challenges on how to mine text data in order to find useful knowledge for educational stakeholders. Despite the increasing number of educational applications of text mining published recently, we have not found any paper surveying them. In this line, this work presents a systematic overview of the current status of the Educational Text Mining field. Our final goal is to answer three main research questions: Which are the text mining techniques most used in educational environments? Which are the most used educational resources? And which are the main applications or educational goals? Finally, we outline the conclusions and the more interesting future trends.


N-Shot Learning: Learning More with Less Data

If AI is the new electricity, then data is the new coal. Rapid growth of artificial intelligence and deep learning has the ability to impact countless lives and change the world in ways that, until now, were only dreamed of in comic books. Unfortunately, just as we’ve seen a hazardous depletion in the amount of available coal, many AI applications have little or no data accessible to them. New technology has made up for a lack of physical resources; likewise, new techniques are needed to allow applications with little data to perform satisfactorily. This is the issue at the heart of what is becoming a very popular field: N-shot Learning.


Machine Learning is Happening Now: A Survey of Organizational Adoption, Implementation, and Investment

This is an excerpt from a survey which sought to evaluate the relevance of machine learning in operations today, assess the current state of machine learning adoption and to identify tools used for machine learning. A link to the full report is inside.


An Introduction to Reproducible Analyses in R

Earlier this week I had a lot of fun running a one-day workshop for the Royal Society of Biology titled ‘An Introduction to Reproducible Analyses in R’. It was intended to introduce researchers at all stages of their careers to using R to make their analyses and figures more reproducible. We ran the course because an increase in the complexity and scale of biological data means biologists are increasingly required to develop the data skills needed to design reproducible workflows for the simulation, collection, organisation, processing, analysis and presentation of data. I believe developing such data skills requires at least some coding which makes your work (everything you do with your raw data) explicitly described, totally transparent and completely reproducible. However, learning to code can be a daunting prospect for many biologists! That’s why an ‘Introduction to reproducible analyses in R’ was developed.


The Shiny Developer Series

Shiny is one of the best ways to build interactive documents, dashboards, and data science applications. But advancing your skills with Shiny does not come without challenges. Shiny developers often have a stronger background in applied statistics than in areas useful for optimizing an application, like programming, web development, and user-interface design. Though there are many packages and tools that make developing advanced Shiny apps easier, new developers may not know these tools exist or how to find them. And Shiny developers are also often siloed. Though the Shiny developer community is huge, there is rarely someone sitting next to you to sound out ideas about your app. With these challenges in mind, the RStudio Community has partnered with Eric Nantz of the R-Podcast to create the Shiny Developer Series.


Unsupervised Data Augmentation

The more data we have, the better the performance we can achieve. However, it is very too luxury to annotate a large amount of training data. Therefore, proper data augmentation is useful to boost up your model performance. Authors of Unsupervised Data Augmentation (Xie et al., 2019) proposed Unsupervised Data Augmentation (UDA) assistants us to build a better model by leveraging several data augmentation methods.


Hands-on Global Model Interpretation

This article is a continuation of my series of articles on Model Interpretability and Explainable Artificial Intelligence. If you haven’t, I would highly recommend you to check out the first article of this series – ‘Introduction to Machine Learning Model Interpretation’ which covers the basics of Model Interpretability ranging from what model interpretability is, why we need it to the underlying distinctions of model interpretation. In this article, we will pick up where we left off by diving deeper into the ins and outs of global model interpretation. First we will quickly recap on what global model interpretation and why it is important. Then we will dive into the theory of two of it’s most popular methods – feature importance and partial dependence plots – and apply them to get information about the features of the heart disease data-set.


A Comprehensive Guide to Correlational Neural Network with Keras

Human being along with many other animals have 5 basic senses: Sight, Hearing, Taste, Smell, and Touch. We also have additional senses like a sense of balance and acceleration, sense of time, etc. Every single moment human brain processes information from all these sources and each of these senses affect our decision-making process. During any conversation, movement of lips, facial expression along with sound produced by vocal cord helps us to fully understand the meaning of words pronounced by the speaker. We can even understand words only by seeing the lips movement without any sound. This visual information is not just supplementary but necessary. This was first exemplified in the McGurk effect (McGurk & MacDonald, 1976) where a visual /ga/ with a voiced /ba/ is perceived as /da/ by most subjects. As we want our machine learning models to achieve human-level performance, it is also necessary to enable them to use data from various sources.


How Affinity Propagation works?

Affinity propagation (AP) is a centroid based clustering algorithm similar to k Means or K medoids, which does not require the estimation of the number of clusters before running the algorithm. Affinity propagation finds ‘exemplars’ i.e. members of the input set that are representative of clusters.