[Overview]: Ensemble Learning made simple

When you want to purchase a mobile phone, will you directly walk to the store and turn online and pick any mobile phone? The most common practice is to browse the internet for the reviews and compare among different models, specifications, features & prices. Probably you would ask your peers for a suggestion/advice for your purchase and end up with a conclusion. On a whole, you didn’t reach a conclusion directly but instead considered options from other sources. In this article, I will introduce you one such technique in machine learning called ‘Ensemble Learning’ and algorithms that use this technique.


Retinal Damage Detection by Image-Based Deep Learning

In the modern era, we are automating almost every possible thing that comes to our mind and Machine Learning is solving many real-world problems. One of the major impacts is in the medical field. In this blog, we will go through a deep learning approach that predicts the class of the retina damage.


Why Your Company Needs Reproducible Research

Today’s sciences – especially the social sciences – are in a bit of turmoil. Many of the most important experiments and findings are not reproducible. This ‘reproducibility crisis’ has significant implications not just for the future of academic research and development, but for any business expecting increased returns from investing in innovation, experimentation, and data analysis. Business needs to learn from science’s mistakes.


Dear Companies: Your Data Science Job Descriptions are Awful…

My corporate brethren. Before you get mad, just a quick reminder that I’m on your side. In my first two blog posts, I did my best to help Data Science applicants craft resumes that won’t make your eyes burn, as well as how they can message you in ways that won’t make you want to give up on humanity… …but y’all aren’t always perfect when it comes to the Data Science hiring process either. In fact, many of you perpetuate one of the worst aspects of it, through what I can only deem is some sort of pathological need to keep the application process shady. Yep, it’s time we finally set a torch to the much maligned Data Science Job Description. And from the ashes of the old world, we shall build anew…


We are ready for Machine Learning Explainability?

The new European Union General Data Protection Regulation (GPDR, General Data Protection Regulation) includes regulations on how to use Machine Learning. These regulations aim to give control of personal data to the user introducing the Right to explanation.


Paper Summary. Stiffness: A New Perspective on Generalization in Neural Networks

This paper aims at improving our understanding of how neural networks generalize from the point of view of stiffness. The intuition behind stiffness is how a gradient update on one point affects another: characterizes the amount of correlation between changes in loss on the two due to the application of a gradient update based on one of them. (4.1, Results and discussion)


Better Understanding Negative Log Loss

While learning fast.ai, I decided to test out the ‘3 lines of code’ on some dataset other than the ones used in the course. The wiki page of fast.ai has some recommendations and I decided to try out as many as possible. The first recommended dataset under the easy category was Dogs vs. Cats Redux: Kernels Edition. Although this is very similar to the one in the course, I thought I might as well try it out.


4 Machine Learning Techniques with Python

While this tutorial is dedicated to Machine Learning techniques with Python, we will move over to algorithms pretty soon. But before we can begin focussing on techniques and algorithms, let’s find out if they’re the same thing. A technique is a way of solving a problem. This is quite generic as a term. But when we say we have an algorithm, we mean we have an input and we desire a certain output from it. We have clearly defined what steps to follow to get there. We will go the lengths to say an algorithm may make use of multiple techniques to get to the output. Now that we have distinguished between the two, let’s find out more about Machine Learning techniques.


The Principles of Econometrics

Literally, econometrics means economic measurement and is the art of identifying and quantifying the causal relationships inherent among economic phenomena. It is also the study of the application of statistical methods for the analysis of economic phenomena. After all, econometrics is the unification of economic theory, mathematical tools, and statistical methodology. In a broader sense, econometrics is concerned with estimating economic relations, confronting theory with facts, testing hypotheses about economic behavior and forecasting the behavior of economic variables. For its application, econometrics is composed of:
• Specification.- The construction of models.
• Estimate.- Adjust the model to the data.
• Verification.- Test the model.
• Prediction.- Use the model.


ToneNet : A Musical Style Transfer

The recent success of Generative Adversarial Networks (GANs) in vision domain such as style transfer inspired us to experiment with these techniques in musical domain. Music generation mainly delves on two most important things: composition and performance. Composition focuses on building blocks of the song like the notations, tone, pitch and chords. And performance which focuses on how the notes are played by the performer. This uniqueness defines the style of music.


Multi-label Text Classification using BERT – The Mighty Transformer

The past year has ushered in an exciting age for Natural Language Processing using deep neural networks. Research in the field of using pre-trained models have resulted in massive leap in state-of-the-art results for many of the NLP tasks, such as text classification, natural language inference and question-answering. Some of the key milestones have been ELMo, ULMFiT and OpenAI Transformer. All these approaches allow us to pre-train an unsupervised language model on large corpus of data such as all wikipedia articles, and then fine-tune these pre-trained models on downstream tasks. Perhaps the most exciting event of the year in this area has been the release of BERT, a multilingual transformer based model that has achieved state-of-the-art results on various NLP tasks. BERT is a bidirectional model that is based on the transformer architecture, it replaces the sequential nature of RNN (LSTM & GRU) with a much faster Attention-based approach. The model is also pre-trained on two unsupervised tasks, masked language modeling and next sentence prediction. This allows us to use a pre-trained BERT model by fine-tuning the same on downstream specific tasks such as sentiment classification, intent detection, question answering and more.


4 Tools That Leverage Big Data To Track Emails Better Than Ever

No matter what your role is, if you work in the technology sector, you likely spend a large portion of your day dealing with email in some way. You’re sending, reading, or reviewing emails, or you’re checking your inbox to see if anything else comes in. By some estimates, the average worker spends 30 hours a week checking their email. Despite being such a centrally important and frequent job function, most of us are flying blind; we don’t understand how much time we’re spending on email, nor do we have a solid understanding of whether our efforts are productive. Fortunately, there are several new email tracking software tools that employers and employees can use to keep a closer eye on these metrics. The problem is that previous email monitoring tools lacked the analytics capabilities needed to make empirically based decisions with the quality managers needed. Big data is making it easier for companies to get deeper insights.


Describe and understand Bayesian models and posteriors using bayestestR

The Bayesian framework is quickly gaining popularity among scientists, leading to the growing popularity of packages to fit Bayesian models, such as rstanarm or brms. However, extracting summary indices from these models to report them in your manuscript can be quite challenging, especially for new users. To address this, please let us introduce bayestestR!


Discrete Event Simulation (DES) Metamodeling – Splines with R and Arena

Simulation Metamodeling – building and using surrogate models that can approximate results from more complicated simulation models – is an interesting approach to analyze results from complicated, computationally expensive simulation models. Metamodels are useful because they can yield good approximations of the original simulation model response variables using less computational resources. For an introduction to Metamodeling, refer to (Barton 2015). To my knowledge, no Discrete-Event Simulation (DES) software provides metamodeling capabilities, and guidance on how to actually execute metamodeling is scarce. In this post, I’ll build a Spline-based simulation metamodel. This tutorial should be useful to advanced users of Arena Simulation who would be willing to give metamodeling a try.


Google launches end-to-end AI Platform

Google announced the beta launch of the company’s AI Platform. The platform brings together a variety of existing and new products that allow you to build a full data pipeline to pull in data, label it (with the help of a new built-in labeling service), and then use either existing classification, object recognition, or entity extraction models, or existing tools like AutoML or the Cloud Machine Learning Engine to train and deploy custom models.