An evaluation of sentiment analysis for mobile devices

Sentiment analysis has become a key tool to extract knowledge from data containing opinions and sentiments, particularly, data from online social systems. With the increasing use of smartphones to access social media platforms, a new wave of applications that explore sentiment analysis in the mobile environment is beginning to emerge. However, there are various existing sentiment analysis methods and it is unclear which of them are deployable in the mobile environment. In this paper, we provide the first of a kind study in which we compare the performance of 14 sentence-level sentiment analysis methods in the mobile environment. To do that, we adapted these methods to run on Android OS and then, we measure their performance in terms of memory, CPU, and battery consumption. Our findings unveil methods that require almost no adaptations and run relatively fast as well as methods that could not be deployed due to excessive use of memory. We hope our effort provides a guide to developers and researchers interested in exploring sentiment analysis as part of a mobile application and can help new applications to be executed without the dependency of a server-side API. We also share the Android API that implements all the 14 sentiment analysis methods used in this paper.


Featuretools is a python library for automated feature engineering.

Custom Optimizer in TensorFlow

Neural Networks play a very important role when modeling unstructured data such as in Language or Image processing. The idea of such networks is to simulate the structure of the brain using nodes and edges with numerical weights processed by activation functions. The output of such networks mostly yield a prediction, such as a classification. This is achieved by optimizing on a given target using some optimisation loss function. In a previous post, we already discussed the importance of customizing this loss function, for the case of gradient boosting trees. In this post, we shall discuss how to customize the optimizers to speed-up and improve the process of finding a (local) minimum of the loss function.

Hierarchical compartmental reserving models

Today, I will sketch out ideas from the Hierarchical Compartmental Models for Loss Reserving paper by Jake Morris, which was published in the summer of 2016 (Morris (2016)). Jake’s model is inspired by PK/PD models (pharmacokinetic/pharmacodynamic models) used in the pharmaceutical industry to describe the time course of effect intensity in response to administration of a drug dose. The hierarchical compartmental model fits outstanding and paid claims simultaneously, combining ideas of Clark (2003), Quarg and Mack (2004), Miranda, Nielsen, and Verrall (2012), Guszcza (2008) and Zhang, Dukic, and Guszcza (2012). You find the first two models implemented in the ChainLadder package, the third in the DCL package and the fourth one in an earlier blog post of mine and as part of the brms package vignette.

Learn your way around the R ecosystem

One of the most powerful things about R is the ecosystem that has emerged around it. In addition to the R language itself and the many packages that extend it, you have a network of users, developers, governance bodies, software vendors and service providers that provide resources in technical information and support, companion applications, training and implementation services, and more.

How environments work in R and what is lazy evaluation

Knowledge of the way how R evaluates expressions is crucial to avoid hours of staring at the screen or hitting unexpected and difficult bugs. We’ll start with an example of an issue I came accross a few months ago when using the purrr::map function.

OSM Nominatim with R: getting Location’s Geo-coordinates by its Address

It is quite likely to get address info when scraping data from the web, but not geo-coordinates which may be required for further analysis like clustering. Thus geocoding is often needed to get a location’s coordinates by its address. There are several options, including one of the most popular, google geocoding API. This option can be easily implemented into R with the function geocode from the library ggmap. It has the limitation of 2500 request a day (when it’s used free of charge), see details here. To increase the number of free of charge geocoding requests, OpenStreetMap (OSM) Nominatim API can be used. OSM allows up to 1 request per second (see the usage policy), that gives about 35 times more API calls compared to the google geocoding API.

Collect Your Own Fitbit Data with Python

So you’ve got your Fitbit over the Christmas break and you’ve got some New Years Resolutions. You go online and see the graphs on your dashboard but you’re still not pleased. You want more data, more graphs, and more information. Well say no more, because I’m going to teach you how to collect your own Fitbit data using nothing but a little Python code. With this tutorial, you can get your elusive minute by minute data (also known as intraday data), which is not readily available when you first get your Fitbit.

Blockchain Development Services

Blockchain development service is intended to capacitate corporate partners, clients, and developers to venture with distributed journal technology by offering them a cloud-based, single-click blockchain developer environment. It is a rising movement for public organizations, businesses, and industries to spontaneously make and authenticate transactions and thus reducing money and decreasing the potential for fraud.

The Art of Learning Data Science

These days, I am sure 90% of LinkedIn traffic contains one of these terms: DS, ML or DL?—?acronyms for Data Science, Machine Learning or Deep Learning. Beware of the cliche though: “80% of all the statistics are made on the spot”. If you blinked on these acronyms perhaps you need to google a bit and then continue reading the rest of this post. This post has 2 goals. First, it attempts to put all the fellow Data Science learners at ease. Second, if you have just begun on the Data Science, this may serve you as a guide to the next step.

Training Sets, Test Sets, and 10-fold Cross-validation

More generally, in evaluating any data mining algorithm, if our test set is a subset of our training data the results will be optimistic and often overly optimistic. So that doesn’t seem like a great idea.