Sentiment analysis and the complex natural language

There is huge amount of content produced online by amateur authors, covering a large variety of topics. Sentiment analysis (SA) extracts and aggregates users’ sentiments towards a target entity. Machine learning (ML) techniques are frequently used as the natural language data is in abundance and has definite patterns. ML techniques adapt to domain specific solution at high accuracy depending upon the feature set used. The lexicon-based techniques, using external dictionary, are independent of data to prevent overfitting but they miss context too in specialized domains. Corpus-based statistical techniques require large data to stabilize. Complex network based techniques are highly resourceful, preserving order, proximity, context and relationships. Recent applications developed incorporate the platform specific structural information i.e. meta-data. New sub-domains are introduced as influence analysis, bias analysis, and data leakage analysis. The nature of data is also evolving where transcribed customer-agent phone conversation are also used for sentiment analysis. This paper reviews sentiment analysis techniques and highlight the need to address natural language processing (NLP) specific open challenges. Without resolving the complex NLP challenges, ML techniques cannot make considerable advancements. The open issues and challenges in the area are discussed, stressing on the need of standard datasets and evaluation methodology. It also emphasized on the need of better language models that could capture context and proximity.

Geneva Social Media Index: Twitter Analytics for Digital Diplomacy

The Geneva Social Media Index (GSMI) is based on analysis of the use of Twitter, which is the most frequently used social media tool in diplomacy, politics and social developments. The GSMI balances Twitter activities and the impact these activities create. It aims to promote smart and impactful use of social media. The Index was developed by Dr Goran S. Milovanović, Data Scientist, DiploFoundation, for DiploFoundation and the Geneva Internet Platform.

9 Key Benefits of Data Lake !

1. Scalability
2. Converge All Data Sources
3. Accommodate High Speed Data
4. Implant the Schema
5. AS-IS Data Format
6. The Schema
7. The favorite SQL
8. Advanced Analytics
9. Administrative Benifits

7 Key Ingredients for Knock-out Data Visualizations

Big data analytics will all amount to nothing if you don’t report the results properly to the right people in the right way. After all, what’s the point of investing in Big (or small) data analytics if the resulting insights don’t get to the people who need those insights to make better decisions? Make sure you report the results effectively by following these 7 steps:

Simple Distributions for Mixtures?

The idea of GLMs is that given some covariates, has a distribution in the exponential family (Gaussian, Poisson, Gamma, etc). But that does not mean that has a similar distribution… so there is no reason to test for a Gamma model for before running a Gamma regression, for instance. But are there cases where it might work? That the non-conditional distribution is the same (same family at least) than the conditional ones?

Apache Spark: RDD, DataFrame or Dataset?

There Are Now 3 Apache Spark APIs. Here’s How to Choose the Right One

When k-means Clustering Fails

Letting the computer automatically find groupings in data is incredibly powerful and is at the heart of “data mining” and “machine learning”. One of the most widely used methods for clustering data is k-means clustering. Unfortunately, k-means clustering can fail spectacularly as in the example below.

7 Ways to Perplex a Data Scientist

On the heels of a report showing the inefficacy of government-run cyber security, it’s imperative to understand the limitations of your system and model. As that article shows, in addition to bureaucratic risk the government also needs to worry about gaming-the-bureaucracy risk! Government snafus aside, data science has enjoyed considerable success in the past few years. Despite this success, models can fail in surprising ways. Last year we saw how deep neural nets for image recognition fail on noisy data.