SIP text log analysis using Pandas

A SIP application server (AS) text logs analysis may help in detection and, in some specific situations, prediction of different types of issues within a VoIP network. SIP server text logs contain the information which is difficult to obtain or even cannot be obtained from other sources, such as CDRs or signaling traffic captures.

A quick guide to summarize many approaches for handling categorical data (both low and high cardinality) when preprocessing data for neural network based predictors.

23 Statistical Concepts Explained in Simple English – Part 7

• Face Validity: Definition and Examples
• Factor analysis: Easy Definition
• False Discovery Rate: Simple Definition, Adjusting for FDR
• False Positive and False Negative: Definition and Examples
• Familywise Error Rate (Alpha Inflation): Definition
• Fat Tail Distribution: Definition, Examples
• 5 Number Summary in Excel: Easy Steps with Video
• Outliers: Finding Them in Data, Formula, Examples. Easy Steps and V…
• How to Find Pearson’s Coefficient of Skewness in Excel
• Pooled Sample Standard Error: How to Calculate it
• Regression Slope Intercept: How to Find it in Easy Steps
• Standard Error Excel 2013 in Easy Steps
• Standard Error of Regression Slope
• How to Find t Critical Value on TI 83
• Variance in Minitab: How to Find it
• Finite Population Correction Factor FPC: Formula, Examples
• Fisher Z-Transformation
• Fleiss’ Kappa
• Fmax / Hartley’s Test: Definition, Step by Step Example, Table
• Fractile Definition Usage and How to Calculate
• Frequency Distribution Table in Excel — Easy Steps!
• Frequency Polygon: Definition and How to Make One
• Friedman’s Test / Two Way Analysis of Variance by Ranks

The AtomSpace: a Typed Graphical Distributed in-RAM Knowledgebase

The OpenCog AtomSpace is a typed graphical distributed in-RAM knowledgebase. The meaning of all those buzzwords is that it is trying to fix many different, related problems that computers (and people) have when working with data. This blog-post discusses some basic issues of working with knowledge, and the various ways that people have tried to work around these issues over the decades. It will illuminate why graph databases are explosively popular these days, and will also show how that’s not the end of the story. I’m trying to start the next chapter already.

The Penelope Platform

Penelope is a cloud-based, open and modular platform that consists of tools and techniques for mapping landscapes of opinions expressed in online (social) media. The platform is used for analysing the opinions that dominate the debate on certain crucial social issues, such as immigration, climate change and national identity.

Building Big Shiny Apps — A Workflow (1/2)

During the rstudio::conf(2019L), I’ve presented an eposter called ‘Building Big Shiny Apps – A Workflow’. You can find the poster here, and this blog post is an attempt at a transcription of what I’ve been talking about while presenting the poster. As this is a rather long topic, I’ve divided this post into two parts: this first post will talk about the background and motivation, and the second post will present a step by step workflow and the necessary tools.

Visualising Machine Learning Datasets with Google’s FACETS.

There has been a lot of uproar as to how a large quantity of training data can have a tremendous impact on the results of a machine learning model. However, along with data quantity, it is also the quality which is critical to building a powerful and robust ML system. After all ‘GARBAGE IN: GARBAGE OUT’ i.e what you get from the system will be a representation of what you feed into the system. A Machine Learning dataset sometimes consists of data points ranging from thousands to millions which in turn may contain hundreds or thousands of features. Additionally, real-world data is messy comprising of missing values, unbalanced data, outliers etc. Therefore it becomes imperative that we clean the data before proceeding with model building. Visualising the data can help in locating these irregularities and pointing out the locations where the data actually needs cleaning. Data Visualisation gives an overview of the entire data irrespective of its quantity and helps to perform EDA in a fast and accurate manner.

The 15 most important AI companies in the world

1. Amazon
2. Apple
3. Banjo
4. DJI
5. Facebook
6. Google
7. HiSilicon
8. IBM
9. Intel
10. Microsoft
11. Nvidia
12. OpenAI
13. Qualcomm
14. SenseTime
15. Twitter

Data Augmentation for Natural Language Processing

Lessons learned from a hate speech detection task to improve supervised NLP models. Note: this post is mainly targeted at an audience unfamiliar with Natural Language Processing and will hence cover some basics concepts before moving on to data augmentation

Tensorflow – The core concepts

Like most machine learning libraries, TensorFlow is ‘concept-heavy and code-lite’. The syntax is not very difficult to learn. But it is very important to understand its concepts.

How to Learn More in Less Time with Natural Language Processing (Part 1)

Imagine you are given an assignment from school or work that involves A LOT of research. You spend all night grinding it out, so you can acquire the knowledge you need for a high-quality end product. Now imagine you are given the exact same assignment and you finish with the same high-quality result except this time you finished with lots of time to spare! For obvious reasons the latter scenario is preferable. Time is a valuable asset hence we need to find a solution to one of the biggest time wasting problems we face as a society: modern-day data influx. As technologies advance, the amount of data we collect is increasing at exponential rates and it is becoming increasingly difficult to keep up with new information. Fortunately, we have Natural Language Processing (NLP), a technology we can leverage to help solve this issue. This article will go through what NLP is and how you can create your own extractive text summarizer! In Part 2 we will look at creating a bag of words model to classify the subject of the article you chose to summarize. Let’s get into it!

Deploy your machine learning models with tensorflow serving and kubernetes

Machine learning applications are booming and yet there is not a lot of tools available for Data Engineers to integrate those powerful models in production systems. Here I discuss how tensorflow-serving can help you accelerate delivering models in productions. This blog post is about serving machine learning models?-?what does it mean? Serving is how you apply a ML model after you’ve trained it?-?Noah Fiedel Software Engineer working on tensorflow serving

Monetizing the Math – are you ready?

Twenty-One things to do or plan to fully realize the ROI of your AI/ML projects in 2019:
1. Do you have the infrastructure to deploy, maintain and refresh models in short cycles?
2. Do you have the control mechanisms and resources to monitor, maintain and report on the model (and correct defects promptly)?
3. Can your application be maintained without eating resources that are not committed to you?
4. Does your app have a compelling name that the CEO has already mentioned to his direct reports…in a good way (this is really important)?
5. Are your algorithms so simple and transparent that no-one could possibly mis-interpret them or screw them up after you and your team have moved on?
6. Are your training data sets in a place where someone will remember?
7. Have you produced a confusion matrix that you are ready to live with and explain?
8. Does your algorithm also explain the ‘why’ of its prediction or recommendation? (I already know the answer to this so let this one be aspirational).
9. Have you made provisions for when IT rebuilds the transaction system or data warehouse from where you get your run-time data?
10. Have you made provisions for when your external data vendor increases their price, changes their endpoints or goes out of business?
11. Do you have active visible executive sponsorship over IT and Change Management?
12. Have you created and communicated widely a scope and release plan for all of the programmatic components (larger than your dev/deploy project alone)?
13. Have you thought through the operational impact and have the full cooperation of line level managers?
14. Have you shared the story with end users early enough to give them ownership in the app?
15. Have you agreed with Finance on how the ‘monetize’ part will be measured?
16. Have you fully considered impact to customers?
17. Are your ethics and data privacy concerning both customers and employees beyond any doubt or skeptical judgement?
18. Are you ready to speak clearly all along the way, defend yourself technically and operationally to skeptics (and the occasional enemy), address the fears, doubts and uncertainties (FUD) of your stakeholders and keep your team from being demoralized?
19. Have you trained your successor to do the same?
20. Are you ready for the very long time it may take, when the original problem seems to be obscured by new problems?
21. Is organizational adoption strong enough that the app will have a life after the current executive sponsor has moved on?

A Full Hardware Guide to Deep Learning

Deep Learning is very computationally intensive, so you will need a fast CPU with many cores, right? Or is it maybe wasteful to buy a fast CPU? One of the worst things you can do when building a deep learning system is to waste money on hardware that is unnecessary. Here I will guide you step by step through the hardware you will need for a cheap high-performance system. Over the years, I build a total of 7 different deep learning workstations and despite careful research and reasoning, I made my fair share of mistake in selecting hardware parts. In this guide, I want to share my experience that I gained over the years so that you do not make the same mistakes that I did before. The blog post is ordered by mistake severity. This means the mistakes where people usually waste the most money come first.

Web Scraping Google Sheets with RSelenium

I love to learn new things and one of ways I learn best is by doing. Also it’s been said that you never fully understand a topic until you are able to explain it , I think blogging is a low barrier to explaining things. Someone I met at a local data science meetup in Montréal wanted help web scraping to get team standings from the PuzzledPint. I jumped at the opportunity because I knew this would be my opportunity to finally learn RSelenium!

One Shot Learning and Other Strategies for Reducing Training Data

Not enough labeled training data is a huge barrier to getting at the equally large benefits that could be had from deep learning applications. Here are five strategies for getting around the data problem including the latest in One Shot Learning.