Data Scientists Must Prioritize Client Confidentiality

Many organizations are reluctant to create data science teams (internally or externally) because of information confidentiality and privacy concerns. It is dangerous to open the kimono to competition – disclosing high-value information about inner workings of the firm may cause significant damage. There is real fear of exposing valuable confidential information to data scientists who may leave the firm and share key knowledge with competitors. Moreover, externally hired data scientists could potentially share critical information with their other clients who may be direct or indirect competitors.

5 Of the Most Viewed Scipy and NumPy Questions with Problems on Stack Overflow

Computational Time of Predictive Models

Tuesday, at the end of my 5-hour crash course on machine learning for actuaries, Pierre asked me an interesting question about computational time of different techniques. I’ve been presenting the philosophy of various algorithm, but I forgot to mention computational time. I wanted to try several classification algorithms on the dataset used to illustrate the techniques.

Visualizing Sort Algorithms With ggplot

Have you read Visualizing Algorithms by Mike Bostock? It’s a pure gold post. In that post Mike show a static representation of a sort algorith and obvious it will fun to replicate that image with ggplot. So here we go. We need some sorts algorihms. In this link you can see some algorithms.

US:China graph of the day

At first it looks like a pie chart, but it isn’t. It’s a set of bar charts warped into a circle, so that the ratio of blue and red areas in a wedge is the square of the ratio of the numbers. Also, the circle format means the longest wedge in each pair must be the same length: 8.6% unemployment rate is the same as 4.6% military expenditure, 104% market capitalisation, and 46 Olympic gold medals.

A deeper theory of testing

In some of my recent public talks (for example: here and here) I have mentioned a desire for “a deeper theory of fitting and testing.” The true goal of predictive analytics is always: to build a model that works well in production. Training and testing procedures are designed to simulate this unknown future model performance, but can be expensive and can also fail.