**Fascinating Chaotic Sequences with Cool Applications**

Here we describe well-known chaotic sequences, including new generalizations, with application to random number generation, highly non-linear auto-regressive models for times series, simulation, random permutations, and the use of big numbers (libraries available in programming languages to work with numbers with hundreds of decimals) as standard computer precision almost always produces completely erroneous results after a few iterations — a fact rarely if ever mentioned in the scientific literature, but illustrated here, together with a solution. It is possible that all scientists who published on chaotic processes, used faulty numbers because of this issue. This article is accessible to non-experts, even though we solve a special stochastic equation for the first time, providing an unexpected exact solution, for a new chaotic process that generalizes the logistic map. We also describe a general framework for continuous random number generators, and investigate the interesting auto-correlation structure associated with some of these sequences. References are provided, as well as fast source code to process big numbers accurately, and even an elegant mathematical proof in the last section.

**A survey on graphic processing unit computing for large-scale data mining**

General purpose computation using Graphic Processing Units (GPUs) is a well-established research area focusing on high-performance computing solutions for massively parallelizable and time-consuming problems. Classical methodologies in machine learning and data mining cannot handle processing of massive and high-speed volumes of information in the context of the big data era. GPUs have successfully improved the scalability of data mining algorithms to address significantly larger dataset sizes in many application areas. The popularization of distributed computing frameworks for big data mining opens up new opportunities for transformative solutions combining GPUs and distributed frameworks. This survey analyzes current trends in the use of GPU computing for large-scale data mining, discusses GPU architecture advantages for handling volume and velocity of data, identifies limitation factors hampering the scalability of the problems, and discusses open issues and future directions.

This is the first in a series about Chatbots. In this first installment we cover the basics including their brief technological history, uses, basic design choices, and where deep learning comes into play. In subsequent articles we’ll describe in more detail about how they are actually programmed and best practice dos and don’ts.

**Creating Reporting Template with Glue in R**

Report Generation is a very important part in any Organization’s Business Intelligence and Analytics Division. The ability to create automated reports out of the given data is one of the most desirable things, that any innovative team would thrive for. And that is one area where SAS is considered to be more matured than R – not because R does not have those features – but primarily because R practitioners are not familiar with those. That’s the same feeling I came across today when I stumbled upon this package glue in R, which is a very good and competitive alternative for Reporting Template packages like whisker and brew.

**Effective Learning: The Near Future of AI**

There is certainly no doubt that the ultimate future of AI is to reach and surpass human intelligence. But this is a far-fetched feat to achieve. Even the most optimistic ones among us bet that human-level AI (AGI or ASI) to be as far as 10-15 years from now with skeptics even willing to bet that it is going to take centuries, if even possible. Well, this is not what the post is about (you should rather read this post if you are interested in learning about super-intelligence). Here we are going to talk about a more tangible, closer future and discuss about the emerging and potent AI algorithms and techniques which, in our opinion, are going to shape the near future of AI.

**Better A/B Testing with Firebase**

If you’re like most app developers, you know that small changes can often make a big difference in the long term success of your app. Whether it’s the wording that goes into your ‘Purchase’ button, the order in which dialogs appear in your sign-up flow, or how difficult you’ve made a particular level of a game, that attention to detail can often make the difference between an app that hits the top charts, or one that languishes. But how do you know you’ve made the right changes? You can certainly make some educated guesses, ask friends, or run focus groups. But often, the best way to find out how your users will react to changes within your app is to simply try out those changes and see for yourself. And that’s the idea behind A/B testing; it lets you release two (or more!) versions of your app simultaneously among randomly selected users to find out which version truly is more successful at getting the results you want. And while Firebase Remote Config did allow you to perform some simple A/B testing through it’s ‘random percentile’ condition, we’ve gone ahead and added an entirely new experiment layer in Firebase that works with Remote Config and notifications to make it quick and easy to set up and measure sophisticated A/B tests. Let’s take a quick tour of how it works!

Have you ever wondered what goes on inside neural networks? Feature visualization is a powerful tool for digging into neural networks and seeing how they work. Our new article, published in Distill, does a deep exploration of feature visualization, introducing a few new tricks along the way! Building on our work in DeepDream, and lots of work by others since, we are able to visualize what every neuron a strong vision model (GoogLeNet [1]) detects. Over the course of multiple layers, it gradually builds up abstractions: first it detects edges, then it uses those edges to detect textures, the textures to detect patterns, and the patterns to detect parts of objects….

**Tips for Getting Started with Text Mining in R and Python**

This article opens up the world of text mining in a simple and intuitive way and provides great tips to get started with text mining.

**Metropolis-in-Gibbs Sampling and Runtime Analysis with Profviz**

First off, here are the previous posts in my Bayesian sampling series:

• Bayesian Simple Linear Regression with Gibbs Sampling in R

• Blocked Gibbs Sampling in R for Bayesian Multiple Linear Regression

In the first post, I illustrated Gibbs Sampling – an algorithm for getting draws from a posterior when conditional posteriors are known. In the second post, I showed that if we can vectorize, then drawing a whole “block” per iteration will increase the speed of the sampler.

For many models, like logistics models, there are no conjugate priors – so Gibbs is not applicable. And as we saw in the first post, the brute force grid method is much too slow to scale to real-world settings.

This post shows how we can use Metropolis-Hastings (MH) to sample from non-conjugate conditional posteriors within each blocked Gibbs iteration – a much better alternative than the grid method.

• Bayesian Simple Linear Regression with Gibbs Sampling in R

• Blocked Gibbs Sampling in R for Bayesian Multiple Linear Regression

In the first post, I illustrated Gibbs Sampling – an algorithm for getting draws from a posterior when conditional posteriors are known. In the second post, I showed that if we can vectorize, then drawing a whole “block” per iteration will increase the speed of the sampler.

For many models, like logistics models, there are no conjugate priors – so Gibbs is not applicable. And as we saw in the first post, the brute force grid method is much too slow to scale to real-world settings.

This post shows how we can use Metropolis-Hastings (MH) to sample from non-conjugate conditional posteriors within each blocked Gibbs iteration – a much better alternative than the grid method.

**Automating Summary of Surveys with RMarkdown**

This guide shows how to automate the summary of surveys with R and R Markdown using RStudio. This is great for portions of the document that don’t change (e.g., “the survey shows substantial partisan polarization”). The motivation is really twofold: efficiency (maximize the reusabililty of code, minimize copying and pasting errors) and reproducibility (maximize the number of people and computers that can reproduce findings). The basic setup is to write an Rmd file that will serve as a template, and then a short R script that loops over each data file (using library(knitr)). The render function then turns the Rmd into documents or slides (typically in PDF, HTML, or docx) by taking file metadata as a parameter. There are countless ways to summarize a survey in R. This guide shows a few basics with ggplot and questionr, but focuses on the overall workflow (file management, etc.). Following the instructions here, you should be able to reproduce all four reports (and in principle, many more) despite only writing code to clean one survey. Most of the code is displayed in this document, but all code is found in either pewpoliticaltemplate.Rmd or pew_report_generator.R. All code, as well as the outputted documents, can be found here, and details on obtaining the data are found below.

R is an incredible tool for reproducible research. In the present series of blog posts I want to show how one can easily acquire data within an R session, documenting every step in a fully reproducible way. There are numerous data acquisition options for R users. Of course, I do not attempt to show all the data possibilities and tend to focus mostly on demographic data. If your prime interest lies outside human population statistics, it’s worth checking the amazing Open Data Task View.

**7 Super Simple Steps From Idea To Successful Data Science Project**

Ever had this great idea for a data science project or business? In the end you did not do it because you did not know how to make it a success? Today I am going to show you how to do it.

Step 1 – Get the data

Step 2 – Select the right tools for the analytics

Step 3 – Prove your theory with science

Step 4 – Figure out your business model

Step 5 – Build a minimum viable product

Step 6 – Automate and Measure everything

Step 7 – Re-iterate

Step 1 – Get the data

Step 2 – Select the right tools for the analytics

Step 3 – Prove your theory with science

Step 4 – Figure out your business model

Step 5 – Build a minimum viable product

Step 6 – Automate and Measure everything

Step 7 – Re-iterate

Advertisements