Big Data Is Not Data Warehousing
Big Data is not Data Warehousing, it is not the evolution of Data Warehousing and it is not a sensible and coherent alternative to Data Warehousing. No matter what certain vendors will put in their marketing brochures or stick up their noses.
Sparse Quadratic Programming with Ipoptr
In this post, I’ll explain how ipoptr can be applied to solve quadratic programs and I’ll compare the performance of this solver to other quadratic program solvers (quadprog, ipop) available in R. We’ll see that ipoptr is very fast and efficient on large sparse quadratic programs, seemingly an order of magnitude faster than quadprog on the demonstration problem considered in my previous post. Because the Ipopt backend is a bit tricky to install, the last section provides a detailed overview of how I successfully built this solver under Ubuntu Linux.
Some More Results on the Theory of Statistical Learning
Yesterday, I did mention a popular graph discussed when studying theoretical foundations of statistical learning. But there is usually another one, which is the following, …
Naive Bayes on Apache Flink
In this blog post we are going to implement a Naive Bayes classifier in Apache Flink. We are going to use it for text classification by applying it to the 20 Newsgroup dataset. To understand what is going on, you should be familiar with Java and know what MapReduce is. If you have seen and understood a word count example in any system, you’re good to go. If you haven’t heard of MapReduce or haven’t seen the word count, you may first have a look at our introductory post “Hadoop and MapReduce”.