Predictive Analytics Goes to College – to Predict Student Success

Higher education has been a little slow on the uptake to use advanced analytics to improve student success but now with the technology that allows us to marry and analyze structured and unstructured data, including streaming data, a number of successful projects are underway.

“The Future of Data Visualization”

From the 2014 Strata + Hadoop World conference in San Jose, a keynote from Jeffrey Heer, Co-Founder of Trifacta: ‘Charting a Path Forward: The Future of Data Visualization’.

Net2Net: Accelerating Learning via Knowledge Transfer

We introduce techniques for rapidly transferring the information stored in one neural net into another neural net. The main purpose is to accelerate the training of a significantly larger neural net.


DeepDive is a new type of data management system that enables one to tackle extraction, integration, and prediction problems in a single system, which allows users to rapidly construct sophisticated end-to-end data pipelines, such as, dark data BI systems. By allowing users to build their system end-to-end, users focus on the portion of their system that most directly improves the quality of their application. In contrast, previous pipeline-based systems require developers to build extractors, integration code, and other components—without any clear idea of how their changes improve the quality of their data product. This simple insight is the key to how DeepDive systems produce higher quality data in less time. DeepDive-based systems are used by users without machine learning expertise in a number of domains from paleobiology to genomics to human trafficking, see our showcase for examples.

A Closer Look at RDDs

The heart of a good program is, as any really talented programmer will tell you, the data structures, rather than the algorithm. RDDs provide a good data structure for Big Data because of their reliability combined with their simplicity. With Spark and RDDs, the only limit will be your imagination, especially when backed by the support of MapR’s Spark distribution.