Navigating Big Data Careers with a Statistics PhD
A quick scan of the science and technology headlines often yields two words: big data. The amount of information we collect has continued to increase, and this data can be found in varied sectors, ranging from social media to genomics. Claims are made that big data will solve an array of problems, from understanding devastating diseases to predicting political outcomes. There is substantial “big data” hype in the press, as well as business and academic communities, but how do upcoming, current, and recent statistical science PhDs handle the array of training opportunities and career paths in this new era? Undergraduate interest in statistics degrees is exploding, bringing new talent to graduate programs and the post-PhD job pipeline. Statistics training is diversifying, with students focusing on theory, methods, computation, and applications, or a blending of these areas. A few years ago, Rafa outlined the academic career options for statistics PhDs in two posts, which cover great background material I do not repeat here. The landscape for statistics PhD careers is also changing quickly, with a variety of companies attracting top statistics students in new roles. As a new faculty member at the intersection of machine learning, causal inference, and health care policy, I’ve already found myself frequently giving career advice to trainees. The choices have become much more nuanced than just academia vs. industry vs. government. So, you find yourself inspired by big data problems and fascinated by statistics. While you are a student, figuring out what you enjoy working on is crucial. This exploration could involve engaging in internship opportunities or collaborating with multiple faculty on different types of projects. Both positive and negative experiences can help you identify your preferences.
What kind of decision boundaries does Deep Learning (Deep Belief Net) draw? Practice with R and {h2o} package
For a while (at least several months since many people began to implement it with Python and/or Theano, PyLearn2 or something like that), nearly I’ve given up practicing Deep Learning with R and I’ve felt I was left alone much further away from advanced technology… But now we (not only I!) have a great masterpiece: {h2o}, an implementation of H2O framework in R. I believe {h2o} is the easiest way of applying Deep Learning technique to our own datasets because we don’t have to even write any code scripts but only to specify some of its parameters. That is, using {h2o} we are free from complicated codes; we can only focus on its underlying essences and theories. With using {h2o} on R, in principle we can implement “Deep Belief Net”, that is the original version of Deep Learning*1. I know it’s already not the state-of-the-art style of Deep Learning, but it must be helpful for understanding how Deep Learning works on actual datasets. Please remember a previous post of this blog that argues about how decision boundaries tell us how each classifier works in terms of overfitting or generalization, if you already read this blog. 🙂
Making R Files Executable (under Windows)
Although it is reasonable that R scripts get opened in edit mode by default, it would be even nicer (once in a while) to run them with a simple double-click. Well, here we go …
Online Courses to Freshen Up Your Knowledge of Big Data
By now you already know how important Big data processing and analytic is in running a lot of industries in the world. It is a multi-sector industry that helps manage big data, monitor trends, improve business operations, and even fight crimes. Big data is used in so many industries that it is considered as one of the most in demand skill this 2015. But just like anything else in modern technology, Big data trends are ever changing. It is right to keep on learning the trends and process on how you can adapt to the changes in this industry. Thankfully, there are online courses that can help you. The courses range from basic big data courses, to advanced classes that will talk about the application and trends. Here are some of the best big data classes offered online:
London card clash sensitivity analysis
The data blog of the Daily Mirror reports a problem with ‘card clash’ on the London Underground. You can now pay directly with a debit card instead of buying a ticket — so if you have both a transport card and a debit card in your wallet, you have the opportunity to enter with one and leave with the other and get overcharged. Alternatively, you can take the card out of your wallet and drop it. Auckland Transport has a milder version of the same problem: no-touch credit cards can confuse the AT HOP reader and make it not recognise your card, but you won’t get overcharged unless you don’t notice the red light.
Dato Core: The open source core of the GraphLab ML library
Dato Core is the open source piece of GraphLab Create, a Python-based machine learning platform that enables data scientists and app developers to easily create intelligent apps at scale: from prototype to production.
Hierarchical Clustering with R (in Action)
(feat. D3.js and Shiny)
Discovering the relationship of the G20 members using Data Mining
Our goal is to verify that the amount of joint occurrences of G20 countries (specifically Brazil) published in the news related to the financial market may reflect the data of the Brazilian Ministry of Development, Industry and Foreign Trade. For those who are unfamiliar with the term, the G20 (20 major economies) is a group consisting of ministers of economy and central bank presidents of 19 major economies plus the European Union.
On Synchronicity and Decision Making
In a previous article I argued that in order to ensure the highest performance, decision makers need to balance the data driven actionable insights with intuition driven insights. In this article a new insight is added to the decision tool box. This insight is represented by an acausal rare, coincidental event that can have a huge impact on the decision maker’s enterprise, although absolutely free. Such an event is called a synchronicity, first introduced, defined and modeled by Swiss psychologist Carl Jung. Most of the ideas presented in this article are summaries and adaptation of excerpts from “Synchronicity”, a book by Jessica Satori , and ” Synchronicity: Nature and Psyche in an interconnected universe ” by David H. Rosen. Besides integrating the concepts presented in these books into a decision making aid, the additional contribution here is the derivation of metrics to measure and evaluate whether an event is a synchronicity. Such an evaluation boosts the awareness of these life and business changing events thereby potentially increasing the frequency of their use in decision making and resulting beneficial impact.