Introduction to Git for Data Science

Version control is one of the power tools of programming. It allows you to keep track of what you did when, undo any changes you have decided you don’t want, and collaborate at scale with other people. This course will introduce you to Git, a modern version control tool that is very popular with data scientists and software developers alike, and show you how it can help you get more done in less time and with less pain.

Reality check on AI

It’s too early to worry about a sentient AI apocalypse. The reality is that we know very little about how the human brain works — which means we know even less about how to build a computer that works just like the human brain. For very specific tasks, AI tends to make rapid progress until it matches human-level performance; then progress tends to slow down. So despite fears of an AI dystopia, the technology is still very limited compared to human intelligence. A more practical problem in AI is figuring out good ways for engineers and product managers to communicate a shared vision for how to actually use AI in the enterprise.

22 Great Articles About Neural Networks

This resource is part of a series on specific topics related to data science: regression, clustering, neural networks, deep learning, Hadoop, decision trees, ensembles, correlation, outliers, regression, Python, R, Tensorflow, SVM, data reduction, feature selection, experimental design, time series, cross-validation, model fitting, dataviz, AI and many more.

Running R on AWS

Many AWS customers already use the popular open-source statistic software R for big data analytics and data science. Other customers have asked for instructions and best practices for running R on AWS. Several months ago, I (Markus) wrote a post showing you how to connect R with Amazon EMR, install RStudio on the Hadoop master node, and use R packages such as rmr2 or plyrmr to analyze a huge public weather dataset. In this post, we show you how to install and run R, RStudio Server, and Shiny Server on Amazon EC2.

Amazon Pumps Out New Database and Machine Learning Services

Amazon’s announcements at AWS Re:Invent continue to extend AWS offerings far beyond basic infrastructure services.

AI, Analytics and the Future of Your Enterprise

Tapping into the power of big data, analytics and AI can be daunting. So we’ve looked at the strategic areas your organization can focus on. Our overview of emerging trends revealed three challenges of big data, and opportunities therein:
1. An overwhelming amount of data to analyze
2. A proliferation of powerful, yet complex, tools and technologies
3. A need for new skillsets and infrastructure built around big data
It’s vital that organizations build competency in these areas. The future belongs to those able to get the best out of their data.
We’ve seen industrial revolutions happen every 50-100 years. With each revolution, those slow to transition have become irrelevant. Today, the fourth industrial revolution is upon us, powered by the rapid rise of artificial intelligence (AI) and data analytics.
To thrive in a new, data-centric world, organizations will need the right skills, approach and tools – including a modern data platform and storage infrastructure. Storage is where your data lives. It delivers data when you need it at the speed of your business. And it needs to be built for the new era of big data.

10 Tips for Building Effective Machine Learning Models

1. Look at the data
2. Slice the data
3. Use simple models
4. Detect rare events
5. Combine lots of models
6. Deploy your models
7. Autotune your models
8. Manage change
9. Balance generalization
10. Add features

InfoGAN – Generative Adversarial Networks Part III

In Part I the original GAN paper was presented. Part II gave an overview of DCGAN, which greatly improved the performance and stability of GANs. In this final part, the contributions of InfoGAN will be explored, which apply concepts from Information Theory to transform some of the noise terms into latent codes that have systematic, predictable effects on the outcome.

Evolutionary Algorithms for Feature Selection

Feature selection is a very important technique in machine learning. In this post we discuss one of the most common optimization algorithms for multi-modal fitness landscapes – evolutionary algorithms.

Understanding deep Convolutional Neural Networks with a practical use-case in Tensorflow and Keras

Deep learning is one of the most exciting artificial intelligence topics. It’s a family of algorithms loosely based on a biological interpretation that have proven astonishing results in many areas: computer vision, natural language processing, speech recognition and more. Over the past five years, deep learning expanded to a broad range of industries. Many recent technological breakthroughs owe their existence to it. To name a few: Tesla autonomous cars, photo tagging systems at Facebook, virtual assistants such as Siri or Cortana, chatbots, object recognition cameras. In so many areas, deep learning achieved a human-performence level on the cognitive tasks of language understanding and image analysis. Here’s an example of what deep learning algorithms are capable of doing: automatically detecting and labeling different objects in a scene.

An introduction to Monte Carlo Tree Search

We recently witnessed one of the biggest game AI events in history – Alpha Go became the first computer program to beat the world champion in a game of Go. The publication can be found here. Different techniques from machine learning and tree search have been combined by developers from DeepMind to achieve this result. One of them is the Monte Carlo Tree Search (MCTS) algorithm. This algorithm is fairly simple to understand and, interestingly, has applications outside of game AI. Below, I will explain the concept behind MCTS algorithm and briefly tell you about how it was used at the European Space Agency for planning interplanetary flights.