A Beginner’s Guide to Object Detection

Explore the key concepts in object detection and learn how they are implemented in SSD and Faster RCNN, which are available in the Tensorflow Detection API.

Developing a Deeper Understanding of Apache Kafka Architecture Part 2: Write and Read Scalability

In the previous article, we gained an understanding of the main Kafka components and how Kafka consumers work. Now, we’ll see how these contribute to the ability of Kafka to provide extreme scalability for streaming write and read workloads.

Python Regular Expressions Cheat Sheet

The tough thing about learning data is remembering all the syntax. While at Dataquest we advocate getting used to consulting the Python documentation, sometimes it’s nice to have a handy reference, so we’ve put together this cheat sheet to help you out!

Performance: when algorithmics meets mathematics

In this post, I talk about performance through an efficient algorithm I developed for finding closest points on a map. This algorithm uses both concepts from mathematics and algorithmics.

Arrow and beyond: Collaborating on next generation tools for open source data science

Two years ago, Wes McKinney and Hadley Wickham got together to discuss some of the systems challenges facing the Python and R communities. Data science teams inevitably work with multiple languages and systems, so it’s critical that data flow seamlessly and efficiently between these environments. Wes and Hadley wanted to explore opportunities to collaborate on tools for improving interoperability between Python, R, and external compute and storage systems. This discussion led to the creation of the feather file format, a very fast on-disk format for storing data frames that can be read and written to by multiple languages.

HyperTools: A Python toolbox for visualizing and manipulating high-dimensional data

Data visualizations can reveal trends and patterns that are not otherwise obvious from the raw data or summary statistics. While visualizing low-dimensional data is relatively straightforward (for example, plotting the change in a variable over time as (x,y) coordinates on a graph), it is not always obvious how to visualize high-dimensional datasets in a similarly intuitive way. Here we present HypeTools, a Python toolbox for visualizing and manipulating large, high-dimensional datasets. Our primary approach is to use dimensionality reduction techniques (Pearson, 1901; Tipping & Bishop, 1999) to embed high-dimensional datasets in a lower-dimensional space, and plot the data using a simple (yet powerful) API with many options for data manipulation [e.g. hyperalignment (Haxby et al., 2011), clustering, normalizing, etc.] and plot styling. The toolbox is designed around the notion of data trajectories and point clouds. Just as the position of an object moving through space can be visualized as a 3D trajectory, HyperTools uses dimensionality reduction algorithms to create similar 2D and 3D trajectories for time series of high-dimensional observations. The trajectories may be plotted as interactive static plots or visualized as animations. These same dimensionality reduction and alignment algorithms can also reveal structure in static datasets (e.g. collections of observations or attributes). We present several examples showcasing how using our toolbox to explore data through trajectories and low-dimensional embeddings can reveal deep insights into datasets across a wide variety of domains.

Titus: Netflix open-sourced their container management platform

Titus is a container management platform that provides scalable and reliable container execution and cloud-native integration with Amazon AWS. Titus was built internally at Netflix and is used in production to power Netflix streaming, recommendation, and content systems.