Trill is a high-performance one-pass in-memory streaming analytics engine from Microsoft Research. It can handle both real-time and offline data, and is based on a temporal data and query model. Trill can be used as a streaming engine, a lightweight in-memory relational engine, and as a progressive query processor (for early query results on partial data).

Introduction to Indexing in SQL

In this tutorial, learn about indexing in databases and different types of indexing techniques.

Types of Analytics

1. Descriptive (What happened?)
2. Diagnostic (How it happened?)
3. Predictive (What´s gonna happen?)
4. Prescriptive (What could be done?)

Do you really need a data scientist?

Just ‘plugging in’ a data scientist in your databases won’t deliver the expected results. First, you need to ensure your data is actually valuable.

Simulating Misanthropic Neighbors

Back in 2016 I posted my first interactive Shiny App based on a simulation program I wrote in R to try to test my solution to a 538 Riddler puzzle. I’m crossposting that work here now, since LinkedIn is trash.

A journey into supervised machine learning

Earlier this year, through my MBA program at Cornell Tech, I took a great intro course on Machine Learning with a fantastic professor, Lutz Finger. Lutz’s course inspired me to dig even deeper into ML and AI, so I recently started a hands-on Introduction to Machine Learning course on Udacity, which I felt would augment my background well. So far, Ive completed sections on Supervised learning and wanted to share my beginner’s take on what I’ve learned and reflect on potential implications of this technology.

Advanced Queries With SQL That Will Save Your Time

During the years of working with telecom data my folder with code snippets collected a lot of reusable examples. And it is not about ‘SELECT * FROM Table1’, I am talking about finding and handling or removing duplicate values, selecting top N values from each group of data within same table, shuffling records within the groups, but keep groups sorted, finding the longest left match, expanding numbers by N digits and so on. Today I would like to share those snippets and explain how it all works. I work with Microsoft SQL Server and use T-SQL in my queries, but I am sure it is possible to find alternatives for other DBMS. If not, reach me out and we will try to find it together ??

How will automation tools change data science?

Data science is now a major area of technology investment, given its impact on customer experience, revenue, operations, supply chain, risk management and many other business functions. Data science enables a data-centric decision-making process for organizations, accelerating digital transformation and AI initiatives. According to Gartner, Inc. only four percent of CIOs have implemented AI, and only 46 percent have plans to do so. While investments continue to grow, many enterprises find it increasingly challenging to implement and accelerate data science practices. This article provides an overview of recent trends in machine learning and data science automation tools and addresses how those tools will change data science.

Text Generation Using RNNs

Text generation is a popular problem in Data Science and Machine Learning, and it is a suitable task for Recurrent Neural Nets. This report uses TensorFlow to build an RNN text generator and builds a high-level API in Python3. The report is inspired by @karpathy ( min-char-rnn) and Aurélien Géron ( Hands-On Machine Learning with Scikit-Learn and TensorFlow ). This is a class project in CST463?-?Advanced Machine Learning at Cal State Monterey Bay, instructed by Dr. Glenn Bruns.

Demystifying ‘Confusion Matrix’ Confusion

If you are Confused about Confusion Matrix, then I hope this post may help you understand it! Happy Reading. We will use the UCI Bank Note Authentication Dataset for demystifying the confusion behind Confusion Matrix. We will predict and evaluate our model, and along the way develop our conceptual understanding. Also will be providing the links to further reading wherever required.

Introducing K-FAC — Training ResNet-50 on ImageNet in 978 iterations

In this article, I summarize Kronecker-factored Approximate Curvature (K-FAC) (James Martens et al., 2015), one of the most efficient second-order optimization method for deep learning. Also, I introduce my work Second-order Optimization Method for Large Mini-batch: Training ResNet-50 on ImageNet in 35 Epochs which applies K-FAC to large-scale deep learning (ResNet-50 for ImageNet classification) and converges in 35 epochs with mini-batch size ? 16,384 and 978 iterations with mini-batch size = 131,072.

A line-by-line layman’s guide to Linear Regression using TensorFlow

Linear regression is a great start to the journey of machine learning, given that it is a pretty straightforward problem and can be solved by popular modules such as the scikit-learn package. In this article, we shall discuss a line-by-line approach on we implement linear regression using TensorFlow.