How Kubernetes Platform Works at the Fundamental Level

I see a lot of people having problems to understand how the Kubernetes platform works at the fundamental level, e.g. resiliency and behaviour. If you start thinking about Kubernetes as a fully event-driven system, there’s answers to so many ‘Why”s


Designing Distributed Systems – Hands on Lab

The samples in this lab are written with the reader of this book in mind: https://…/en-us and will guide you through the steps in designing and deploying distributed systems in Microsoft Azure.


Pizza Ontology

An ontology about pizzas and their toppings. This is an example ontology that contains all constructs required for the various versions of the Pizza Tutorial run by Manchester University (see http://…/protg-owl-tutorial ).


Hackathon Winner Interview: Penn State – Kaggle University Club

We believe today’s university students are tomorrow’s leading data scientists. As such, we decided to launch Kaggle University Club – a virtual community and Slack channel for existing data science clubs who want to compete in Kaggle competitions together. As our end-of-year event, we hosted our first-ever University Hackathon! 18 total kernels were submitted and the three top scoring teams won exclusive Kaggle swag and an opportunity to be featured here, on No Free Hunch. Please enjoy this profile from one of the top scoring university teams, ‘Team NDL’ from Penn State! To read more about the Hackathon and its grading criteria, see here: Winter ’18 Hackathon and to read this team’s winning kernel, visit: Team NDL: Algorithms and Illnesses.


Improve ML transparency without sacrificing accuracy

O’Reilly begins to shed some light on the accuracy/complexity tradeoff in machine learning, with An Introduction to Machine Learning Interpretability: An Applied Perspective on Fairness, Accountability, Transparency, and Explainable AI. Get the ebook now!


Can Users Control and Understand a UI Driven by Machine Learning?

We live in a world flooded by information. It’s harder and harder for us to keep track of it or to manually curate it for others; luckily, modern data science can sort through the vast amounts of information and surface those items that are relevant to us. Machine-learning algorithms rely on user knowledge and patterns observed in the data to make inferences and suggestions about what we may like or be interested in. With machine-learning technologies becoming more and more accessible to developers, there’s a push for companies to take advantage of these algorithms to improve their products and their users’ experience.


What is neural architecture search?

Deep learning offers the promise of bypassing the process of manual feature engineering by learning representations in conjunction with statistical models in an end-to-end fashion. However, neural network architectures themselves are typically designed by experts in a painstaking, ad hoc fashion. Neural architecture search (NAS) has been touted as the path forward for alleviating this pain by automatically identifying architectures that are superior to hand-designed ones. But with the field moving so fast both in terms of research progress and hype, it can be hard to get answers to basic questions: What exactly is NAS and is it fundamentally different from AutoML or hyperparameter optimization? Do specialized NAS methods actually work? Aren’t they prohibitively expensive to use? Should I be using specialized NAS methods?


Spelling 2.0: Improved Markdown and RStudio Support

We have released updates for the rOpenSci text analysis tools. This technote will highlight some of the major improvements in the spelling package and also the underlying hunspell package, which provides the spelling engine for the spelling package.


Would You Recommend?: Customer Analysis With Reviews

In the past days, marketing was focusing solely on how to advertise their product. There were only a few ways a company can promote their product such as TV ads, Newspaper ads, and some online banner ads. Those approaches were typical, massive and costly. Moreover, it could only have a single way of direction, from a company to customers. As the world is digitalized, however, the connectivity and networking tremendously changed the distance between customers and companies. Today’s customers can communicate with other customers as well as companies and share their experiences of products dynamically. They are directly or indirectly participating in production and making their position as ‘a prosumer.’


Strong AI: What’s Next?

Ray Kurzweil’s view that machines will be as smart as humans by 2029, which he calls ‘The Singularity’. Is this a realistic prediction? We cannot deny that AI techniques such as ‘deep learning’ and ‘convolutional neural networks’ have made stunning advancements in image recognition and other difficult tasks. As a result, numerous AI companies have appeared to catch the wave of excitement as funding and acquisitions have accelerated. However, ‘Strong AI’ is the original dream of the field of AI. The field of AI has been around for more than 50 years, and the pioneers really were interested in building a thinking, learning machine, a machine that can think and learn the way humans do. However, this proved to be really difficult. So, over the decades, people have really changed their focus and decided to focus on narrow problems. The ultimate goal is to reach this level of technological singularity. In other words that the invention of artificial superintelligence/Strong AI will abruptly trigger runaway technological growth, resulting in unfathomable changes to human civilization!


Building Open Source Google Analytics from Scratch

From an engineering standpoint, the technology behind Google Analytics was pretty sophisticated when it was created. Custom, tailored-made algorithms were implemented for event collection, sampling, aggregation, and storing output for reporting purposes. Back then it required years of engineering time to ship such a piece of software. Big data landscapes have changed drastically since then. In this tutorial, we’re going to rebuild an entire Google Analytics pipeline. We’ll start from data collection and reporting. By using the most recent big data technology available, we’ll see how simple it is to reproduce such software nowadays.


Multi-Class Classification in Text using R

This blog is in continuation to my NLP blog series. In the previous blogs, I discussed data pre-processing steps in R and recognizing emotions present in ted talks. In this blog, I am going to predict the ratings of the ted talks given by viewers. This would require a multi-class classification and quite a bit of data cleaning and preprocessing. We will discuss each step in detail below. So, let’s dive in!


Deep Learning in practice

Practitioners know that Deep Learning methods are extremely powerful and flexible, but applying them ‘in the wild’, on real-world data sets, is very different from re-running the latest SOTA (State-Of-The-Art) model on a well-known data set such as CIFAR-10/100 or ImageNet. Missing data, erroneous values, mistyping, wrong labels, etc., make things much more complicated, and as a result sometimes we’re not sure whether we can trust our results or not. Let’s have a look at some practical tips about what to check when you’re getting results which seem ‘too good to be true’ or ‘too bad to be useful’ .


Progress Bars in Python

Just like a watched pot never boils, a watched for loop never ends. When dealing with large datasets, even the simplest operations can take hours. Progress bars can help make data processing jobs less of a headache because:
1. You get a reliable estimate of how long it will take.
2. You can see immediately if it´s gotten stuck.
The first of these is especially valuable in a business environment, where having a solid delivery estimate can make you look super professional. The best/only way I´ve found to add progress bars to Python code is with tqdm. While it is super easy to use, tqdm can be a bit finnicky to set up, especially if you use JupyterLab (which you totally should).


Reinforcement Learning from Scratch: Designing and Solving a Task All Within a Python Notebook

In this article, I will introduce a new project that attempts to help those learning Reinforcement Learning by fully defining and solving a simple task all within a Python notebook. The environment and basic methods will be explained within this article and all the code is published on Kaggle in the link below. In addition, I have created a ‘Meta’ notebook that can be forked easily and only contains the defined environment for others to try, adapt and apply their own code to.


NLP with Kotlin

Natural Language Processing (NLP) is at the heart of many applications that fall under the ‘machine learning’ trademark though they are their own group of algorithms and ways of approaching artificial intelligence (pick this with a pinch of salt) NLP is used to create chatbots that help you when you’re asking for support to deliver results from a search that are similar; to translate texts or to alleviate the work other applications must do by offering a representation of the text that is easier to handle One application of NLP is to generate or guess the next word in a sequence. As with other machine learning processes, guessing the next word requires training a model and making inferences using it. In this example, you will see how to build a simple word generator that works at character level. There are different approaches to this task. In the last few years, RNN in different flavours are beating other mechanisms to generate text. However, in this project, I have decided to drop them in favour of the more simple n-gram representation as it is a much better fit for the problem to solve


Sentiment & Influencers

A few years ago, we started a debate about whether the loudest customers really were as important as everybody?-?including they themselves?-?thought! Customer care usually reacts faster to the loudest complainer. Is this convenient? How to identify those complainers worth investing time with? Happy and disgruntled users are easily identifiable via sentiment analysis on their forum posts. The degree of influence of each user can also be measured via an influence score. There are many influence scores available. A widely adopted one is the centrality index. The idea of this use case is to combine the sentiment measure with the influence score and in this way to identify those disgruntled customers/users with a high degree of influence. Support time and resources should then be redirected to the most influential and unhappy customers or users.


Multimodal Deep Learning

Being highly enthusiastic about research in deep learning I was always searching for unexplored areas in the field (Though it is tough to find one). I had previously worked on Maths word problem solving and many such related topics. The challenge of using Deep Neural Networks as black boxes piqued me. I decided to dive deeper into the topic of ‘Interpretability in multimodal deep learning’. Here are some of the results.


XGBoost Mathematics Explained

XGBoost (https://…/xgboost ) is one of the most popular and efficient implementations of Gradient Boosted Trees algorithm, a supervised learning method that is based on function approximation by optimizing specific loss functions as well as applying several regularization techniques. The original XGBoost paper can be found here: https://…/1603.02754.pdf , and the purpose of this post is to explain the mathematics of some critical parts of the paper as well as to give some insights. We assume that the reader is familiar to the content of the original paper.


Introduction to PyTorch Model Compression Through Teacher-Student Knowledge Distillation

Serving ML models in resource constrained mobile and real-time systems can be a real problem. The ML community has been developing solutions to compress the size of the models generated by larger clusters of servers. Model compression promises savings on the inference time, power efficiency and model size. All of that can let that flying rescue drone cover more land surface on a single battery charge, as well as not draining the batteries of your mobile app users.


A quick introduction to Slow Feature Analysis

I recently started PhD studies in machine learning at Ruhr University Bochum. One of the main research topics of the group I joined is called Slow Feature Analysis (SFA). To learn about a new topic, I like seeing examples and intuitive explanations if possible before submerging myself in mathematical rigor. I wrote this blog post for others who like approaching subjects in a similar manner, as I believe that SFA is quite powerful and interesting. In this post I’ll lead with a code example where SFA is applied, to help motivate the method. Then I’ll go into more detail about the math behind the method and finally provide links to other good resources on the material.


Seasonal Adjustment by X-13ARIMA-SEATS in R

Seasonal is a powerful interface between R and X-13ARIMA-SEATS, the seasonal adjustment software developed by the United States Census Bureau. It offers access to almost all features of X-13, including seasonal adjustment via the X-11 and SEATS approaches, automatic ARIMA model search, outlier detection, and support for user-defined holiday variables such as the Chinese New Year or Indian Diwali. The required X-13 binaries are provided by the x13binary package, which facilitates a fully-automatic installation on most platforms. A demo website (at http://www.seasonal.website ) supports interactive modeling of custom data. A graphical user interface in the seasonalview package offers the same functionality locally. X-13 can handle monthly, quarterly or bi-annual time series.


Slow Feature Analysis: Unsupervised Learning of Invariances

Invariant features of temporally varying signals are useful for analysis and classi.cation. Slow feature analysis (SFA) is a new method for learning invariant or slowly varying features from a vectorial input signal. It is based on a nonlinear expansion of the input signal and application of principal component analysis to this expanded signal and its time derivative. It is guaranteed to .nd the optimal solution within a family of functions directly and can learn to extract a large number of decorrelated features, which are ordered by their degree of invariance. SFA can be applied hierarchically to process high-dimensional input signals and extract complex features. SFA is applied .rst to complex cell tuning properties based on simple cell output, including disparity and motion. Then more complicated input-output functions are learned by repeated application of SFA. Finally, a hierarchical network of SFA modules is presented as a simple model of the visual system. The same unstructured networkcan learn translation, size, rotation, contrast, or, to a lesser degree, illumination invariance for one-dimensional objects, depending on only the training stimulus. Surprisingly, only a few training objects suf.ce to achieve good generalization to newobjects.The generated representation is suitable for object recognition. Performance degrades if the network is trained to learn multiple invariances simultaneously.
Advertisements