Introduction to Event Streaming with Kafka and Kafdrop

Event sourcing, eventual consistency, microservices, CQRS… These are quickly becoming household names in mainstream application development. But do you know what makes them tick? What are the basic building blocks required to assemble complex, business-centric applications from fine-grained services without turning the lot into a big ball of mud? This article examines a fundamental building block – event streaming. Leading the charge will be Apache Kafka – the de facto standard in event streaming platforms, which we’ll observe through Kafdrop – a feature-packed web UI.


Survivorship bias in Data Science and Machine Learning

One of the problems the Applied Mathematics Panel took over was aircraft survivability since back then, a plane going out for battle basically had a 50-50 chance of returning home safe at sound at the end of a battle. So improving that probability, even if it was just a little, could make a massive difference in the field. The person in charge of analyzing and trying to improve this was Abraham Wald. And for doing it, Abraham came out with a very simple and logic solution: after a plane came back from battle, he asked people in the field to mark in a card in which parts the plane had been shot.


Auptimizer: A faster, easier way to do hyperparameter optimization for machine learning

Auptimizer is a general-purpose open-source Hyperparameter Optimization (HPO) framework that also allows you to scale your HPO training from CPUs and GPUs to on-prem and EC2 instances. To get started, use ‘pip install auptimizer’. You can find our documentation here and our repo here.


Streamlit- Deploy a Machine Learning Model without learning any web framework.

It can be very tiresome for many people to work on an actual data science project and then spend some more time working on a web framework, backend, and frontend. For a data scientist or a machine learning engineer, working on these technologies is a secondary task. So the question is that, how is it possible to deploy an ML model without learning even the flask, which is a very well known minimal python framework? Well, here in this blog, I’ll present you with the most useful tool, namely Streamlit, which can help you to focus on your work as a data scientist. However, it will take care of the deployment of your model, which can be published as a working web application.


Coding habits for data scientists

If you’ve tried your hand at machine learning or data science, you know that code can get messy, quickly. Typically, code to train ML models is written in Jupyter notebooks and it’s full of (i) side effects (e.g. print statements, pretty-printed dataframes, data visualisations) and (ii) glue code without any abstraction, modularisation and automated tests. While this may be fine for notebooks targeted at teaching people about the machine learning process, in real projects it’s a recipe for unmaintainable mess. The lack of good coding habits makes code hard to understand and consequently, modifying code becomes painful and error-prone. This makes it increasingly difficult for data scientists and developers to evolve their ML solutions. In this article, we’ll share techniques for identifying bad habits that add to complexity in code as well as habits that can help us partition complexity.


Isolation Forest and Spark

Main characteristics and ways to use Isolation Forest in PySpark. Isolation Forest is an algorithm for anomaly / outlier detection, basically a way to spot the odd one out. We go through the main characteristics and explore two ways to use Isolation Forest with Pyspark.


Uncommon Data Cleaners for your Real-World Machine or Deep Learning Project

Data cleaning is a subject that is lightly touched in your brick&mortar or on-line classes. However, in your work as a Data Engineer or Data Scientist you will spend a great deal of your time getting ready (pre-processing) your data so that it can be input into your model. Data cleaning is critical if you are going to create a service for production.


New Theory Cracks Open the Black Box of Deep Learning

A new idea called the ‘information bottleneck’ is helping to explain the puzzling success of today’s artificial-intelligence algorithms – and might also explain how human brains learn. 40


Attacks against machine learning – an overview

This blog post survey the attacks techniques that target AI (artificial intelligence) systems and how to protect against them.
Advertisements