Apache Kafka + KSQL + TensorFlow for Data Scientists via Python + Jupyter Notebook

Why would a data scientist use Kafka Jupyter Python KSQL TensorFlow all together in a single notebook? There is an impedance mismatch between model development using Python and its Machine Learning tool stack and a scalable, reliable data platform. The former is what you need for quick and easy prototyping to build analytic models. The latter is what you need to use for data ingestion, preprocessing, model deployment and monitoring at scale. It requires low latency, high throughput, zero data loss and 24/7 availability requirements. This is the main reason I see in the field why companies struggle to bring analytic models into production to add business value. Python in practice is not the most well-known technology for large scale and performant, reliable environments. However, it is a great tool for data scientist and a great client of a data platform like Apache Kafka.

Handling imbalanced datasets in machine learning

What should and should not be done when facing an imbalanced classes problem?

Hypothesis testing in Machine learning using Python

Well probably all who are beginner in machine learning or in intermediate level or statistic student heard about this buzz word hypothesis testing. Today i will give a brief introduction over this topic which created headache for me when i was learning this. I put all those concept together and examples using python. some question in mind before i will go for broader things – What is hypothesis testing ? why do we use it ? what are basic of hypothesis ? which are important parameter of hypothesis testing ?

Introduction to ResNets

We need to go Deeper’ Meme, classical CNNs do not perform well as the depth of the network grows past a certain threshold. ResNets allow for the training of deeper networks. This Article is Based on Deep Residual Learning for Image Recognition from He et al. [2] (Microsoft Research): https://…/1512.03385.pdf

How To Fine Tune Your Machine Learning Models To Improve Forecasting Accuracy

We explain how to retrieve estimates of a model’s performance using scoring metrics, before taking a look at finding and diagnosing the potential problems of a machine learning algorithm.

How to bring your Data Science Project in production

Using Azure Databricks with Spark, Azure Machine Learning Service and Azure DevOps.

Monte Carlo Simulations with Python (Part 1)

This is the first of a three part series on learning to do Monte Carlo simulations with Python. This first tutorial will teach you how to do a basic ‘crude’ Monte Carlo, and it will teach you how to use importance sampling to increase precision. Part 2 will introduce the infamous metropolis algorithm, and Part 3 will be a specialized piece for budding physicists (we’ll learn how to use Monte Carlo simulations to solve problems in quantum mechanics!) Monte Carlo methods are widely used heuristic techniques which can solve a variety of common problems including optimization and numerical integration problems. These algorithms work by cleverly sampling from a distribution to simulate the workings of a system. Applications range from solving problems in theoretical physics to predicting trends in financial investments.

Text to Image

This article will explain the experiments and theory behind an interesting paper that converts natural language text descriptions such as ‘A small bird has a short, point orange beak and white belly’ into 64×64 RGB images. Following is a link to the paper ‘Generative Adversarial Text to Image Synthesis’ from Reed et al.

A Proposed Model AI Governance Framework

The PDPC presents the first edition of A Proposed Model AI Governance Framework (Model Framework) – an accountability-based framework to help chart the language and frame the discussions around harnessing AI in a responsible way. The Model Framework translates ethical principles into practical measures that can be implemented by organisations deploying AI solutions at scale. Through the Model Framework, we aim to promote AI adoption while building consumer confidence and trust in providing their personal data for AI.

An Rstudio Addin for Network Analysis and Visualization

The ggraph package provides a ggplot-like grammar for plotting graphs and as such you can produce very neat network visualizations. But as with ggplot, it takes a while to get used to the grammar. There are already a few amazing Rstudio Addins that assist you with ggplot (for example ggplotAssist and ggThemeAssist), but there has not been any equivalent tools for ggraph. Till now. This post introduces snahelper, an Rstudio Addin which provides a tiny GUI for visualizing and analysing networks.

What is AI bias?

The AI bias trouble starts – but doesn’t end – with definition. ‘Bias’ is an overloaded term which means remarkably different things in different contexts.

An introduction to Stan with R

Stan is a probabilistic programming language for specifying statistical models. Stan provides full Bayesian inference for continuous-variable models through Markov Chain Monte Carlo methods such as the No-U-Turn sampler, an adaptive form of Hamiltonian Monte Carlo sampling. Penalized maximum likelihood estimates are calculated using optimization methods such as the limited memory Broyden-Fletcher-Goldfarb-Shanno algorithm. Stan can be called through R using the rstan package, and through Python using the pystan package. Both interfaces support sampling and optimization-based inference with diagnostics and posterior analysis. In this talk it is shown a brief glance about the main properties of Stan. It is also shown a couple of examples: the first one related with a simple Bernoulli model and the second one, about a Lotka-Volterra model based on ordinary differential equations.

Getting Started with Recommender Systems and TensorRec

Recommender systems are used in many products to present users with relevant or personalized items (food, movies, music, books, news, etc). To do this, they learn from users’ previous interactions with items to identify users’ tastes and improve future recommendations. This post will walk us through the prototyping of a new recommender system in Python using TensorRec including input data manipulation, algorithm design, and usage for prediction.

Fitting a Neural Network Using Randomized Optimization in Python

Python’s mlrose package provides functionality for implementing some of the most popular randomization and search algorithms, and applying them to a range of different optimization problem domains. In this tutorial, we will discuss how mlrose can be used to find the optimal weights for machine learning models, such as neural networks and regression models. That is, to solve the machine learning weight optimization problem.

Monitoring Kubernetes, part 1: the challenges + data sources

Our industry has long been relying on microservice-based architecture to deliver software faster and safer. The advent and ubiquity of microservices naturally paved the way for container technology, empowering us to rethink how we build and deploy our applications. Docker exploded onto the scene in 2013, and, for companies focusing on modernizing their infrastructure and cloud migration, a tool like Docker is critical to shipping applications quickly, at scale.