PixieDust Support of Streaming Data

With the rise of IoT devices (Internet of Things), being able to analyze and visualize live streams of data is becoming more and more important. For example, you could have sensors like thermometers in machines or portable medical devices like pacemakers, continuously streaming data to a streaming service like Kafka. PixieDust makes it easier to work with live data inside Jupyter Notebooks by providing simple integration APIs to both the PixieApp and display() framework. On the visualization level, PixieDust uses Bokeh support for efficient data source update to plot streaming data into live charts (note that at the moment, only line chart and scatter plot are supported, but more will be added in the future). The display() framework also supports geospatial visualization of streaming data using the Mapbox rendering engine.

The GDPR Effect on Big Data!

The European Parliament, Council and Commission created a regulation which toughens and combines data protection for people inside the EU – This regulation is called GDPR. It is a single set of rules which are created in order to govern as to how personal data is used. This is done regardless of the source and across all uses. GDPR protect the personal privacy laws pertaining to the data rights of the EU citizens. GDPR is not restricted to organizations inside the EU alone, on the contrary, any organization with customers in the EU will be affected. The way companies handle personal data will change forever with the introduction of GDPR. Europe´s data protection rules have undergone a huge change with GDPR being introduced. GDPR replaced the 1995 Data Protection Directive. The internet is growing at a rapid pace. Digital content has increased at an unimaginable rate. This has led to loads of personal data being held digitally. With so much personal data out there, the need for an enhanced data protection regulation arose and hence, GDPR. What GPR does is, it empowers individuals to gain access and control over the information held on them. While empowering individuals, GDPR also holds organizations accountable for the way thy handle and store personal data. Companies will be required to have the latest documentation and communication when it comes to data protection.

Comparison of the Top Cloud APIs for Computer Vision

There are many different cloud APIs for computer vision on the market. In addition, this field is under rapid development. In the article, we made a brief overview of the various providers. At first sight, all of them provide fairly similar capabilities, yet some put an emphasis on face recognition like Kairos, or on building custom models like IBM and Azure. However, if you need to accomplish some very specific task, you still have to build the model using Deep Learning frameworks yourself.

How Can You Find The Best Machine Learning Frameworks?

A list of machine learning frameworks has come into the picture for the development and deployment of the AI apps. These frameworks will ditch the entire flow of development, testing, optimization, and final production. Developers are quite bewildered which framework to pick and which to ditch. Some of the frameworks would focus on the easy usability while others may put emphasis on the production deployment and parameter optimization. Every framework will have highs and lows of their own. They would have their own areas of excellence and downfalls making the choice for the developers even more difficult. The frameworks that make the top of the list of the best ones include MXNET, Keras, PyTorch, and Tensorflow.

Going Deeper: More Insight Into How and What Convolutional Neural Networks Learn

The reason topological analysis is useful in this type of analytical challenge is that it provides a way of compressing complicated data sets into understandable and potentially actionable form. Here, as in many other data analytic problems, it is crucial to obtain an understanding of the ‘frequently occurring motifs’ within the data. The above observations suggest that topological analysis can be used to obtain control and understanding of the learning and generalization capabilities of CNN´s. There are many further ideas along these lines, which we will discuss in future posts.

Project Hydrogen, new initiative based on Apache Spark to support AI and Data Science

An introduction to Project Hydrogen: how it can assist machine learning and AI frameworks on Apache Spark and what distinguishes it from other open source projects.

Make R speak

Every wanted to make R talk to you? Now you can, with the mscstts package by John Muschelli. It provides an interface to the Microsoft Cognitive Services Text-to-Speech API (hence the name) in Azure, and you can use it to convert any short piece of text to a playable audio file, rendering it as speech using a number of different voices.

Updates to the sergeant (Apache Drill connector) Package & a look at Apache Drill 1.14.0 release

Apache Drill 1.14.0 was recently released, bringing with it many new features and a temporary incompatibility with the current rev of the MapR ODBC drivers. The Drill community expects new ODBC drivers to arrive shortly. The sergeant?? is an alternative to ODBC for R users as it provides a dplyr interface to the REST API along with a JDBC interface and functions to work directly with the REST API in a more programmatic fashion.

Bio7 2.9 Released

A new release of Bio7 is available. The new Bio7 2.9 release comes with a plethora of new R features and bugfixes.

Linear programming in R

Linear programming is a technique to solve optimization problems whose constraints and outcome are represented by linear relationships.

Beyond Basic R – Mapping

There are many different R packages for dealing with spatial data. The main distinctions between them involve the types of data they work with – raster or vector – and the sophistication of the analyses they can do. Raster data can be thought of as pixels, similar to an image, while vector data consists of points, lines, or polygons. Spatial data manipulation can be quite complex, but creating some basic plots can be done with just a few commands. In this post, we will show simple examples of raster and vector spatial data for plotting a watershed and gage locations, and link to some other more complex examples.

Remote Python and R in SQL

Did you know that you can execute R and Python code remotely in SQL Server from Jupyter Notebooks or any IDE? Machine Learning Services in SQL Server eliminates the need to move data around. Instead of transferring large and sensitive data over the network or losing accuracy on ML training with sample csv files, you can have your R/Python code execute within your database. You can work in Jupyter Notebooks, RStudio, PyCharm, VSCode, Visual Studio, wherever you want, and then send function execution to SQL Server bringing intelligence to where your data lives. This tutorial will show you an example of how you can send your python code from Juptyter notebooks to execute within SQL Server. The same principles apply to R and any other IDE as well.