There are many different types of anomalies, and determining which is a good and bad anomaly is challenging. In Industrial IoT, one main objectives is the automatic monitoring and detection of these abnormal events, or changes and shifts in the collected data, including all the techniques aimed at identifying data patterns that deviate from the expected behavior.
Develop and refine your skills with 100+ new live online trainings we opened up for April and May on our learning platform.
This post describes the tools I currently use for working with data. People often ask me to recommend specific tools, and I always hesitate, because so much boils down to personal preference. I recently added a workshop to the DSS lineup providing an overview of popular tools for working with data. The core idea is that researchers have a lot of choices available when it comes to choosing tools to implement a reproducible workflow. For example, it doesn’t really matter whether you choose to learn R or Python; the important thing is that you write and document code of some kind so that your analysis can be reproduced. Similarly, it doesn’t matter much whether you choose to use RStudio or Jupyter notebooks; the important thing is that you have a development and authoring environment that encourages good research practices. Still, inquiring minds want to know, what do you use?
In the last months, I started increasingly using Rmd documents for preparing scientific reports, blog posts, etcetera. While I really like the flexibility offered by the system, one thing that I thought could be improved is the support for easily inserting tables. So, “inspired” also by the recent addition of the excellent insert image addin in blogdown, I decided to give it a go and try to implement some kind of addin to facilitate table insertion in Rmd documents. After struggling a bit due to my rather nonexistent shiny skills, in the end I managed to obtain a “basic but useful” (IMO) addin. Let’s see how it works:
In this blogpost, we will show 6 keyword extraction techniques which allow to find keywords in plain text. Keywords are frequently occuring words which occur somehow together in plain text. Common examples are New York, Monte Carlo, Mixed Models, Brussels Hoofdstedelijk Gewest, Public Transport, Central Station, p-values, …
In early March, the Bay Area useR Group was able to hold an R and TensorFlow mini-conference on Google’s new Sunnyvale campus. Pete Mohanty, a Stanford researcher and frequent BARUG speaker, lead off with a talk on his recent kerasformula package, which allows R users to call a keras-based neural net with R formula objects. Pete’s slides show an example of using using a regression-style formula with the kerasformula::kms() function to fit a sequential TensorFlow model.
A recent survey of over 16,000 data professionals showed that the most common challenges to data science included dirty data (36%), lack of data science talent (30%) and lack of management support (27%). Also, data professionals reported experiencing around three challenges in the previous year. A principal component analysis of the 20 challenges studied showed that challenges can be grouped into five categories.