In the last post, we talked about how to estimate the coefficients or weights of linear regression. We estimated weights which give the minimum error. Essentially it is an optimization problem where we have to find the minimum error(cost) and the corresponding coefficients. In a way, all supervised learning algorithms have optimization at the crux of it where we will have to find the model parameters which gives the minimum error(cost). In this post, I will explain some of the most commonly used methods to find the maxima or minima of the cost function.
This problem appeared as an assignment in the online coursera course Convolution Neural Networks by Prof Andrew Ng, (deeplearing.ai). The description of the problem is taken straightway from the assignment.
Digital Twins is a concept based in IoT but requiring the skills of machine learning and potentially AI. It’s not completely new but it is integral to Gartner’s vision of the digital enterprise and makes the Hype Cycle for 2017. It’s a major enabler of event processing as opposed to traditional request processing.
I once posted about making use of narrative objects. In this blog, I will be discussing an algorithm that supports the creation of these objects. I call it my “Infereferencing Algorithm”: this term is most easily pronounced with a slight pause between “infer” and “referencing.” I consider this a useful and widely applicable algorithm although I don’t believe it operates well in a relational database environment. Instead, I use “mass data files”: these contain unstructured lumps of symbols or tags. Infereferencing is a process of extracting aspects of a larger story (defined by the story itself) using much smaller pieces (defined by my specific needs). Perhaps the most straightforward analogy is in how people do online searches: they present a string of text to the search engine (defined by their own needs); and the engine will create a listing of links that seem applicable – although the underlying articles follow their own sphere of discourse. Keywords might already exist for the resources; or the search engine could pre-compile a list of applicable terms for faster access in the future. But these indexes should not confuse the fundamental object differences separating the submission from the outcomes. I will elaborate.
I had a working, short script that took 3 1/2 minutes to run. While this may be fine if you only need to run it once, I needed to run it hundreds of time for simulations. My first attempt to do so ended about four hours after I started the code, with 400 simulations left to go, and I knew I needed to get some help. This post documents the iterative process of improving the performance of the function, culminating in a runtime of .64 seconds for 10,000 iterations, a speed-up of more than 100,000x.
Today, I had the opportunity to help someone over at the R for Data Science Slack group (read more about this group here) and I thought that the question asked could make for an interesting blog post, so here it is! Disclaimer: the way I’m doing things here is totally not optimal, but I want to illustrate how to map functions over nested lists. But I show the optimal way at the end, so for the people that are familiar with purrr don’t get mad at me.
The objective of this post is to create a method which easily combines loss runs, or listings of insurance claims, into triangles. Using only Excel, the common method is to create links between the excel files which must be updated manually for each new evaluation. This is prone to human error and is time-consuming. Using a script to merge the files first and then create a triangle saves time and is more consistent.
In this episode of the Data Show, I spoke with Kristian Hammond, chief scientist of Narrative Science and professor of EECS at Northwestern University. He has been at the forefront of helping companies understand the power, limitations, and disruptive potential of AI technologies and tools. In a previous post on machine learning, I listed types of uses cases (a taxonomy) for machine learning that could just as well apply to enterprise applications of AI. But how do you identify good use cases to begin with?
We’re happy to announce the third release of the tibbletime package. This is a huge update, mainly due to a complete rewrite of the package. It contains a ton of new functionality and a number of breaking changes that existing users need to be aware of. All of the changes have been well documented in the NEWS file, but it’s worthwhile to touch on a few of them here and discuss the future of the package. We’re super excited so let’s check out the vision for tibbletime and its new functionality!