Structural Equation Modeling with lavaan in R

When working with data, we often want to create models to predict future events, but we also want an even deeper understanding of how our data is connected or structured. In this course, you will explore the connectedness of data using using structural equation modeling (SEM) with the R programming language using the lavaan package. SEM will introduce you to latent and manifest variables and how to create measurement models, assess measurement model accuracy, and fix poor fitting models. During the course, you will explore classic SEM datasets, such as the Holzinger and Swineford (1939) and Bollen (1989) datasets. You will also work through a multi-factor model case study using the Wechsler Adult Intelligence Scale. Following this course, you will be able to dive into your data and gain a much deeper understanding of how it all fits together.

Experimental Design in R

Experimental design is a crucial part of data analysis in any field, whether you work in business, health or tech. If you want to use data to answer a question, you need to design an experiment! In this course you will learn about basic experimental design, including block and factorial designs, and commonly used statistical tests, such as the t-tests and ANOVAs. You will use built-in R data and real world datasets including the CDC NHANES survey, SAT Scores from NY Public Schools, and Lending Club Loan Data. Following the course, you will be able to design and analyze your own experiments!

Working with Modules in Python

As a beginner, you start working with Python on the interpreter, later when you need to write longer programs you start writing scripts. As your program grows more in the size you may want to split it into several files for easier maintenance as well as reusability of the code. The solution to this is Modules. You can define your most used functions in a module and import it, instead of copying their definitions into different programs. A module can be imported by another program to make use of its functionality. This is how you can use the Python standard library as well.

Repository of helpful Pandas/Python code snippets

Search our repository of helpful code snippets for data processing and analysis (using Python/Pandas and SQL)

Distributed Intelligence

Convergence of Artificial Intelligence and Blockchain will lead to new possibilities. This article talks about how Artificial Intelligence is and can add value to the existing blockchain technology.

Machine Learning in Google BigQuery

Google BiqQuery allows interactive analysis of large datasets, making it easy for businesses to share meaningful insights and develop solutions based on customer analytics. However, many of the businesses that are using BigQuery aren´t using machine learning to help better understand the data they are generating. This is because data analysts, proficient in SQL, may not have the traditional data science background needed to apply machine learning techniques. Today we´re announcing BigQuery ML, a capability inside BigQuery that allows data scientists and analysts to build and deploy machine learning models on massive structured or semi-structured datasets. BigQuery ML is a set of simple SQL language extensions which enables users to utilize popular ML capabilities, performing predictive analytics like forecasting sales and creating customer segmentations right at the source, where they already store their data. BigQuery ML additionally sets smart defaults automatically and takes care of data transformation, leading to a seamless and easy to use experience with great results.

A Unified Approach to Analytics with Apache Spark

How your data scientists and engineers can build models and data pipelines rapidly while collaborating with the business – download the ebook now.
Experiencing performance and productivity issues with your data science and data engineering efforts on Apache Spark? Spending too much time managing your infrastructure instead of building machine-learning models, analytic apps, and data products? See how Viacom, HP, and connect data science and data engineering using Databricks, founded by the original creators of Apache Spark. The results? Faster performance, scaled data processes, simplified infrastructure, streamlined workflows, and greater collaboration. Ready to discover how unifying your data science, engineering, and analytics can benefit your company?

Why your machine learning project will fail

Okay, so the title of the article is somewhat apocalyptic and I don´t wish ill on anyone. I am rooting for you and hope your project succeeds beyond expectations. The purpose of this article is not to put a voodoo curse on you and assure your project´s failure. Rather, it a visit to the most common reasons why data science projects do fail and hopefully help you avoid the pitfalls.
1. Asking the wrong question
2. Trying to use it to solve the wrong problem
3. Not having enough data
4. Not having the right data
5. Having too much data
6. Hiring the wrong people
7. Using the wrong tools
8. Not having the right model
9. Not having the right yardstick

How to Build a Data Science Portfolio

How do you get a job in data science? Knowing enough statistics, machine learning, programming, etc to be able to get a job is difficult. One thing I have found lately is quite a few people may have the required skills to get a job, but no portfolio. While a resume matters, having a portfolio of public evidence of your data science skills can do wonders for your job prospects. Even if you have a referral, the ability to show potential employers what you can do instead of just telling them you can do something is important. This post will include links to where various data science professionals (data science managers, data scientists, social media icons, or some combination thereof) and others talk about what to have in a portfolio and how to get noticed. With that, let´s get started!

Knowledge discovery for enabling smart Internet of Things: A survey

The use of knowledge discovery on the Internet of Things (IoT) and its allied domains is undeniably one of the most indispensable ones, which results in optimized placement architectures, efficient routing protocols, device energy savings, and enhanced security measures for the implementation. The absence of knowledge discovery in IoT results in just an implementation of large-scale sensor networks, which generates a huge amount of data, and which needs, an often under-optimized, processing for actionable outputs. In this survey, we explore various domains of IoT for which knowledge discovery is inseparable from the application, and show how it benefits the overall implementation of the IoT architecture.

9 Ways AI and Intelligent Automation Affect the C-Suite

Today’s organizations are upping their competitive game using various forms of machine intelligence. From chatbots and virtual assistants to robotics processing automation (RPA) and deep learning, businesses and entire industries are transforming the way they operate, although the results are mixed. Proceeding responsibly requires members of the C-suite to consider the potential opportunities, challenges and risks, including the impacts on people and processes.

Web Scraping Using Python

In this tutorial, you’ll learn how to extract data from the web, manipulate and clean data using Python’s Pandas library, and data visualize using Python’s Matplotlib library.

A Practitioner’s Guide to Natural Language Processing (Part I) – Processing & Understanding Text

Unstructured data, especially text, images and videos contain a wealth of information. However, due to the inherent complexity in processing and analyzing this data, people often refrain from spending extra time and effort in venturing out from structured datasets to analyze these unstructured sources of data, which can be a potential gold mine.

Singularity as a software distribution / deployment tool

In this blog post, I´ll explain how someone can take advantage of Singularity to make R or Python packages available as an image file to users. This is a necessity if the specific R or Python package is difficult to install across different operating systems making that way the installation process cumbersome. Lately, I´ve utilized the reticulate package in R (it provides an interface between R and Python) and I realized from first hand how difficult it is, in some cases, to install R and Python packages and make them work nicely together in the same operating system. This blog post by no means presents the potential of Singularity or containerization tools, such as docker, but it´s mainly restricted to package distribution / deployment.

Singularity – Enabling users to have full control of their environment.

Starting a Singularity container ‘swaps’ out the host operating system environment for one the user controls! Let’s say you are running Ubuntu on your workstation or server, but you have an application which only runs on Red Hat Enterprise Linux 6.3. Singularity can instantly virtualize the operating system, without having root access, and allow you to run that application in its native environment!

Explaining Black-Box Machine Learning Models – Code Part 2: Text classification with LIME

This is code that will encompany an article that will appear in a special edition of a German IT magazine. The article is about explaining black-box machine learning models. In that article I´m showcasing three practical examples:
1.Explaining supervised classification models built on tabular data using caret and the iml package
2.Explaining image classification models with keras and lime
3.Explaining text classification models with xgboost and lime

Data Science For Business: 3 Reasons You Need To Learn The Expected Value Framework

One of the most difficult and most critical parts of implementing data science in business is quantifying the return-on-investment or ROI. In this article, we highlight three reasons you need to learn the Expected Value Framework, a framework that connects the machine learning classification model to ROI. Further, we´ll point you to a new video we released on the Expected Value Framework: Modeling Employee Churn With H2O that was recently taught as part of our flagship course: Data Science For Business (DS4B 201). The video serves as an overview of the steps involved in calculating ROI from reducing employee churn with H2O, tying the key H2O functions to the process. Last, we´ll go over some Expected Value Framework FAQ´s that are commonly asked in relation to applying Expected Value to machine learning classification problems in business.
Reason #1: Classification Machine Learning Algorithms Often Maximize the Wrong Metric
Reason #2: The Solution is Maximizing for Expected Value
Reason #3: Expected Value can test for Variability in Assumptions