Data Science, Machine Learning, Deep Learning, and Artificial Intelligence are some of the most heard about buzzwords in the modern analytical eco-space. The exponential growth of technology in this regard has simplified our lives and made us more machine dependent. The astonishing hype surrounding such technologies has prompted professionals from various disciples to hop on to the ship and consider analytics as their career option. To master Data Science or Artificial Intelligence in that regard, one needs a myriad of skills which includes Programming, Mathematics, Statistics, Probability, Machine Learning, and also Deep Learning. The most sort after languages for programming in Data Science is Python, and R with the former being regarded as the holy grail of the programming world because of its functionality, flexibility, community, and others. Python is comparatively easy to master but given its importance, it has various usages which demand certain specific areas to be mastered more efficiently compared to others. In this blog, we would learn about the virtual environments in Python and how they could be used.
About half a year ago I wrote about dqsample providing a bias free alternative to base::sample(). Meanwhile this is mostly of historic interest, since the upcomming R release 3.6.0 will see an updated sampling algorithm. However, some of the techniques used in dqsample are now part of dqrng.
Breaking down data science with Python, Spark and Optimus. Today: Data Operations for Data Science. ..::Part 1 here::.. Here we’ll learn to set-up Git, Travis CI and DVC for our project.
This post is meant to provide a concise but comprehensive overview of the concept of stationarity and of the different types of stationarity defined in academic literature dealing with time series analysis. Future posts will aim to provide similarly concise overviews of the detection of non-stationarity in time series data and of the different ways to transform non-stationary time series into stationary ones.
Secure multi-party computation and homomorphic encryption add computational overhead, but the results are well worth it! Data privacy and model parameter security are mutually protected with clever encryption schemes.
Averages when taken out of context are meaningless. Just like ‘That’s what she said’ jokes. This used to work earlier because we only had one context – the normal distribution. Not anymore. So, don’t fall for the average fallacy. The person in front of you is not the stereotype. They might conform to it, but you’ll have to find out for yourself. And, don’t talk about averages without mentioning the distribution. Explicit is better than implicit. If it’s not a normal distribution, the meaning of average changes. You go from 50% people are below average to 80% people are below average.
Bio7 can be extended with R packages, Eclipse plugins and ImageJ plugins. Another very easy option to extend the Bio7 Graphical User Interface with R actions are dynamic menus which can be used, e.g., for personalized workflows or repeating tasks. The provision of new menus and nested menus is simple and can be arranged for different tasks. Three different menu locations can be populated with custom scripts. One location is the context menu of the R-Shell view. To extend the R-Shell context menu scripts have to be copied to a default script location of the Bio7 installation. Folders in the locations constitute a new submenu, nested folders results in nested submenus. It is not necessary to restart the application because the menus are updated automatically and copied scripts are instantly available as an executable action
A few months ago, I wrote about the differences between data engineers and data scientists. I talked about their skills and common starting points. An interesting thing happened: the data scientists started pushing back, arguing that they are, in fact, as skilled as data engineers at data engineering. That was interesting because the data engineers didn’t push back saying they’re data scientists. So, I’ve spent the past few months gathering data and observing the behaviors of data scientists in their natural habitat. This post will offer more information about why a data scientist is not a data engineer.
There are numerous situations in which one would want to insert parameters in a SQL query, and there are many ways to implement templated SQL queries in python. Without going into comparing different approaches, this post explains a simple and effective method for parameterizing SQL using JinjaSql. Besides many powerful features of Jinja2, such as conditional statements and loops, JinjaSql offers a clean and straightforward way to parameterize not only the values substituted into the where and in clauses, but also SQL statements themselves, including parameterizing table and column names and composing queries by combining whole code blocks.
Housing price data provides a great introduction to machine learning. Anybody who has bought a house or even rented an apartment can easily understand the features: more space, and more rooms, generally lead to a higher price. So it ought to be easy to develop a model – but sometimes it isn’t, not because machine learning is hard but because data is messy. Also, prices for the exact same house in different neighborhoods of the same city, even only a mile away, may have significantly different prices. The best way to deal with this is to engineer the data so that the model can better handle this situation.
Faster RCNN is an object detection architecture presented by Ross Girshick, Shaoqing Ren, Kaiming He and Jian Sun in 2015, and is one of the famous object detection architectures that uses convolution neural networks like YOLO (You Look Only Once) and SSD ( Single Shot Detector).
How do companies like Amazon and Netflix know precisely what you want? Whether it’s that new set of speakers that you’ve been eyeballing, or the next Black Mirror episode – their use of predictive algorithms has made the job of selling you stuff ridiculously efficient. But as much as we’d all like a juicy conspiracy theory, no, they don’t employ psychics. They use something far more magical – mathematics. Today, we’ll look at an approach called collaborative filtering.