In a world where each of us is surrounded by data and its insights, Data Science does seem a promising field of work and research to many. Although there are many books and courses which help us dive into the area right from the scratch, it is also essential for the adepts to have a separate book containing all their commonly used terminologies and techniques suitable for their level, so that they don’t have to scavenge through the whole book or course right till the end. Also, for those wanting to switch from an R or SAS background to Python, this book – Python Data Science Handbook- will be something to grab a hold on.
Digitization has changed the way we process and analyze information. There is an exponential increase in online availability of information. From web pages to emails, science journals, e-books, learning content, news and social media are all full of textual data. The idea is to create, analyze and report information fast. This is when automated text classification steps up.
We’re excited to announce the release of a new open-source platform that allows data science teams to easily run and manage models in production at scale. Seldon Core focuses on solving the last step in any machine learning project to help companies put models into production, to solve real-world problems and maximise the return on investment. Data scientists are freed to focus on creating better models while devops teams are able to manage deployments more effectively using tools they understand.
Deep learning offers companies a new set of techniques to solve complex analytical problems and drive rapid innovations in artificial intelligence. By feeding a deep learning algorithm with massive volumes of data, models can be trained to perform complex tasks like speech and image analysis.
The more experienced I become as a data scientist, the more convinced I am that data engineering is one of the most critical and foundational skills in any data scientist’s toolkit. I find this to be true for both evaluating project or job opportunities and scaling one’s work on the job. In an earlier post, I pointed out that a data scientist’s capability to convert data into value is largely correlated with the stage of her company’s data infrastructure as well as how mature its data warehouse is. This means that a data scientist should know enough about data engineering to carefully evaluate how her skills are aligned with the stage and need of the company. Furthermore, many of the great data scientists I know are not only strong in data science but are also strategic in leveraging data engineering as an adjacent discipline to take on larger and more ambitious projects that are otherwise not reachable. Despite its importance, education in data engineering has been limited. Given its nascency, in many ways the only feasible path to get training in data engineering is to learn on the job, and it can sometimes be too late. I am very fortunate to have worked with data engineers who patiently taught me this subject, but not everyone has the same opportunity. As a result, I have written up this beginner’s guide to summarize what I learned to help bridge the gap.
I’m a big fan using R to simulate data. When I’m trying to understand a data set, my first step is sometimes to simulate data from a model and compare the results to the data, before I go down the path of fitting an analytical model directly. Simulations are easy to code in R, but they can sometimes take a while to run — especially if there are a bunch of parameters you want to explore, which in turn requires a bunch of simulations.