Глосарій

Inter-quartile range
R
Allocation
Array
Churn
Conditional expectation
Conditional variance
Head
Hands-on ML
Hadley Wickham
Iris dataset
3D in Matplotlib
Sets
rpy2
Splatting

Виберіть одне з ключових слів ліворуч ...

The Data Science PipelineIntroduction

Час читання: ~5 min

In this mini-course, we will introduce a collection of skills commonly applied to solve data problems in industry and science. These skills correspond to stages of a typical data science project: we acquire data, wrangle it into a form conducive to further analysis, visualize the data to better understand it, model the data to gain further insight and make predictions about the process that generated the data, and communicate our results to stakeholders.

We will be using the Python data science ecosystem for developing the computational pipeline skills: Pandas for data wrangling, Plotly for data visualization, and Scikit-Learn for modeling. These packages are popular enough to be a good investment of your time even if you eventually settle into some other toolchain, because the experience will help you in interviews and when collaborating with the Python users you will inevitably encounter.

Bruno Bruno