
Lecture Description
In this Data Analysis with Pandas and Python tutorial series, we're going to show how quickly we can take our Pandas dataset in the dataframe and convert it to, for example, a numpy array, which can then be fed through a variety of other data analysis Python modules. The example that we're going to use here is Scikit-Learn, or SKlearn. In order to do this, you will need to install it:
pip install sklearn
From here, we're almost already done. For machine learning to take place, at least in the supervised form, we need only a couple things. First, we need "features." In our case, features are things like current HPI, maybe the GDP, and so on. Then you have "labels." Labels are assigned to the feature "sets," where a feature set is the collective GDP, HPI, and so on for any given "label." Our label, in this case, is either a 1 or a 0, where 1 means the HPI increased in the future, and a 0 means it did not.
Sample code and text-based tutorial: pythonprogramming.net/scikit-learn-sklearn-machine-learning-data-analysis-python-pandas-tutorial/
pythonprogramming.net
twitter.com/sentdex
Course Index
- Introduction to Pandas
- Pandas Basics
- IO Basics
- Building dataset
- Concatenating and Appending dataframes
- Joining and Merging Dataframes
- Pickling
- Percent Change and Correlation Tables
- Resampling
- Handling Missing Data
- Rolling statistics
- Applying Comparison Operators to DataFrame
- Joining 30 year mortgage rate
- Adding other economic indicators
- Rolling Apply and Mapping Functions
- Scikit Learn Incorporation
Course Description
In this 16-video tutorial series from PythonProgramming.net, learn how to employ the Pandas library in Python to conduct data analysis operations. Pandas is a Python module, and Python is the programming language that we're going to use. The Pandas module is a high performance, highly efficient, and high level data analysis library.
At its core, it is very much like operating a headless version of a spreadsheet, like Excel. Most of the datasets you work with will be what are called dataframes. You may be familiar with this term already, it is used across other languages, but, if not, a dataframe is most often just like a spreadsheet. Columns and rows, that's all there is to it! From here, we can utilize Pandas to perform operations on our data sets at lightning speeds.