
Lecture Description
Welcome to another data analysis with Python and Pandas tutorial series, where we become real estate moguls. In this tutorial, we're going to be covering the application of various rolling statistics to our data in our dataframes.
One of the more popular rolling statistics is the moving average. This takes a moving window of time, and calculates the average or the mean of that time period as the current value. In our case, we have monthly data. So a 10 moving average would be the current value, plus the previous 9 months of data, averaged, and there we would have a 10 moving average of our monthly data. Doing this is Pandas is incredibly fast. Pandas comes with a few pre-made rolling statistical functions, but also has one called a rolling_apply. This allows us to write our own function that accepts window data and apply any bit of logic we want that is reasonable. This means that even if Pandas doesn't officially have a function to handle what you want, they have you covered and allow you to write exactly what you need. Let's start with a basic moving average, or a rolling_mean as Pandas calls it. You can check out all of the Moving/Rolling statistics from Pandas' documentation.
Text tutorial and sample code: pythonprogramming.net/rolling-statistics-data-analysis-python-pandas-tutorial/
pythonprogramming.net
twitter.com/sentdex
Course Index
- Introduction to Pandas
- Pandas Basics
- IO Basics
- Building dataset
- Concatenating and Appending dataframes
- Joining and Merging Dataframes
- Pickling
- Percent Change and Correlation Tables
- Resampling
- Handling Missing Data
- Rolling statistics
- Applying Comparison Operators to DataFrame
- Joining 30 year mortgage rate
- Adding other economic indicators
- Rolling Apply and Mapping Functions
- Scikit Learn Incorporation
Course Description
In this 16-video tutorial series from PythonProgramming.net, learn how to employ the Pandas library in Python to conduct data analysis operations. Pandas is a Python module, and Python is the programming language that we're going to use. The Pandas module is a high performance, highly efficient, and high level data analysis library.
At its core, it is very much like operating a headless version of a spreadsheet, like Excel. Most of the datasets you work with will be what are called dataframes. You may be familiar with this term already, it is used across other languages, but, if not, a dataframe is most often just like a spreadsheet. Columns and rows, that's all there is to it! From here, we can utilize Pandas to perform operations on our data sets at lightning speeds.