Hello everyone and welcome to Part 14 of our Data Analysis with Python and Pandas for Real Estate investing tutorial series. We've come quite a long ways here, and the next, and final, macro step that we want to take here involves looking into economic indicators to see their impact on housing prices, or the HPI.
Text based version of this tutorial and sample code: pythonprogramming.net/economic-factors-data-analysis-python-pandas-tutorial/
There are two major economic indicators that come to mind out the gate: S&P 500 index (stock market) and GDP (Gross Domestic Product). I suspect the S&P 500 to be more correlated than the GDP, but the GDP is usually a better overall economic indicator, so I may be wrong. Another macro indicator that I suspect might have value here is the unemployment rate. If you're unemployed, you're probably not getting that mortgage. We'll see though. We've been through the process for adding more data points, so I do not see much point in dragging you all through this process. There will be one new thing to note, however. In the HPI_Benchmark() function, we're changing the "United States" column to be US_HPI. This makes a bit more sense when we're bringing in other values now.
For GDP, I couldn't find one that encompassed the full time frame. I am sure you can find a dataset, somewhere, with this data, maybe even on Quandl. Sometimes you have to do some digging. I also had trouble finding a nice long-term monthly unemployment rate. I did find an unemployment level, but we really want more of a percentage/rate, otherwise we need to divide the unemployment level by the population. We could do that if we decide unemployment rate is worth having, but we'll work with what we get first.
In this 16-video tutorial series from PythonProgramming.net, learn how to employ the Pandas library in Python to conduct data analysis operations. Pandas is a Python module, and Python is the programming language that we're going to use. The Pandas module is a high performance, highly efficient, and high level data analysis library.
At its core, it is very much like operating a headless version of a spreadsheet, like Excel. Most of the datasets you work with will be what are called dataframes. You may be familiar with this term already, it is used across other languages, but, if not, a dataframe is most often just like a spreadsheet. Columns and rows, that's all there is to it! From here, we can utilize Pandas to perform operations on our data sets at lightning speeds.