Welcome to Part 13 of our Data Analysis with Python and Pandas, using Real Estate investing as an example. At this point, we've learned quite a bit about what Pandas has to offer us, and we'll come up here with a bit of a challenge! As we've covered so far, we can make relatively low-risk investments based on divergence between highly correlated state pairs and probably do just fine. We'll cover testing this strategy later on, but, for now, let's look into acquiring the other necessary data that comprises housing values: Interest rates. Now, there are many different types of mortgage rates both in the way interest is accrued as well as the time-frame for the loan. Opinions vary over the years, and depending on the current market situation, on whether you want a 10 year, 15 year, or 30 year mortgage. Then you have to consider if you want an adjustable rate, or maybe along the way you decide you want to re-finance your home.
At the end of the data, all of this data is finite, but ultimately will likely be a bit too noisy. For now, let's just keep it simple, and look into the 30 year conventional mortgage rate. Now, this data should be very negatively correlated with the House Price Index (HPI). Before even bothering with this code, I would automatically assume and expect that the correlation wont be as negatively strong as the higher-than-90% that we were getting with state HPI correlation, certainly less than -0.9, but also it should be greater than -0.5. The interest rate is of course important, but correlation to the overall HPI was so very strong because these were very similar statistics. The interest rate is of course related, but not as directly as other HPI values, or the US HPI.
Sample code and text-based tutorial: pythonprogramming.net/joining-mortgage-rate-data-analysis-python-pandas-tutorial/
In this 16-video tutorial series from PythonProgramming.net, learn how to employ the Pandas library in Python to conduct data analysis operations. Pandas is a Python module, and Python is the programming language that we're going to use. The Pandas module is a high performance, highly efficient, and high level data analysis library.
At its core, it is very much like operating a headless version of a spreadsheet, like Excel. Most of the datasets you work with will be what are called dataframes. You may be familiar with this term already, it is used across other languages, but, if not, a dataframe is most often just like a spreadsheet. Columns and rows, that's all there is to it! From here, we can utilize Pandas to perform operations on our data sets at lightning speeds.