Welcome to another data analysis with Python and Pandas tutorial. In this tutorial, we're going to be talking about smoothing out data by removing noise. There are two main methods to do this. The most popular method used is what is called resampling, though it might take many other names. This is where we have some data that is sampled at a certain rate. For us, we have the Housing Price Index sampled at a one-month rate, but we could sample the HPI every week, every day, every minute, or more, but we could also resample at every year, every 10 years, and so on.
Another environment where resampling almost always occurs is with stock prices, for example. Stock prices are intra-second. What winds up happening though, is usually stock prices are resampled to minute data at the lowest for free data. You can buy access to live data, however. On a long-term scale, usually the data will be sampled daily, or even every 3-5 days. This is often done to keep the size of the data being transferred low. For example, over the course of, say, one year, intra-second data is usually in the multiples of gigabytes, and transferring all of that at once is unreasonable and people would be waiting minutes or hours for pages to load.
Using our current data, which is currently sampled at once a month, how might we sample it instead to once every 6 months, or 2 years? Try to think about how you might personally write a function that might perform that task, it's a fairly challenging one, but it can be done. That said, it's a fairly computationally inefficient job, but Pandas has our backs and does it very fast.
Sample code and text tutorial for this video: pythonprogramming.net/resample-data-analysis-python-pandas-tutorial/
In this 16-video tutorial series from PythonProgramming.net, learn how to employ the Pandas library in Python to conduct data analysis operations. Pandas is a Python module, and Python is the programming language that we're going to use. The Pandas module is a high performance, highly efficient, and high level data analysis library.
At its core, it is very much like operating a headless version of a spreadsheet, like Excel. Most of the datasets you work with will be what are called dataframes. You may be familiar with this term already, it is used across other languages, but, if not, a dataframe is most often just like a spreadsheet. Columns and rows, that's all there is to it! From here, we can utilize Pandas to perform operations on our data sets at lightning speeds.