Machine Learning with Python

Video Lectures

Displaying all 42 video lectures.
Lecture 1
Introduction to Machine Learning
Play Video
Introduction to Machine Learning
The objective of this course is to give you a wholistic understanding of machine learning, covering theory, application, and inner workings of supervised, unsupervised, and deep learning algorithms.

In this series, we'll be covering linear regression, K Nearest Neighbors, Support Vector Machines (SVM), flat clustering, hierarchical clustering, and neural networks.

For each major algorithm that we cover, we will discuss the high level intuitions of the algorithms and how they are logically meant to work. Next, we'll apply the algorithms in code using real world data sets along with a module, such as with Scikit-Learn. Finally, we'll be diving into the inner workings of each of the algorithms by recreating them in code, from scratch, ourselves, including all of the math involved. This should give you a complete understanding of exactly how the algorithms work, how they can be tweaked, what advantages are, and what their disadvantages are.

In order to follow along with the series, I suggest you have at the very least a basic understanding of Python. If you do not, I suggest you at least follow the Python 3 Basics tutorial until the module installation with pip tutorial. If you have a basic understanding of Python, and the willingness to learn/ask questions, you will be able to follow along here with no issues. Most of the machine learning algorithms are actually quite simple, since they need to be in order to scale to large datasets. Math involved is typically linear algebra, but I will do my best to still explain all of the math. If you are confused/lost/curious about anything, ask in the comments section on YouTube, the community here, or by emailing me. You will also need Scikit-Learn and Pandas installed, along with others that we'll grab along the way.

Machine learning was defined in 1959 by Arthur Samuel as the "field of study that gives computers the ability to learn without being explicitly programmed." This means imbuing knowledge to machines without hard-coding it.

https://pythonprogramming.net/machine-learning-tutorial-pyth...
https://twitter.com/sentdex
https://www.facebook.com/pythonprogra...
https://plus.google.com/+sentdex
Lecture 2
Regression Intro
Play Video
Regression Intro
To begin, what is regression in terms of us using it with machine learning? The goal is to take continuous data, find the equation that best fits the data, and be able forecast out a specific value. With simple linear regression, you are just simply doing this by creating a best fit line.

From here, we can use the equation of that line to forecast out into the future, where the 'date' is the x-axis, what the price will be.

A popular use with regression is to predict stock prices. This is done because we are considering the fluidity of price over time, and attempting to forecast the next fluid price in the future using a continuous dataset.

Regression is a form of supervised machine learning, which is where the scientist teaches the machine by showing it features and then showing it was the correct answer is, over and over, to teach the machine. Once the machine is taught, the scientist will usually "test" the machine on some unseen data, where the scientist still knows what the correct answer is, but the machine doesn't. The machine's answers are compared to the known answers, and the machine's accuracy can be measured. If the accuracy is high enough, the scientist may consider actually employing the algorithm in the real world.

https://pythonprogramming.net
https://twitter.com/sentdex
https://www.facebook.com/pythonprogramming.net/
https://plus.google.com/+sentdex
Lecture 3
Regression Features and Labels
Play Video
Regression Features and Labels
We'll be using the numpy module to convert data to numpy arrays, which is what Scikit-learn wants. We will talk more on preprocessing and cross_validation when we get to them in the code, but preprocessing is the module used to do some cleaning/scaling of data prior to machine learning, and cross_ alidation is used in the testing stages. Finally, we're also importing the LinearRegression algorithm as well as svm from Scikit-learn, which we'll be using as our machine learning algorithms to demonstrate results.

At this point, we've got data that we think is useful. How does the actual machine learning thing work? With supervised learning, you have features and labels. The features are the descriptive attributes, and the label is what you're attempting to predict or forecast. Another common example with regression might be to try to predict the dollar value of an insurance policy premium for someone. The company may collect your age, past driving infractions, public criminal record, and your credit score for example. The company will use past customers, taking this data, and feeding in the amount of the "ideal premium" that they think should have been given to that customer, or they will use the one they actually used if they thought it was a profitable amount.

Thus, for training the machine learning classifier, the features are customer attributes, the label is the premium associated with those attributes.

https://pythonprogramming.net
https://twitter.com/sentdex
https://www.facebook.com/pythonprogramming.net/
https://plus.google.com/+sentdex
Lecture 4
Regression Training and Testing
Play Video
Regression Training and Testing
Welcome to part four of the Machine Learning with Python tutorial series. In the previous tutorials, we got our initial data, we transformed and manipulated it a bit to our liking, and then we began to define our features. Scikit-Learn does not fundamentally need to work with Pandas and dataframes, I just prefer to do my data-handling with it, as it is fast and efficient. Instead, Scikit-learn actually fundamentally requires numpy arrays. Pandas dataframes can be easily converted to NumPy arrays, so it just so happens to work out for us!

It is a typical standard with machine learning in code to define X (capital x), as the features, and y (lowercase y) as the label that corresponds to the features. As such, we can define our features and labels like so.

https://pythonprogramming.net
https://twitter.com/sentdex
https://www.facebook.com/pythonprogramming.net/
https://plus.google.com/+sentdex
Lecture 5
Regression forecasting and predicting
Play Video
Regression forecasting and predicting
In this video, make sure you define the X's like so. I flipped the last two lines by mistake:

X = np.array(df.drop(['label'],1))
X = preprocessing.scale(X)
X_lately = X[-forecast_out:]
X = X[:-forecast_out:]

To forecast out, we need some data. We decided that we're forecasting out 10% of the data, thus we will want to, or at least *can* generate forecasts for each of the final 10% of the dataset. So when can we do this? When would we identify that data? We could call it now, but consider the data we're trying to forecast is not scaled like the training data was. Okay, so then what? Do we just do preprocessing.scale() against the last 10%? The scale method scales based on all of the known data that is fed into it. Ideally, you would scale both the training, testing, AND forecast/predicting data all together. Is this always possible or reasonable? No. If you can do it, you should, however. In our case, right now, we can do it. Our data is small enough and the processing time is low enough, so we'll preprocess and scale the data all at once.

In many cases, you wont be able to do this. Imagine if you were using gigabytes of data to train a classifier. It may take days to train your classifier, you wouldn't want to be doing this every...single...time you wanted to make a prediction. Thus, you may need to either NOT scale anything, or you may scale the data separately. As usual, you will want to test both options and see which is best in your specific case.

With that in mind, let's handle all of the rows from the definition of X onward.
https://pythonprogramming.net/forecasting-predicting-machine...
https://twitter.com/sentdex
https://www.facebook.com/pythonprogramming.net/
https://plus.google.com/+sentdex
Lecture 6
Pickling and Scaling
Play Video
Pickling and Scaling
In the previous Machine Learning with Python tutorial we finished up making a forecast of stock prices using regression, and then visualizing the forecast with Matplotlib. In this tutorial, we'll talk about some next steps.

I remember the first time that I was trying to learn about machine learning, and most examples were only covering up to the training and testing part, totally skipping the prediction part. Of the tutorials that did the training, testing, and predicting part, I did not find a single one that explained saving the algorithm. With examples, data is generally pretty small overall, so the training, testing, and prediction process is relatively fast. In the real world, however, data is likely to be larger, and take much longer for processing. Since no one really talked about this important stage, I wanted to definitely include some information on processing time and saving your algorithm.

While our machine learning classifier takes a few seconds to train, there may be cases where it takes hours or even days to train a classifier. Imagine needing to do that every day you wanted to forecast prices, or whatever. This is not necessary, as we can just save the classifier using the Pickle module.

https://pythonprogramming.net
https://twitter.com/sentdex
https://www.facebook.com/pythonprogramming.net/
https://plus.google.com/+sentdex
Lecture 7
Regression How it Works
Play Video
Regression How it Works
Welcome to the seventh part of our machine learning regression tutorial within our Machine Learning with Python tutorial series. Up to this point, you have been shown the value of linear regression and how to apply it with Scikit Learn and Python, now we're going to dive into how it is calculated. While I do not believe it is necessary to dig into all of the math that goes into every machine learning algorithm (have you dug into the source code of your other favorite modules to see how they do every little thing?), linear algebra is essential to machine learning, and it is useful to understand the true building blocks that machine learning is built upon.

The objective of linear algebra is to calculate relationships of points in vector space. This is used for a variety of things, but one day, someone got the wild idea to do this with features of a dataset. We can too! Remember before when we defined the type of data that linear regression was going to work on was called "continuous" data? This is not so much due to what people just so happen to use linear regression for, it is due to the math that makes it up. Simple linear regression is used to find the best fit line of a dataset. If the data isn't continuous, there really isn't going to be a best fit line

https://pythonprogramming.net
https://twitter.com/sentdex
https://www.facebook.com/pythonprogramming.net/
https://plus.google.com/+sentdex
Lecture 8
How to program the Best Fit Slope
Play Video
How to program the Best Fit Slope
Welcome to the 8th part of our machine learning regression tutorial within our Machine Learning with Python tutorial series. Where we left off, we had just realized that we needed to replicate some non-trivial algorithms into Python code in an attempt to calculate a best-fit line for a given dataset.

Before we embark on that, why are we going to bother with all of this? Linear Regression is basically the brick to the machine learning building. It is used in almost every single major machine learning algorithm, so an understanding of it will help you to get the foundation for most major machine learning algorithms. For the enthusiastic among us, understanding linear regression and general linear algebra is the first step towards writing your own custom machine learning algorithms and branching out into the bleeding edge of machine learning, using what ever the best processing is at the time. As processing improves and hardware architecture changes, the methodologies used for machine learning also change. The more recent rise in neural networks has had much to do with general purpose graphics processing units. Ever wonder what's at the heart of an artificial neural network? You guessed it: linear regression.

https://pythonprogramming.net
https://twitter.com/sentdex
https://www.facebook.com/pythonprogramming.net/
https://plus.google.com/+sentdex
Lecture 9
How to program the Best Fit Line
Play Video
How to program the Best Fit Line
Welcome to the 9th part of our machine learning regression tutorial within our Machine Learning with Python tutorial series. We've been working on calculating the regression, or best-fit, line for a given dataset in Python. Previously, we wrote a function that will gather the slope, and now we need to calculate the y-intercept.

https://pythonprogramming.net
https://twitter.com/sentdex
https://www.facebook.com/pythonprogramming.net/
https://plus.google.com/+sentdex
Lecture 10
R Squared Theory
Play Video
R Squared Theory
Welcome to the 10th part of our of our machine learning regression tutorial within our Machine Learning with Python tutorial series. We've just recently finished creating a working linear regression model, and now we're curious what is next. Right now, we can easily look at the data, and decide how "accurate" the regression line is to some degree. What happens, however, when your linear regression model is applied within 20 hierarchical layers in a neural network? Not only this, but your model works in steps, or windows, of say 100 data points at a time, within a dataset of 5 million datapoints. You're going to need some sort of automated way of discovering how good your best fit line actually is.

https://pythonprogramming.net
https://twitter.com/sentdex
https://www.facebook.com/pythonprogramming.net/
https://plus.google.com/+sentdex
Lecture 11
Programming R Squared
Play Video
Programming R Squared
Now that we know what we're looking for, let's actually program the coefficient of determination in Python.

https://pythonprogramming.net
https://twitter.com/sentdex
https://www.facebook.com/pythonprogramming.net/
https://plus.google.com/+sentdex
Lecture 12
Testing Assumptions
Play Video
Testing Assumptions
We've been learning about regression, and even coded our own very simple linear regression algorithm. Along with that, we've also built a coefficient of determination algorithm to check for the accuracy and reliability of our best-fit line. We've discussed and shown how a best-fit line may not be a great fit, but also explained why our example was correct directionally, even if it was not exact. Now, however, we are at the point where we're using two top-level algorithms, which are subsequently comprised of a handful of smaller algorithms. As we continue building this hierarchy of algorithms, we might wind up finding ourselves in trouble if just one of them have a tiny error, so we want to test our assumptions.

https://pythonprogramming.net/sample-data-testing-machine-le...
https://twitter.com/sentdex
https://www.facebook.com/pythonprogramming.net/
https://plus.google.com/+sentdex
Lecture 13
Classification w/ K Nearest Neighbors Intro
Play Video
Classification w/ K Nearest Neighbors Intro
We begin a new section now: Classification. In covering classification, we're going to cover two major classificiation algorithms: K Nearest Neighbors and the Support Vector Machine (SVM). While these two algorithms are both classification algorithms, they acheive results in different ways.

https://pythonprogramming.net
https://twitter.com/sentdex
https://www.facebook.com/pythonprogramming.net/
https://plus.google.com/+sentdex
Lecture 14
K Nearest Neighbors Application
Play Video
K Nearest Neighbors Application
In the last part we introduced Classification, which is a supervised form of machine learning, and explained the K Nearest Neighbors algorithm intuition. In this tutorial, we're actually going to apply a simple example of the algorithm using Scikit-Learn, and then in the subsquent tutorials we'll build our own algorithm to learn more about how it works under the hood.
To exemplify classification, we're going to use a Breast Cancer Dataset, which is a dataset donated to the University of California, Irvine (UCI) collection from the University of Wisconsin-Madison. UCI has a large Machine Learning Repository.

https://pythonprogramming.net
https://twitter.com/sentdex
https://www.facebook.com/pythonprogramming.net/
https://plus.google.com/+sentdex
Lecture 15
Euclidean Distance
Play Video
Euclidean Distance
In the previous tutorial, we covered how to use the K Nearest Neighbors algorithm via Scikit-Learn to achieve 95% accuracy in predicting benign vs malignant tumors based on tumor attributes. Now, we're going to dig into how K Nearest Neighbors works so we have a full understanding of the algorithm itself, to better understand when it will and wont work for us.
We will come back to our breast cancer dataset, using it on our custom-made K Nearest Neighbors algorithm and compare it to Scikit-Learn's, but we're going to start off with some very simple data first. K Nearest Neighbors boils down to proximity, not by group, but by individual points. Thus, all this algorithm is actually doing is computing distance between points, and then picking the most popular class of the top K classes of points nearest to it. There are various ways to compute distance on a plane, many of which you can use here, but the most accepted version is Euclidean Distance, named after Euclid, a famous mathematician who is popularly referred to as the father of Geometry, and he definitely wrote the book (The Elements) on it.

https://pythonprogramming.net
https://twitter.com/sentdex
https://www.facebook.com/pythonprogramming.net/
https://plus.google.com/+sentdex
Lecture 16
Creating Our K Nearest Neighbors A
Play Video
Creating Our K Nearest Neighbors A
Now that we understand the intuition behind how we calculate the distance/proximity between feature sets, we're ready to begin building our own version of K Nearest Neighbors in code from scatch.

https://pythonprogramming.net
https://twitter.com/sentdex
https://www.facebook.com/pythonprogramming.net/
https://plus.google.com/+sentdex
Lecture 17
Writing our own K Nearest Neighbors in Code
Play Video
Writing our own K Nearest Neighbors in Code
In the previous tutorial, we began structuring our K Nearest Neighbors example, and here we're going to finish it. The idea of K nearest neighbors is to just take a "vote" of the closest known data featuresets. Whichever class is closest overall, is the class we assign to the unknown data.

https://pythonprogramming.net
https://twitter.com/sentdex
https://www.facebook.com/pythonprogramming.net/
https://plus.google.com/+sentdex
Lecture 18
Applying our K Nearest Neighbors Algorithm
Play Video
Applying our K Nearest Neighbors Algorithm
Now that we have our own custom K Nearest Neighbors that we learned how to program ourselves, and we have tested it against some simple data, we're ready to test it on a real dataset.

https://pythonprogramming.net
https://twitter.com/sentdex
https://www.facebook.com/pythonprogramming.net/
https://plus.google.com/+sentdex
Lecture 19
Final thoughts on K Nearest Neighbors
Play Video
Final thoughts on K Nearest Neighbors
We're going to cover a few final thoughts on the K Nearest Neighbors algorithm here, including the value for K, confidence, speed, and the pros and cons of the algorithm now that we understand more about how it works.

https://pythonprogramming.net
https://twitter.com/sentdex
https://www.facebook.com/pythonprogramming.net/
https://plus.google.com/+sentdex
Lecture 20
Support Vector Machine Intro and Application
Play Video
Support Vector Machine Intro and Application
In this tutorial, we introduce the theory of the Support Vector Machine (SVM), which is a classification learning algorithm for machine learning. We also show how to apply the SVM using Scikit-Learn on some familiar data.

https://pythonprogramming.net
https://twitter.com/sentdex
https://www.facebook.com/pythonprogramming.net/
https://plus.google.com/+sentdex
Lecture 21
Understanding Vectors
Play Video
Understanding Vectors
In this tutorial, we cover some basics on vectors, as they are essential with the Support Vector Machine.

https://pythonprogramming.net
https://twitter.com/sentdex
https://www.facebook.com/pythonprogramming.net/
https://plus.google.com/+sentdex
Lecture 22
Support Vector Assertion
Play Video
Support Vector Assertion
In this tutorial, we cover the assertion for the calculation of a support vector within the Support Vector Machine.

https://pythonprogramming.net
https://twitter.com/sentdex
https://www.facebook.com/pythonprogramming.net/
https://plus.google.com/+sentdex
Lecture 23
Support Vector Machine Fundamentals
Play Video
Support Vector Machine Fundamentals
In this tutorial, we cover some more of the fundamentals of the Support Vector Machine.

https://pythonprogramming.net
https://twitter.com/sentdex
https://www.facebook.com/pythonprogramming.net/
https://plus.google.com/+sentdex
Lecture 24
Support Vector Machine Optimization
Play Video
Support Vector Machine Optimization
In this tutorial, we discuss the optimization problem that is the Support Vector Machine, as well as how we intend to solve it ourselves.

https://pythonprogramming.net
https://twitter.com/sentdex
https://www.facebook.com/pythonprogramming.net/
https://plus.google.com/+sentdex
Lecture 25
Creating an SVM from scratch
Play Video
Creating an SVM from scratch
Welcome to the 25th part of our machine learning tutorial series and the next part in our Support Vector Machine section. In this tutorial, we're going to begin setting up or own SVM from scratch.

Before we dive in, however, I will draw your attention to a few other options for solving this constraint optimization problem:

First, the topic of constraint optimization is massive, and there is quite a bit of material on the subject. Even just our subsection: Convex Optimization, is massive. A starting place might be: https://web.stanford.edu/~boyd/cvxbook/bv_cvxbook.pdf. For a starting place for constraint optimization in general, you could also check out http://www.mit.edu/~dimitrib/Constrained-Opt.pdf

Within the realm of Python specifically, the CVXOPT package has various convex optimization methods available, one of which is the quadratic programming problem we have (found @ cvxopt.solvers.qp).

Also, even more specifically there is libsvm's Python interface, or the libsvm package in general. We are opting to not make use of any of these, as the optimization problem for the Support Vector Machine IS basically the entire SVM problem.

Now, to begin our SVM in Python.

https://pythonprogramming.net
https://twitter.com/sentdex
https://www.facebook.com/pythonprogramming.net/
https://plus.google.com/+sentdex
Lecture 26
SVM Training
Play Video
SVM Training
In this support vector machine from scratch video, we talk about the training/optimization problem.

Additional Resources:
Convex Optimization Book: https://web.stanford.edu/~boyd/cvxbook/bv_cvxbook.pdf

Sequential Minimal Optimization book: http://research.microsoft.com/pubs/68391/smo-book.pdf

More SMO: http://research.microsoft.com/pubs/69644/tr-98-14.pdf

CVXOPT (Convex Optimization Module for Python): http://cvxopt.org/

https://pythonprogramming.net
https://twitter.com/sentdex
https://www.facebook.com/pythonprogramming.net/
https://plus.google.com/+sentdex
Lecture 27
SVM Optimization
Play Video
SVM Optimization
In this support vector machine from scratch video, we talk about the training/optimization problem.

https://pythonprogramming.net
https://twitter.com/sentdex
https://www.facebook.com/pythonprogramming.net/
https://plus.google.com/+sentdex
Lecture 28
Completing SVM from Scratch
Play Video
Completing SVM from Scratch
In this machine learning with the support vector machine (SVM) tutorial, we cover completing our SVM from scratch.

https://pythonprogramming.net
https://twitter.com/sentdex
https://www.facebook.com/pythonprogramming.net/
https://plus.google.com/+sentdex
Lecture 29
Kernels Introduction
Play Video
Kernels Introduction
In this machine learning tutorial, we introduce the concept of Kernels. Kernels can be used with the Support Vector Machine in order to take a new perspective and hopefully allow us to translate into further dimensions in order to find a linearly separable case.

https://pythonprogramming.net
https://twitter.com/sentdex
https://www.facebook.com/pythonprogramming.net/
https://plus.google.com/+sentdex
Lecture 30
Why Kernels
Play Video
Why Kernels
Once we've determined that we can use Kernels, the next question is of course why would we bother using kernels when we can use some other function to transform our data into more dimensions. The point of using Kernels is to be able to perform a calculation (inner product in this case) in another dimension without actually needing to work in that dimension.

https://pythonprogramming.net
https://twitter.com/sentdex
https://www.facebook.com/pythonprogramming.net/
https://plus.google.com/+sentdex
Lecture 31
Soft Margin SVM
Play Video
Soft Margin SVM
In reality, you may find that you either cannot find a linearly separable dimension for your dataset for machine learning, or you may find that your support vector machine has significant overfitment to your data. You know you have over-fitment if you have a large percentage of your dataset as support vectors. The soft-margin SVM allows for some "wiggle room" with separation.

https://pythonprogramming.net
https://twitter.com/sentdex
https://www.facebook.com/pythonprogramming.net/
https://plus.google.com/+sentdex
Lecture 32
Soft Margin SVM and Kernels with CVXOPT
Play Video
Soft Margin SVM and Kernels with CVXOPT
In this tutorial, we cover the Soft Margin SVM, along with Kernels and quadratic programming with CVXOPT all in one quick tutorial using some example code from:
http://www.mblondel.org/journal/2010/09/19/support-vector-ma...

Visualizing the conversion of many dimensions back to 2D: https://www.youtube.com/watch?v=3liCbRZPrZA

Quadratic programming with CVXOPT: http://cvxopt.org/userguide/coneprog.html#quadratic-programm...
Docs qp example: http://cvxopt.org/examples/tutorial/qp.html

Another CVXOPT tutorial: https://courses.csail.mit.edu/6.867/wiki/images/a/a7/Qp-cvxo...

https://pythonprogramming.net
https://twitter.com/sentdex
https://www.facebook.com/pythonprogramming.net/
https://plus.google.com/+sentdex
Lecture 33
SVM Parameters
Play Video
SVM Parameters
In this concluding Support Vector Machine (SVM) tutorial, we cover one last topic, which is how to separate more than 2 classes using either a One-vs-Rest method or One-vs-One. After this, we cover the parameters for the SVM via Scikit-Learn: http://scikit-learn.org/stable/modules/generated/sklearn.svm... as a review of what we've learned so far.

https://pythonprogramming.net
https://twitter.com/sentdex
https://www.facebook.com/pythonprogramming.net/
https://plus.google.com/+sentdex
Lecture 34
Clustering Introduction
Play Video
Clustering Introduction
In this tutorial, we shift gears and introduce the concept of clustering. Clustering is form of unsupervised machine learning, where the machine automatically determines the grouping for data. There are two major forms of clustering: Flat and Hierarchical. Flat clustering allows the scientist to tell the machine how many clusters to come up with, where hierarchical clustering allows the machine to determine the groupings.

https://pythonprogramming.net
https://twitter.com/sentdex
https://www.facebook.com/pythonprogramming.net/
https://plus.google.com/+sentdex
Lecture 35
Handling Non-Numeric Data
Play Video
Handling Non-Numeric Data
In this machine learning tutorial, we cover how to work with non-numerical data. This useful with any form of machine learning, all of which require data to be in numerical form, even when the real world data is not always in numerical form.

Titanic Dataset: https://pythonprogramming.net/static/downloads/machine-learn...

https://pythonprogramming.net
https://twitter.com/sentdex
https://www.facebook.com/pythonprogramming.net/
https://plus.google.com/+sentdex
Lecture 36
K Means with Titanic Dataset
Play Video
K Means with Titanic Dataset
In this machine learning tutorial we cover applying the K Means clustering algorithm to the Titanic Dataset.

https://pythonprogramming.net
https://twitter.com/sentdex
https://www.facebook.com/pythonprogramming.net/
https://plus.google.com/+sentdex
Lecture 37
Custom K Means
Play Video
Custom K Means
In this machine learning tutorial, we create our own custom K Means clustering algorithm from scratch in Python.

https://pythonprogramming.net
https://twitter.com/sentdex
https://www.facebook.com/pythonprogramming.net/
https://plus.google.com/+sentdex
Lecture 38
K Means from Scratch
Play Video
K Means from Scratch
In this machine learning tutorial, we improve our custom K Means clustering algorithm from scratch in python by creating a dynamically weighted bandwidth rather than a single, static, bandwidth.

https://pythonprogramming.net
https://twitter.com/sentdex
https://www.facebook.com/pythonprogramming.net/
https://plus.google.com/+sentdex
Lecture 39
Mean Shift Intro
Play Video
Mean Shift Intro
Mean Shift is very similar to the K-Means algorithm, except for one very important factor: you do not need to specify the number of groups prior to training. The Mean Shift algorithm finds clusters on its own. For this reason, it is even more of an "unsupervised" machine learning algorithm than K-Means.

The way Mean Shift works is to go through each featureset (a datapoint on a graph), and proceed to do a hill climb operation. Hill Climbing is just as it sounds: The idea is to continually increase, or go up, until you cannot anymore. We don't have for sure just one local maximal value. We might have only one, or we might have ten. Our "hill" in this case will be the number of featuresets/datapoints within a given radius. The radius is also called a bandwidth, and the entire window is your Kernel. The more data within the window, the better. Once we can no longer take another step without decreasing the number of featuresets/datapoints within the radius, we take the mean of all data in that region and we have located a cluster center. We do this starting from each data point. Many data points will lead to the same cluster center, which should be expected, but it is also possible that other data points will take you to a completely separate cluster center.

https://pythonprogramming.net
https://twitter.com/sentdex
https://www.facebook.com/pythonprogramming.net/
https://plus.google.com/+sentdex
Lecture 40
Mean Shift with Titanic Dataset
Play Video
Mean Shift with Titanic Dataset
We continue the topic of clustering and unsupervised machine learning with Mean Shift, this time applying it to our Titanic dataset.

There is some degree of randomness here, so your results may not be the same. You can probably re-run the program to get similar data if you don't get something similar, however.

We're going to take a look at the Titanic dataset via clustering with Mean Shift. What we're interested to know is whether or not Mean Shift will automatically separate passengers into groups or not. If so, it will be interesting to inspect the groups that are created. The first obvious curiosity will be the survival rates of the groups found, but, then, we will also poke into the attributes of these groups to see if we can understand why the Mean Shift algorithm decided on the specific groups.

https://pythonprogramming.net
https://twitter.com/sentdex
https://www.facebook.com/pythonprogramming.net/
https://plus.google.com/+sentdex
Lecture 41
Mean Shift from Scratch
Play Video
Mean Shift from Scratch
In this machine learning tutorial, we cover how to create our own Mean Shift clustering algorithm from scratch in Python.

https://pythonprogramming.net
https://twitter.com/sentdex
https://www.facebook.com/pythonprogramming.net/
https://plus.google.com/+sentdex
Lecture 42
Mean Shift Dynamic Bandwidth
Play Video
Mean Shift Dynamic Bandwidth
In this machine learning tutorial, we cover the idea of a dynamically weighted bandwidth with our Mean Shift clustering algorithm

https://pythonprogramming.net
https://twitter.com/sentdex
https://www.facebook.com/pythonprogramming.net/
https://plus.google.com/+sentdex