Andrew Ng0:00
Welcome to this free online class on machine learning. Machine learning is one of the most exciting recent technologies, and in this class you'll learn about the state of the art and also gain practice implementing and deploying these algorithms yourself. You probably use a learning algorithm dozens of times a day without knowing it. Every time you use a web search engine like Google or Bing for searching the internet, one of the reasons that works so well is because a learning algorithm implemented by Google or Microsoft has learned how to rank web pages. Every time you use Facebook or Apple's photo tagging application and it recognizes your friends' photos, that's also machine learning. Every time you read your email and your spam filter saves you from having to wade through tons of spam email, that's also a learning algorithm. For me, one of the reasons I'm excited is the AI dream of someday building machines as intelligent as you are. We're a long way away from that goal, but many AI researchers believe that the best way towards that goal is through learning algorithms that try to mimic how the human brain learns. I'll tell you a little about that too in this class. In this class you'll learn about state-of-the-art machine learning algorithms, but it turns out just knowing the algorithms and knowing the math isn't that much good if you don't also know how to actually get this stuff to work on problems that you care about. So we've also spent a lot of time developing exercises for you to implement each of these algorithms and see how they work for yourself.
So why is machine learning so prevalent today? It turns out that machine learning is a field that grew out of the field of AI, or artificial intelligence. We wanted to build intelligent machines, and it turns out that there are a few basic things that we could program a machine to do, such as how to find the shortest path from A to B, but for the most part we just did not know how to write AI programs to do the more interesting things such as web search or photo tagging or email anti-spam. There was a realization that the only way to do these things was to have a machine learn to do it by itself. So machine learning was developed as a new capability for computers, and today it touches many segments of industry and basic science. For me, I work on machine learning, and in a typical week I might end up talking to helicopter pilots, biologists, a bunch of computer systems people, so that my colleagues here at Stanford. And averaging two or three times a week, I'll get email from people in industry from Silicon Valley contacting me to have an interest in applying learning algorithms to their own problems. This is a sign of the range of problems that machine learning touches: autonomous robotics, computational biology, tons of things in Silicon Valley that machine learning is having an impact on.
Here are some modern examples of machine learning. There's database mining. One of the reasons machine learning is so prevalent is the growth of the web and the growth of automation. All this means that we have much larger data sets than ever before. So for example, tons of Silicon Valley companies are today collecting web data, also called clickstream data, and they try to use machine learning algorithms to mine this data to understand the users better and to serve the users better. That's a huge segment of Silicon Valley right now. Medical records: with the advent of automation, we now have electronic medical records. So if we can turn medical records into medical knowledge, then we can start to understand disease better. Computational biology: with automation again, biologists are collecting lots of data about gene sequences, DNA sequences, and so on, and machine learning algorithms are giving us a much better understanding of the human genome and what it means to be human. And in engineering as well, in all fields of engineering, we have larger and larger datasets that we're trying to understand using learning algorithms.
A second range of machine learning applications is ones that we cannot program by hand. So for example, I've worked on autonomous helicopters for many years. We just did not know how to write a computer program to make this helicopter fly by itself. The only thing that worked was to have a computer learn by itself how to fly this thing. Handwriting recognition: it turns out one of the reasons it is so inexpensive today to send a piece of mail across the country in the US and internationally is that when you write an envelope like this, a learning algorithm has learned how to read your handwriting so it can automatically route this envelope on its way, so it costs us a few cents to send a letter thousands of miles. And in fact, you've seen the use of natural language processing or computer vision. These are the fields of AI pertaining to understanding language or understanding images. Most of natural language processing, most of computer vision today, is applied machine learning. Learning algorithms are also widely used for self-customizing programs. Every time you go to Amazon or Netflix or iTunes Genius and it recommends new movies or products or music to you, that's a learning algorithm. If you think about it, if you have a million users, there's no way to write a million different programs for your million users. The only way for our software to give these customized recommendations is if it can learn by itself to customize itself to your preferences. Finally, learning algorithms are being used today to understand human learning and to understand the brain. We'll talk about how researchers are using those to make progress towards the big AI dream.
A few months ago, a student showed me an article on the top 12 IT skills that hiring managers can use. It looked like an older article, but at the top of this list of the 12 most desirable IT skills was machine learning. Here at Stanford, the number of recruiters who contact me to hire any graduating machine learning students is far larger than the number of machine learning students who graduate this year. So I think there's a vast unfulfilled demand for this skill set, and this is a great time to be learning about machine learning. And I hope to teach you a lot about machine learning in this class. In the next video, we'll start to give a more formal definition of what is machine learning, and we'll begin to talk about the main types of machine learning problems and algorithms. You'll pick up some of the main machine learning terminology and thought, and given a sense of the different algorithms and when each one might be appropriate.
What is machine learning? In this video, we will try to define what it is and also try to give you a sense of when you want to use machine learning. Even among machine learning practitioners, there isn't a well-accepted definition of what is and what isn't machine learning. But let me show you a couple of examples of the ways that people have tried to define it. Here's a definition of what is machine learning from Arthur Samuel. He defined machine learning as the field of study that gives computers the ability to learn without being explicitly programmed. Samuel's claim to fame was that back in the 1950s, he wrote a checkers-playing program. And the amazing thing about this checkers-playing program was that Arthur Samuel himself wasn't a very good checkers player. But what he did was he had the program play tens of thousands of games against itself. And by watching what sort of board positions tended to lead to wins and what sort of board positions tended to lead to losses, the checkers-playing program learned over time what are good board positions and what are bad board positions, and eventually learned to play checkers better than Arthur Samuel himself was able to. This was a remarkable result. Arthur Samuel himself turned out not to be a very good checkers player, but because the computer has the patience to play tens of thousands of games against itself, no human has the patience to play that many games. By doing this, the computer was able to get so much checkers-playing experience that it eventually became a better checkers player than Arthur Samuel himself.
That's a somewhat informal definition and an older one. A slightly more recent definition by Tom Mitchell, who is a friend at Carnegie Mellon, defines machine learning by saying that a well-posed learning problem is defined as follows: He says a computer program is said to learn from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P, improves with experience E. I think he came up with this definition just to make it rhyme. So in the checkers-playing example, the experience E would be the experience of having the program play tens of thousands of games against itself. The task T would be the task of playing checkers, and the performance measure P would be the probability that it wins the next game of checkers against some new opponent.
Throughout these videos, besides me trying to teach you stuff, I'll occasionally ask you a question to make sure you understand the content. Here's one on top of the definition of machine learning by Tom Mitchell. Let's say your email program watches which emails you do or do not mark as spam. So in an email client like this, you might click this spam button to report some email as spam, but not all emails. And based on which emails you mark as spam, your email program learns better how to filter spam email. What is the task T in this setting? In a few seconds, the video will pause, and when it does so, you can use your mouse to select one of these four radio buttons to let me know which answer you think is the right one.
So hopefully you got that this is the right answer: classifying emails is the task T. In fact, this definition defines the task T, the performance measure P, and the experience E. So watching you label emails as spam or non-spam would be the experience E, and the fraction of emails correctly classified might be our performance measure P. And so our task performance on the task T, as measured by the performance measure P, will improve after the experience E.
In this class, I hope to teach you about various different types of learning algorithms. There are several different types of learning algorithms. The main two types are what we call supervised learning and unsupervised learning. I'll define what these terms mean more in the next couple of videos. But it turns out that in supervised learning, the idea is we're going to teach the computer how to do something, whereas in unsupervised learning, we're going to let it learn by itself. Don't worry if these two terms don't make sense yet. In the next two videos, I'm going to say exactly what these two types of learning are. You will also hear other buzz terms such as reinforcement learning and recommender systems. These are other types of machine learning algorithms that we'll talk about later. But the two most used types of learning algorithms are probably supervised learning and unsupervised learning, and I'll define them in the next two videos. Then we'll spend all this class talking about these two types of learning algorithms.
It turns out one of the other things we spend a lot of time on in this class is practical advice for applying learning algorithms. Something that I feel pretty strongly about, and it's actually something that I don't know if any other university teachers teaching about learning algorithms do, is that giving yourself tools is equally important, or more important, than giving you the tools is to teach you how to apply these tools. I like to make an analogy to learning to become a carpenter. Imagine that someone is teaching you how to be a carpenter and they say, 'Here's a hammer, here's a saw, here's a screwdriver, here's a glue gun.' Well, that's no good, right? You have all these tools, but the more important thing is to learn how to use these tools properly. There's a huge difference between people that know how to use these machine learning algorithms versus people that don't know how to use these tools well. Here in Silicon Valley, when I go visit different companies, even at the top Silicon Valley companies, very often I see people trying to apply machine learning algorithms to some problem, and sometimes they'll have been going at it for six months. But sometimes when I look at what they're doing, I say, 'You know, I could have told you six months ago that you should be taking a learning algorithm and applying it in a slightly modified way, and your chance of success would have been much higher.' So what I'll do in this class is actually spend a lot of time talking about how, if you're actually trying to develop a machine learning system, how to make those best-practices-type decisions about the way in which you build a system, so that when you're prototyping a learning algorithm, you're less likely to end up as one of those people who end up pursuing some path for six months that someone else could have figured out just by looking at it along the way. So I'll actually spend a lot of time teaching you those sorts of best practices in machine learning and AI, and how to get this stuff to work, and how the best people do it in Silicon Valley and around the world. I hope to make you one of the best people at knowing how to design and build serious machine learning and AI systems.
So that's machine learning, and these are the main topics I hope to teach. In the next video, I'm going to define what is supervised learning, and after that what is unsupervised learning, and also start to talk about when you will use each of them.
In this video, I'm going to define what is probably the most common type of machine learning problem, which is supervised learning. I'll define supervised learning more formally later, but it's probably best to explain or start with an example of what it is, and we'll do the formal definition later. Let's say you want to predict housing prices. A while back, a student collected a data set from the city of Portland, Oregon, and let's say you plot the data set and it looks like this. Here on the horizontal axis is the size of different houses in square feet, and on the vertical axis is the prices of different houses in thousands of dollars. So given this data, let's say you have a friend who owns a house that is, say, 750 square feet, and they're hoping to sell the home and they want to know how much they can get for the house. So how can the learning algorithm help you? One thing a learning algorithm might want to do is put a straight line through the data, fit a straight line to the data, and based on that, it looks like maybe the house can sell for about 150 thousand dollars. But maybe this isn't the only learning algorithm you can use, and it might be a better one. For example, instead of fitting a straight line to the data, you might decide that it's better to fit a quadratic function, or a second-order polynomial, to this data. And if you do that, to make a prediction here, then it looks like well maybe we can sell the house for closer to $200,000. One of the things we talk about later is how to choose and how to decide: do you want to fit a straight line to the data or do you want to fit a quadratic function to the data? And there's no fair picking whichever one gives your friend the better house to sell. But each of these would be a fine example of a learning algorithm.
So this is an example of a supervised learning algorithm. And the term supervised learning refers to the fact that we gave the algorithm a data set in which the right answers were given. That is, we gave it a data set of houses in which for every example in this data set, we told it what is the right price, or what is the actual price that that house sold for. And the task of the algorithm was to just produce more of these right answers, such as for this new house that your friend may be trying to sell. To define a bit more terminology, this is also called a regression problem. And by regression problem, I mean we're trying to predict a continuous-valued output, namely the price. So technically, I guess prices can be rounded off to the nearest cent, so maybe prices are actually discrete values, but usually we think of the price of a house as a real number, a scalar value, a continuous-valued number. And the term regression refers to the fact that we're trying to predict this sort of continuous-valued attribute.
Just another supervised learning example: some friends and I were actually working on this earlier. Let's say you want to look at medical records and try to predict whether breast cancer is malignant or benign. If someone discovers a breast tumor, a lump in their breast, a malignant tumor is a tumor that is harmful and dangerous, and a benign tumor is a tumor that is harmless. So obviously people care a lot about this. Let's collect a data set, and suppose in your data set you have on your horizontal axis the size of the tumor, and on the vertical axis I'm going to plot 1 or 0, yes or no, whether or not these examples of tumors we've seen before are malignant (which is 1) or not malignant, benign (which is 0). So let's say your data set looks like this: we saw a tumor of this size, it turned out to be benign, and this size, and so on. And sadly, we also saw a few malignant tumors, one of that size, one of that size, one of that size, and so on. So in this example, I have five examples of benign tumors shown down here, and five examples of malignant tumors shown with the vertical axis value of 1. And let's say a friend who tragically has a breast tumor, and let's say her breast tumor size is maybe somewhere around this value. The machine learning question is: can you estimate what is the probability, the chance that the tumor is malignant, for your sister?
So let me introduce a bit more terminology. This is an example of a classification problem. The term classification refers to the fact that here we're trying to predict a discrete-valued output, 0 or 1, malignant or benign. And it turns out that in classification problems, sometimes you can have more than two values for the possible outputs. As a complete example, maybe there are three types of breast cancers, and so you need to try to predict a discrete-valued output 0, 1, 2, or 3, where 0 may mean benign tumors, or no cancer, and 1 may mean a type 1 cancer, 2 may mean a second type of cancer, and 3 may mean a third type of cancer. But this would also be a classification problem because we have a discrete set of outputs corresponding to, you know, cancer or cancer type 1 or type 2 or type 3.
In classification problems, there is another way to plot this data. Let me show you what I mean. I'll use a slightly different set of symbols to plot this data. So if tumor size is going to be the attribute that I'm going to use to predict malignancy or benign, I can also draw my data like this. I'm going to use different symbols to denote my benign and malignant, or my negative and positive examples. So instead of drawing crosses, I'm now going to draw circles for the benign tumors like so, and I'm going to keep using X's to denote my malignant tumors. I hope this figure makes sense. All I did was I took this data set on top and I just mapped it down to this real line like so, and started to use different symbols, circles and crosses, to denote malignant versus benign examples.
Now in this example, we use only one feature, or one attribute, namely the tumor size, in order to predict whether a tumor is malignant or benign. In other machine learning problems, when we have more than one feature, one attribute, let's say that in addition to knowing the tumor size, we know about the age of the patient and the tumor size. In that case, maybe your data set will look like this, where I may have patients with those ages and tumor sizes, and it looks like this. And different patients, they look a little different, whose tumors turn out to be malignant as denoted by the crosses. So let's say you have a friend who tragically has a tumor, and maybe their tumor size and age falls around there. So given a data set like this, what a learning algorithm may do is fit a straight line to the data to try to separate out the malignant tumors from the benign ones. The learning algorithm may decide to draw a line like that to separate out the two classes of tumors. And hopefully, with this, we can decide that your friend's tumor is more likely... since it's over there, hopefully the learning algorithm will say that your friend's tumor falls on this benign side, and therefore more likely to be benign than malignant.
In this example, we had two features, namely the age of the patient and the size of the tumor. In other machine learning problems, we will often have more features. And my friends that worked on this problem actually used other features like these: clump thickness, completeness of the breast tumor, uniformity of cell size of the tumor, uniformity of cell shape of the tumor, and so on, and other features as well. And it turns out one of the interesting learning algorithms that we'll see in this part is a learning algorithm that can deal with not just two or three or five features, but an infinite number of features. On this slide, I've listed a total of five different features, right? Two on the axes and three up here. But it turns out that for some learning problems, what you really want is not to use like three or five features, but instead you want to use an infinite number of features, an infinite number of attributes, so that your learning algorithm has lots of attributes or features or cues with which to make those predictions. So how do you deal with an infinite number of features? How do you store an infinite number of things on a computer with only finite memory? It turns out that when we talk about an algorithm called the support vector machine, there will be a neat mathematical trick that will allow a computer to deal with an infinite number of features. Imagine that I didn't just write down two features here and three features on the right, but imagine that I wrote down an infinitely long list, I just kept writing more and more features, an infinitely long list of features. It turns out we'll be able to come up with an algorithm that can do that.
So just to recap, in this class we'll talk about supervised learning. And the idea is that in supervised learning, for every example in our dataset, we are told what is the correct answer that we would have liked to have predicted for that example, such as the price of a house or whether a tumor is malignant or benign. We also talked about the regression problem, and by regression, that means that our goal is to predict a continuous-valued output. And we talked about the classification problem, where the goal is to predict a discrete-valued output.
Just a quick wrap-up question. Suppose you're running a company and you want to develop learning algorithms to address each of two problems. In the first problem, you have a large inventory of identical items. So imagine that you have thousands of copies of some identical item to sell, and you want to predict how many of these items you will sell in the next month. In the second problem, you have a lot of users, and you want to write software to examine each individual customer's account, and for each account, decide whether or not the account has been compromised. So for each of these problems, should they be treated as a classification problem or as a regression problem? When the video pauses, use your mouse to select whichever of these four options you think is the correct answer.
So hopefully you got that this is the answer. For problem one, I would treat this as a regression problem, because if I have thousands of items, I would probably just treat this as a real value, as a continuous value, and therefore the number of items I sell as a continuous value. And for the second problem, I would treat that as a classification problem, because I might say set the value I want to predict to 0 to denote the account has not been hacked, and set the value 1 to denote an account that has been hacked. So just like with breast cancers, 0 is benign, 1 is malignant. So I might set this to 0 or 1 depending on whether it's hacked, and have an algorithm try to predict each one of these two discrete values. And because that's a small number of discrete values, it is therefore treated as a classification problem. So that's it for supervised learning. And in the next video, I will talk about unsupervised learning, which is the other major category of learning algorithm.
In this video, we'll talk about the second major type of machine learning problem, called unsupervised learning. In the last video, we talked about supervised learning, where we had data sets that look like this, where each example was labeled either as a positive or a negative example, whether it was a benign or malignant tumor. So for each example in supervised learning, we were told explicitly what is the so-called right answer, whether it's benign or malignant. In unsupervised learning, we're given data that looks different, given data that looks like this, that doesn't have any labels, or that all have the same label, or no labels. So we're given a data set and we're not told what to do with it. I will not tell what each data point is. Instead, we just told, 'Here is a data set. Can you find some structure in the data?' Given this data set, an unsupervised learning algorithm might decide that the data lives in two different clusters, so there's one cluster and there's a different cluster. And the unsupervised learning algorithm may break this data into these two separate clusters. So this is called a clustering algorithm, and this turns out to be used in many places.
One example where clustering is used is in Google News. And if you haven't seen this before, you can actually go to the URL news.google.com to take a look. What Google News does is every day it goes and looks at tens of thousands or hundreds of thousands of new stories on the web, and it groups them into cohesive news stories. For example, let's look here. These URLs link to different news stories about the BP oil spill. So let's say I click on one of these URLs, and I click on one of these URLs, what I'll get to is a web page like this. Here's a Wall Street Journal article about the BP oil spill, still stories of BP's Macondo, which is the name of the well. And if you click on a different URL from that group, then you might get a different story, a CNN story about the same thing. And if you click on yet a third link, then you might get a different story, here's the UK Guardian story about the BP oil spill. So what Google News has done is looked at tens of thousands of news stories and automatically clustered them together, so that news stories that are all about the same topic get displayed together.
It turns out that clustering algorithms and unsupervised learning algorithms are used in many other problems as well. Here's one on understanding genomics, just an example of DNA microarray data. The idea is we have a group of different individuals, and for each of them, we measure how much they do or do not have a certain gene type. We measure how much certain genes are expressed. So these colors, red, green, gray, and so on, they show the degree to which different individuals do or do not have a specific gene. And what you can do is then run a clustering algorithm to group individuals into different categories, into different types of people. So this is unsupervised learning because we're not telling the algorithm in advance that these are type 1 people, those are type 2 persons, those are type 3 persons, and so on. Instead, what we're saying is, 'Here's a bunch of data. I don't know what's in this data. I don't know who's in what type. I don't even know what the different types of people are. But can you automatically find structure in the data? Can you automatically cluster the individuals into these types that I don't know in advance?' Because we're not giving the algorithm the right answers for the examples in my data set, this is unsupervised learning.
Unsupervised learning, or clustering, is used for a bunch of other applications. It's used to organize large computer clusters. I had some friends looking at large data centers, large computer clusters, and trying to figure out which machines tend to work together. And if you can put those machines together, you can make your data center work more efficiently. Another application is social network analysis. So given knowledge about which friends you email the most, or given your Facebook friends or your Google+ circles, can we automatically identify which are cohesive groups of friends, or groups of people that all know each other? Market segmentation: many companies have huge databases of customer information. So can you look at this customer data set and automatically discover market segments, and automatically group your customers into different market segments, so that you can automatically and more efficiently sell or market to different market segments together? Again, this is unsupervised learning because we have all this customer data, but we don't know in advance what are the market segments for the customers in our data set. We don't know in advance who is in market segment one, who is in market segment two, and so on. But we have to let the algorithm discover all this just from the data. Finally, it turns out that unsupervised learning is also used for surprisingly astronomical data analysis, and these clustering algorithms give surprisingly interesting or useful theories of how galaxies are formed.
All these are examples of clustering, which is just one type of unsupervised learning. Let me tell you about another one. I'm going to tell you about the cocktail party problem. So you've been to cocktail parties before, right? Where you can imagine there's a party room with people all sitting around, all talking at the same time, and there are all these overlapping voices because everyone is talking at the same time, and it's almost hard to hear the person in front of you. So maybe at a cocktail party with two people, two people talking at the same time, and someone at a small cocktail party, and we're going to put two microphones in the room. So there are microphones, and because these microphones are at two different distances from the speakers, each microphone records a different combination of these two speakers' voices. Maybe speaker 1 is a little louder in microphone 1, and maybe speaker 2 is a little bit louder in microphone 2, because the two microphones are at different positions relative to the two speakers. But each microphone records an overlapping combination of both speakers' voices. So here's an actual recording of two speakers recorded by a researcher. Let me play for you the first one. The first microphone sounds like... [audio plays] All right, maybe not the most interesting cocktail party, two people counting from one to ten in two languages. There you go. What you just heard was the first microphone recording. Here's the second recording. [audio plays] So what we can do is take these two microphone recordings and give them to an unsupervised learning algorithm called the cocktail party algorithm, and tell the algorithm, 'Find structure in this data for me.' And what the algorithm will do is listen to these audio recordings and say, 'You know something, the two audio recordings are being added together, or they're being combined together to produce these recordings that we had.' Moreover, what the cocktail party algorithm will do is separate out these two audio sources that were being added or combined together to form our recordings. And in fact, here's the first output of the cocktail party algorithm: [audio plays] So it separated out the English voice in one of the recordings. And here's the second output: [audio plays].
To give you one more example, here's another recording of a similar situation. Here's the first microphone. [audio plays] Okay, so the poor guy's gone home from the cocktail party and he's now sitting in a room by himself talking to his radio. Here's the second microphone recording. [audio plays] When you give these two microphone recordings to the same algorithm, what it does is again say, 'It sounds like there are two audio sources.' And moreover, the algorithm says, 'Here is the first of the audio sources I found.' [audio plays] So that wasn't perfect, it's got the voice but also some music in there. And here's the second output of the algorithm. [audio plays] Not too bad. In that second output, it managed to get rid of the voice entirely and just cleaned up the music and got rid of the counting from one to ten.
You might look at an unsupervised learning algorithm like this and ask how complicated it is to implement. It seems like in order to build this application, to do this audio processing, you'd have to write a ton of code, or maybe link in to a bunch of C++ or Java libraries to process audio. It seems like a really complicated program to do this audio separating. It turns out the algorithm to do what you just heard can be done with one line of code, shown right here. It did take researchers a long time to come up with this line of code, so I'm not saying this is an easy problem. But it turns out that if you use the right programming environment, many learning algorithms can be really short programs. So this is also why in this class we're going to use the Octave programming environment. Octave is free, open-source software. And using a tool like Octave or MATLAB, many learning algorithms become just a few lines of code to implement. Later in the class, I will teach you a little bit about how to use Octave, and you'll be implementing some of these algorithms in Octave. Or if you have MATLAB, you can use that too. It turns out that the important workflow for a lot of machine learning algorithms is that we first prototype our software in Octave, because that often makes it incredibly fast to implement and run algorithms. Each of these functions, like for example the SVD function, which stands for singular value decomposition, is a linear algebra routine that is just built into Octave. If you were trying to do this in C++ or Java, this would be many, many lines of code linking complex C++ or Java libraries. So you can implement this stuff in C++ or Java or Python, it's just much more complicated to do so in some of those languages. What I've seen after having taught machine learning for about a decade now is that you learn much faster if you use Octave and a prototyping environment. And if you use Octave as your learning tool and prototyping tool for learning algorithms, you can prototype learning algorithms much more quickly. And in fact, what many people will do in Silicon Valley companies is use an algorithm like Octave to first prototype the learning algorithm, and only after you've gotten it to work, then you might re-implement it in C++ or Java or whatever. It turns out that by doing things this way, you can often get your algorithm to work much faster than if you were starting out in C++. So I know that as an instructor, I get to say 'trust me on this one' only a finite number of times, but for those who have never used these Octave-type prototyping environments before, I'm going to ask you to trust me on this one and say that you will, I think, your time is one of the most valuable resources. And having seen lots of people do this, I think you as a machine learning researcher or machine learning developer will be much more productive if you learn this stuff and prototype just in Octave and in some other language.
Finally, to wrap up this video, a quick review question for you. We talked about unsupervised learning, which is a learning setting where you give the algorithm a ton of data and just ask it to find structure in the data. For each of the following four examples, which ones do you think would be an unsupervised learning algorithm as opposed to a supervised learning problem? For each of the four checkboxes on the left, check the ones for which you think an unsupervised learning algorithm would be appropriate, and then click the button on the lower right to check your answer. So when the video pauses, answer the question on this slide.
So hopefully you remember the spam filter problem. If you have labeled data, spam vs. non-spam email, we treat this as a supervised learning problem. The news story example, that's exactly the Google News example that we saw in this video. We saw how you can use a clustering algorithm to cluster news articles together, so that's unsupervised. The market segmentation example I talked a little bit earlier, do that as an unsupervised learning problem, because I'm just going to give my algorithm data and ask it to discover market segments automatically. And the final example, diabetes, well that's actually just like the breast cancer example from the last video, only instead of good and bad cancer tumors, or benign and malignant tumors, we instead have diabetes or not. And so we will solve that as a supervised learning problem, just like we did for the breast tumor data. So that's it for unsupervised learning. And in the next video, we'll delve more into specific learning algorithms and start to talk about just how the algorithms work and how you can go about implementing them.