A common and introductory unsupervised learning problem is that of clustering. Often, you have large datasets that you wish to organize into smaller groups, or wish to break up into logically similar groups. For instance, you can try to divide census data of household incomes into three groups: low, high, and super rich. If you feed the household income data into a clustering algorithm, you would expect to see three data points as a result, with each corresponding to the average value of your three categories. Even this one-dimensional problem of clustering household incomes may be difficult to do by hand, because you might not know where one group should end and the other should begin. You could use governmental definitions of income brackets, but there's no guarantee that those brackets are geometrically balanced; they were invented by...





















































