Supervised image classification begins with computing statistics for user-selected training sites of land cover classes and uses the results of the statistical summary to classify the image. The following section describes the IPW software for image classification.
In this implementation of IPW, training sites are represented as masks. Thus the statistics programs used with images (eg. hist, mstats) can also be used with masks. We recommend that training site data are stored in a Unix directory with the name of the class as the directory name. Files, including the mask, mask coordinates file, histograms and statistics should be stored in the class directory. This set-up utilizes the Unix directory structure, minimizes the length of file names, and keeps related files in one place.
Classification time and complexity increase with the number of features (spectral bands) used by the classification algorithm. Therefore, it is preferable to reduce the number of features by choosing the most discriminating channels. This can be done by computing the statistical divergence between spectral channels. The Jefferies-Matusita (JM) distance is the preferred distance measure, defined as (Richards, 1986):
The IPW program jmdist computes the JM distance between each class pair for any number of bands in combinations of up to four bands at a time.
Bayesian maximum likelihood classification is the most common supervised classification method used with remote sensing image data. The discriminant function for maximum likelihood classification, based on the assumption of a normal distribution representing each training class, is:
Note that this equation is computationally expensive. It involves two matrix multiplications for each pixel and for each class. The dimension of the matrix increases with each image band added to the classification. The IPW program bayes implements Bayesian maximum likelihood classification, optionally allowing for input of a priori probabilities for each class. A single non-classification threshold may be specified, so that pixels whose discriminant functions for all classes are lower than the threshold are not classified.
Clustering is a method of unsupervised image classification in which statistically similar pixels are grouped together into classes. These clusters replace the training sites used in supervised image classification. The program ustats generates an IPW statistics file containing statistics for each cluster. The class statistics generated by ustats are used with bayes to classify the image.
Classification accuracy may be determined as follows:
1) Extract test sites for each of the classes from the classified image. A test site is a small area known to be a particular class.
2) Compute a histogram of the classified test sites and print the values in the histogram for each class. This will summarize the number of pixels classified correctly and incorrectly in each class in each test site.
Since test sites are typically small, an alternate method is to use scribe to outline the test sites, display the classified image at a large magnification, and manually count the number of pixels that were classified either correctly, incorrectly, or omitted (unclassified).