DYCOMS-II Satellite: NOAA POES AVHRR NRL Cloud Classification Product (JPG) 1.0 General Information The NOAA POES AVHRR NRL Cloud Classification Product (JPG) is one of several satellite products collected as part of the Dynamics and Chemistry of Marine Stratocumulus Phase II: Entrainment Studies (DYCOMS-II) project field catalog operated by the University Corporation for Atmospheric Research/Joint Office for Science Support (UCAR/JOSS; http://www.joss.ucar.edu/dycoms/catalog/). The Naval Research Laboratory (NRL) utilizes the National Oceanic and Atmospheric Administration (NOAA) Polar Orbiting Environmental Satellite (POES) Advanced Very High Resolution Radiometer (AVHRR) from the NOAA-14 satellite to develop cloud classifications. Two products are included in this data set. The five class product groups clouds into five classes labeled low, middle, high, vertical, and clear. The 11 class product groups clouds into 11 classes based on cloud types. The products cover the period from 1 - 31 July 2001 and cover varying portions of the western United States and Eastern Pacific Ocean. The products are available at NOAA-14 satellite overpass times which during DYCOMS-II were typically 1200-1500 UTC and 2300-0300 UTC. The products were acquired from the NRL (http://kauai.nrlmry.navy.mil/sat-bin/clouds/nrl/clouds_west). All images are in jpg format. 2.0 Data Contact Paul M. Tag (tag@nrlmry.navy.mil) Rich Bankert (bankert@nrlmry.navy.mil) 3.0 Product Information 3.1 History Over the past ten years, NRL Monterey has applied several artificial intelligence technologies to address Navy meteorological problems. These technologies include expert systems (Peak and Tag, 1989; Fett et al., 1997; Kuciauskas et al., 1997), machine learning (Tag and Peak, 1996), and pattern recognition, the latter now commonly called computer vision. We have applied computer vision primarily to tasks involving the meteorological analysis of satellite imagery: cloud types; cloud systems (Peak and Tag, 1992, 1994; Bankert, 1994b; Bankert and Tag, 1996); and tropical cyclones (Bankert and Tag, 1997). Because of the effects of clouds on Navy tactical air operations, we have expended considerable effort in automatically typing clouds. This kind of effort is particularly important to U. S. Navy forecasters, whose frequent tour rotations preclude the buildup of imagery interpretation expertise at remote, world-wide locations, particularly aboard ship. Because of the availability (for our research) of NOAA Advanced Very High Resolution Radiometer (AVHRR) data, this imagery was chosen as the basis for our cloud classification algorithm, for which classifications are presented in this data set. Development of this algorithm was conducted in two phases, corresponding to overwater (Bankert, 1994a; Bankert and Aha, 1996) and overland (Tag and Bankert, 1998) imagery. 3.2 Cloud Databases Water and Land Cloud Databases An automated cloud classification procedure is essentially a pattern matching technique in which an unknown cloud imagery sample is compared to samples of known type. The first step in developing a classifier (that uses supervised learning) is to create a data base of manually-classified cloud images. This step is the most important of the classification process without a database of accurately-typed images, there is little hope of creating an accurate, robust classifier. Eleven standardized cloud classes (including clear) were chosen (see Table 1). The overwater and overland databases for these classes were developed by Crosiar (1993) and Tag and Bankert (1998), respectively. (A portion of the Crosiar database includes western Atlantic and eastern Pacific cloud samples classified by Professor Chuck Wash and associates at the Naval Postgraduate School (NPS) in Monterey.) For the overwater (excluding the NPS samples) /overland samples, a team of experts (Robin Brody, Robert Fett, Ron Miller, and Dennis Perryman--all NRL employees at the time)/(Robin Brody (NRL), and Ron Englebretson and Robert Fett of Science Applications International Corp.) independently classified numerous (3625/7912) cloud samples from 7/12 worldwide geographic areas (see Tables 2 and 3). In general, three out of four experts for the water and two out of three experts for the land had to agree on a sample before it was included in the database. The final numbers of developmental/training samples were 1834 and 1605 for water and land surfaces, respectively. Table 1. Cloud classes used for typing imagery Cirrus (Ci) Cirrocumulus (Cc) Cirrostratus (Cs) Altocumulus (Ac) Altostratus (As) Stratocumulus (Sc) Stratus (St) Cumulus (Cu) Cumulus Congestus (CuC) Cumulonimbus (Cb) Clear (Cl) Tables 2 and 3 not included in this document please see: http://kauai.nrlmry.navy.mil/clouds/background.html 3.3 Classifiers OVERWATER/OVERLAND CLASSIFIERS The classifications shown on these web pages utilize both the overwater and overland classifiers to classify each 16X16 pixel square in an unregistered image. Along a coast line, the choice of classifier is dependent upon where (land or water) the majority of pixels in a square reside. The 16X16 pixel area is the minimum area that can be used in order to accommodate texture features in the classification. CLASSIFICATION Classifications are performed using a 1-Nearest Neighbor (1-NN) algorithm that was "trained" independently for the water and land situations, using various computer vision characteristics, or features, computed from a visible (Channel 1) and an infrared (Channel 4) channel of the AVHRR. The specific features that provide the highest classification accuracy for this data were selected by a feature selection routine developed by Dr. David Aha from the NRL AI Center in Washington, D.C. This routine, which uses backward sequential selection, is embedded with a 1-NN classifier. Features selected using this process are then used with the 1-NN in operations. All features, for both overwater and overland classifications fall into three types, as defined by the developers: spectral, textural, and physical. From 110 initial features for water and 185 initial features for land, the Aha feature selection program selected ten features for both the water and land classifiers. These features and their types are listed in Tables 4 and 5. Textural measures, representing the spatial distribution of pixel gray level values within the image, are GLDV, referring to gray-level difference vector, and SADH, referring to sum and difference histogram computations. Table 4. Selected water features Feature Channel ----------------------------------- Spectral: Pixel value maximum Visible Pixel value median Visible Pixel value minimum Infrared Pixel value median Infrared Pixel value mean Infrared Textural: GLDV mean diff. (4X4 mean) Visible GLDV mean diff. (16X16) Visible GLDV mean diff. (4X4 min.) Infrared GLDV cluster shade (16X16) Visible Physical: Latitude -------- ----------------------------------- Table 5. Selected land features Feature Channel ---------------------------------- Spectral: Pixel value standard dev Visible Textural: SADH entropy (4X4 mean) Infrared SADH mean sum (4X4 max.) Visible SADH mean sum (4X4 max.) Infrared SADH mean sum (4X4 mean) Infrared GLDV entropy (4X4 mean) Visible Physical: Avg. cloud albedo Visible Latitude --------- Region --------- Month --------- ------------------------------------ REGISTERED VS. UNREGISTERED Both the water and land classifiers were developed from unregistered satellite data. As a result, the unregistered classifications are one of the output choices. In addition, a registered classification is also available. This option is a west coast reprojection of the unregistered cloud classifications onto a Mercator surface. 3.4 Accuracy Using the appropriate selected features to represent a testing sample, in conjunction with the training samples, a leave-one-out testing can be used to provide a reasonable measure of how well the algorithms will perform on unseen data (i.e., a new image). A "leave-one-out" test is an extreme version of cross validation. Whereas, for example, 10-fold cross validation removes 10% of the data set each iteration to be used as testing data (with the remaining 90% used for training), with this process repeated 10 times, a leave-one-out cross validation repeats the training and testing the same number of times as there are data samples. For example, if there were 1000 data samples, the training and testing process would be repeated 1000 times, with each sample serving as an independent test based on a training set of 999. The average of the 1000 tests would provide an estimate of the algorithm?s accuracy on new data independent of the developmental process. ELEVEN-CLASS ACCURACIES Tables 6 and 7 provide leave-one-out testing accuracies for the overwater and overland databases, respectively, using the 1-NN classifier. The overall, eleven-class, classification accuracies for the water and land classifiers are 86.4 and 80.8%, respectively. Table 6. Confusion matrix (leave-one-out test) for overwater data base (overall accuracy--86.4%). This table shows relative accuracies, based on the training data, for each of the cloud classes. Table rows represent actual classes and columns the predicted. As an illustration, consider the row for Cirrus (Ci). Of the 212 training cases, 188 (89%) were analyzed correctly, with 1 misclassified as Cirrocumulus (Cc), 15 misclassified as Cirrostratus (Cs), etc. Cloud Ci Cc Cs Ac As Sc St Cu CuC Cb Cl % No ------------------------------------------------------- Ci 188 1 15 3 1 2 2 89 212 Cc 1 66 2 1 2 92 72 Cs 29 3 124 4 5 1 75 166 Ac 1 88 13 8 6 1 6 72 123 As 2 2 1 7 131 2 3 1 4 1 85 154 Sc 2 4 2 217 6 11 8 1 87 251 St 1 2 4 4 13 197 2 1 1 88 225 Cu 8 4 107 4 87 123 CuC 1 3 2 9 2 3 116 1 85 137 Cb 3 3 5 1 4 133 89 149 Cl 1 2 1 218 98 222 1834 Table 7. Confusion matrix (leave-one-out test) for overland data base (overall accuracy-- 80.8%). This table shows relative accuracies, based on the training data, for each of the cloud classes. Table rows represent actual classes and columns the predicted. As an illustration, consider the row for Cirrus (Ci). Of the 199 training cases, 148 (74%) were analyzed correctly, with 22 misclassified as Cirrocumulus (Cc), 13 as Cirrostratus (Cs), etc. Cloud Ci Cc Cs Ac As Sc St Cu CuC Cb Cl % No Ci 148 22 13 6 5 3 2 74 199 Cc 18 53 4 3 68 78 Cs 15 4 93 3 1 80 116 Ac 5 53 18 10 1 1 2 59 90 As 5 3 16 71 3 2 1 70 101 Sc 6 3 131 10 7 4 1 81 162 St 6 11 83 2 1 2 79 105 Cu 1 6 156 17 2 86 182 CuC 1 5 19 83 2 76 110 Cb 5 4 6 2 6 67 74 90 Cl 1 1 4 6 1 359 97 372 1605 FIVE-CLASS ACCURACIES The classifications depicted in the web display are shown in both eleven-class and five-class displays. For many users, a classification based upon height primarily, provides a sufficient depiction without the meteorological detail provided by latin-based cloud types. For this reason, the classifications have been additionally grouped into classes labeled low, middle, high, vertical, and clear. The vertical class was chosen to represent cumulus congestus and cumulonimbus, two classes that indicate convection of some significance. Because many of the misclassifications shown in Tables 8 and 9 above are for different types of clouds but at the same level, the five-class accuracies necessarily increase. Tables 8 and 9 summarize these accuracies, with average overall accuracies of 92.7% and 90.4% respectively, for the water and land classifications. Table 8. Five-class confusion matrix (leave-one-out test) for overwater data base (overall accuracy--92.7%. This table shows relative accuracies, based on the training data, for each of the cloud groupings: low, middle, high, vertical, and clear. Table rows represent actual classes and columns the predicted. As an illustration, consider the row for low clouds. Of the 599 training cases, 565 (94%) were analyzed correctly, with 14 misclassified as middle clouds, 5 as high clouds, etc. Cloud Grouping Low Middle High Vertical Clear % No ------------------------------------------------------- Low 565 14 5 14 1 94 599 Middle 20 239 6 11 1 86 277 High 3 8 429 9 1 95 450 Vertical 14 10 12 249 1 87 286 Clear 2 1 1 218 98 222 1834 Table 9. Five-class confusion matrix (leave-one-out test) for overland data base (overall accuracy--90.4%). This table shows relative accuracies, based on the training data, for each of the cloud groupings: low, middle, high, vertical, and clear. Table rows represent actual classes and columns the predicted. As an illustration, consider the row for low clouds. Of the 449 training cases, 406 (90%) were analyzed correctly, with 16 misclassified as middle clouds, etc. Cloud Grouping Low Middle High Vertical Clear % No ------------------------------------------------------- Low 406 16 22 5 90 449 Middle 16 158 13 3 1 83 191 High 14 370 7 2 94 393 Vertical 24 3 15 158 79 200 Clear 10 1 1 1 359 97 372 1605 3.5 Display WEB WINDOWS The classifications and corresponding satellite images shown on this web page allow several display options. Both the water and land classifiers were developed from unregistered satellite data; both unregistered and registered satellite imagery (visual and infrared) and classifications are available for display. In the case of unregistered imagery, the entire satellite pass is shown. For the registered image, a specific west-coast subsection is isolated and redisplayed in Mercator coordinates. For the classifications, both eleven-class and five-class displays are available (see Accuracy). Cloud types are color-coded, with general color classifications as follows: yellow, high clouds; green, mid-level; blue, low-level; and red, vertical (CuC and Cb). PROCESSING The first procedure that we used for computing the classifications involved starting with a 16X16 pixel area, making a classification that represents that entire area, and then moving over 16 pixels to start on a new classification. We had originally considered doing a pixel-by-pixel progression, with each pixel being classified as the center of a 16X16 box. This approach was not considered seriously, however, because of the considerable additional computation that would be required. Recently, however, Mr. Jonathan Allen of Computer Sciences Corp., has suggested an alternative which, although about three times more expensive computationally, provides an additional refinement. In Allen's method, the advancement across the grid is based upon a four-pixel progression. In contrast to the pixel-by-pixel method where only the center pixel gets classified, the cloud classification for each pixel (within any 16X16 domain) gets saved. Allen's idea is to tally the classification choices for each pixel and, later, to assign the dominant class for that pixel as that pixel's cloud class. This method has two distinct advantages. First, the edges of the cloud masses are smoothed (because of 16 times greater precision in the classifications). Second, however, by utilizing a maximum likelihood process in determining cloud class, fine-scale differences (and aberrant classifications that do not agree with their surroundings) are smoothed out. Both of these effects lead to a classification product that is more realistic in view and more precise as well. POST PROCESSING The classifications for the 16X16 pixel squares represent a single cloud type for that area. This method of typing is necessary because of textural measures that require this minimum area. Comparing classification output to expert opinion based upon the accompanying visible and infrared imagery has been part of the evaluation process over the last half of 1997 and early 1998. One of the obvious observations is that these single color designations dominate areas for which we know that clouds are not continuous, particularly those involving cumuliform clouds such as cumulus, stratocumulus, cumulus congestus, and cumulonimbus. For these cloud types, there can be significant clear areas between the clouds, particularly at the AVHRR resolution of 1 km. For this reason, one of our colleagues (James E. Peak, Computer Sciences Corp.) made the suggestion that a post-processing step might be in order. His suggestion is that, after all classifications are performed for the 16X16 pixel areas, we examine, pixel by pixel, each of the individual pixels to determine if that pixel represents clear conditions. This process, although more complicated over land than over water, can be a simple test based upon whether the pixel albedo (visible channel) lies below a certain threshold. For example, we found that an albedo of 15% is satisfactory. If a pixel is deemed to be clear, that pixel is then set to black, representing clear conditions. This post-processing step has the significant benefit of making the classification pattern appear, visually, much more realistic, particularly over water. Future work may involve enhancing this postprocessing step to include pixel temperature as well. WEB DEVELOPMENT The displays for the cloud classifications were developed by Mr. Jonathan Allen of Computer Sciences Corp. Mr. John Kent, of Science Applications International Corp., is responsible for the overall web page design and the integration of the cloud classifications into the page. 4.0 Quality Control Procedures UCAR/JOSS conducted no quality checks on these data. 5.0 File Naming Convention The file names are structured as follows: noaa-14.yyyymmddhhmm.cloud_class_11.jpg Where noaa-14 is the satellite used yyyy is the four digit year mm is the two digit month dd is the two digit day hh is the hour (UTC) mm is the minute cloud_class_11 is the name of the 11 class cloud product cloud_class_5 is the name of the 5 class cloud product 6.0 References Bankert, R. L., 1994a: Cloud classification of AVHRR imagery in maritime regions using a probabilistic neural network. J. Appl. Meteor., 33, 909-918. Bankert, R.L., 1994b: Cloud pattern identification as part of an automated image analysis. Preprints 7th Conference on Satellite Meteorology and Oceanography, American Meteorological Society, Boston, MA 02108, 441-443. Bankert, R. L., and D. W. Aha, 1996: Improvement to a neural network classifier. J. Appl. Meteor., 35, 2036-2039. Bankert, R.L., and P. M. Tag 1996: Automated extraction and identification of cloud systems in satellite imagery. Preprints, 8th Conference on Satellite Meteorology, Atlanta, GA, American Meteorological Society, 45 Beacon St., Boston, MA 02108, 373-376. Bankert, R. L., and P. M. Tag, 1997: Automating the subjective analysis of satellite-observed tropical cyclone intensity. Proceedings, 22nd Conference on Hurricanes and Tropical Meteorology, American Meteorological Society, 45 Beacon St., Boston, MA 02108, 37-38. Crosiar, C. L., 1993: An AVHRR cloud classification database typed by experts. Naval Research Laboratory, NRL/MR/7531-93-7207, 7 Grace Hopper Avenue, Monterey, CA 93943-5502, 31 pp. Peak, J. E., and P. M. Tag, 1989: An expert system approach for prediction of maritime visibility obscuration. Mon. Wea. Rev., 117, 2641-2653. Peak, J. E., and P. M. Tag, 1992: Toward automated interpretation of satellite imagery for Navy shipboard applications. Bull. Amer. Meteor. Soc., 73, 995-1008. Peak, J. E., and P. M. Tag, 1994: Segmentation of satellite imagery using hierarchical thresholding and neural networks. J. Appl. Meteor., 33, 605-616. Tag, P. M., and J. E. Peak, 1996: Machine learning of maritime fog forecast rules. J. Appl. Meteor., 35, 714-724. Tag, P. M., and R. L. Bankert, 1998: A day/night ocean/land AVHRR cloud classification package. To be submitted to J. Appl. Meteor. Fett, R. W., M. E. White, J. E. Peak, S. Brand and P. M. Tag, 1997: Application of hypermedia and expert system technology to Navy environmental satellite imagery analysis. Bull. Amer. Meteor. Soc., 78, 1905-1915. Kuciauskas, A. P., L. R. Brody, M. Hadjimichael, R. L. Bankert and P. M. Tag, 1998: A fuzzy expert system to assist in the prediction of hazardous wind conditions within the Mediterranean basin. Accepted for publication in Meteor. Appl.