DYCOMS-II Satellite:  NOAA POES AVHRR NRL Cloud Classification
                      Product (JPG)


1.0  General Information


The NOAA POES AVHRR NRL Cloud Classification Product (JPG) is one 
of several satellite products collected as part of the Dynamics
and Chemistry of Marine Stratocumulus Phase II:  Entrainment 
Studies (DYCOMS-II) project field catalog operated by the 
University Corporation for Atmospheric Research/Joint Office 
for Science Support 
(UCAR/JOSS; http://www.joss.ucar.edu/dycoms/catalog/).
The Naval Research Laboratory (NRL) utilizes the National Oceanic
and Atmospheric Administration (NOAA) Polar Orbiting Environmental
Satellite (POES) Advanced Very High Resolution Radiometer (AVHRR)
from the NOAA-14 satellite to develop cloud classifications.  Two
products are included in this data set.  The five class product 
groups clouds into five classes labeled low, middle, high, 
vertical, and clear.  The 11 class product groups clouds into 11 
classes based on cloud types.  The products cover the period from 
1 - 31 July 2001 and cover varying portions of the western United 
States and Eastern Pacific Ocean.  The products are available at 
NOAA-14 satellite overpass times which during DYCOMS-II were 
typically 1200-1500 UTC and 2300-0300 UTC.  The products were 
acquired from the NRL 
(http://kauai.nrlmry.navy.mil/sat-bin/clouds/nrl/clouds_west).
All images are in jpg format.


2.0  Data Contact


Paul M. Tag (tag@nrlmry.navy.mil)
Rich Bankert (bankert@nrlmry.navy.mil)


3.0  Product Information


3.1  History

Over the past ten years, NRL Monterey has applied several
artificial intelligence technologies to address Navy
meteorological problems. These technologies include expert
systems (Peak and Tag, 1989; Fett et al., 1997; Kuciauskas et al.,
1997), machine learning (Tag and Peak, 1996), and pattern 
recognition, the latter now commonly called computer vision. We
have applied computer vision primarily to tasks involving the
meteorological analysis of satellite imagery: cloud types; cloud
systems (Peak and Tag, 1992, 1994; Bankert, 1994b;
Bankert and Tag, 1996); and tropical cyclones (Bankert and Tag,
1997).

Because of the effects of clouds on Navy tactical air operations,
we have expended considerable effort in automatically typing 
clouds. This kind of effort is particularly important to U. S. 
Navy forecasters, whose frequent tour rotations preclude the 
buildup of imagery interpretation expertise at remote, world-wide
locations, particularly aboard ship. Because of the availability 
(for our research) of NOAA Advanced Very High Resolution 
Radiometer (AVHRR) data, this imagery was chosen as the basis for 
our cloud classification algorithm, for which classifications are 
presented in this data set.  Development of this algorithm was 
conducted in two phases, corresponding to overwater (Bankert, 
1994a; Bankert and Aha, 1996) and overland (Tag and Bankert, 1998)
imagery.

3.2  Cloud Databases

Water and Land Cloud Databases

An automated cloud classification procedure is essentially a
pattern matching technique in which an unknown cloud imagery
sample is compared to samples of known type. The first step in 
developing a classifier (that uses supervised learning) is to 
create a data base of manually-classified cloud images.  This 
step is the most important of the classification process without
a database of accurately-typed images, there is little hope of 
creating an accurate, robust classifier.

Eleven standardized cloud classes (including clear) were chosen
(see Table 1). The overwater and overland databases for these
classes were developed by Crosiar (1993) and Tag and Bankert 
(1998), respectively. (A portion of the Crosiar database includes
western Atlantic and eastern Pacific cloud samples classified by 
Professor Chuck Wash and associates at the Naval Postgraduate 
School (NPS) in Monterey.) For the overwater (excluding the NPS 
samples) /overland samples, a team of experts (Robin Brody, Robert
Fett, Ron Miller, and Dennis Perryman--all NRL employees at the 
time)/(Robin Brody (NRL), and Ron Englebretson and Robert Fett of 
Science Applications International Corp.) independently classified
numerous (3625/7912) cloud samples from 7/12 worldwide geographic
areas (see Tables 2 and 3). In general, three out of four experts
for the water and two out of three experts for the land had to 
agree on a sample before it was included in the database. The 
final numbers of developmental/training samples were 1834 and 
1605 for water and land surfaces, respectively.

Table 1. Cloud classes used for typing imagery 

Cirrus            (Ci)
Cirrocumulus      (Cc)
Cirrostratus      (Cs)
Altocumulus       (Ac)
Altostratus       (As)
Stratocumulus     (Sc)
Stratus           (St)
Cumulus           (Cu)
Cumulus Congestus (CuC)
Cumulonimbus      (Cb)
Clear             (Cl)

Tables 2 and 3 not included in this document please see:
http://kauai.nrlmry.navy.mil/clouds/background.html

3.3  Classifiers

OVERWATER/OVERLAND CLASSIFIERS

The classifications shown on these web pages utilize both the
overwater and overland classifiers to classify each 16X16 pixel
square in an unregistered image. Along a coast line, the choice of 
classifier is dependent upon where (land or water) the majority of
pixels in a square reside. The 16X16 pixel area is the minimum 
area that can be used in order to accommodate texture features in
the classification.

CLASSIFICATION

Classifications are performed using a 1-Nearest Neighbor (1-NN)
algorithm that was "trained" independently for the water and land
situations, using various computer vision characteristics, or 
features, computed from a visible (Channel 1) and an infrared 
(Channel 4) channel of the AVHRR. The specific features that 
provide the highest classification accuracy for this data were 
selected by a feature selection routine developed by Dr. David 
Aha from the NRL AI Center in Washington, D.C. This routine, 
which uses backward sequential selection, is embedded with a 1-NN
classifier.  Features selected using this process are then used 
with the 1-NN in operations. All features, for both overwater and
overland classifications fall into three types, as defined by the
developers: spectral, textural, and physical. From 110 initial
features for water and 185 initial features for land, the Aha 
feature selection program selected ten features for both the 
water and land classifiers. These features and their types are
listed in Tables 4 and 5. Textural measures, representing the 
spatial distribution of pixel gray level values within the image, 
are GLDV, referring to gray-level difference vector, and SADH, 
referring to sum and difference histogram computations. 

Table 4. Selected water features

         Feature             Channel
  -----------------------------------
                Spectral:

  Pixel value maximum        Visible
  Pixel value median         Visible
  Pixel value minimum        Infrared
  Pixel value median         Infrared
  Pixel value mean           Infrared

                Textural:

  GLDV mean diff. (4X4 mean) Visible
  GLDV mean diff. (16X16)    Visible
  GLDV mean diff. (4X4 min.) Infrared
  GLDV cluster shade (16X16) Visible
   
                Physical:
  Latitude                   --------
  -----------------------------------

Table 5. Selected land features

         Feature             Channel
  ----------------------------------
                Spectral:

  Pixel value standard dev   Visible

                Textural:

  SADH entropy  (4X4 mean)   Infrared
  SADH mean sum (4X4 max.)   Visible
  SADH mean sum (4X4 max.)   Infrared
  SADH mean sum (4X4 mean)   Infrared
  GLDV entropy  (4X4 mean)   Visible

                Physical:

  Avg. cloud albedo          Visible
  Latitude                   ---------
  Region                     ---------
  Month                      ---------
  ------------------------------------

REGISTERED VS. UNREGISTERED

Both the water and land classifiers were developed from
unregistered satellite data. 

As a result, the unregistered classifications are one of the
output choices. In addition, a registered classification is also
available. This option is a west coast reprojection of the
unregistered cloud classifications onto a Mercator surface. 

3.4  Accuracy

Using the appropriate selected features to represent a testing
sample, in conjunction with the training samples, a leave-one-out
testing can be used to provide a reasonable measure of how well 
the algorithms will perform on unseen data (i.e., a new image). A
"leave-one-out" test is an extreme version of cross validation. 
Whereas, for example, 10-fold cross validation removes 10% of the 
data set each iteration to be used as testing data (with the 
remaining 90% used for training), with this process repeated 10 
times, a leave-one-out cross validation repeats the training and 
testing the same number of times as there are data samples. For 
example, if there were 1000 data samples, the training and 
testing process would be repeated 1000 times, with each sample 
serving as an independent test based on a training set of 999. 
The average of the 1000 tests would provide an estimate of the 
algorithm?s accuracy on new data independent of the developmental
process. 

ELEVEN-CLASS ACCURACIES

Tables 6 and 7 provide leave-one-out testing accuracies for the
overwater and overland databases, respectively, using the 1-NN
classifier. The overall, eleven-class, classification accuracies
for the water and land classifiers are 86.4 and 80.8%, respectively.

Table 6. Confusion matrix (leave-one-out test) for overwater data
base (overall accuracy--86.4%). This table shows relative 
accuracies, based on the training data, for each of the cloud 
classes. Table rows represent actual classes and columns the 
predicted. As an illustration, consider the row for Cirrus (Ci). 
Of the 212 training cases, 188 (89%) were analyzed correctly, 
with 1 misclassified as Cirrocumulus (Cc), 15 misclassified as 
Cirrostratus (Cs), etc. 

Cloud  Ci Cc  Cs Ac  As  Sc  St  Cu CuC  Cb  Cl  %   No
-------------------------------------------------------
Ci    188  1  15      3   1   2           2     89  212
Cc      1 66   2      1                   2     92   72
Cs     29  3 124          4               5   1 75  166
Ac      1        88  13   8   6   1   6         72  123
As      2  2   1  7 131   2   3       1   4   1 85  154
Sc      2         4   2 217   6  11   8   1     87  251
St      1      2  4   4  13 197   2   1       1 88  225
Cu                        8   4 107   4         87  123
CuC     1         3   2   9   2   3 116       1 85  137
Cb      3  3   5  1   4                 133     89  149
Cl      1                 2           1     218 98  222
                                                   1834

Table 7. Confusion matrix (leave-one-out test) for overland data
base (overall accuracy-- 80.8%). This table shows relative
accuracies, based on the training data, for each of the
cloud classes. Table rows represent actual classes and columns
the predicted. As an illustration, consider the row for Cirrus
(Ci). Of the 199 training cases, 148 (74%) were
analyzed correctly, with 22 misclassified as Cirrocumulus (Cc),
13 as Cirrostratus (Cs), etc. 

Cloud  Ci Cc  Cs Ac  As  Sc  St  Cu CuC  Cb  Cl  %   No
Ci    148 22  13  6   5                   3   2 74  199
Cc     18 53   4                          3     68   78
Cs     15  4  93      3                   1     80  116
Ac      5        53  18  10       1   1   2     59   90
As      5  3     16  71   3   2               1 70  101
Sc                6   3 131  10   7   4       1 81  162
St                    6  11  83   2   1       2 79  105
Cu                1       6     156  17       2 86  182
CuC               1       5      19  83   2     76  110
Cb      5  4   6  2                   6  67     74   90
Cl      1             1       4   6   1     359 97  372
                                                   1605

FIVE-CLASS ACCURACIES

The classifications depicted in the web display are shown in both
eleven-class and five-class displays. For many users, a
classification based upon height primarily, provides a
sufficient depiction without the meteorological detail provided
by latin-based cloud types. For this reason, the classifications
have been additionally grouped into classes labeled low, middle,
high, vertical, and clear. The vertical class was chosen to 
represent cumulus congestus and cumulonimbus, two classes that 
indicate convection of some significance.

Because many of the misclassifications shown in Tables 8 and 9 
above are for different types of clouds but at the same level, the
five-class accuracies necessarily increase. Tables 8 and 9 
summarize these accuracies, with average overall accuracies of 
92.7% and 90.4% respectively, for the water and land classifications. 

Table 8. Five-class confusion matrix (leave-one-out test) for
overwater data base (overall accuracy--92.7%. This table shows
relative accuracies, based on the training data, for each of the 
cloud groupings: low, middle, high, vertical, and clear. Table 
rows represent actual classes and columns the predicted. As an 
illustration, consider the row for low clouds. Of the 599 
training cases, 565 (94%) were analyzed correctly, with 14 
misclassified as middle clouds, 5 as high clouds, etc. 

Cloud Grouping   Low Middle High Vertical Clear  %   No
-------------------------------------------------------
Low              565     14    5       14     1 94  599
Middle            20    239    6       11     1 86  277
High               3      8  429        9     1 95  450
Vertical          14     10   12      249     1 87  286
Clear              2           1        1   218 98  222
                                                   1834

Table 9. Five-class confusion matrix (leave-one-out test) for
overland data base (overall accuracy--90.4%). This table shows
relative accuracies, based on the training data, for each of the 
cloud groupings: low, middle, high, vertical, and clear. Table 
rows represent actual classes and columns the predicted. As an 
illustration, consider the row for low clouds. Of the 449 
training cases, 406 (90%) were analyzed correctly, with 16 
misclassified as middle clouds, etc. 

Cloud Grouping   Low Middle High Vertical Clear  %   No
-------------------------------------------------------
Low              406     16            22     5 90  449
Middle            16    158   13        3     1 83  191
High                     14  370        7     2 94  393
Vertical          24      3   15      158       79  200
Clear             10      1    1        1   359 97  372
                                                   1605

3.5  Display

WEB WINDOWS

The classifications and corresponding satellite images shown on
this web page allow several display options. Both the water and
land classifiers were developed from unregistered satellite data;
both unregistered and registered satellite imagery (visual and 
infrared) and classifications are available for display. In the 
case of unregistered imagery, the entire satellite pass is shown.
For the registered image, a specific west-coast subsection is 
isolated and redisplayed in Mercator coordinates. For the 
classifications, both eleven-class and five-class displays are 
available (see Accuracy). Cloud types are color-coded, with 
general color classifications as follows: yellow, high clouds; 
green, mid-level; blue, low-level; and red, vertical (CuC and Cb).

PROCESSING

The first procedure that we used for computing the 
classifications involved starting with a 16X16 pixel area, making
a classification that represents that entire area, and then
moving over 16 pixels to start on a new classification. We had
originally considered doing a pixel-by-pixel progression, with
each pixel being classified as the center of a 16X16 box. This 
approach was not considered seriously, however, because of the 
considerable additional computation that would be required. 
Recently, however, Mr. Jonathan Allen of Computer Sciences Corp.,
has suggested an alternative which, although about three times 
more expensive computationally, provides an additional refinement.
In Allen's method, the advancement across the grid is based upon a
four-pixel progression. In contrast to the pixel-by-pixel method 
where only the center pixel gets classified, the cloud
classification for each pixel (within any 16X16 domain) gets
saved. Allen's idea is to tally the classification choices for
each pixel and, later, to assign the dominant class for that
pixel as that pixel's cloud class. This method has two distinct
advantages. First, the edges of the cloud masses are smoothed
(because of 16 times greater precision in the classifications).
Second, however, by utilizing a maximum likelihood process in 
determining cloud class, fine-scale differences (and aberrant 
classifications that do not agree with their surroundings) are 
smoothed out. Both of these effects lead to a classification 
product that is more realistic in view and more precise as well. 

POST PROCESSING

The classifications for the 16X16 pixel squares represent a
single cloud type for that area. This method of typing is
necessary because of textural measures that require this
minimum area. Comparing classification output to expert opinion
based upon the accompanying visible and infrared imagery has been
part of the evaluation process over the last half of 1997 and 
early 1998. One of the obvious observations is that these single
color designations dominate areas for which we know that clouds 
are not continuous, particularly those involving cumuliform clouds
such as cumulus, stratocumulus, cumulus congestus, and 
cumulonimbus. For these cloud types, there can be significant 
clear areas between the clouds, particularly at the AVHRR 
resolution of 1 km. 

For this reason, one of our colleagues (James E. Peak, Computer
Sciences Corp.) made the suggestion that a post-processing step
might be in order. His suggestion is that, after all 
classifications are performed for the 16X16 pixel areas, we 
examine, pixel by pixel, each of the individual pixels to
determine if that pixel represents clear conditions. This
process, although more complicated over land than over water, can
be a simple test based upon whether the pixel albedo (visible
channel) lies below a certain threshold. For example, we found 
that an albedo of 15% is satisfactory. If a pixel is deemed to be
clear, that pixel is then set to black, representing clear 
conditions. This post-processing step has the significant benefit
of making the classification pattern appear, visually, much more 
realistic, particularly over water. 

Future work may involve enhancing this postprocessing step to
include pixel temperature as well. 

WEB DEVELOPMENT

The displays for the cloud classifications were developed by Mr.
Jonathan Allen of Computer Sciences Corp. Mr. John Kent, of
Science Applications International Corp., is responsible for the 
overall web page design and the integration of the cloud 
classifications into the page. 


4.0  Quality Control Procedures


UCAR/JOSS conducted no quality checks on these data.


5.0  File Naming Convention


The file names are structured as follows:

noaa-14.yyyymmddhhmm.cloud_class_11.jpg

Where noaa-14 is the satellite used
      yyyy is the four digit year
      mm is the two digit month
      dd is the two digit day
      hh is the hour (UTC)
      mm is the minute
      cloud_class_11 is the name of the 11 class cloud product
      cloud_class_5 is the name of the 5 class cloud product


6.0  References


Bankert, R. L., 1994a: Cloud classification of AVHRR imagery in
   maritime regions using a probabilistic neural network. J. 
   Appl. Meteor., 33, 909-918.


Bankert, R.L., 1994b: Cloud pattern identification as part of an
   automated image analysis. Preprints 7th Conference on Satellite
   Meteorology and Oceanography, American Meteorological Society,
   Boston, MA 02108, 441-443. 

Bankert, R. L., and D. W. Aha, 1996: Improvement to a neural
   network classifier. J. Appl. Meteor., 35, 2036-2039. 

Bankert, R.L., and P. M. Tag 1996: Automated extraction and
   identification of cloud systems in satellite imagery. Preprints,
   8th Conference on Satellite Meteorology, Atlanta, GA, American
   Meteorological Society, 45 Beacon St., Boston, MA 02108, 373-376. 

Bankert, R. L., and P. M. Tag, 1997: Automating the subjective
   analysis of satellite-observed tropical cyclone intensity. 
   Proceedings, 22nd Conference on Hurricanes and Tropical
   Meteorology, American Meteorological Society, 45 Beacon St.,
   Boston, MA 02108, 37-38. 

Crosiar, C. L., 1993: An AVHRR cloud classification database
   typed by experts. Naval Research Laboratory, NRL/MR/7531-93-7207,
   7 Grace Hopper Avenue, Monterey, CA 93943-5502, 31 pp. 

Peak, J. E., and P. M. Tag, 1989: An expert system approach for
   prediction of maritime visibility obscuration. Mon. Wea. Rev.,
   117, 2641-2653. 

Peak, J. E., and P. M. Tag, 1992: Toward automated interpretation
   of satellite imagery for Navy shipboard applications. Bull.
   Amer. Meteor. Soc., 73, 995-1008. 

Peak, J. E., and P. M. Tag, 1994: Segmentation of satellite
   imagery using hierarchical thresholding and neural networks. J.
   Appl. Meteor., 33, 605-616. 

Tag, P. M., and J. E. Peak, 1996: Machine learning of maritime
   fog forecast rules. J. Appl. Meteor., 35, 714-724. 

Tag, P. M., and R. L. Bankert, 1998: A day/night ocean/land AVHRR
   cloud classification package. To be submitted to J. Appl. Meteor. 

Fett, R. W., M. E. White, J. E. Peak, S. Brand and P. M. Tag,
   1997: Application of hypermedia and expert system technology
   to Navy environmental satellite imagery analysis. Bull. Amer.
   Meteor. Soc., 78, 1905-1915. 

Kuciauskas, A. P., L. R. Brody, M. Hadjimichael, R. L. Bankert
   and P. M. Tag, 1998: A fuzzy expert system to assist in the
   prediction of hazardous wind conditions within the
   Mediterranean basin. Accepted for publication in Meteor. Appl.