STORM Precipitation Composite Datasets This document contains the following sections: I. General Information II. How to Access the Data III. Composite Format Description IV. Processing of Data Included in the Composite Datasets V. Quality Control VI. Instrumentation VII. References I. GENERAL INFORMATION ------------------- The STORM-FEST composite datasets for precipitation data is now available. The composites are available as hourly and 15-minute datasets separated into daily files. The files consist of data grouped with 1 day of precipitation values per station in each record, and are sorted by time, station latitude and station longitude. The hourly surface composite dataset contains data from: -------------------------------------------------------- PAM Network ASOS Network AWOS Network Nebraska High Plains Climate Network Illinois State Water Survey Network PROFS Network Wind Profiler Demonstration Network NWS Cooperative Network Oklahoma Agricultural Research Stations US Geological Survey Network The 15-minute surface composite dataset contains data from: ---------------------------------------------------------- PAM Network ASOS Network AWOS Network Illinois State Water Survey Network PROFS Network National Climatic Data Center Precipitation Observations Oklahoma Agricultural Research Stations US Geological Survey Network Data were subject to the following quality control procedures: -------------------------------------------------------------- Quality control (qc) was automatically done by computer. The only check was a check against gross limits. Values that exceeded these limits were flagged either "questionable" or "unlikely" depending on measured accumulation. All other values were flagged as "good". Any quality control done by the source agency for the data was retained. Limitations of the composites include: ------------------------------------------------- - No Canadian data is included - Quality control was entirely automatic and, therefore, observations with "unlikely" or "questionable" qc flags may require further inspection to determine their usefulness. II. HOW TO ACCESS THE DATA ---------------------- These datasets and appropriate documentation may be accessed online via the STORM Data Management System over the Internet at: http://www.joss.ucar.edu/codiac/ III. COMPOSITE FORMAT DESCRIPTION ---------------------------- The composite dataset is archived in the standard WMO FM-94 BUFR code format enhanced with a fixed size headers preceeding each BUFR data record. This enhanced BUFR format is referred to as Enhanced-BUFR, or E-BUFR. The headers on BUFR records consist of date/time and location information to allow for easy sort and extraction of BUFR records without the need to decode the binary BUFR data. BUFR (and E-BUFR) has been designed to be machine independent. The dataset may be requested in a variety of formats through the STORM Data Management system. Supported formats include a tabular ASCII format designed for easy readability, NetCDF, CMF and E-BUFR. The following is a description of the precipitation composite with specifics for the E-BUFR and ASCII formats. A. Parameters in the Composite datasets. The Composite Format contains the parameters listed below. This format applies to both the hourly and 15-minute composites with one variation. The hourly composite dataset contains the nominal date and time of observation, whereas the 5-minute composite does not. The nominal date and time is the nearest top of the hour time for the observation as compared to the actual time of the observation. Most networks actually take the observation about 5-minutes before the hour, but this varies from network to network and station to station in a network. Parameter Units Nominal Date of Observation UTC (YYYY/MM/DD) Nominal Time of Observation UTC (always 00:00:00; time is implied in the precip amount buckets) Network Identifier Abbreviation of platform name Station Identifier Network Dependent Latitude Decimal degrees, South is negative Longitude Decimal degrees, West is negative Station Occurrence Unitless Precipitation Amount mm Qualification Code Code Table Quality Control Flag Code Table The code values for the qualification flag and quality-control flag are given later in this document. Parameter notes: 1. The nominal date and time differs from actual time. It is the actual time rounded to the nearest hour for the hourly dataset and rounded to the nearest quarter hour for the 15-Minute dataset. Actual times may vary a few minutes before or after nominal time. Actual times are not included in the dataset. 2. The station occurance is a uniqueness code to separate two different stations that may be co-located in the same latitude/longitude point. 3. The meaning of the precip amounts is determined by the qualification code as explained in section E. below. B. E-BUFR File Structure The following describes the BUFR descriptors used in the precipitation composite and summarizes the E-BUFR format of the precipitation data records. Every E-BUFR file contains header information at the beginning of the file and additional details not listed here. For a full description of BUFR and E-BUFR see the BUFR and E-BUFR description manuals. The BUFR Descriptors: --------------------- BUFR Code F XX YYY Description - -- --- ----------- 0 01 254 Network Identifier 0 01 253 Station Identifier 1 04 000 Repeat next 4 descriptors X times 0 31 001 X 0 08 022 Number of time interval replications 0 13 011 Precipitation Amount 0 08 255 Precipitation Measurement Qualification Code 0 33 255 Quality Control Flag Note that E-BUFR precip uses two kinds of replication: 1) BUFR delayed replication (1 04 000 and 0 31 001) 2) "Manual" time-based replication (0 08 022) 1. Delayed replication (1 04 000 and 0 31 001) The descriptor 1 04 000 says we are using delayed replication. The 04 says that we are replicating 4 descriptors, i.e. a group of 4 descriptors will be repeated X times. The X is found in the descriptor immediately following 1 04 000, which is 0 31 001. So the value for 0 31 001 will be X. Then, there will be 4*X values in the data record representing the 4 descriptors after the 0 31 001. 2. Manual time-based replication (0 08 022) If the manual time-based replication descriptor was not used then X in the delayed replication descriptor would always be either 24 for the hourly data or 96 for the 15 minute data. The manual time-based replication is meant to reduce X in order to save some space. Here's how it works: if there are a bunch of consecutive data values that are exactly the same, then we only record them once, along with a count of how many times that value is repeated. For example, if there are 15 consecutive time intervals all with a value of 0.00 for the precip, 0 for the qualifier, and G for the QC flag, then we would record the equivalent to saying: "repeat the following 15 times: 0.00 0 G" instead of saying "0.00 0 G 0.00 0 G 0.00 ...". This repetition count goes in 0 08 022. It follows then that if the entire record of hourly data has 0.00 0 G, then X=1 and 0 08 022 gets 24. As another example it the hourly composite contains 12 hours of data with 0.00 0 G, followed by 2 hours with 2.50 0 G, followed by 10 more hours of data with 0.00 0 G, then, then X=3 and 0 08 022 would be 12 in the first set, 2 in the second set and 10 in the third set. E-BUFR encoded data record: --------------------------- Header portion: Nominal Date Nominal Time Latitude Longitude Station Occurance BUFR encoded portion: Network Identifier Station Identifier Number of time interval replications Precipitation Amount Precipitation Measurement Qualification Code Quality Control Flag The last 4 fields will be repeated X times where X is defined in the dataset header by descriptor 0 31 001. The last 3 descriptors have 24 values for hourly data and 96 values for 15-minute data after expansion using the replication factors. All times are ending times for the time period. For the hourly data the first precip value represents the accumula- tion from the preceding hour (and day) and ending on 00Z. For the 15-minute data the first precip value represents the accumulation from the preceding 15 minutes (of the previous day) and ending on 00Z. Each time bucket increments from there. Here is an example for the hourly composite dataset: ---------------------------------------------------- Date: 1992/02/01 Network: ASOS Station ID: AKO Latitude: 31.77917 Longitude: -95.71333 Station Occurance: 0 Hour Precip Qualification QC GMT Amt Code Flag 00 0.00 0 G 01 0.00 0 G 02 0.00 0 G 03 0.10 0 G 04 0.10 0 G 05 0.10 0 G 06 0.10 0 G 07 0.10 0 G 08 5.00 0 B 09 -999.00 7 M (Missing Data) 10 0.00 0 G 11 0.00 0 G 12 0.00 0 G 13 0.00 0 G 14 0.00 0 G 15 0.00 0 G 16 0.00 0 G 17 0.00 0 G 18 0.00 0 G 19 0.00 0 G 20 0.00 0 G 21 0.00 0 G 22 0.00 0 G 23 0.00 0 G The E-BUFR header section of the data record will be: 1992 02 01 00 00 00 31.77917 -95.71333 0 The BUFR data section of the data record will be: 3 0.00 0 G 5 0.10 0 G 1 5.00 0 B 1 -999.99 7 M 14 0.00 0 G The BUFR descriptor for 0 31 001 in the dataset header section will be 5. Note that the E-BUFR record is shown here in ASCII with space delimiters for readablility, however, the actual data in the file is in binary and contains no delimiters. C. ASCII File Structure The ASCII composite file is a sequential tabular ASCII file containing fixed-length records. It consists of header records and data records. It is created upon request from the archived E-BUFR file. It contains fields in the following order: Nominal Date from E-BUFR record header Nominal Time from E-BUFR record header Network descriptor 0 01 254 Station descriptor 0 01 253 Latitude from E-BUFR record header Longitude from E-BUFR record header Station Occurance E-BUFR record header Precip Amount descriptor 0 13 011 Precip Qualifier descriptor 0 08 255 QC Flag descriptor 0 33 255 The last 3 items are repeated as a group 24 times for hourly data and 96 times for 15-minute data. Following are the READ and FORMAT statements used to read the ASCII precipitation format data. MAXTIM is 24 or 96 for hourly or 15-minute and is determined by examining the length of the character array BUFFER, which contains one line of data: READ (BUFFER,1002) TIME,NETSTR,STNSTR,LATLON,CLIBYT, & (PRECIP(K), QUAL(K), QCF(K), K=1,MAXTIM) 1002 FORMAT (I2,'/',I2,'/',I2,X,I2,':',I2,':',I2, & X,A10,X,A10, & X,F10.5,X,F11.5,X,I3,X,I3,96(: X,F7.2,X,I1,X,A1)) D. Special Values 1. Missing Values A value that is missing, or is not observed by the given network is indicated by the value '-999.99', a Qualification code of 7 and a QC flag of 15 (E-BUFR) or 'M' (ASCII). 2. Qualification Code Values The qualitfication code is a coded value. Codes and definitions are listed briefly here and are expanded in the section on file format. BUFR/ASCII Definition Code 0 None. Amount is the accumulation for the time period. 1 Accumulating Precipitation. Amount is zero. 2 Accumulation period ended. Amount is the total accumulation for this period and all previous periods with a qualification code of 1. 3 Deleted value, original data unreadable. Amount is -999.99 4-6 4-6 reserved (unused) 7 Missing Value 3. Quality-Control Flag Values The quality-control flag is a coded value. Codes and definitions are listed briefly here and are expanded in the section on quality control. BUFR ASCII Definition Code Code 0 U Unchecked 1 G Checked and Good 2 B Checked and Unlikely (Bad) 3 D Checked and Questionable (Dubious) 4 N Parameter not measured at this station or is not applicable 5 X Glitch 6 E Estimated Value 7 7-14 reserved (unused) 15 M Missing Value Upon initial conversion of the data all QC flags were initialized to 0 or 'U' to indicate that the data have not been checked, except for data that is missing (15 or 'M') or unobserved (4 or 'N'). 4. Network Identifiers ID Network PAM1 Pam Network for 1-minute stations PAM5 Pam Network for 5-minute stations ASOSH ASOS Network for hourly composite ASOS5 ASOS Network for 5-minute composite AWOSQ AWOS Qualimetrics Network (20-minute stations) AWOSH AWOS Handar Network (20-minute stations) AWOS1 AWOS for Iowa 1-minute stations HPLAIN Nebraska High Plains Climate Network ISWS Illinois State Water Survey Network PROFS5 PROFS Network WDPN Wind Profiler Demonstration Network NCDC National Climatic Data Center Precipitation Observations ARS Oklahoma Agricultural Stations USGS US Geological Stations 5. Station Identifiers The stations and their names and identifiers are listed in a separate file called fest_sites. E. Qualification Codes It is necessary to understand the meaning of the qualification code to know exactly what the precipitation amount means. The codes are defined in section D.2. above. Note that a precipitation amount the hourly composite does not necessarily mean the amount is for 1 hour. Likewise, for the 15 minute amounts. The qualification code must be checked to determine the real time of accumulation. An example will demonstrate the use of the qualification code. Assume the hourly precipitation composite contains the following data: Date: 1992/02/01 Network: ASOS Station ID: AKO Latitude: 31.77917 Longitude: -95.71333 Station Occurance: 0 Hour Precip Qualification QC GMT Amt Code Flag 00 0.10 0 G 01 0.20 0 G 02 0.20 1 G Accumulating 03 0.30 1 G Accumulating 04 0.50 1 G Accumulating 05 1.20 1 G Accumulating 06 2.70 1 G Accumulating 07 4.10 1 G Accumulating 08 5.00 1 G Accumulating 09 3.20 1 G Accumulating 10 1.90 2 G Done accumulating 11 1.30 2 G Done accumulating 12 1.10 0 G 13 0.90 2 G Done accumulating 14 0.60 0 G 15 0.20 1 G Accumulating 16 0.30 1 G Accumulating 17 0.20 1 G Accumulating 18 0.10 1 G Accumulating 19 0.00 1 G Accumulating 20 0.00 2 G Done accumulating 21 0.00 0 G 22 0.00 1 G Accumulating 23 0.00 2 G Done accumulating The precip amounts for this station on 2/1/92 would be: Hour Precip Length of time Ending Amount for accumulation 00 0.10 1 hour 01 0.20 1 hour 10 19.10 9 hours 11 1.30 1 hour 12 1.10 1 hour 13 0.90 1 hour 14 0.60 1 hour 20 0.80 6 hours 21 0.00 1 hour 23 0.00 2 hours The qualification code works the same way in the 15-minute composite dataset. IV. PROCESSING OF DATA INCLUDED IN THE COMPOSITE DATASETS ----------------------------------------------------- NCDC Cooperative Network Precipitation Observations --------------------------------------------------- This data was QC'd to a limited extent by NCDC and further QC'd by the STORM Project Office. Oklahoma ARS ------------ The ARS data exits for only 3 days during the STORM-FEST time period, 2/22/92, 2/24/92 and 3/9/92. Data was converted from local data and time to GMT and the precip amounts from inches to millimeters. Note that the precip amount was calcu- lated to 2 decimal places and therefore contains some false precision. US Geological Survey Network ---------------------------- There are some tricks and funny things about USGS data that it's worth knowing. First, we don't know what is 15-minute and what is hourly until we've seen all the data for that station. If there is a substantial number of precip values at times xx:15 or xx:45, then we assumed a 15-minute station. Otherwise, hourly. It is theoretically possible that we could also get 30-minute data, but we shouldn't. (That's why we don't count xx:30 as a 15-minute value.) Times that are not close to a 15-minute interval are ignored Presently the interval is +-2 minutes (e.g. 13-17 is regarded as 15). There can (and will) be quite a few precip values at times that do not fit into a 15-minute bucket. These are "random" transmissions. They occur when the station decides to report data at an unscheduled moment, e.g. because it is raining really hard. These times are generally ignored since the values are included in the next 15-minute (or hourly) bucket. Now, a word about the algorithm for computing accumulations. If the value falls in a 15-minute time slot, then it is put into a big array for that day. If, at the end of the day, there are enough xx:15 and xx:45 values, then the station is output as 15-minute PQCF. Otherwise it is output as hourly PQCF. But the hourly data may have some xx:15, xx:30, and xx:45 slots filled. The next slot will still be the accumulation since the previous 15-minute slot. So, when outputting hourly data, we have to sum up the last 3 slots in the big 15-minute array (ignoring missing values). The USGS source data can have missing/accumulation ranges. What this appears as is that there is no data for a range of times, e.g. data appears for 02:00 but doesn't appear again until 09:00. If the precip values at the ends of this range are equal, then the middle of the range is filled with 0.00 precip and a 0 qualifier flag. If the precip total rises, then the values in the middle are set to 0.00 with a 1 qualifier flag, which indicates that an accumulation is being recorded. The last value (e.g. 09:00) will have the precip amount between 02:00 and 09:00 with a 2 qualifier flag, indicating the end of the accumulation period. If the precip total falls, then the entire range is left as "missing". All other Platforms/Networks ---------------------------- These are an extraction of precipitation from the surface composite datasets. Quality Control codes were also extracted so quality control for data in these networks follows the quality control procedures described in the surface composite documentation. The hourly precipitation composite contains a one for one extraction of data records from the hourly surface composite. The 15-minute precipitation composite contains summarized data from the 5-minute surface composite dataset. The three 5-minute precip observations prior to the 15-minute precip nominal time were added to get the 15 minute amounts. For example, the 14:45 precip amounts would be the sum of the 14:35, 14:40 and 14:45 precip amounts from the 5 minute surface composite. Quality control flags were ignored when making the 15 minute amounts unless the data was 'missing' or not measured in ANY of the 5 minute periods. In this case, the 15 minute precip amount was set to -999.99 which represents 'missing' or 'not measured'. Quality control flags were set to the codes in the surface composite. If all three codes were not identical then the resultant quality control code was set with the following precedence: Highest Precedence Not Measured Missing Not QC'd Glitch Unlikely Estimated Questionable Lowest Precedence Good V. QUALITY CONTROL --------------- A. Description of QC method ------------------------ Observations contained within the STORM-FEST precipitation composite datasets (hourly and 15-minute composites) were automatically checked by computer in the STORM Project Office. The check was a simple comparison of the accumulated amounts to gross limits. Any observation that exceeded the gross limit in the hourly or 15-minute time period was flagged either "unlikely" or "questionable". Note that only amounts with the qualification code = 0 were checked this way. The following table gives the gross limits used: Gross limit unlikely questionable 75.00 mm/hr 25.00 mm/hr 40.00 mm/15-min 20.00 mm/15-min B. Quality control flags --------------------- Quality control flags are stored with each parameter in the composite datasets. The codes have the following meanings: U - Unchecked. No qc has been done for this observation. (E-BUFR code 0). G - Checked and found "Good". This qc flag means the precip value fell within limits. (E-BUFR code 1). B - Checked and found "Unlikely". This qc flag means the precip amount fell outside the likely limit for precip for this time period. Observations with this qc flag should only be used after careful analysis as to the validity of its value. (E-BUFR code 2). D - Checked and found "Questionable". This qc flag means the precip amount fell outside the questionable limit for precip for this time period. Observations with this qc flag should be used with caution. (E-BUFR code 3). N - Not observed or not applicable. This parameter is not observed at this station, or, this parameter is optionally reported and is not applicable for this station and time. (E-BUFR code 4). X - "Glitch" flag. Either the instrumentation was faulty, or, some other known problem was known to exist which invalidates the value of this observation. Observations with this qc flag should not be used. (E-BUFR code 5). E - Estimated value. This parameter's value was estimated and is not a directly observed value. This flag overrides the qc flags "G" ("good") and "D" (questionable), but the qc flag "B" ("unlikely") overrides this flag. Use values with this qc flag with caution. (E-BUFR code 6). M - Missing. This value is normally reported but for some unknown reason is missing for this station and time. (E-BUFR code 15). Quality Control Results ----------------------- The following table summarizes some of the qc test results: qc'd good unlikely questionable 99.6 % 99.9 % <.1 % 0.1 % Miscellaneous QC Notes ---------------------- 1. Observations with the "X" flag (glitch) may have values that look realistic. This is only coincidental. Do not trust these values. 2. Any parameter not mentioned in the checks performed section above had no qc test performed on them and contain the "U" qc flag ("unchecked"). 3. The data set sometimes contains false precision. Real numbers were calculated to two decimals. This does not necessarily represent the precision of the intrumentation. For example, precipitation may have been reported in hundreths of an inch. We converted this to hundreths of a millimeter. VI. INSTRUMENTATION --------------- To be added at a later date for: PAM Network ASOS Network AWOS Network Nebraska High Plains Climate Network Illinois State Water Survey Network PROFS Network Wind Profiler Demonstration Network National Climatic Data Center Precipitation Observations Oklahoma Agricultural Research Stations US Geological Survey Network VII. References ---------- "E-BUFR Version 0 Technical Reference Manual", December 1991, UCAR/OFPS, Boulder, CO. This document explains the details of the E-BUFR format. A digital version of this document may be retrieved via anonymous FTP. Telnet to 128.117.90.53. The document is named "ebufr.doc" and is in directory "documentation". Manual on Codes, 1988, World Meteorological Organization, Geneva, Switzerland. WMO Publication 306, Supplement No. 3 (VIII.1991) Section FM 94-IX Ext., pages I-Bi--43 to I-Bi--174.