PacMARS Logo

PacMARS

Pacific Marine Arctic Regional Synthesis Data Archive

Documentation and Format Guidelines

Documentation Guidelines

The documentation (i.e. the "Readme" file) that accompanies each project data set is as important as the data itself. This information permits collaborators and other analysts to become aware of the data and to understand any limitations or special characteristics of data that may impact its use elsewhere. The data set documentation should accompany all data set submissions and contain the information listed in the outline below. While it will not be appropriate for each and every dataset to have information in each documentation category, the following outline (and content) should be adhered to as closely as possible to make the documentation consistent across all data sets. It is also recommended that a documentation file submission accompany for each preliminary and final data set.

---TITLE: This should match the data set name

---AUTHOR(S):
-Name(s) of PI and all co-PIs
-Complete mailing address, telephone/facsimile Nos., web pages and E-mail address of PI
-Similar contact information for data questions (if different than above)

---FUNDING SOURCE AND GRANT NUMBER:

---DATA SET OVERVIEW:
-Introduction or abstract
-Time period covered by the data
-Physical location of the measurement or platform (latitude/longitude/elevation)
-Data source, if applicable (e.g. for operational data include agency)
-Any World Wide Web address references (i.e. additional documentation such as Project WWW site)

---INSTRUMENT DESCRIPTION:
-Brief text (i.e. 1-2 paragraphs) describing the instrument with references
-Figures (or links), if applicable
-Table of specifications (i.e. accuracy, precision, frequency, etc.)

---DATA COLLECTION and PROCESSING:
-Description of data collection
-Description of derived parameters and processing techniques used
-Description of quality control procedures
-Data intercomparisons, if applicable

---DATA FORMAT:
-Data file structure, format and file naming conventions (e.g. column delimited ASCII, NetCDF, GIF, JPEG, etc.)
-Data format and layout (i.e. description of header/data records, sample records)
-List of parameters with units, sampling intervals, frequency, range
-Description of flags, codes used in the data, and definitions (i.e. good, questionable, missing, estimated, etc.)
-Data version number and date

---DATA REMARKS:
-PI's assessment of the data (i.e. disclaimers, instrument problems, quality issues, etc.) Missing data periods
-Software compatibility (i.e. list of existing software to view/manipulate the data)

---REFERENCES:
-List of documents cited in this data set description

Data Format Guidelines

An inherent flexibility of the EOL data management system permits data in all different formats to be submitted, stored and retrieved from CODIAC. EOL is prepared to work with the participants to bring their data to the archive and make sure it is presented, with proper documentation, for exchange with other project participants. In anticipation of receiving many data sets from the field sites in ASCII format we are providing guidelines below that will aid in the submission, integration and retrieval of these data. EOL will work with any participants submitting other formats including NetCDF, AREA, HDF and GRIB to assure access and retrieval capabilities within CODIAC.

The following ASCII data format guidelines are intended to help standardize the information provided with any data archived for the project. These guidelines are based on EOL experience in handling thousands of different data files of differing formats. Specific suggestions are provided for naming a data file as well as information and layout of the header records and data records contained in each file. This information is important when data are shared with other project participants to minimize confusion and aid in the analysis. An example of the layout of an ASCII file using the guidelines is provided below.Keep in mind that it is not mandatory that the data be received in this format. However, if the project participants are willing to implement the data format guidelines described below, there are some improved capabilities for integration, extraction, compositing and display via CODIAC that are available.

Naming Convention

A) All data files should be uniquely named. For example, it is very helpful if date can be included in any image file name so that the file can be easily time registered. Also include an extension indicating the type of file:

i.e.

.gif = GIF image format file

.jpg = jpg image format file

.txt = Text or ASCII format file

.cdf = NetCDF format file

.tar = archival format

If compressed, the file name should have an additional extension indicating the type of compression (i.e. .gz, .z, etc.).

Spreadsheet and Columnar ASCII Data

For spreadsheet and columnar ASCII data to be converted into shape files, please use the following template:

CruiseID text field blank if not applicable (examples: HLY0601, PSEA0902)
StationNum text field blank if not applicable
StationNme text field blank if not applicable
DataDate text field, YYYY-MM-DD other formats acceptable, e.g. YYYYMMDD
DataYear text field blank if not applicable
DataTime text field, HH:mm:SS other formats acceptable, e.g. HHmm or HHmmSS
TimeZone text field, UTC, MST, AKST, etc. blank if not applicable
UTCOffset text field, number of hours from UTC, + or - used if TimeZone is not UTC - blank if not applicable
Latitude decimal degrees use negative values for south latitude; set to -999.99 if missing
Longitude decimal degrees use negative values for west longitude; set to -999.99 if missing
Depth meters if not applicable, should be set to -999.99
Data... as many columns as needed

The order of the fields should be as they appear in the list. Any other variables can be included after Depth in any order. The number -999.99 should be used for missing numeric values and a blank field for missing text values, or "unknown" if applicable.

Note:  Shape files limit names of fields to 10 characters. Certain reserved words for shape files can not be used for field names in the data files, such as Date, Year and Time.

Further Information

Please direct any questions regarding data format(s) and documentation guidelines to the PacMARS data management support team (pacmars at eol dot ucar dot edu).