Project data - The Galeta Oil Spill Project

Introduction

The project was divided into 8 subprojects to study the chemistry of the oil and 7 different environments (listed below) affected by the spill. Each sub-project was headed by a scientist-in-charge and produced 4 to 16 different sets of data each. The data were originally kept in Dbase files, but have been translated into comma delimited files (explanation below).

Each sub-project has a three letter code, called the study ID (SID) which forms the first three letters of each file pertaining to it. Each subproject has up to 16 different sets of data, and each of these sets is assigned a single letter which is used for the fourth letter in each file name. Most of the files have the characters ‘_M’ for the next two letters, indicating that it is the main data file for that data set. Files with three initial letters and ending with “_S” contain the species list for that subproject.

In the following discussion, columns in a table are referred to as “fields” and rows are referred to as “records” in accordance to the convention used by Dbase.

Much of the data was collected on a regular schedule, monthly to yearly. Each time a collection of data was made, corresponding to a particular month, quarter, or year, it was assigned a collection ID (CID), unique to that particular data set. For example, if quarterly samples are taken, but it takes a week to do all the sampling, then all the samples taken during that week were given the same CID. These numbers are sequential, but may not necessarily start at 1, and some numbers may be skipped for various reasons.

Structure of data files

The arrangement of the data in the tables is designed to permit cross tabulations which can be used to arrange it in any manner desired. We avoided the use of different columns for the same kind of data, such as a separate column for each species where each column contains a count of the number of species. Rather, there is one column for the species name, one column for the count, one column for the CID (or date) and one column for the site. If it is required to make a table with the CID (or date) across the top and species on separate lines for time series analysis, or species across the top and the date on separate lines for making a graph, this can be accomplished by doing a crosstabulation on the appropriate fields. All species abbreviations and site names start with letters, and have only letters, numbers, or underline characters ( ‘_’ ) in them and are 8 or less characters long so that they will make legal Dbase field names.

Missing data is indicated by a -1 for a numeric field or a blank for a character field, unless otherwise indicated. Logical fields are all false by default. If an animal was specifically looked for and not found, it is entered as 0 in the appropriate place. Usually, however, animals which were not found and censused were not written on the data sheet and not entered in the file. Whether or not it was actually present in the sample depends on the thoroughness and purpose of the sampling method. However, if none of the target organisms were recorded in a given sample, at least something is entered, even if it is a name such as ’empty’ to record the fact that that quadrat, core, root, etc, was sampled.

Care must be taken when doing a crosstabulation to distinguish between cells with no data and cells where the total count in the cell is 0. For example, if a crosstabulation is done to determine the total number of corals of a given species that were found at each of 12 sites for 5 different years, and a cell with a count of 0 results, it must be determined by some other method of analysis whether there were no corals of that species found or whether that site was not sampled in that year. Generally, a crosstab with a count of the number of samples will reveal if a site was not sampled, since there is always at least one entry for each sampling unit.

Species names

Names of species, higher taxa, or other categories counted as a species are given 8 letter abbreviations (except during data entry, when they are entered as a 1 to 3 letter code and then expanded to the longer abbreviation by the computer). The data abbreviation (DATAABBR) is the abbreviation for this name that is actually written on the data sheet. Because it is not uncommon in this sort of work for the name applied to a given organism to change as more information is gained, a second name, the present abbreviation (PRESABBR) is used to indicate the currently used name. The DATAABBR is never changed – it always matches what is written on the data sheet. The PRESABBR is updated periodically using the current name in the species list. The species lists also have the genus and species and/or other full taxonomic description. The species lists then, may have more than one entry for a given presabbr depending on how many different names were used in the past. In only one case, however, will the dataabbr and the presabbr be the same, and this case will also have the most recent genus, species and description.

All of the data files here have been updated to the name used as of the time of writing of the final report. In some cases where the species are few and well known ( e.g. urchins and mangroves) this system is not used. Many files have fields named TAX1, TAX2, GROUP etc. These were used during analysis to classify the taxa into various groups, and the contents changed depending on the needs of the analysis. The changes were made by looking up the value for one of these fields in the species list, where taxa are classified according to one or more criteria. The criteria used can be determined by examining which taxa are grouped under each code. It should not be assumed that a classification called TAX1 in a data file corresponds to a classification called TAX1 in the species list.

Format of data files:

The data are available in either Dbase III format or as comma delimited ASCII files. The format of the
ASCII files are as follows:

Each record ends with a CR/LF
Each field is delimited with a comma
Text fields have double quotes (“) at the beginning and end of each field.
Numbers have no quotes, decimals are included where needed.
Blank text fields have only two double quotes – “” – delimited with commas.
Blank numeric fields have only adjacent comma delimiters
Logical fields are formatted as single letters with T for TRUE and F for false
Date fields are formatted as 8 integer numbers – yyyymmdd.

Example:
The following two records contain two text fields followed by an integer field, a date field (October 8, 1987), a text field, a blank numeric field, a blank text field, two decimal fields, and a logical field (false).

“GRS”,”O”,17,”MH”,”19871008,”lren”,,””,7.0255,8.4105,F

“GRS”,”O”,217,”MH”,”19871008,”dont”,,””,17.0431,16.6789,F