UNAVCO Home UNAVCO Home
   |    |   |  
UNAVCO Home UNAVCO Facility

Data
Data Archive Interface (DAI)
Permanent Stations Campaigns Monuments
Other Data Search and Access
Station List Data Maps FTP Public RINEX GSAC Other Providers
For Educators
Data for Educators
Archive Information
About the Archive Data Policy Archiving at UNAVCO Submissions unav-data BINEX Glossary Contact Us Comment
Data Tools
TEQC Hatanaka
Data - GSAC Structure and Data Exchange Formats

Version 1.1 Draft proposal 1 August 2002

1. Introduction
2. GSAC Hierarchy
3. GSAC Meta-Data Exchange
4. User Interface and Requesting Data
Table 1. Monument Catalog DHR
Table 2. Data Holdings File DHR
Table 3. Field Specifications for Version 1.1 Data Types

1. Introduction

There are many GPS data archiving centers in the U.S. and worldwide. Each typically has an interest in at most a few particular types of data, for instance, data from a regional network, data from the world-wide IGS network, or data which were collected under sponsorship of a particular agency. These archives have operated in an independent manner to this point, requiring data users to be fairly sophisticated in order to find information of interest. From the user's point of view, it would be much easier if they could get any piece of data simply by contacting any one of these archive centers, instead of contacting each one separately. For such a system to work, the data archive centers would need to know what data are contained not only in their own holdings but also in the holdings of all other archives; this will require a large degree of coordination which does not currently exist. This document proposes a data-holdings exchange mechanism which can create this "seamless" data environment; we call the resulting multi-archive system the GPS Seamless Archive Centers (GSAC).

Participation in GSAC denotes a willingness on the part of scientists and surveyors who collect GPS data to organize their holdings according to standards agreed to by the GSAC Working Group, and to provide knowledge of these holdings to the international community. Although participants are encouraged to use GSAC as an efficient means of distributing data, this is not a requirement: information "about" the data can be displayed on GSAC while the data themselves are still under distribution restrictions that might be imposed by governments, sponsoring agencies, or the collectors themselves.

2. GSAC Heirarchy

The GSAC are structured in three levels to reflect existing community functions:

Data Provider: An individual or agency that provides information to a Wholesaler (defined below) to be archived and published to the GSAC. There must be only a single pathway into the GSAC for each piece of information so that there is a well-defined original copy. To accomplish this the Data Provider must make each piece of information available to one Data Wholesaler. Data are defined as raw and RINEX GPS observations, network solutions, orbit information, site information, and any other information useful to the analysis of GPS observations. Currently allowed data types are defined in Table 3. The data must be distributable in electronic form (though some meta data may be scanned images of paper records), preferably via the Internet.

Data Wholesaler: The Data Wholesalers are the warehousers of information and operate a data archive the contents of which they agree to make available to users of the GSAC as well as other GSAC participants (e.g., other Wholesalers and Retailers). Publishing information to the GSAC occurs when a Data Wholesaler creates a Data Holding Record (defined in Section 3) describing a data file held in the Wholesaler's archive. For each piece of information known to the GSAC, there is one Data Wholesaler who is responsible for the original copy of that information, though other Wholesalers may keep and provide duplicate copies, as described in Section 3. Each Data Wholesaler must provide a unique identifier (the unique_info_id field of the DHR), which is an integer number, for each piece of information that it published to the GSAC. Each Data Wholesaler must also provide a unique name (the unique_site_id field of the DHR) for all of its published site-specific information. Wholesalers may assign unique names in any way they wish (e.g. DOMES numbers or an internal scheme keyed to geographic area as is done for the Southern California Earthquake Center). Hence, the unique_site_id for a given site will in general not be same the for all Wholesalers. .

A Data Wholesaler performs at least the following three functions:

  • Accepts information from a Data Provider.
  • Maintains a Monument Catalog which includes entries for each individual site in the Wholesaler's archive. Assigns a unique site name to each entry in the Monument Catalog.
  • Develops, with the other Data Wholesalers, a cross-index to sites for which more than one Data Wholesaler holds data (and may use different names). If only one Wholesaler holds all data for a site, the only name that matters is the one they assign; if more than one Wholesaler holds data for a site the Wholesalers must be able to verify that the differently-named monuments are, in fact, the same. This could be done using precise site coordinates, site descriptions, station stampings, etc.

A fully participating Wholesaler also:

  • Performs some sort of quality control (and possibly conversion, for instance, from raw to RINEX) of the data if appropriate.
  • Places the data in a permanent archive, with accessibility made as easy as possible consistent with distribution restrictions.
  • Assigns and maintains a unique integer number to refer to each piece of its published information.
  • Creates a Data Holdings Record for each piece of published information in the Wholesaler's archive and makes these available to all other GSAC participants via ftp. The Wholesaler must maintain an up-to-date record of all its published information as well as a 30-day incremental record of all changes to its published information.
  • May provide a public interface to their own data holdings if they wish, outside of the GSAC.
  • May choose to provide a distributed backup mechanism for any or all data holdings of other Data Wholesalers. If a Wholesaler chooses to provide this function then it must create an appropriate Data Holdings Record for the backup data and make this information available to the other GSAC participants so that they know of the existence of the backup. A Wholesaler will not modify any data file sent to it from another Wholesaler; if an error or inconsistency is discovered with the file the original Wholesaler will be notified and will fix the problem on the original version of the file; this fixed version will then be propagated into the GSAC in the same way as would an entirely new file.

Data Retailer: The Data Retailers are the point of entry to the GSAC for the user community. Data Retailers do not archive data though a single institution may act as both a Retailer and a Wholesaler.

A Data Retailer performs at least the following three functions:

  • Collects Data Holdings Files and Monument Catalogs from all Data Wholesalers.
  • Provides a publicly-accessible, searchable, user interface to the complete GSAC data holdings.
  • Handles user requests for data and provides either the data files themselves (with identification of the source) or information (hyper-link, ftp command, mail-to address) with which the user can obtain the files from the Wholesalers.
  • Keeps track of access statistics for all published GSAC information and forwards these statistics to the original Data Wholesaler on a monthly basis.

3. GSAC Meta-Data Exchange

Meta-data describing a Data Wholesaler's holdings are essential for identifying and exchanging information between Wholesalers and Retailers, and for local querying of data that reside at remote Data Wholesaler's archives. There must be a few well-defined, computer-parsable files that can be exchanged over the Internet for this purpose. The Data Holdings Record, Data Holdings Files, and Monument Catalog accomplish this.

Data Holdings Record (DHR): Describes a piece of information published to the GSAC by a Data Wholesaler. There are two types of DHRs, one describing a file of time-dependent data (RINEX, SINEX, etc.), the other describing a monument in the Wholesaler's catalog of sites. For each site described in a Wholesaler's data DHRs, there must be a corresponding monument DHR. The DHRs are collected by Data Retailers so that they can accurately inform users what data are available from the entire GSAC. DHRs describe the data of interest to the user, but are not the actual data that would be sent to the user. Each DHR is a single line of ASCII text comprising fields of information separated by a semi-colon delimiter and terminated by a newline. The exact fields of the two types of DHR are defined in Tables 1 and 2 at the end of this document.

Data Holdings File (DHF): A DHF is an ASCII flat-file containing multiple DHRs describing information published by a single Data Wholesaler. A "Full DHF" is one which contains information for all of the published DHRs available from a given Data Wholesaler, where the data pertain to a given year and day; that is, there are many full DHFs, each of which contains information about data for a single date. An "Incremental DHF" is one which only contains information on new, updated, or deleted DHRs of a given Data Wholesaler for a given year and day; that is, an incremental DHF records the changes to a Data Wholesaler's archive on a given date. This is analogous to the difference between a "full data backup" and an "incremental data backup."

    Filename used: wholesaler_name.yyyy.ddd.[full, inc].dhf

Monument Catalog (MC): The MC is an ASCII flat-file containing multiple DHRs describing GPS monuments, including their locations, for which information exists from a single Data Wholesaler. A "Full MC" is one which contains information for all of the GPS monuments available from a given Data Wholesaler. An "Incremental MC" is one which only contains information on new, updated, or deleted GPS monuments available from a given Data Wholesaler. This is analogous to the difference between a "full data backup" and an "incremental data backup." Publication of this file by a Data Wholesaler is the minimum level of participation allowed within the GSAC details). Note that this file only tells of the existence of data from particular monuments and the location of these monuments, but gives few details about the data or monuments. In the future the GSAC may provide additional information (e.g. site descriptions, photos, 'get-to' instructions) via new data types, not yet defined.

    Filename used: wholesaler_name.yyyy.ddd.[full, inc].mc

3.1 DHF and MC Format

A DHF/MC is an ASCII flat-file consisting of a series of DHRs concatenated together. The format of each DHR is a single line of text with fields of information separated by semi-colons (;) and terminated by a newline character. In cases where a semi-colon is needed for the information in a field it will be escaped with a back-slash (\;). If a back-slash is needed it must be escaped by itself (\\). Some fields of a DHR are allowed to contain multiple entries (see tables below); these multiple entries are separated by commas (,). In cases where a comma is needed for the information in a field it will be escaped by a back-slash (\,).

The maximum number of characters allowed on a line in a DHF/MC is 2048 including the trailing newline (this is the POSIX standard for text files). It may happen that a DHR needs more than 2048 characters, for instance when multiple entries are used for a field that allows them. In these cases the DHR will be split into multiple lines, each of which must contain 2048 or fewer characters; the line before the split must contain 2048 characters with the remaining characters on the line-continuation. These split-lines will include a dollar sign ($) as the last printable character (just before the newline: character position 2047) before the line is split and another dollar sign as the first character of the line-continuation. Thus, any line in a DHF that has a dollar sign as the first character is a continuation from the previous line and any line that has a dollar sign as the last printable character is continued on the next line. In cases where a dollar sign is needed for the information in a field it will be escaped by a back-slash (\$).

All fields in a DHR must exist, but some fields are allowed to be Null (empty). Thus, when a certain field is Null the DHR will contain a pair of semi-colons with no characters between (;;).

The top three lines of every DHF/MC contain header information detailing the name of the Wholesaler who created it , the format version of the DHRs, and the individual fields of information included in each DHR. These lines are distinguished by a # character as the first character of the line.

Headers for a DHF are made up of three lines of information in the following order:

    Line 1: # Wholesaler_name [cddis, ncedc, ngs, panga, scec, sopac, unavco]
    Line 2: # DHF_format_version [1.1]
    Line 3: # DHF_fields [unique_info_id; wholesaler; data_type; unique_site_id; start_time; end_time; dhr_create_time; info_url; file_size; file_create_time; file_checksum; provider; file_grouping; file_compression]
Headers for an MC are made up of three lines of information in the following order:

    Line 1: # Wholesaler_name [cddis, ncedc, ngs, panga, scec, sopac, unavco]
    Line 2: # MC_format_version [1.1]
    Line 3: # MC_fields [unique_site_id; wholesaler; 4_char_id; descriptive_id; dhr_create_time; x; y; z; coord_accuracy]

There are thus a total of five ASCII characters that have a special meaning in a DHF/MC:

    ; field separator
    , multiple entry separator within a single field
    $ line continuation indicator
    # header information indicator
    \ escape character to allow any of these five as part of a field entry

3.2 Use of DHFs and MCs in the GSAC System.

Incremental DHF/MC Area.

Each Data Wholesaler will make its Incremental DHFs and MCs available via ftp to all other GSAC participants. These incremental files are organized into sub-directories named by the UTC year/day on which the files were created. This sub-directory structure may exist anywhere that a Data Wholesaler prefers as long as it is "mapped" (for instance, through use of the Unix "link" concept) to the standard directory name ~ftp/pub/GSAC/inc. The directories in the incremental area keep track of the work being done on a daily basis by the Wholesaler. They can thus be thought of as incremental information that was published on a given day. A DHR for information with a start_time (see tables below) on 1998:001, for example, will be kept in the incremental DHF for that year and day, called "wholesaler_name.1998.001.inc.dhf." Once a Retailer becomes synchronized with a Wholesaler it need only access the incremental DHFs and MC in order to stay synchronized. On a daily basis, after the UTC day boundary, each Retailer will be responsible for collecting the incremental DHFs and MC from each Wholesaler and incorporating them into the Retailer's Relational Data Base Management System (RDBMS).

For example, assume the current day is 1998:320, and examine the ~ftp/pub/GSAC/inc/1998/320 directory at SOPAC. A hypothetical listing of files there might be:

sopac.1998.317.inc.dhf
sopac.1998.318.inc.dhf
sopac.1998.319.inc.dhf
sopac.1998.320.inc.mc
sopac.1998.320.inc.list

As the above listing notes, several incremental DHFs and an incremental MC exist in this directory. The DHFs are an incremental update of DHRs that were published by the SOPAC wholesale archive on day 1998:320. The incremental DHF for day 1998:317 includes DHRs for information published "today" (1998:320) but where the information pertains to (that is, has a start time) three days ago (1998:317) and similarly for the DHFs for days 1998:318 and 1998:319. The MC would contain any DHRs for new or updated monument information that SOPAC wants to make known to the other GSAC participants "today."

In each of the incremental directories the Wholesaler will also keep a file containing the filenames and times of modification of the DHFs and MC in that directory. This additional file is named, in the above example, wholesaler_name.1998.320.inc.list. The format of this "listing" file is:

filename1;time_stamp1
filename2;time_stamp2
etc.

Where the time_stamps must be in UTC and in the ISO standard format (see tables below for details). So in the above example, the file sopac.1998.320.inc.list would look like the following.

sopac.1998.317.inc.dhf;1998-320T20:01:01Z
sopac.1998.318.inc.dhf;1998-320T20:02:32Z
sopac.1998.319.inc.dhf;1998-320T19:05:30Z
sopac.1998.320.inc.mc;1998-320T23:01:01Z

If a data Retailer wants to get an up-to-the-minute listing of data holdings on a Wholesaler's archive it can get this listing-file first to see if anything in that directory has changed in the course of the UTC day; if the filesize and modification time of the DHF and MC are unchanged from an earlier check by the Retailer, there is no reason to copy the DHF or MC to the Data Retailer's system. This minimizes the amount of information exchange needed to check the current state of each Data Wholesale archive.

At the end of the UTC day the Data Wholesaler will stop adding any DHRs to the DHFs and MC in that day's incremental directory and begin placing new DHRs into the DHFs and MC in the "next" day's incremental directory. That is, once the UTC day boundary is passed the data Wholesaler will not modify anything in the that day's incremental directory, but will move on to the next day's directory. Since each Wholesaler's computers will keep their own time, and often the clocks in computers are wrong by many minutes or more, each Wholesaler will think that UTC midnight occurs at a different time. Retailers will need to take this into account when probing each Wholesaler's computer system.

The sub-directories in the incremental storage area will not be saved past a certain date; on a daily basis, as a new sub-directory is created the oldest one will be deleted. This will provide a ring-buffer of incremental holdings information that will allow the Data Retailers to collect the DHFs and MCs late if necessary. This ring-buffer will be 30 days in length. It is, of course, the Data Retailer's responsibility to keep track of which incremental DHFs and MCs it has collected in order to stay in synchronization with each Data Wholesaler. It is important for the Retailers to stay in synchronization with the Wholesalers so that their RDBMS is an accurate representation of the holdings of the GSAC. If for some reason synchronization is lost the Retailer will have to start over from the Wholesaler's full DHFs and MC records (a "full restore").

Data Wholesalers dealing primarily with daily-download continuous GPS data from Data Providers would almost always have a set of incremental directories that contain Incremental DHFs and MCs which hold DHRs pertaining to information from one day to a few days in the past if the Provider's data downloads take place after the UTC day boundary. If the Provider downloads data during the UTC day, for instance hourly, the the Wholesaler's incremental directories might also contain Incremental DHFs for the present day.

Data Wholesalers dealing primarily with survey mode GPS data from Data Providers would almost always have a set of incremental directories that contain Incremental DHFs and MCs which hold DHRs pertaining to information from many days in the past (perhaps data that were collected many years in the past) as they work through their backlog of data files supplied by the Data Providers.

Full DHF/MC Area.

Each Data Wholesaler will also keep a copy of its full DHFs and full MC available through ftp to all GSAC participants. These files can be stored in whatever directory the Data Wholesaler prefers as long as that directory is "mapped" to the standard directory name ~ftp/pub/GSAC/full. There is one full MC which contains information on all monuments in a Wholesaler's archive, while there are many full DHFs split by year and day. The DHFs are split to keep filesizes to a manageable level, and also to make it easier for a Wholesaler to keep them up to date. As with the incremental DHFs, a DHR for information with a start_time (see tables below) on 1998:001, for example, will be kept in the full DHF for that year and day, called "wholesaler_name.1998.001.full.dhf." These files serve as a permanent record for each Wholesaler of what data they have published from their archive and should be kept up to date at all times; that is, at any given time they should be a complete representation of all information published by the Wholesaler. The full DHFs and MC are not a description at a single (frozen) point in time; they are always kept up to date. These files are made available to other GSAC participants so that they may re-synchronize themselves with the Wholesaler if necessary. Under normal circumstances this should not be required.

Using the same example as above, examine the ~ftp/pub/GSAC/full directory at SOPAC. A hypothetical listing of files there might be:

sopac.1998.317.full.dhf
sopac.1998.318.full.dhf
sopac.1998.319.full.dhf
sopac.full.mc
sopac.full.list

Since there is only one full MC there is no yyyy.ddd added to its name. The "listing" file is the same format as above, with the filename and modification time (in UTC and ISO format) of each file listed one per line.

In summary, there are two file storage areas at a Data Wholesaler's archive where DHFs and MCs are kept:

  1. An incremental storage area which keeps all Incremental DHFs and MCs as the Wholesaler does its work of publishing information to the GSAC; this area is organized under directory ~ftp/pub/GSAC/inc/ and is further subdivided by year and day. Incremental directories thirty days into the past will be kept by the Wholesaler at any given time. Incremental DHFs and MC created on a given day will be stored in that day's directory.

Example incremental storage area using the same SOPAC example as above:

~ftp/pub/GSAC/inc/1998/320/sopac.1998.317.inc.dhf
~ftp/pub/GSAC/inc/1998/320/sopac.1998.318.inc.dhf
~ftp/pub/GSAC/inc/1998/320/sopac.1998.319.inc.dhf
~ftp/pub/GSAC/inc/1998/320/sopac.1998.320.inc.mc
~ftp/pub/GSAC/inc/1998/320/sopac.1998.320.inc.list
  1. A full storage area which keeps the primary copy of the DHFs and MC for the Wholesaler; these files are stored in directory ~ftp/pub/GSAC/full/.

Example full storage area using the same SOPAC example as above:

~ftp/pub/GSAC/full/sopac.1998.317.full.dhf
~ftp/pub/GSAC/full/sopac.1998.318.full.dhf
~ftp/pub/GSAC/full/sopac.1998.319.full.dhf
~ftp/pub/GSAC/full/sopac.full.mc
~ftp/pub/GSAC/full/sopac.full.list

When to Generate a New DHR

A Data Wholesaler will generate a new DHR for a piece of information when it first publishes the information to the GSAC. At that time, the Wholesaler must provide, in field 0 of the DHR, a unique integer number that will be used to track this published information within the GSAC (see DHR specification above). This unique number is only unique to an individual Wholesaler, not to the entire GSAC. The combination of the Wholesaler's name and this unique number does provide a unique identification within the entire GSAC.

How to Update Previously Published Information

If a change occurs to a piece of previously published information which causes any of the fields of the previously generated DHR (whether part of a DHF or MC) to change, then the Wholesaler must publish an updated DHR which reflects the correct values for all DHR fields. That is, the Wholesaler generates a new DHR with all mandatory fields filled with the newly correct values. Naturally, if the Wholesaler is updating information in a DHF, they must use the same unique_info_id value in field 0 of the DHR which was supplied when the information was first published. Similarly, if the Wholesaler is updating information in an MC, they must use the same unique_site_id value in field 0 of the DHR which was supplied when the information was first published.

How to Delete Previously Published Information

A Wholesaler might need to delete published information if, for instance, it is found to have been wrong in some way that is not repairable; if the information could be repaired then the Wholesaler would repair it and then publish an update as discussed in the previous paragraph. When a Wholesaler wishes to completely remove a previously published piece of information from the GSAC, the Wholesaler will generate a DHR that has values in only three fields: field 0, the unique integer identifier; field 1, the Wholesaler's name; field 6, the date and time at which this DHR was written. All other fields must be Null (empty). When the new DHR is published to the appropriate incremental table, the original DHR is simultaneously dropped from the full table. Once a unique identifier number has been removed from the GSAC it should not be re-used by the Wholesaler; it simply disappears from the system.

Providing Distributed Data Backup

Some Data Wholesalers may choose to copy published information from other Wholesalers and make these backup copies available to the GSAC. This is desirable on several accounts. First, it provides a distributed backup mechanism at geographically separate locations for the data holdings of GSAC participants. Second, it provides multiple access to the duplicated files for users; if the original Wholesaler's archive is off-line for any reason, then the information will still be available to users through the backup Wholesaler. Also, if one wholesale archive is closer to a user then the data can be transferred from that archive instead of from a different Wholesaler farther away; this speeds delivery and allows the user to avoid Internet bottlenecks.

When a Data Wholesaler mirrors information available to the GSAC through another Wholesaler, it must, naturally, publish its own DHR describing the copy and how to access it so that all Data Retailers will know about the copy. The Wholesaler must also notify the other GSAC participants that this is a backup and not an original copy (so that the backup is not backed-up yet again by some other Wholesaler; a possible infinite loop problem). The Wholesaler does this two ways. First, by keeping the name of the original Wholesaler in field 1 of the DHR. The mismatch between this field and both the Wholesaler name on the first header line of the DHF and the filename of the DHF (which contains the publishing Wholesaler's name as a part) signifies that the DHR pertains to a backup. Second, the backup Wholesaler creates its own unique number to refer to the published information in field 0, and also places the original unique number from the original Wholesaler after their own in the field, separated by a comma (as for all multiple-entry fields). This establishes a one-to-one correspondence between the original published information and the backup published information. Without this correspondence other GSAC participants would have no way of knowing exactly which piece of information is being mirrored.

In a DHR pointing to a mirrored file, field entries describing the data (start_time and provider) will be the same as the DHR for the original Wholesaler, whereas the entries describing the physical file (dhr_create_time, info_url, file_size, file_create_time, file_checksum, file_grouping, and file_compression) will be that of the local (i.e., mirroring) Wholesaler. The unique_site_id must also point to the Monument Catalog of the local Wholesaler in order for the Wholesaler to check that the monument exists.

DHF flat-file storage requirements

For each piece of information published by a Data Wholesaler there will be a DHR stored in a flat-file on their system. An estimate of the number of bytes needed for one DHR is about 300 including the semi-colon delimiters between field values. Thus, for the entire SCEC survey mode archive, which contains about 10,000 raw and 10,000 RINEX observation files, the total storage needed to keep these DHFs on-line is about 6 MB. For an archive which deals with 250 continuous sites every day (both raw and rinex) this would require about 150 KB per day or 55 MB per year. In either case this is small compared to the storage required for the actual data files themselves.

4. User Interface and Requesting Data

The main purpose of the GSAC is to make data requests simple, and to allow users to access multiple archives from one web site. The following describes one model for how to accomplish this, but is not the only possible approach. The GSAC does not place any restrictions on how the user interface will work as long as it provides the basic functionality of allowing users to locate information of interest to them and to retrieve that information in a reasonably simple manner. Beyond this it is up to the individual Retailer; that is, we encourage individual creativity on the part of the Retailers.

Deciphering the request: The user contacts a Data Retailer using a web interface and determines what type(s) of data the user wants and for which sites (if it is site-specific data). For site-specific data the user could do this by supplying the specific names of desired sites or by selecting all sites within some geographic region. In the first case the Retailer will have to resolve the problem of nonunique site names; if the user requests data for a site called PEAK the Retailer could reasonably ask, "which PEAK do you mean out of the following possible sites?" and then present the user with the coordinates of several locations with this name. Since each MC includes site coordinates it should be possible for the Retailer to sort this out, but determining exactly what the user wants from the Retailer will be a difficult question to answer and it is up to the imagination of the Retailer to figure out how they wish to deal with it. Note that this confusion about naming is going to be a problem even though the GSAC participants have renamed sites (internally to the GSAC) to get uniqueness within each Wholesaler's holdings and have a naming cross-reference (because all MCs are available to each Retailer), because each user will likely have his own favorite name for a site and will be unaware of what the GSAC participants have done with respect to unique site-names.

Notify User of Data Availability: Once the request is understood, the Retailer notifys the user of the volume of data involved and whether the entire request can be serviced at this time (i.e., some data may be off-line). The user is presented with (at least) two choices:

  1. Allow the user to select which data they want right now through a point-and-click interface (this is possible because all Retailers know the URL of all information published by all Data Wholesale archives).
  2. Let the Retailer service the whole request as a single unit to be transmitted to the user at a future time.

Collecting Data: If the user chooses for the Data Retailer to bundle the entire request (option 2 above), then the Retailer could either assemble the requested data files from the various sources onto the Retailer's computers, or they could supply the user with, for instance, an ftp script to be executed by the user from the user's computer. If the requested information is accumulated onto the Retailer's computers then once it is all available the Retailer would notify the user and the information could be transmitted by ftp, or perhaps by some non-electronic means like mailing data tapes (for instance, for very large requests). If the user requested any data that are not on-line, the Data Retailer will note which files are off-line and supply the user with contact information for the Wholesaler in possession of each of these files.

NOTE: The Retailer should place a limit on the amount of information it will transmit at one time (e.g., to deal with confused or "prank" requests).

Table 1. Monument Catalog DHR (Version 1.1, unchanged from Version 1.0)

field description field position comments
[allowed values / example]
description
unique_site_id

0

unique name
[LMCCJMVA.2207, E1C7P4X7+0012, etc.]

Created by each Wholesaler and tagged with the Wholesaler's unique site ID, which could, if the Wholesaler desires, be assigned by any of several outside agencies such as DOMES, etc.

This field, combined with field 1, provides a unique name for all sites across the entire GSAC.
Wholesaler

1

GSAC participant [cddis, ncedc, ngs, panga, scec, sopac, unavco]
4_char_id

2

short site name
[VNDP, asyt, etc.]

Typically assigned by the Data Provider. May be case-sensitive from the point of view of the Data Provider and/or Wholesaler, but the Retailer interface may perform case-insensitive searches. Does not need to be unique, even within an individual Wholesaler's archive system.
descriptive_id

3

long site name
[HPGN-CA SDGPS 01 1990, Palomar airport southeast corner, etc.]

Site name such as monument stamping or descriptive location.
dhr_create_time

4

DHR creation time
[yyyy-dddThh:mm:ssZ]

ISO standard; must be given in UTC

This is the time at which the DHR information was created and made available ("published") to the GSAC.
x

5

in meters using a geocentric coordinate system with the X-axis through the 0° meridian.
[-2456670.641, etc.]

X-axis Cartesian coordinate. Value must be fully written out; scientific notation not allowed.
y

6

in meters using a geocentric coordinate system with the Y-axis through the 90° east meridian.
[-3529987.342, etc.]

Y-axis Cartesian coordinate. Value must be fully written out; scientific notation not allowed.
z

7

in meters using a geocentric coordinate system with the Z-axis along Earth's mean rotation axis.
[3425612.879, etc.]

Z-axis Cartesian coordinate. Value must be fully written out; scientific notation not allowed.
coord_accuracy

8

rounded to nearest power of 10 (meters)
[0.001, 0.01, 0.1, 1, 10, 100, 1000, etc., Null]

Accuracy of coordinate values.

Null (empty) means accuracy unknown.


Table 2. Data Holdings File DHR (Version 1.1, modified from Version 1.0)

field description field position comments
[allowed values / example]
description
unique_info_id

0

unique information identifier

A numeric integer value with no commas, periods, or other punctuation.

For publications of backup information this field requires two entries (separated by a comma); the first entry is the unique number from the backup Wholesaler and the second entry is the unique number from the original Wholesaler.
[3415287, etc.]

Created by each Wholesaler to serve as the unique identification for the published information referred to by this DHR. For many Wholesalers "published information" will have the exact same meaning as "electronic data file". However, for Wholesalers who store locally mirrored backup copies of an electronic data file, and make both copies available to the GSAC under field 7 below, the same unique_info_id will apply to both files (that is, the unique_info_id really applies to the information).

This field, combined with field 1, provides a unique identification for all published information across the entire GSAC.

For remote backup copies this field will contain two values: the unique numbers from the backup Wholesaler and the original Wholesaler. This one-to-one mapping is necessary to notify other GSAC participants what information has been backed-up and the backup made available.
wholesaler

1

GSAC participant
[cddis, ncedc, ngs, panga, scec, sopac, unavco]

This is the name of the original Wholesaler: the Wholesaler who was the point of entry into the GSAC for this information. There is only one original Wholesaler for any given information. When one Wholesaler chooses to mirror a backup copy of information from another Wholesaler, the second Wholesaler must create their own DHR for the mirrored copy in order to let the rest of the GSAC know of the existence of the backup. However, the backup Wholesaler will keep the name of the original Wholesaler in this field. Other Wholesalers or Retailers who read the DHR for the backup will recognize that it is a mirror-copy by the fact that the entry in this field will not match the Wholesaler-name listed on the first header line of the DHF, nor will it match the Wholesaler-name built into the filename of the DHF in which the DHR resides. These other Wholesalers will know not to make a backup-copy of the backup-copy, thus avoiding an infinite loop situation.
data_type

2

data types acceptable under Version 1.1.

See note below.
[raw_gps, rinex_obs, rinex_nav, rinex_met, site_log_igs, orbit_sp3, sinex]

The data type determines which of fields 3, 4, and 5 are mandatory. In general, if the data type is site-specific then a value in field 3 is mandatory. If the data type is global in nature then field 3 must be Null (empty). If the data type is multiple-site-specific, but not global, then field 3 must contain a list of all unique_site_ids to which it applies. If a data type is start-time-specific then a value in field 4 is mandatory, otherwise it is Null (empty), and if the data type is end-time-specific then a value in field 5 is mandatory, otherwise it is Null (empty). See the note below for additional information on each accepted data_type for version 1.1.

Related raw information must be packaged into a single electronic file using one of the allowable file groupings given in field 12. For example, Ashtech raw data are often split into three files when transferred from the receiver (the "b," "e," and "s" files). All three files must be packaged into a single compound file for publication in the GSAC.
unique_site_id

3

unique name

Multiple entries are necessary for data types (field 2) that apply to multiple sites; multiple entries must be separated by a comma (","). The escape sequence "\," allows an actual comma in a name.
[LMCCJMVA.2207, E1C7P4X7+0012, etc., Null]

Created by each Wholesaler and tagged with the Wholesaler's unique site ID, which could, if the Wholesaler desires, be assigned by any of several outside agencies such as DOMES, etc.

Null (empty) means field 2 refers to a global data type.
start_time

4

beginning of data
[yyyy-dddThh:mm:ssZ, Null]

ISO standard; must be given in UTC.

Null (empty) means field 2 refers to a non-start-time-specific data type
.
end_time

5

end of data
[yyyy-dddThh:mm:ssZ, Null]

ISO standard; must be given in UTC.

Null (empty) means field 2 refers to a non-end-time-specific data type
.
dhr_create_time

6

DHR creation time
[yyyy-dddThh:mm:ssZ]

ISO standard; must be given in UTC.

This is the time at which the DHR information was created and made available ("published") to the GSAC. It might be close-in-time to the file_create_time, field 9, but not necessarily. For instance, a Wholesaler might place a data file into its wholesale archive, but not publish it to the GSAC until some time later for any number of reasons.
info_url

7

information retrieval mode.

Multiple entries are allowed if separated by a comma (","). The escape sequence "\," allows an actual comma in the URL.
on-line publication:
[ftp://computer.name.ext/full/path/filename, http://computer.name.ext/full/path/filename]

off-line publication:
[mailto:user.namecomputer.name.ext, phone:user.namefull.international.phone.number]

Methods by which information is available from a Wholesaler. We strongly encourage on-line publication of information so that it is immediately available to users. However, we allow off-line publication on the understanding that it is better to know about the existence of information even if it is not immediately available.

Multiple methods of publication are allowed for the same information and multiple local mirror-copies of a data file are also allowed. These multiple entries must be separated by commas (",") within the field.

If an on-line specification appears, then fields 8, 9, and 10 below are required; if only off-line specifications appear then fields 8, 9, and 10 may be Null (empty).

If both on-line and off-line specifications appear, then the on-line information is used to satisfy user data requests, and the off-line information may (but not "must") be passed to GSAC users as well.
file_size

8

in bytes.

Mandatory value if field 7 includes any on-line publication.

Null (empty) if field 7 includes only off-line publication.
[3461234, etc., Null]

File size in bytes without commas.

Null (empty) means file size not provided.
file_create_time

9

Mandatory value if field 7 includes any on-line publication.

Null (empty) if field 7 includes only off-line publication.
[yyyy-dddThh:mm:ssZ, Null]

ISO standard; must be given in UTC.

This is the time at which a file was created; the time when it was placed on a Wholesaler's archive. This is not necessarily the operating system time-stamp associated with the file. For instance, it is possible within most computer operating systems to change the time-stamp of a file without changing any of the contents of the file (the Unix touch command does this); there is no need for a Wholesaler to publish an updated DHR in this case because the creation time of the information in the file has not changed. In general, there is no way for the value of this field to be verified by any GSAC participant except for the publishing Wholesaler.

Null (empty) means time not provided.
file_checksum

10

MD5 algorithm

Mandatory value if field 7 includes any on-line publication.

Null (empty) if field 7 includes only off-line publication.
[ea12ab27adc30e869d42036408be5b2d, etc., Null]

A 32 character hexidecimal string representing 512 total bits.

Null (empty) means checksum not provided.
provider

11

 
[John Doe at UCLA, etc., Null]

Included by some Wholesalers so that appropriate credit can be given to the agency that provided the data to the GSAC.

Null (empty) means provider not supplied.
file_grouping

12

file packaging
[tar, pkzip, Null]

Method by which multiple files are combined into a single file. If grouping and compression are applied to the same information the grouping must take place first followed by the compression last.

Null (empty) means no grouping
file_compression

13

Multiple entries are allowed if separated by a comma (","). The order of entries signifies the order in which the Wholesaler applied the multiple compression schemes.
[unix_compress, gzip, hatanaka, Null]

Method by which files are compressed. If grouping and compression are applied to the same information the grouping must take place first followed by the compression last.

Null (empty) means no compression


Table 3. Field Specifications for Version 1.1 Data Types

data_type Field 3 Field 4 Field 5 Comment
raw_gps unique_site_id start_time end_time values required in all fields
rinex_obs unique_site_id start_time end_time values required in all fields
rinex_nav Null (empty) if created by merging the navigation messages from two or more sites (preferred); unique_site_id if from a single site start_time end_time values required in Fields 4 and 5
rinex_met unique_site_id start_time end_time values required in all fields
site_log_igs unique_site_id start_time end_time
values required in all fields

the start_time is defined to be the earliest time-stamp in the file; often this will be the time when the first GPS equipment was installed at the site. The end_time is defined to be the most recent time-stamp in the file; usually this will be the time when the last documented equipment change took place. When there is only one time-stamp in the file it will be used for both start_time and end_time fields.
orbit_sp3 Null (empty) start_time end_time orbit_sp3 information is specific to a range of times (start_time to end_time), but applies on a global scale instead of to a single location, therefore field 3 is Null (empty).
sinex unique_site_id, unique_site_id, etc. start_time end_time sinex information is specific to a range of times (start_time to end_time), and can apply to either a single location, or to multiple locations.

Comments or questions about this page? Send e-mail to Lou Estey (louunavco.org).

Last modified Wednesday, 16-Nov-2005 21:21:00 MST

 

Home | About Us | Contact Us | Support | Search | Facility | PBO | Education & Outreach

Comments: webmasterATunavco.org
© 2008 UNAVCO, Inc.