| Home | About Us | Contact Us | Support | Search | | Facility | | PBO | Education & Outreach | ||
![]() |
![]() |
|||||||
|
GPS Data DAIv2 Web Tool (Alpha) Command line Clients (Alpha) DAIv1 Permanent Stations Campaigns Monuments Station List Other GPS Data Search and Access
Data Maps
FTP Public RINEX
GSAC
Other Providers
For Educators
Data for Educators
GPS Archive Information
About the Archive
Data Policy
Archiving at UNAVCO
Submissions
unav-data
BINEX
Glossary
Contact Us
Comment
GPS Data Tools
TEQC
Hatanaka
|
Data - GSAC Structure and Data Exchange Formats
Draft
1. Introduction
Draft Submitted to the "Seamless Archive Workshop" Participants
for Review and Comments. Please email all commentary/suggested changes to gsac-wg 1. IntroductionA distributed and cooperative data environment is needed between current GPS data centers to provide the user community with seamless access to all GPS data and metadata. Each participant in this proposed GPS Seamless Archive Centers (GSAC) will maintain their individuality and bring their own strengths into play, yet give the user community a familiar and consistent data access look-and-feel to all archive holdings by providing standardized data access. The seamless archive starts by defining specific data holdings and mechanisms to access data from the GSAC's, in a standard manner proposed below. The seamless archive will evolve to meet research and GSAC needs. 2. Who Is a GSAC Participant?A GSAC participant provides data and/or access to data according to agreed-upon GSAC Working Group standards and levels of participation. Differing levels of participation are defined to match existing community functions and are intended to standardize data exploration and retrieval methods. GSAC participant functions will include: supplying collected data, assembling and organizing data in a "warehouse", and providing user interfaces to search and access data and metadata. Data Originator's collect or generate data and then supply it in its most original form. There should only be one Data Originator for any particular piece of data. For continuous GPS sites the Data Originator is whomever actually downloads the raw data from the receiver. For campaign mode sites the Data Originator is whomever conducts the field operation and then downloads the data from the receiver. Data Wholesellers combine data and metadata from multiple originators and catalog it using GSAC defined Data Holdings Files (defined below) intended primarily for access by other Data Wholesellers. Data Retailers provide at least a minimal, standard, agreed-upon user interface for researchers to explore and access data and metadata. User interfaces include FTP and WEB/Forms methods. Users in this GSAC document means anyone who request data from the GSAC, but users are expected to be research oriented. These categories and a more detailed explanation of the relationship between them is more fully discussed in Appendix B. 3. DefinitionsWe define some standard terms so that each participant in the GSAC communicates with the same language: Seamless Archive - The GPS Seamless Archive community. GSAC - A GPS Seamless Archive Center, which provides one or more specific functions. DATA - Raw and RINEX GPS/GLONASS observations, and other information useful to a seamless archive user, (anything needed to analyze the observations), such as orbit, coordinate, site, receiver, and antenna information. There are multiple data types and categories which will be discussed later in this document. The data must be distributable over the Internet (this rules out paper!) and "should" be maintained on-line at all times, though off-line data is also acceptable. GSAC Meta-Data - Anything that describes the data holdings of an archive. This can be file size, file location, site lattitude and longitude, etc. GPS Meta-Data - Anything that describes additional information for a GPS observation. This can be antenna height, receiver type, etc. 4. GSAC Meta-Data ExchangeMeta-data describing a Data Wholesaler's holdings is essential for identifying and exchanging data between Wholesalers and Retailers, and for local querying of data that resides at remote Data Wholesaler's archives. There must be a well defined, computer- parsable file (or files) that can be exchanged over the internet for this purpose. The concept of Data Holding Records and Data Holdings Files accomplishes this. The definitions follow: Data Holdings Record: information about a datum holding of a Data Wholesaler, which is collected by Data Retailers so that they can accurately inform Users what data is available from the GSAC. It describes the data of interest to the User, but is NOT the actual data that will be sent to the User. There will be several different types of DHR depending on the type of data it describes. Each DHR is a single line of ASCII text comprised of fields of information separated by a semi-colon delimiter and terminated by a newline character. The exact fields of each type of DHR are defined later in this section. Data Holdings File: The DHF is an ASCII flat-file containing multiple Data Holdings Records each of the same type. There will be several types of DHF depending on the data types available within the GSAC. A "Full DHF" is one which contains information for all of the DHR's of a specific data type available from a Data Wholesaler. An "Incremental DHF" is one which only contains information on new or updated data holdings of a Data Wholesaler since the last (or first) Incremental DHF was made available. This is analogous to the difference between a "full backup" and an "incremental backup." We currently define four types of DHF within the GSAC, but this list may be refined or expanded as requirements evolve. This list is intended to tell where data exist, when it exist, and what type of data exist: Monument File: A listing of each monument for which data exists on a Wholesaler's archive and information to describe its geographic location. Provision of this file is the "minimum entry point" for agencies wishing to be a part of the GSAC. Note that this file only tells of the existence of data from a particular monument and its location, but limits details Filename used: wholesalers_name.[full, inc].mf Observation File: A file containing information on all available data that are both time-specific and location-specific. Types of data that fall into this category include, but are not limited to, raw, RINEX, quality control, etc. Filename used: wholesalers_name.yyyy.ddd.[full, inc].of Product File: A file containing information on all available data that are time-specific only (and thus not location-specific). Types of data that fall into this category include, but are not limited to, satellite orbits, earth rotation parameters, etc. Filename used: wholesalers_name.yyyy.ddd.[full, inc].pf Site File: A file containing information on all available data that are location-specific only (and thus not time-specific). By "not time-specific" we mean that the data type does not necessarily change each time a set of observations is made, as is the case for data types described by the Observation File. This data type does contain a time stamp to signify for what date it is valid, but any changes in time are expected to be very slow. Types of data that fall into this category include, but are not limited to, scanned photos of sites, to-reach information, etc. Filename used: wholesalers_name.[full, inc].sf The use of the DHFs in the GSAC system. Each Data Wholesaler will keep track of its "full" time-specific DHFs (that is, Observation Files and Product Files) by UTC day. The DHRs for all time-specific and location-specific data pertaining to a given UTC day are kept in a single Full Observation File; there are 365/366 such DHFs per year. Similarly, the DHRs for all time-specific (only) data pertaining to a given UTC data are kept in a single Full Product File; there are 365/366 such DHFs per year. The DHF files can be stored in whatever directory structure the Data Wholesaler prefers as long as it is accessible to all Data Retailers. An obvious structure would be to have one directory per year with all OF and PF for that year in this area (see example later in document). This area serves as a permanent record for each Wholesaler of what data they have. This permanent DHF holding area is made available to Data Retailers so that the Retailer may re-sync itself with the Wholesaler if necessary. Under normal circumstances this should not be required. A Data Wholesaler will also keep track of its "Full" non-time-specific DHFs (that is, the Monument File and Site File) in the permanent holding area. There will be only one (each) of these files for each Wholesaler. There will also be a separate set of "day-directories" at each Data Wholesaler named by UTC day that will contain all new "Incremental" DHFs created on that day by the Data Wholesaler. The day-directories keep track of the work being done on a daily basis by the Wholesalers. They can thus be though of as incremental information that was added to the permanent DHF holding area on a given day. Once a Retailer has synced itself with a Wholesaler it need only access the incremental DHFs in the day-directory area in order to stay synced. On a daily basis, after the UTC day boundary, each Retailer will be responsible for collecting all incremental DHFs from each Wholesaler and either storing them on the Retailer's system for future reference, or the information will be incorporated into the Retailer's RDBMS. For example, take julian day 320 of 1997 (assume this is the current day) on SOPAC's Data Wholesaler archive in the day-directories area. The listings of files in ~ftp/pub/GSAC/day-directories/1997/320 might be: As the above listing notes, there are several DHFs in the directory. Each DHF is an incremental update of DHRs for the year and day given. There is no DHF for day 1997.320 since SOPAC (acting in its Wholesaler's role) deals only in daily downloads of continuous GPS sites. If SOPAC used hourly (or more frequent) collection intervals, the DHF for day 320 would exist and would contain DHRs for these partial downloads. Note that there are DHFs for data collected on days other than "today" and "yesterday". This is due to the fact that new or updated data has come into SOPAC's archive. At the end of the UTC day the Data Wholesaler will stop adding any DHF files to the "current" day-directory and begin placing DHFs into the "next" day-directory. That is, once the UTC day boundary is passed the data Wholesaler will not modify anything in the day-directory, but will move on to the next day-directory. Data Wholesalers dealing primarily with campaign mode data would almost always have a set of day-directories that contain only Incremental DHFs from days in the past (perhaps data that were collected many years in the past) as they work through their backlog of data files supplied by the Data Providers. In each of the day-directories the Wholesaler also keeps a file listing the filenames and times of modification of all DHFs in the directory (e.g., the output of a long listing piped to this file). If a data Retailer wants to get an up-to-the-minute listing of data holdings on the Wholesaler's archive it can get this "listing" file first to see if anything in that directory has changed in the course of the UTC day; if the dates of the all DHF files in the day-directory are unchanged from an earlier check by the Retailer, there is no reason to copy any of the DHFs to the Data Retailer's system. This minimizes the amount of traffic needed to check the current state of each Data Wholesale archive. The sub-directories in the day-directory area would not be saved past a certain date; on a daily basis, as a new one is created the oldest one will be deleted. This would provide a ring-buffer of incremental holdings information that would allow the Data Retailer to collect the DHFs late if necessary. This ring-buffer will be 30 days in length. It would of course be up to the Data Retailer to keep track of which incremental DHFs it has collected in order to stay in sync with the Data Wholesalers. It is important for the Retailers to stay synced with the Wholesalers. If for some reason synchronization is lost the Retailer will have to start over from the Wholesaler's permanent DHF records: a "full restore." In summary, there are two file storage areas at a Data Wholesaler's archive where DHFs are stored:
Example Permanent storage area:
Example Day-directories storage area: The format of each type of DHF needs to be standardized. A DHF will be an ASCII flat file consisting of a series of Data Holdings Records concatenated together. The format of each Data Holdings Record is a single line of text with fields of information separated by semi-colons (;). In cases where a semi-colon is needed for the information in a field the semi-colon will be "escaped" with a back-slash (\;). And, of course, if a back-slash is needed it must be escaped by itself (\\). A DHR for a Monument File will contain the following fields in the order given:
Note: If WGS-84 coordinates are not given then adjust horz_accuracy to compensate for any known differences between actual coordinates and WGS-84. If difference between coordinate systems is not known then leave the horz_accuracy field NULL. A DHR for an Observation File will contain the following fields in the order given:
A DHR for a Product File will contain the following fields in the order given:
A DHR for a Site File will contain the following fields in the order given:
The start time and end time of a data file may not coincide with the UTC day boundaries (especially for campaign mode data). Since the DHFs are organized by day of year, each Data Wholesaler should include a Data Holdings Record in every DHF for which the data file contains information. For example, suppose we have a RINEX observation file which starts on 1996-209T22:30:00Z and ends on 1996-210T23:59:00Z; the Data Holdings Record for this file will appear in both Full DHF Observation Files called wholesaler.1996.209.full.of and wholesaler.1996.210.full.of. If there was a scanned logsheet or file of GPS meta-data to go with this, its Data Holdings Record would also appear in the same two Full Observation Files. This duplication makes it easy for the Retailer to identify all data which might be of interest at a given time. For each datum available from a data Wholesaler there would be a DHR of one of these four types (or other types if new ones are needed someday) stored in a flat-file on their system. An estimate of the number of bytes needed for one Observation File record is about 220 including the semi-colon delimiters between field values. Thus, for the entire SCEC campaign mode archive, which contains about 10,000 raw and 10,000 RINEX observation files, the total storage needed to keep these DHFs on-line is about 4 MB. For an archive which dealt with 250 continuous sites every day this would require about 110 KB per day or 40 MB per year. In either case this is small compared to the storage required for the actual data files. We assume that the Data Retailers will usually take the DHFs from the Data Wholesalers and read them into the Data Retailer's RDBMS so that when they get data requests from Users it is quick and easy for them to know if the data exists and how to get it. The data Retailers would then have some sort of daily procedure for accumulating the new Incremental DHFs from all of the Wholesalers and then would parse these files and add them to their RDBMS; this should be easily automated. Using this scheme, the Data Retailers would really never look at the day-files from the Wholesalers except for when the DHFs were retrieved (n-times a day) and they could delete them after having processed them into their RDBMS. 5. User Interface and Requesting DataThe efforts of the seamless archive are aimed at making data requests simple, and to allow users to access multiple archives from one web site. The following describes a model for how to accomplish this. Deciphering the request: The User contacts a Data Retailer using some sort of web interface and determines what type(s) of data they want and for which sites (if it is site-specific data). This could be generated by the User supplying the specific names of desired sites or by selecting all sites within some region. In the first case the Retailer will have to resolve the problem of nonunique ID's; if the User requests data for a site called PEAK the Retailer could reasonably ask, "which PEAK do you mean out of the following possible sites?" and then present the User with the coordinates of several locations with this name. Since each DHR includes good coordinates we can hope that the Retailer can sort this out, but determining exactly what the User wants from the Retailer will be a very difficult question to answer and deserves much thought on how to do it best. Note that this is going to be a problem even though the GSAC participants have renamed sites to get uniqueness within each Wholesaler's holdings and have a naming cross-reference, because each User will likely have his own favorite name for a site and will be unaware of anything that the archive participants have done. Notify User of Data Availability: Once the request is understood, notify the User of the volume of data involved and whether the entire request can be serviced at this time (i.e., some data may be off-line). The user is presented with a few choices:
Collecting Data: If the user chooses that the Data Retailer bundle the entire request (option 2 above), then the Data Retailer begins to assemble the requested data files from various sources (since all Data Retailers know how to get to the data holdings of all Wholesalers). If the User requested data that is not on-line, the Data Retailer will note which files are off-line and supply the name and address of the Wholesaler in possession of each of these files. Once all the data has been accumulated, the Data Retailer either places them somewhere for the User to get via ftp or puts them directly to the Users machine also via ftp. A size limit should probably be placed on what the GSAC will deliver. NOTE: For large data requests we should probably think about some sort of limit, since otherwise we could waste lots of time assembling data (to deal with "prank" requests). Also, large requests may need to be sent to the User via some other medium than the internet. APPENDIX AThe GSAC needs to establish exactly what data will be available to a user. This list must not be too exhaustive as to make archiving and serving data requests too difficult to manage, but must be large enough and maleable enough to serve the GPS community's immediate needs as well as future ones. (The following is one way to describe Data Type and categories.) Data TypesData can be categorized by their level of primacy, with original measurements at the lowest level and results derived from them at higher levels. Thus, site velocities and displacements are derived from site position estimates, which are derived from GPS observations and orbit information, which are derived from raw GPS observation files. 1. Raw observationsRaw data are the original measurement files created by the GPS/GLONASS receivers, such as download files (e.g., Ashtech R-files, Trimble DAT, MES, EPH-files), RS-232 serial stream files, JPL/AOA ConanBinary and TurboBinary, Leica DS, etc. These files are typically in binary, and are receiver dependent. 2. Exchange-format observationsRINEX is the internationally adopted exchange format for GPS/GLONASS data. It is a receiver-independent translation of the raw data files that may include additional site, receiver, and antenna information. RINEX is currently defined for three types of data:
Non-standard extensions of RINEX include: observation summary files created by translator programs (e.g., TEQC), auxiliary files for other measurements simultaneously collected at the site (e.g., tilt and power), and extensions to the observation standard to allow additional data types (e.g., LC, S1, S2). RINEX is an ASCII format; proposed modifications to this format include Compact RINEX, based on the Hatanaka compression algorithm, and BINEX, a purely binary extensible exchange format. 3. Observation session and site characteristicsAdditional information specific to the observation session and the site where the measurements are obtained include monument names, receiver and antenna types, receiver firmware, antenna heights, etc. For permanent site installations, a standardized site log file defined by the IGS is used by many data providers. For episodic, campaign mode observations, handwritten log sheets are typically used and they are either scanned or transcribed into digital format for distribution on the Internet. Often this information is contained in the header of the RINEX file itself. 4. General model characteristicsOther non-site or session specific information necessary to analyze the data include antenna phase center and patterns, lunar and planetary ephemerides, nutation, precession, and polar motion models, geopotential field models, solid earth and ocean tide models, tropospheric and ionospheric models, radiation pressure models, etc. (e.g., IERS Standards, 1992). Standard exchange formats for most of these data have not been defined. Published model characteristics are typically translated into formats suitable for each analysis package by their maintainers. Estimated parametersParameters estimated from Level 1 data (and examples of standardized exchange formats) include: site coordinates and covariance (SINEX), satellite orbit and clock information (SP3), earth orientation (rotation and polar motion) parameters (ERP), and tropospheric and ionospheric variations. All of the above data types can also be divided into three categories based on their time-specific and location-specific characteristics. We have defined three types of DHR to cover the possibilities (and a fourth DHR, the Monument FIle, to help Wholesalers create a naming cross-reference). APPENDIX BGSAC HierarchyWe propose that there be four tiers to the seamless archive, the middle two being the actual GSAC members. These tiers are: Data Originator: Whomever produces the data in its original form. There should only be one Originator for any particular piece of data. For continuous GPS sites the Data Originator is whomever actually downloads the raw data from the receiver. For campaign mode sites this is whomever conducts the field operation and then downloads the data from the receiver. For continuous sites there is the possibility of more than one agency downloading the data from the receiver (though we think this doesn't happen right now); this must be avoided. Data Originators providing continuous data are often members of the GSAC, while Data Providers of campaign mode data are usually not members of the GSAC and pass their data to a Data Wholesaler. Data Wholesaler: The Data Wholesalers are the warehousers of data and operate a data archive the contents of which they agree to make available to Users of the GSAC. For any piece of data, there is one Data Wholesaler who is responsible for contact with the Data Originator. Each Data Wholesaler must have a unique identifier for each piece of data in its holdings. Although these identifiers need not be unique across the entire GSAC we suggest that usage of one or a few conventions be adopted. A Data Wholesaler is defined in more detail by the following:
Data Retailer: A member of the GSAC which:
A Data Retailer will not modify any data file sent to it from a Data Wholesaler. If an error or inconsistency is discovered with the file the Wholesaler will be notified and will fix the problem on the original version of the file; this fixed version will then be propagated into the GSAC in the same way as would an entirely new file. Note: Any participant of the GSAC can be both a Data Retailer and Data Wholesaler (and Data Provider for that matter), though as Data Wholesaler it would be concerned only with an assigned set of stations: that is, there should only be one primary Data Wholesaler for a given piece of data (say, one day of data from a GPS site). Data may in fact be held redundantly by the Data Retailers to speed access and to provide a distributed backup mechanism; but there is always a well-defined place to go to get the official/primary version of a data file. User: Anyone who requests data from a GSAC Data Retailer. Comments or questions about this page? Send e-mail to Lou Estey (lou Last modified Thursday, 17-Nov-2005 04:21:00 UTC |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
![]() |
Home | About Us | Contact Us | Support | Search | Facility | PBO | Education & Outreach Comments: webmaster |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||