| Home | About Us | Contact Us | Support | Search | | Facility | | PBO | Education & Outreach | ||
![]() |
![]() | |||||||
|
Data Data Archive Interface (DAI)
Permanent Stations
Campaigns
Monuments
Other Data Search and Access
Station List
Data Maps
FTP Public RINEX
GSAC
Other Providers
For Educators
Data for Educators
Archive Information
About the Archive
Data Policy
Archiving at UNAVCO
Submissions
unav-data
BINEX
Glossary
Contact Us
Comment
Data Tools
TEQC
Hatanaka
|
Data - GSAC Strawman Document
GSAC Strawman Document6 November 1997 Distributed Access to GPS Data CentersThis paper discusses mechanisms for distributing a catalog of GPS data and discusses implementing a common method to access the data. Feedback is requested! The intended audience is participants at the Seamless Archive Workshop at the UNAVCO Boulder Facility, Nov 11. Areas needing more work or group input may be identified at the meeting. Our goal is to leave the meeting with a plan to provide users with a mechanism for (1) identical data request and (2) identical data delivery, regardless of which data center is contacted. Of course, data centers will have extensions and enhancements to the basic method according to their own needs and capabilities. Concept of Seamless Access for GPS Data CentersA distributed and cooperative data environment is needed between current GPS data centers and existing or newly-forming networks, to provide the user community with seamless access to GPS data and metadata. Each participant in this proposed GPS Seamless Archive Data Centers (GSADC) will maintain their individuality and continue bringing their own strengths into play, yet give the user community a familiar, consistent data access look and feel by providing standardized data and data products. Issues related to seamless GPS data exchange:
Each group or organization currently provides some method to access and retrieve data - in some cases frustratingly so because methods are different, yet similar enough to be confusing. Implicit to our goals is to take advantage of existing methods by keeping the parts that work, and converge on similar methods but make them identical. Seamless Process DiscussionResolving the primary types of data users and their requirements allows us to outline the core functions of a seamless archive. The resulting definition of a minimum set of metadata enables efficient "data discovery" by users, and exchanges of data to users and between data and analysis centers. Increased efficiency is required to handle the influx of new data types and growth in data networks and to handle specific processing requirements. Real time data distribution obviously can be best provided by the center closest to collection of the data, should it wish to do so. This same center might wish to defer distribution of legacy data to avoid the impact on staff, computer storage resources and network capacity, especially when the data are available elsewhere. We discuss below some mechanisms to handle each case. One stop shopping for data products should be provided where possible so that researchers don't waste time seeking combinations of data from various centers, with a different access mechanism to each. However it is impossible to package every product that a some users may want. Initial complete packages - such as providing RINEX, orbits and polar motion parameters - should be defined. Two methods of accessing GPS data centers in use today are anonymous FTP, and query methods (mostly by Web-based form tools). FTP provides the simplest approach to making pre-identified data available, for example from permanent stations, where a hierarchical directory structure provides a known path to locate data by site name and time. Limitations of FTP include: non-uniform directory structures between data centers; potential for non-unique site names; and difficulty in maintaining and distributing automated updates. Anonymous FTP is in wide use for distributing data soon after collection, however. A query method manages requests for accessing a wider range of GPS data files, where various data suppliers provide their own naming convention and where delays in arrival of data may approach weeks or months. Queries can request specific data to be delivered, or request an inventory or catalog of information about data availability. The query can be interactive, such as with a Web form, or from an automated or "batch" background process. Examples of batch queries in English:
One problem with the query method is immediately obvious: what query language can be used, and will the user community adopt it? A batch query requires use of a formatted method to enter specific information, and to deliver the request to the data center through an established communications channel. There is no standard mechanism to handle GPS data in this manner at this time. Request for various data types and delivery methods such as: ftp, email, and media type must be handled, as well as returning a acknowledgment. We suggest adopting an existing query mechanism below. This is the IRIS NETDC EMAIL request, evolved from the BREQ_FAST mechanism for requesting and delivering seismic data. An example of an interactive query that demonstrates the Seamless concept:
In a different example using a batch method instead of a web browser, the researcher may request an inventory of data holdings using a query language (discussed below) and may specify selection criteria to restrict the data returned. The GSADC will then perform identically to the first example. Development PlanEach GSADC will create and maintain an index of specific core information about GPS sites for which it is responsible and make it available to other interested GSADC participants in a timely manner. Each GSADC may therefore extend its catalog to include information from other GSADC's. Between the various GSADC sites supporting this option, queries to a GSADC may result in the GSADC returning data from its own storage or retrieving the data from a participant GSADC. So in some cases, regional GSADC's will maintain only data which they collect themselves, and more comprehensive GSADC's may maintain lists and retrieve data from multiple regional GSADC's. This coordinated index will exist on each GSADC system, will be regularly updated, and will be the primary information accessible by the user community as the first step to providing access to the data in the seamless archive. A similar look-and-feel user interface for basic queries and retrieval of data and information will be provided by each GSADC participant. GSADC's will also coordinate to provide delivery of identical information and data in response to a users queries. To enable this process:
GSADC ExchangeGSADC participants will therefore exchange information on data holdings consisting of the following two types of files. This will coordinate data holdings and metadata:
Examples: GSADC Requests
The GSADC requests will be EMAIL based, with a specific format modeled
after IRIS' NETDC format but adopted for GPS data. A properly formatted
request would be emailed to a fixed email address at each GSADC participant,
for example: Mail gsadc The requests are parsable by computers but readable, and consists of:
The HEADER identifies the entity making the request, and is described later. The INFO_TYPE request line is an extendable method, which initially will return three types of information: 1) a catalog (inventory of data holdings, .INV); 2) response information (.RESP); and 3) data (.DATA). Generally a request looks like: .<INFO_TYPE> <DATA_CENTER> <NETWORK> <STATION_ID> <LOCATION> <START> <END> Applying this format to examine the catalog of holdings, to determine existence of data for a particular IGS station, held at a specific data center: .INV CDDIS IGS IGS.POL2 * * * The results from this request yields the basic location and time information from the panoramic file: POL2 42.6798 74.6943 1725 1995-05-25 1997-08-22 Similarly, requesting detailed occupation information within a specific time range: .INV UNAVCO_DMG IGS IGS.POL2 * "1996 11 24 00 00 00" "1996 12 08 23 59 59": generates the detailed list of daily data availability by searching through detail-listing files: The missing data for days 4-6 on the 1st line is real. On the 3rd line, only the first day of data was requested but since the query returns information in one-week quantities all days are presented. As a last example, searching for any station data for May 21,1989: .INV UNAVCO * * "1989 05 21" "1989 05 21" would return: Apparently you can get REST for 3 days, HEBG for 4 days, and the other two stations for the entire GPS week. A number of HEADER lines must precede any INFO_TYPE requests. The HEADER lines appear as follows: The line labels are relatively self-explanatory. GSADC_REQUEST is the necessary start of the message to identify the email document. The .HUB_ID is assigned to the request by the receiving data center and tracks the time of the requests and the responding entity. It is never modified once set. The HUB_ID will look like: .HUB_ID UNAVCO_DMG;Nov_11,01:00:05 SUMMARY & ACTIONS:
This is a strawman, to be more fully detailed at, and after, the meeting.
Additional handouts will be provided. We should be able to state a specific
goal, like:
(A) a definition or a specific path to locate data files based on their names as above, i.e. by GPS week and site name. For example if the comprehensive file list from UNAVCO identifies an FTP transport method and has this line:
POL2 IGS 12348M001 42.6798 74.6943 1725 881 0 1 2 3 (B) data access via the NETDC-style mechanism as described previously.
Comments or questions about this page? Send e-mail to Lou Estey (lou Last modified Wednesday, 16-Nov-2005 21:21:00 MST |
|
![]() |
Home | About Us | Contact Us | Support | Search | Facility | PBO | Education & Outreach Comments: webmaster |
|