Chapter 6
From The SDI Cookbook
Contents
|
Chapter Six: Geospatial Data Access and Delivery: Open access to data
Editor: David Bitner, dbSpatial. bitner {[at]} dbspatial.com
Context and Rationale
Access to geospatial data from the consumers point of view is a part of a process of that goes from discovery to evaluation, to access and finally to exploitation. Discovery (find, locate) involves the use of services such as metadata catalogues to find data of particular interest over a specific geographic region. Evaluation involves detailed reports, sample data and visualisation (e.g., in the recent form of web mapping through gifs or simple vector representations of the data) to help the consumer determine whether the data is of interest. Access involves the order, packaging and delivery, offline or online, of the data (coordinate and attributes according to the form of the data) specified. Finally exploitation (use, employ) is what the consumer does with the data for their own purpose.
Typically in the past, the focus of geospatial data access was supplier side with a strong emphasis on technology and community based standards and specifications. With the growth of the Internet, in particular Web based technologies, access has become a demand driven operation. Consumers expect simple discover and access to cheap (or free) data in simple standard formats that can be used in desktop applications. Increasingly non-traditional suppliers are offering geospatial services, an example being Terraserver (http://terraserver.microsoft.com. The ability to leverage off other major developments such as the World Wide Web, and in some cases electronic commerce, has allowed broader participation in the Industry. The further democratisation of access to geospatial data thus enables value-added suppliers to create new data products and services.
The range of issues from an organisational point of view can be categorised two ways: 1) how broad is the client group; 2) how broad is the supplier group. In both cases issues tend to appear and grow as the groups become broader. In general issues revolve around copyright, licences (end user vs. reseller), cost, privacy, data formats and standards.
For example, if the client group is only internal staff then issues such as cost and copyright might not play a factor. As the scope of the client group grows to a limited number of known clients then there are straightforward mechanisms to control access. However, providing broad access to large group of potentially anonymous clients.
Similarly, as the size of the supplier group grows then issues appear. It is easier to establish a common policy for one or two organisations than it is for many. Typically each organisation has a business model (or non-business model!) that reflects its mandate and environment. The types of data and services it provides, the form and representation of the data, the quality and standards for the data all reflect this business model. Trying to bridge these issues between disparate organisations is an exponential problem.
The overlap between information managed by subject-specific communities in possibly parallel infrastructures can compound problems of data discovery and access. This can be viewed from either the consumer or supplier perspective. For example, as individuals in communities such as biodiversity or geosciences attempt to leverage a combined spatial data infrastructure to support their own goals they introduce new factors. These could be new standards or convention that they commonly require, it could be a new attribution requirement on the data not previously realised, or it could be the need to provide common access to data not otherwise visible from a spatial data infrastructure.
Several trends can be noted in the treatment and handling of geospatial data. Typically in the past the first concern of a data custodian has been what format the data is stored or managed in. Increasingly the trend is to move one level up and only worry about the interfaces to the data. This allows the data to be managed in the best manner possible, while providing open, standards based access. A consequence of this, however, is that the content of the data must be of a sufficient quality to support these interfaces. Often existing data is not accurate enough, up to date or lacking in attribution.
Another trend is in the organisation of the data itself. There is an evolution that starts back with traditional paper products. These migrated into discrete digital files that were typically stored offline, e.g., on a tape rack. As mass storage became more affordable these files found themselves living on online media (magnetic or optical) for easier access. This last step is an important one when you couple it with the developing of ubiquitous, wide area internetworking, i.e. the Internet. At this point a supplier was empowered to deliver data online.
More recently the trend has been to merge all the discrete data sets together into a single, seamless data warehouses that have spawned the development of direct data access services. This has been enabled by developments in mass storage and spatial database technology. This step is also proving to be hard on the data, revealing inconsistencies in data accuracy and quality. Recent infrastructure developments allow the creation of virtual data warehouses that federate multiple instances of a data warehouses into a single logical entity.
Organisational Approach
As in any development it is important to understand who the stakeholders are and what roles each will play. For example in most national infrastructures government suppliers are key stakeholders. How they will play in the development and operation of the data access component of the infrastructure depends strongly on government policies regarding data distribution, cost recovery, etc.
Commercial entities will generally play a strong role as providers of tools and services but may also be suppliers of primary and value added data. It is important to understand the relationship between the commercial sector and the infrastructure as whole, e.g. will the commercial sector have a role in planning the infrastructure? What types of business arrangements will be supported in the infrastructure?
The final category of stakeholder is the consumer or end-user. Their use of the data access element infrastructure is dependent on a number of factors including: the functionality of the infrastructure tools, the amount and quality of the content accessible, operating policies, infrastructure business model (will consumers be charged for access?), etc..
In the early stages of the development it is important to specify and review the long term vision for the entire infrastructure to determine where the access components fits and how it ties into other infrastructure elements. At this stage it is helpful to develop some scenarios and use cases that can be presented to the stakeholders and refined as required.
The importance of developing a supportive policy/organisational environment should not be underestimated. Potential stakeholders will only become active participants if they see advantages for their organisations and if they do not feel threatened by the infrastructure. This policy/organisation environment will vary from country to country and will need to be worked out closely with the stakeholder community. The buy-in and commitment from senior management of all stakeholders is critical to the success of the infrastructure as a whole and to that of the access element in particular. The Canadian Geospatial Data Infrastructure (http://www.geoconnections.org/) is an example of an infrastructure implementation that has developed an organisation based on broad stakeholder participation.
Some of the issues that need to be considered in the development of the supportive policy/organisational environment are:
- Distributed/autonomous suppliers
- The management of the data should be done as close as possible to source. This ensures the accuracy and quality of the data.
- Non threatening to mandates
- Commercial and government stakeholders need to feel comfortable as active participants in the infrastructure. They should not feel threatened by infrastructure business models or policies.
- Multiple levels of “buy-in”; low barrier to entry
- The access component of the infrastructure must provide multiple levels of buyin from a low cost option with limited benefits, e.g. basic advertising of products and services, to higher cost options that offer increased benefits, e.g. distributed search connections to the supplier’s inventory. This allows suppliers to choose a level of participation that best meets their business and operational objectives. This is especially important in the early operation of the access component as many suppliers will want to “try” it out and hence may not be prepared to expend much effort until they see how it works.
- Sustainable long term business models
- The access component of an infrastructure must provide an environment that supports a variety of supplier business models. The development of a sustainable business model for the operation of the access component is critical to the long term success of the entire infrastructure.
1. Role of the private sector
The role the private sector as suppliers of data, services, and technology and as potential operators of the access infrastructure must be clearly defined.
2. Marketing and promotion
The access component of an infrastructure must develop a marketing and promotion plan to build up the level of awareness and participation as quickly as possible. It is important to get a critical mass of suppliers so potential participants will see the benefits of joining the infrastructure. Potential benefits to suppliers include:
- Economies of data collection, closest to the source
- Reduced operational costs
- New clients (national and international)
- Data reuse (reuse vs recollection or conversion)
- Common tool and service reuse
- Advertising
- Benefits of “free” portrayal
- Enabling/supporting broad new applications, e.g.disaster management , value added
Implementation Approach
Definitions and Overview
Data Sets
Data sets are described by metadata and maintained within a data store. Foundation and Framework data sets represent fundamental or core data that may be present within a spatial data infrastructure (See Chapter 2). Data sets are composed of collections of features (e.g. roads, rivers, political boundaries, etc.) and/or coverages (e.g. satellite/airborne imagery, digital elevation models, etc.).
Data Stores
Data stores are used to manage data sets. Data stores may be offline or online repositories. Traditional online data stores are file-based repositories, setup for the delivery of pre-defined data sets. Data stores also contain text and attribute data related to a data set. Data warehouses are datastores that provide seamless access and management of data sets.
Spatial Data Warehouse
A spatial data warehouse provides storage, management and direct access mechanisms. Typically, data warehouses ingest data from legacy file-based or data production systems.
Key characteristics of a spatial data warehouse include:
- the access and delivery of arbitrary features, layers, etc.
- seamless repository
- common data model
- application neutral, supporting a heterogeneous application environment
- support of large volumes of data
- multi-temporal support
- common repository for spatial and non-spatial data
- efficient access to large volumes of data
Examples of commercial data warehousing and service solutions for geospatial data include: Cubestore from Cubewerx (http://www.cubewerx.com/), the Oracle Spatial solution, (http://otn.oracle.com/products/oracle9i/datasheets/spatial/spatial.html) and ESRI Spatial Data Engine (http://www.esri.com/) .
Data Access Service
Implementations of data access services include the following:
- Offline (e.g. packaging and physical delivery of data sets in either hardcopy or softcopy)
- Direct to datastore (e.g. softgoods delivery via ftp, specified via e-commerce order request)
- Brokered - provide specification of data access request to secondary (online or offline)access service
- Online data service (e.g. stateful request/response access protocol to data warehouse) supporting online operations such as:
- Drill down
- Aggregation
- Generalisation
In Open Geospatial Consortium (http://www.opengeospatial.org/) Project Document 98-060: "User Interaction with Geospatial Data” the Portrayal model is described. Figure 6.1 describes this model, which illustrates an simple features-based access and portrayal services pipeline.
Data Access Client
Online implementations of data access clients include:
- “thin” Internet/Web – client is provided by standard Internet/Web tools (no Java – e.g. Web browser, e-mail, ftp client, etc.)
- “medium” client provided by Web browser with Java, or ActiveX controls
- “thick” client provided by a Web browser plugin, or standalone application (network access via a distribution computing platform such as Corba, DCOM, Java RMI, etc.)
- Traditional GIS type client - access to previously downloaded data set, and direct network access to data warehouse
- “middleware” client – transparent access to consumer via a middleware infrastructure or applications service
- Geoprocessing service – direct access to data for use by a geoprocessing service (e.g. Web mapping in Chapter 5 with interactive portrayal service)
Data Formats
Common spatial data formats include the following:
GIS proprietary (e.g. ESRI, MapInfo, Intergraph, etc.) A good overview of GIS formats can be found at http://www.gisdatadepot.com/helpdesk/formats.html
International and community Efforts have recently been made to minimise the number of geodata formats and to converge towards a reduced set. The Spatial Data Transfer System (SDTS), ISO TC/211 and the DIgital Geographic Exchange STandard (DIGEST) are examples of this trend. There are also exchange formats that allow the use of data outside of closed environments (e.g. Geography Markup Language - http://www.opengeospatial.org/docs/02-023r4.pdf).
Typical native data formats for most GIS applications contain only enough information for the originating GIS application to be able to use it properly. The data formats usually carry the features and maybe some basic projection information. Data Exchange formats are usually more robust. They usually carry information that would allow the use of the data in a variety of systems. Exchange formats usually also carry some minimum metadata to describe the data set as well as data quality statements. Data exchange formats are typically used by producers of data. Due to lack of consensus on specific format standards, spatial data infrastructures often support access to multiple spatial data formats through data access services. However, if it is feasible, the definition of a single community format based on ISO and OGC specifications is ideal to promote information exchange (See Chapter 2).
In the past, supporting a multitude of GIS data formats was very problematic. Currently, most GIS and related access systems support format translation. Examples of commercial support for format translation include: the Feature Manipulation Engine from Safe Software (http://www.safe.com/) and Geogateway from PCI (http://www.pci.com/) An online data access service that combines data access with format translation is the Open Geospatial Datastore Interface (http://ogdi.sourceforge.net).
Unfortunately format translation systems do little to support translation of semantics. The real problem for interoperable data access services, and formats is the lack of common semantics. Semantic translation and multi use feature coding catalogues (e.g. Digest) attempt to address the cross domain semantic support issue (See Chapter 2).
Web Implementation formats
Vector Files A vector file has many advantages that will prove useful for WWW spatial interfaces:
A vector file can be delivered to the client where it can be zoomed and panned without the need to expensively conduct every operation on a WWW server. It is composed of layers that might represent roads, rivers, or boundaries. The layers can be switched on or off. A vector file often allows a mechanism to limit the level of zoom so that spatial data is not displayed as accurate beyond its level of reliability. The size and efficiency of a simple vector file will help with network services and response times. Fortunately, most GIS software programmes can directly produce vector files. A vector file supports functions such as an interactive mapping, symbolization, and coordinate transformation.
There are a three candidate file formats for encoding vector information on the WWW: Simple Vector Format (http://www.w3.org/Graphics/SVG/), Web Computer Graphics Metafile (http://www.cgmopen.org/webcgmintro/paper.htm) and XML-based encoding formats (e.g. Geography Markup Language – GML) that allow for Web-based transfer of feature information, for subsequent styling and rendering via Web client, or client plug-ins. Only GML is specifically designed for the encoding of vector geographic information; the other formats are designed for the exchange of vector graphic information but may have little or no reference to real world or mapped coordinate systems or feature content.
Raster Files Web/internet delivery of GIS raster formats such as ADRG, BIL and DEM (http://www.gisdatadepot.com/helpdesk/formats.html) is often problematic due to the large size of such files, combined with general lack of Internet bandwidth. Typically compressed raster files predominate Web-based portrayals for both vector and raster data. Common compressed Web formats include GIF, JPEG and PNG (http://www.w3.org/Graphics/PNG/) to move single variable panchromatic or color images as raster files.
Relationship to other spatial data infrastructure services
Figure 6.2 illustrates the relationship role of data access in an end-to-end resource discovery, evaluation and access paradigm. Successive iterations of resource discovery via a metadata catalogue, followed by resource evaluation (such as Web mapping) lead to data access either: direct as a data set, or indirect via a data access service.
Mature spatial data infrastructure will allow both application and human exploitation of the resource access paradigm. A key element of future spatial data infrastructures is the ability to broker requests for services, based on discovery and real-time access to online geoprocessing and related services. Future capability for chaining of distributed geoprocessing services is also expected.
A system context for data access is given in Figure 6.3. A data access service provides network access to a data set stored within a data store. Data sets are discovered (and later accessed) via metadata queries from a catalogue client to a data catalogue service (See Chapter 4).
Data sets can be visualised (and later accessed) via Web Mapping services [See Chapter 5], which are complementary to the data catalogue service.

Figure 6.3 – System context for Geospatial data access services
Standards
In general, standards related to geospatial data access are still in their infancy. The standards of most relevance to access components of spatial data infrastructures include those from ISO/TC211, Open GIS Consortium (OGC) and Internet-related bodies including the World Wide Web consortium (W3C) and the Internet Engineering Task Force (IETF).
ISO/TC211
The primary mandate of ISO/TC211 (http://www.isotc211.org) is international standardisation in the field of digital geographic information.
“This work aims to establish a structured set of standards for information concerning objects or phenomena that are directly or indirectly associated with a location relative to the Earth.
These standards may specify, for geographic information, methods, tools and services for data management (including definition and description), acquiring, processing, analyzing, accessing, presenting and transferring such data in digital/electronic form between different users, systems and locations.
The work shall link to appropriate standards for information technology and data where possible, and provide a framework for the development of sector-specific applications using geographic data.”
Emerging work on services is currently underway in both ISO/TC211 and the OGC. The definition of services interfaces will allow a wide range of applications access and use of geospatial resources. The OGC Simple Features Access model for SQL has been submitted to ISO for standardisation.
ISO SQL/MM
The purpose of the Draft Spatial Database Standard SQL/MultiMedia (SQL/MM) Part Three Spatial is to define multimedia and application specific objects and their associated methods (object packages) using the object-oriented features in SQL3 (ISO/IEC Project 1.21.3.4).
SQL/MM is structured as a multi-part standard. It consists of the following parts:
Part 1: Framework
Part 2: Full-Text
Part 3: Spatial
Part 4: General Purpose Facilities
Part 5: Still Image
SQL/MM Part 3: Spatial is aimed at providing database capabilities to facilitate increased interoperability and more robust management of spatial data.
Open GIS Consortium (OGC)
The Open GIS Consortium has achieved consensus on several families of interfaces, and some of these have now been implemented in Off-The-Shelf software. All OGC consensus interface specifications carry a pledge of commercial or community implementation by their submitting teams. Phase 1 of the initial OGC sponsored Web Mapping Test (WMT) bed initiative [ref: Chapter 5] was successful in “Web mapping” portrayal of spatial data. An XML-based encoding scheme (Geography Markup Language or GML) for OGC Simple features was also an important output of the Testbed process.
The publication of the OGC Web Feature Service (WFS) Specification in 2002 provided a solution for the standardised request and delivery of vector data. Supporting the OGC “Feature Model” shown in Figure 6.4, the WFS specification (http://www.opengeospatial.org/docs/02-058.pdf) defines the dialogue required to interact with geographic Features via vector data service. GML is used as the primary encoding for vector information returned from the OGC WFS. The use of WFS with various GML application schemas allows for the publication and exchange of spatial data in full vector detail. A detailed OGC Cookbook is published on the OGC website to help the interpretation and implementation of the WFS specification.
Whereas the WFS provides access to vector information, the request and service of raster information requires a separate specification. The OGC Web Coverage Specification (WCS) was published in 2003. It extends the Web Map Server (WMS) interface to allow access to geospatial "coverages" that represent values or properties of geographic locations, rather than WMS generated maps (pictures). Thus one would receive an array or surface of data values instead of color values. This is useful for keeping the data value behind raw or interpreted imagery, other remotely sensed information, or other more or less continuously varying surfaces of data (e.g. elevation, temperature, constituent concentration). The WCS document is available at: http://www.opengeospatial.org/docs/03-065r6.pdf.
Three Open GIS Simple Feature Access (SFA) interface specifications have also been released to support feature access in relational database environments: one each for SQL, COM-based, and CORBA distributed computing platforms. The SFA and interfaces provide access to and control over GIS features. At the primitive level, the interfaces provide for the establishment of linear and angular units, spheroids, datums, prime meridians, and map projections that give semantics to coordinates. At the intermediate level, they enable the construction and manipulation of geometric elements such as points, lines, curves, strings, rings, polygons, and surfaces, as well as the topological and geometric and other relationships between them. Included are support for common geometric and topological constructs, such as convex hull, symmetric difference, closure, intersection, buffer, length, distance, and dozens of others. At the GIS feature level, the interfaces provide for access to feature collections using geometry or attributes for selection.
Web and Internet related
The Internet Engineering task force (http://www.ietf.org/) develops and maintains specification for many Internet related application, transport, routing and security standards (Request for Comments – RFCs) many of which are related to data access (e.g. http, ftp, smtp).
The World Wide Web consortium, or W3C (http://www.w3.org/) is responsible for the development of common protocols and specifications to further the evolution of the World Wide Web. Activities of the W3C that related to spatial data access include work on Web graphic file formats, XML and metadata.
Related Services
Many services are related to data access. A brief listing follows:
- Discovery and catalogue services [ref Chapter 4]
- Webmapping [ref Chapter 5]
- Electronic commerce related (e.g. http://www.commerce.net/)
- Authentication
- Payment
- Confidentiality (e.g. Secure Socket Layer)
- Public Key Infrastructure
- Delivery and Packaging
- Compression
- Subsetting and subselection
- Container-based delivery systems (e.g. http://www.paradata.com/)
- Data subscription services
- Data and file transport
- HTTP
- FTP
- SMTP/MIME
- Geoprocessing services (e.g. as defined by OGC)
- Distributed Computing Platforms
- CORBA (http://www.omg.org/)
- COM (http://www.microsoft.com/)
- Web/Java/XML
Best Practice Application
GeoGratis (http://geogratis.cgdi.gc.ca/)
One common problem with online access to data through a single infrastructure is the variety of policies and practice in place by the different data custodians. In order to support these different access policies one approach is to develop services to support different basic paradigms. These cases include:
- Custodians who restrict access to particular users would benefit from common user authentication/authorisation services;
- Custodians who charge for data or services would benefit from electronic commerce services;
- Custodians who distribute data free of charge would benefit from an inexpensive mechanism (both time and money) to distribute data.
One example of services to support the third paradigm is GeoGratis that provides common services to support the distribution of freely available geospatial data. GeoGratis provides a single ftp/web access point where consumers can discover and download freely available data sets. As a common online service GeoGratis can be viewed from different perspectives:
- The types of data it makes available;
- The services it provides;
- The distribution model it offers.
GeoGratis makes many types of geospatial data available to the consumer. These data may be national or local in scope, raster or vector, or current or legacy data.
Small-scale national data sets are commonly made publicly available. In the case of GeoGratis, base map data from the National Atlas of Canada is available for download. Additionally many national scale framework data sets are available through GeoGratis. At the other end of the spectrum are data from local test studies/sites that are nominally available free of charge. By offering basic download capabilities GeoGratis supports a wide variety of data types, including raster, vector and tabular. The only restriction is on any value-added service above the basic download capability. A final characteristic of the data available through GeoGratis is the availability of many legacy data sets such as the Canada Land Inventory. These data are typically data sets that suffered through some measure of cost cutting or program termination and as a result are no longer supported. GeoGratis provides a facility to make these data available albeit without background support.
In addition to freely available data GeoGratis provides value-added services. As a basic service GeoGratis provides the download of freely available data. Other basic services that GeoGratis provides is the discovery of available data through a search interface, the evaluation of data sets through detailed metadata and visualisation. Additionally, extra services are provided in support of data download – these include data subsetting, reprojection and reformating for all types of data available through GeoGratis. More advance services include the provision of data warehousing capabilities that support seamless access to large area data sets available through GeoGratis.
Finally, GeoGratis offers a cost avoidance data distribution model. Since GeoGratis is provided as one of many common services supporting data access, this distribution model does not preclude other models, i.e., private access or fee based access. Similarly, GeoGratis does assert that all data should be freely available, but provides an effective service for data that is freely available.
One example of this is the National Atlas of Canada digital data. Originally these data were sold for a nominal fee. However it did not prove cost effective to continue this strategy due to the costs of selling and supporting the data compared to the limited return. Therefore a strategy of cost avoidance was adopted where the data was placed on GeoGratis for free download and support was removed. Access by any other means (such as distribution of the data on CD) was left to the value added private sector community. The result was a dramatic increase in the access and use of these data.
From an implementation and standards perspective, Geogratis provides an excellent “data rich” environment in which to implement emerging spatial data infrastructure standards, in an operational environment. Geogratis currently supports Catalogue-based discovery services via the Z39.50 Geo profile, and is expected to provide future online OGC Web mapping and directaccess spatial data warehouse access services. The new reprojection and reformatting services provided by Geogratis will also be used to exercise the emerging OGC service specifications within an Intranet environment.
Summary and Readiness Analysis
Key organisational issues, related to data access in development of a spatial data infrastructure include:
- Ensuring key government, commercial, and value-added data/related service providers are represented as key stakeholder in the development and implementation of a national spatial data infrastructure
- Collaboration of government data suppliers on coordinated, supportive policies that relate to spatial data access and distribution including: availability of free data, pricing, copyright, and use/integration of electronic commerce
- An access infrastructure and policy that is non threatening to stakeholder mandates
- Support for multiple levels of “buy-in” to the data access infrastructure with a low barrier to entry
- Sustainable long term business models
- Early and clear indication of the role of the private sector
- Early marketing and promotion of the entire spatial data infrastructure program
- Awareness and adoption of international standards
Recommendations
The matrix below illustrates the evolution of data access and related spatial data services. Migration from “classic” towards “infrastructure enabled; standards based; and full functioned” is required to bootstrap a national spatial data infrastructure. Both “top-down” and “bottom-up” implementation strategies are suggested. Early adoption and “best practices” should be followed by key government data providers.
- The Cookbook authors recommend the development and publishing of data schemas using the OGC
Geography Markup Language (GML) Version 3.2.1 for common re-use data themes.
- <b><i>The Cookbook authors recommend the deployment of OGC Web Coverage, Version 1.0 (WCS) and Web Feature Services, Version 1.1 (WFS) for raster and vector data publication, respectively.
- <b><i>The Cookbook authors recommend that participants register their data services with the GEOSS Component and Service Registry (CSR).
The Group on Earth Observation (GEO) hosts a global service registry that acts as a directory of all known web services in the Earth Observation and SDI communities. By listing data access services in such a system, publishers can assure that they can be discovered in a trans-national context.
References and Linkages
GeoGratis (http://geogratis.cgdi.gc.ca/)
International Organisation for Standards, ISO/TC211 (http://www.isotc211.org)
Internet Engineering Task Force (http://www.ietf.org/)
World Wide Web Consortium, or W3C (http://www.w3.org/)




