Chapter 2

From The SDI Cookbook

Jump to: navigation, search

Contents

Chapter Two: Geospatial Data Development: Building data for multiple uses

Editor: open

Context and Rationale

In the times of traditional ‘mapping’, collection and distribution of geographic information used to be highly centralised, or controlled by powerful government monopolies. This pattern was established since the beginning of the history of mapping, and lasted for centuries, until very recent times. It was a necessity that had never been challenged due to the heavy costs and technology associated with traditional mapping and to the long time-scales of mapping projects that often extended over several decades. Also, maps were not necessarily a consumer product, but were considered part of the national/local assets – data mainly used by the government, for defense, taxes, planning and development.

Thus the governments determined the collection of the information in specific types and formats required for its intended applications. Applications did not vary much across borders, and therefore a similar range of products was developed in many countries. These include:

  • Cadastre, cadastral maps (scales from 1:100 to 1:5 000)
  • Large scale topographic maps for urban planning and development (scale from 1:500 to 1:10 000)
  • National ‘base maps’ (medium scale, 1:20 000 to 1:100 000)
  • Small scale maps (1:100 000 and smaller)

Most, if not all, other mapping products and projects would use these main ‘basic maps’ as a template, as a common reference, and for building upon this ‘basic information’ the thematic data and applications that were required. Thus national interoperability was achieved. Moreover, needs across borders being very comparable, national products across borders were also quite similar, and if edge-matching was not always evident, anyone from country 'A' would be able to read and use a paper map from country 'B with no special effort required. Thus tacit cross-border interoperability also existed.

GIS technology has changed all that, particularly with the development of desktop GIS. Usage and type of applications is now incredibly diverse. GI has become a mass-market product on its own or is found integrated in hard- and software solutions. Nearly anyone can create their own maps, thanks to the use of desktop mapping, GIS, GPS surveying, satellite imagery, scanning and intelligent software. The old monopoly is shaken.

GIS technology is been employed in many different areas and in newer fields of applications, as computer hardware and GIS software applications provide improved capabilities at reduced cost. However, the overall cost of developing geospatial data required to support GIS applications remains relatively high compared with the hardware and software required for GIS. In addition, GIS users tend to develop their own data sets, even if there are existing geospatial data sets available for them, because:

  • they may not know available existing data sets that could be appropriately used for their applications; or access to these data sets is difficult
  • they are not used to sharing data sets with other sectors and/or organisations; and
  • existing geospatial data sets stored in a certain GIS system may not be easily exported to another system.

These problems arise from the fact that existing geospatial data sets have been poorly documented in a standardised manner. Consequently, there have been duplicate efforts in geospatial data development, which sometimes hinders further dissemination of GIS applications in local, national, regional and global circumstances.

As a result, the new era of GIS is still characterised by:

  • many actors involved in data collection and distribution
  • a proliferation of GI applications, product types, and formats
  • duplication as a consequence of the difficulties to access the existing data, and the highly specific quality of the data collected
  • increasing difficulty in the exchange and use of data that came from different organisations

Core-, Reference-, Base-, Fundamental-data, and other similar terms are often used, and generally understood … until one tries to define what concept(s) they cover, or until one tries to define the related specifications.

Most GIS applications employ a limited number of common geospatial data items, including geodetic control points, transportation networks, hydrological networks, contour lines and so forth. These items are common to many GIS applications and provide keys for the integration of other and more specialized thematic information. They represent the content found in most traditional base-maps, or in modern technology and terminology, in most GI databases and products. Does that mean that these items are the ‘core’? What about postal addresses? What about cadastral parcels?

The concepts of ‘core-data’ and of ‘reference-data’ relate to two quite different perspectives. But fortunately they may result in the definition of very similar specifications. Let’s start with ‘reference’. The primary reference for cartographers is the geodetic and levelling networks that give the surveyors the physical links to a co-ordinate system. Of course, this has recently and dramatically changed with satellite positioning technologies, but the principle remains that the primary reference is what gives access to geodetic coordinates. We are not really concerned with this type of reference here, because it is generally not a part of the Geographic Information that is used in GIS applications, but rather its background. Very often it is even not visible.

If geodesy is the reference for the cartographer and the surveyor, the ‘reference’ of the GI user is generally more closely related to the real world. It includes concrete themes, such as infrastructure – roads, railways, power-lines, settlements, etc, or physical features – terrain elevation, hydrography, etc. It includes also less tangible features that have nonetheless a significant role in human life: administrative boundaries, cadastral parcels, gazetteer, postal addresses, etc. All these features are keys that allow one to relate, to ‘refer’, external information to the real world, through the media of its GI representation. Therefore they may be considered as comprising a reference for the GI user -- the ‘reference data’.

A different perspective presides over the conceptual approach of the ‘core data’. The core being the heart, the central part, the fundamental part, it may be also considered as being the common denominator of all GI data sets, being so because being used by most applications. We can see that this perspective can bring the specifications of the core very compatible with those deriving from the concept of the ‘reference data’. Therefore, let’s not lose ourselves in academic debates, and let’s keep here a simple practical view and terminology.

‘Core data’ when used here, will mean “a set of Geographic Information that is necessary for optimal use of most GIS applications, i.e. that is a sufficient reference for most geo-located data.” The relevance of this definition can of course be questioned, and will need to be improved. Let’s adopt it only for the sake of understanding the following chapters. One obvious necessary accommodation to the above definition, is that the specifications might be scaledependent. Core, then, may refer to the fewest number of features and characteristics required to represent a given data theme.

We have seen before that the GIS revolution has resulted in a democratisation of GI, but also in a key problem that is the non-interoperability of the GI produced with the new technologies. We propose that the concept of the ‘core data’ is one instrumentality to help improving interoperability, thus increasing GI usability and reducing expenses resulting from the current duplications.

Interoperability complications exist at different levels, and they can be found in four main types:

  • cross-border : edge matching between different data sets
  • cross-sector : data sets created for different sector-based applications
  • cross-type : e.g. raster- vs. vector-data
  • overlap : same features coming from different sources and process

Resolving the related issues will need a mix of three ingredients -- the technology, the adoption of a common concept of ‘core data’, and of course the political support that will help resourcing the necessary key implementations.

The concept of the core aims at sharing the core data sets between users in order to facilitate the development of GIS. Each data item may be provided by a different data provider. Such data providers produce data through their daily businesses including road management, urban planning, land management, tax collection, and so forth. Although there may be many data providers, the data sets they provide must be integrated to develop core data sets. Once these core data sets are shared between data users, each user does not have to develop the core data by oneself, and can avoid duplicated efforts of core data development. Consequently, by sharing the cost of developing the core data, data development cost can be minimised and shared between users.

Much more than at the time of data set creation, the benefits of the ‘core data’ concept will be revealed when updating. Since these core data sets are developed by those who produce the data through their daily businesses, they are updated most frequently. Therefore, the users are assured of using up-to-date core data sets. In addition, these data producers develop most detailed geospatial data with high quality based on their business requirements. Another benefit of using core data sets lies in the fact that these commonly used core data sets enable the users to easily share other geospatial data with other users.

Achieving Benefits

In order to achieve those benefits described in the previous section, those data producers who develop and maintain geospatial data sets through their daily businesses are to distribute their data to the public. Once distributed, GIS users can collect and integrate them in their own GIS applications. Such data sets would provide GIS users with the most up-to-date and highest quality data sets publicly available. Hence the users have to spend only a minimum amount of cost for the core data in their GIS applications.

Global Map is one illustration of ‘core' data sets conceived in a global or at least multi-national environment. The Japanese Geographical Survey Institute took an initiative in 1992 to develop a suite of global geospatial data (Global Map) to cope with the global environmental problems. The goal is to involve national mapping organisations to collaboratively develop global geospatial data sets. By incorporating national mapping organisations of the world, the collected information would be most up-to-date and assured of being free of national security issues. The Global Map could be considered as an initial implementation of the concept of a suite of ‘core data’ for GSDI in concert with similar framework data sets at regional and national levels.

It is important to recognize that Core data, as represented by Global Map and other national initiatives, do not comprise the only data available within a national or global SDI. SDI capabilities enable the documentation and service of all types of geospatial data, such as local scientific or engineering projects, regional or global remote sensing activities, and environmental monitoring. Although SDIs as infrastructure enables access to all these types of information, special consideration is given in this chapter to document issues associated with data of high reuse potential that may be served by SDIs at local, national, or global levels as traditional base map themes.

Organisational Approach

At the national level, common spatial data are often defined through community and/or national agreements on content, known as "framework" or "fundamental" data in various national SDIs. In the Australian Spatial Data Infrastructure (ASDI), Fundamental describes a dataset for which several government agencies, regional groups and/or industry groups require a comparable national coverage in order to achieve their corporate objectives and responsibilities. In other words, fundamental data are a subset of framework data. Similar concepts exist in other countries with similar terms, and most identify general themes of interest as "framework" information, for they provide a framework of base, common-use geospatial information onto which thematic information can be portrayed. An organisation interested in implementing spatial data that will be compatible with local, regional, national, and global data sets, must identify, and potentially reconcile different framework designations across their geographic area of interest.

The framework is a collaborative effort to create a common source of basic geographic data. It provides the most common data themes geographic data users need, as well as an environment to support the development and use of these data. The framework’s key aspects are:

  • specific layers of digital geographic data with content specifications
  • procedures, technology, and guidelines that provide for integration, sharing, and use of these data; and
  • institutional relationships and business practices that encourage the maintenance and use of data.

The framework represents a foundation on which organisations can build by adding their own detail and compiling other data sets. Existing data content may be enhanced, adjusted, or even simplified to match a national or global framework specification. This is helpful for the purpose of exchange.

Framework Leverages the Development of Needed Data

Thousands of organisations spend billions of dollars each year producing and using geographic data. Yet, they still do not have the information they need to solve critical problems. There are several aspects to this problem:

  • Most organisations need more data than they can afford. Frequently, large amounts of money are spent on basic geographic data, leaving little for applications data and development.
  • Some organisations cannot afford to collect base information at all. Organisations often need data outside their jurisdictions or operational areas. They do not collect these data themselves, but other organisations do.
  • Data collected by different organisations are often incompatible. The data may cover the same geographic area but use different geographic bases and standards. Information needed to solve cross-jurisdictional problems is often unavailable.
  • Many of the resources organisations spend on geographic information systems (GIS) go toward duplicating other organisations’ data collection efforts. The same geographic data themes for an area are collected again and again, at great expense. Most organisations cannot afford to continue to operate this way.

Framework initiatives will greatly improve this situation by leveraging individual geographic data efforts so data can be exchanged at reasonable cost by government, commercial, and nongovernmental contributors. It provides basic geographic data in a common encoding and makes them discoverable through a catalogue (See Chapter 4) in which anyone can participate. Using Web mapping and advanced, distributed GIS technology in the future, users can perform visual cross-jurisdictional and cross-organisational analyses and operations, and organisations can funnel their resources into applications, rather than duplicating data production efforts.

There are many situations in which the framework will help users. A regional transportation planning project can use base data supplied by the localities it spans. Government agencies can respond quickly to a natural disaster by combining data. A jurisdiction can use watershed data from beyond its boundaries to plan its water resources. Organisations can better track the ownership of publicly held lands by working with parcel data.

Geographic data users from many disciplines have a recurring need for a few themes of basic data. While these layers may vary from place to place, some common themes include: geodetic control, orthoimagery, elevation, transportation, official geographic names (gazetteer), hydrography, governmental units, and cadastral information. Many organisations produce and use such data every day. The framework provides basic content for these data themes, and by defining a common schema, it can also provide a common means of information exchange and value-adding.

By attaching their own geographic data — which can cover innumerable subjects and themes — to the common data in the framework, users can build their applications more easily and at less cost. The common data themes provide basic data that can be used in applications, a base to which users can add or attach geographic details and attributes, reference source for accurately registering and compiling participants' own data sets, and a reference map for displaying the locations and the results of an analysis of other data.

National and global frameworks are a growing data resource to which geographic data producers can contribute. It will continually evolve and improve. In practice, the content model of many framework layers may be simple enough that, as a collection target, at certain scales, it could be made available at virtually no cost. Content providers exist already in the United States to take and extend free government data with valuable additional attributes of value, e.g. marketing and demographic information. The core information itself may be given away for free, but extended information that are anchored to the geometry may have high current value that declines over time, and may re-enter the public domain after its proprietary nature expire. Thus commercial providers of information benefit through anchoring to a common framework system and cross-referencing with other attributes held by other organisations; consumers benefit in acquiring the framework core geometry, feature definitions, and base attributes as a by-product of the more advanced data set.

Who are the actors in framework data development?

  • users and producers of detailed data, such as utilities
  • users of small-scale, limited geographic data, such as street networks, statistical areas, and administrative units;
  • data producers who create detailed data as a product or a service;
  • data producers who create low-resolution, small-scale, limited themes for large areas;
  • product providers who offer software, hardware, and related systems; and
  • service providers who offer system development, database development, operations support, and consulting services.

Non-profit and educational institutions also create and use a variety of geographic data and provide GIS-related services. They cover the full spectrum of data content, resolution, and geographic coverage. Depending on the organisation’s activities, data use may range from highresolution data over small areas, as in facility management, to low-resolution data over wide areas, as in regional or national environmental studies.

Organisations build national and regional framework efforts by coordinating their data collection and development activities based on intersecting interests within a community. The bounds of this community, however, given the diversity of types of organisations and individuals involved, needs to be non-exclusive and open to innovative contributions, exchanges, and partnerships. The framework should be developed by the entire community, with organisations from all areas playing roles. For some, the framework will supply the data they need to build applications. Others will contribute data, and some may provide services to maintain and distribute data.

Some organisations will play several roles in framework development, operation, and use. The framework will take many years to develop fully, but useful components are being developed continuously.

Implementation Approach

The ISO TC 211 Geomatics standardisation activity is working on two related areas of endeavour that will greatly assist in the global specification of content models and feature models for framework and non-framework data. These include ISO 19109 - Rules for application schema, and 19110 - Feature cataloguing methodology. In the networked world, the ability for software to interact with geographic information outside an organisation is virtually non-existent except where public agreements exist for data structures (also known as a content model or schema) and the features being mapped. The ISO standards mentioned above provide a basis for the description of these packages of information that would enable access to a distributed network of framework data services. Implemented through specific encoding methods such as Geography Markup Language (GML), ISO 19136 Coupled with catalogue for discovery (See Chapter 4) populated with metadata (See Chapter 3), the ingredients are coming together for a configurable deployed architecture.

The scope of ISO 19109 is defined as "… the rules for defining an application schema, including the principles for classification of geographic objects and their relationships to an application schema." In principle, using the Unified Modeling Language (UML), software applications that provide access to geospatial data, such as framework, would be defined in a consistent way so as to improve sharing of data between applications and even allow for real-time interaction between applications. Expressing the encoding of an application schema using GML is a new technique to formalise the packages of information being exchanged between providers and users of spatial data.

Before one can allow software to reliably access mapped features stored in remote data systems, there must first be a common understanding about the nature and composition of the objects being managed. ISO 19109 includes guidance principles for classifying geographic objects. The usefulness of any information is reduced when the meaning is unclear, especially and commonly across different application domains. If different classifications are defined using a consistent set of rules, that ability to map one classification to another and retain the meaning will be greatly increased. This is also known as the semantic translation of one representation of an object in one system, for example a road or river segment, to that in another.

These rules will be used by geographic information users when classifying geographic objects within their applications and when interpreting geographic data from other applications. The rules and principles could also be used by geographic information system and software developers to design tools for the creation and maintenance of classification schemes.

Very closely related to the schema definition of ISO 19109 is the standard proposing a feature cataloguing methodology, ISO 19110. It is intended to define the approach and structures used for an information provider to store the identity, meaning, representation, and relationships of concepts or things in the real world as they are managed in online systems. A feature catalogue, then, acts as a dictionary for feature types or classes that can be used in software. The definition of a single international, multilingual catalogue would have tremendous value.

Whether this catalogue was used in all applications or only used as a neutral form when moving data from one application to another, it could simplify the problem of mapping the catalogue of one application to the catalogue of another. However, the feasibility of such a task is in question and will be investigated as a part of this work item in the TC 211 work group. The cataloguing task will use the input from the Rules for Application Schema work item and cannot be completed before that item is completed.

Publishing an application schema with a feature catalogue for a given data set of common interest can provide the basis for framework data definitions of use to global, regional, national, and local data. Done carefully, schemas and feature catalogues could be similarly constructed for existing framework-like data in order to enable discussion among participants, and transformation of content into conforming framework data sets.

Several national projects have been undertaken to build standardised framework data content and/or encoding. A project to develop framework specifications in Switzerland, known as InterLIS, has had marked success with this approach. Common definitions of data layers exist as target specifications that are matched to various degrees by participant organisations. As a result, software that is designed to interact with the InterLIS application model will work against data sets from different sources and organisations. The application framework is designed to be a scalable one to allow the participation of minimal data sets with lesser application functionality and more complex data sets with maximal application functionality. The Master Map of the Ordnance Survey in the United Kingdom and the Framework Data Content Standards under development in the United States are also documented as abstract application schemas and include GML encoding guidance to facilitate the exchange of data and development of applications that support the published models.

Common Identities of Real World Objects

In many framework implementations, there will not be necessarily one authoritative geometric representation of a feature in the real world. Several national systems have proposed the use of a common or permanent feature identifier to be associated with the object in the real world so that different representations and attributes of that object on maps can be cross-referenced. Having well-known identities of features established with a coding system within a community greatly assists in the association of attribute information to real-world objects where such attributes may not reside in a GIS or spatially-enabled data base. Also, multiple representations of real world objects may be linked to the identity code, to provide views of an object that is changed over time or that has different degrees of spatial resolution at different scales of data collection or representation. This becomes a logical model for organizing related geospatial information.

The management of a common or "permanent" feature identity needs to be undertaken within the community with permission granted to certain participant organisations to create or adjudicate these identities. In Canada, there is an effort to create a data alignment layer of wellknown features or intersections of features to help vertically integrate spatial data from different sources. These features and intersections will have published identifiers, some sense of positional accuracy, and source information. In the United States, the National Hydrography Data set includes a permanent feature identifier for segments of river and water bodies between points of confluence. In other national, regional, and global settings, agreement on management and assignment of feature identifiers -- building upon a sound feature cataloguing approach -- will be essential in building up compatible framework data across political boundaries.

Candidate National Framework Categories

A variable number of data layers may be considered to be common-use and of national or transnational importance as "framework" data. Framework layers commonly nominated in national context include:

  • cadastral information
  • geodetic control
  • geographic feature names
  • orthoimagery
  • elevation
  • transportation
  • hydrography (surface water networks)
  • governmental units

It is likely for this list to grow as custodians of data identify and promote their data as necessary to increasingly advanced applications and user environments.

Candidate Global Data Categories

The Global Mapping concept was articulated by the Ministry of Construction of Japan as a response to the United Nations Conference on Environment and Development held in Brazil in 1992. Agenda 21 is an action program drawn up by the conference, and it clearly makes the case that global baseline spatial data is important to society's interaction with the environment. The Global Mapping Project, also known as Global Map, is addressing the compilation of suitable spatial data products from existing international and national sources. This provide a public set of reference data at trans-national to global scales to assist decision-makers and society in depicting global environmental concerns.

Progress is being made in selecting and enhancing these general purpose spatial data layers originally based on VMAP Level 0 (also known as Digital Chart of the World) for vector themes, Global Land Cover Characteristics Database from the U.S. Geological Survey (USGS) for land cover, land use and vegetation, and the 30-second GTOPO30 product also hosted by the USGS. Global Map Version 1.0 specifications for data organisation were adopted at the International Steering Committee for Global Mapping (ISCGM) meeting held in conjunction with the Third GSDI Conference in Canberra, Australia in November 1998. As of February 2000, 74 countries are participating in the collection or aggregation of large-scale map products to update and package the above data sources.

Recommendations

The development of common data specifications is an arduous task to undertake by oneself or by a single organisation. For the development of the GSDI the following recommendations are made:

  • The Cookbook authors recommend that interested parties participate in or be aware of existing framework initiatives at the sub-national, national, and international scale.

Data appropriate to a given type of geospatial analysis will require information at a range of resolutions and degrees of detail.

The Cookbook authors recommend that Global Map specification be adopted for trans-national applications requiring land cover/use, vegetation, transportation, hydrography, administrative boundaries, populated places, and elevation data.

The global map content specification defines a simple content model with a small number of feature types and attributes suitable for the construction of base cartography at regional scales. Evaluate the level of detail with respect to a given GIS or mapping application. It may require extension to suit your base requirements.

  • The Cookbook authors recommend that Core and non-Core data be modeled and shared in the designs of national SDIs using emerging ISO standards by following the rules for application schema, publishing a feature catalogue, and standardising the encoding of the data.

The ISO 19109 and 19110 draft standards and the use of GML per ISO 19136 formalise the description and encoding of features and feature collections for individual applications that can facilitate the proper access and transformation of geospatial data held in online systems in near real time. This extends the capabilities of the individual in working with dynamic information held in distributed locations, as will be discussed in Chapter 6 in greater detail. National and global framework data, as well as non-framework data will be made more accessible and semantically correct through such technologies.

References and Linkages

Harmonised Data Manual - The Harmonised Data Model (Australia) http://www.icsm.gov.au/icsm/harmonised_data_manual/harmonised_data_model.htm

Framework Home Page, U.S. Federal Geographic Data Committee http://www.fgdc.gov/framework/framework.html

Geospatial One-Stop Framework Standards Development (U.S.) http://www.geo-one-stop.gov/Standards/index.html

Global Map Specifications - Version 1.1 http://www.iscgm.org/html4/index_c5_s1.html#doc13_3741

Interlis Project Home Page (Switzerland) http://www.interlis.ch/content/index.php

GSDI Cookbook, Version 2.0 25 January 2004 Page 23

Ordnance Survey Master Map in GML http://www.g-intelligence.co.uk/webadmin/data/files/36.pdf

Personal tools