Chapter 3
From The SDI Cookbook
Chapter Three: Metadata -- Describing geospatial data
Editor: Mick Wilson, UNEP Mick.Wilson {[at]} unep.org
This document has been developed from input by FGDC, EUROGI, ANZLIC and NGDF and is predominantly based on the various sources cited at the end of the chapter.
Introduction
We often hear the phrase "information is power," but with increasing amounts of data being created and stored (but often not well organised) there is a real need to document the data for future use - to be as accessible as possible to as wide a "public" as possible. Data, plus the context for its use (documentation, metadata) become information. Data without context are not as valuable as documented data. There are significant benefits to such asset management:
- Metadata helps organise and maintain an organisation's investment in data and provides information about an organisation's data holdings in catalogue form
- Coordinated metadata development avoids duplication of effort by ensuring the organisation is aware of the existence of data sets
- Users can locate all available geospatial and associated data relevant to an area of interest
- Collection of metadata builds upon and enhances the data management procedures of the geospatial community
- Reporting of descriptive metadata promotes the availability of geospatial data beyond the traditional geospatial community
- Data providers are able to advertise and promote the availability of their data and potentially link to on line services (e.g. text reports, images, web mapping and ecommerce) that relate to their specific data sets
A number of studies have established that although the value of geospatial data is recognised by both government and society, the effective use of geospatial data is inhibited by poor knowledge of the existence of data, poorly documented information about the data sets, and data inconsistencies. Once created, geospatial data can be used by multiple software systems for different purposes. Given the dynamic nature of geospatial data in a networked environment, metadata is therefore an essential requirement for locating and evaluating available data. Metadata can help the concerned citizen, the city planner, the graduate student in geography, or the forest manager find and use geospatial data, but they also benefit the primary creator of the data by maintaining the value of the data and assuring their continued use over a span of years. Over thirty years ago, humans landed on the Moon. Data from that era are still being used today, and it is reasonable to assume that today's geospatial data could still be used in the year 2020 and beyond to study climate change, ecosystems, and other natural processes. Metadata standards will increase the value of such data by facilitating data sharing through time and space. So when a manager launches a new project, investing a small amount of time and resources at the beginning may pay dividends in the future.
Context and Rationale
The word metadata shares the same Greek root as the word metamorphosis. "Meta-" means change and metadata, or "data about data" describe the origins of and track the changes to data. Metadata is the term used to describe the summary information or characteristics of a set of data. This very general definition includes an almost limitless spectrum of possibilities ranging from human-generated textual description of a resource to machine-generated data that may be useful to software applications. More recently, the term metadata has even been applied to services as a description of published service characteristics.
The term metadata has become widely used over the past 15 years, and has become particularly common with the popularity of the World Wide Web. But the underlying concepts have been in use for as long as collections of information have been organised. Library catalogues represent an established variety of metadata that has served for decades as collection management and resource discovery tools. The concept of metadata is also familiar to most people who deal with spatial issues. A map legend is one representation of metadata, containing information about the publisher of the map, the publication date, the type of map, a description of the map, spatial references, the map's scale and its accuracy, among other things. Metadata are also these types of descriptive information applied to a digital geospatial file. They're a common set of terms and definitions to use when documenting and using geospatial data. Most digital geospatial files now have some associated metadata. In the area of geospatial information or information with a geographic component this normally means the What, Who, Where, Why, When and How of the data. The only major difference that therefore exists from the many other metadata sets being collected for libraries, academia, professions and elsewhere is the emphasis on the spatial component - or the where element.
The Benefits of Metadata
Metadata helps people who use geospatial data find the data they need and determine how best to use it. Metadata benefit the data-producing organisation as well. As personnel change in an organisation, undocumented data may lose their value. Later workers may have little understanding of the contents and uses for a digital database and may find they can't trust results generated from these data. Lack of knowledge about other organisations' data can lead to duplication of effort. It may seem burdensome to add the cost of generating metadata to the cost of data collection, but in the long run the value of the data is dependent on its documentation.
Metadata is one of those terms that is conveniently ignored or avoided. However there is an increasing recognition of the benefits and requirement for metadata for our data as we continue to increase the use of digital data. Whereas cartographers rigidly provided metadata within a paper map’s legend, the evolution of computers and GIS has seen a decline in this practice. As organisations start to recognize the value of this ancillary information, they often begin to look at incorporating metadata collection within the data management process.
Organisational Approach
Levels of Metadata
There are different levels that metadata may be used for:
- Discovery metadata - What data sets hold the sort of data I am interested in? This enable organisations to know and publicise what data holdings they have.
- Exploration metadata - Do the identified data sets contain sufficient information to enable a sensible analysis to be made for my purposes? This is documentation to be provided with the data to ensure that others use the data correctly and wisely.
- Exploitation metadata – What is the process of obtaining and using the data that are required? This helps end users and provider organisations to effectively store, reuse, maintain and archive their data holdings.
Each of these purposes, while complementary, requires different levels of information. As such organisations should look at their overall needs and requirements before developing their metadata systems. The important aspect is for agencies to establish their business requirements first, the content specifications second and the technology and implementation methods third.
This is not to say that these levels of metadata are unique. There is a high degree of reuse of the metadata for each level and an organisation will design its metadata schema and implementation based on its business needs to accommodate these three requirements.
Discovery Metadata is the minimum amount of information that needs to be provided to convey to the inquirer the nature and content of the data resource. This falls into broad categories to answer the ”what, why when who, where and how” questions about geospatial data.
What - title and description of the data set. Why - abstract detailing reasons for the data collection and its uses. When - when the data set was created and the update cycles if any. Who – originator, data supplier, and possibly intended audience. Where - the geographical extent based on latitude / longitude, co-ordinates, geographical names or administrative areas. How – how it was built and how to access the data.
The broad categories are only few in number to reduce the effort required to collect the information whilst still conforming to the requirement to convey to the inquirer the nature and content of the data resource.
Online systems for handling metadata need to rely on their (metadata is plural, like data) being predictable in both form and content. The level of metadata detail that will be documented is dependent on the type of data held and the methods that it is being accessed and used. Different types of data (e.g. vector, raster, textual, imagery, thematic, boundary, polygon, attribute, point, etc.) will require different levels and forms of metadata to be collected. However there is still a high degree of compatibility between most of the metadata elements required.
Similarly, organisations will manage their data in mission-defined ways. Some organisations manage information as a data set, tiles of data sets, series of data sets, or manage the information down to the feature level. Again there is still a high level of compatibility between the levels of metadata required, particularly as the data is cascaded from the feature level to the data set or data series level.
Thus, not only can metadata content vary according to purpose; it can also vary according to scope of the data being defined. Discovery metadata usually, but not exclusively, relates to collections of data resources or data set series that have similar characteristics but relate to different geographic extents or times. A map series is the commonest example but it can equally be applied to statistical surveys. More detailed metadata may be applied to a collection or series but may apply to an individual data set (e.g. one map tile). Transfer metadata applies exclusively to that transfer.
Exploration metadata provides sufficient information enable an inquirer to ascertain that data fit for a given purpose exists, to evaluate its properties, and to reference some point of contact for more information. Thus, after discovery, more detail is needed about individual data sets, and more comprehensive and more specific metadata is required. If the data are transferred as a single data set then quite specific and detailed metadata is needed possibly down to the feature, object or record level. Exploration metadata include those properties required to allow the prospective end user know whether the data will meet general requirements of a given problem.
Exploitation metadata include those properties required to access, transfer, load, interpret, and apply the data in the end application where it is exploited. This class of metadata often includes the details of a data dictionary, the data organisation or schema, projection and geometric characteristics, and other parameters that are useful to human and machine in the proper use of the geospatial data.
These roles form a continuum in which a user cascades through a pyramid of choices to determine what data are available, to evaluate the fitness of the data for use, to access the data, and to transfer and process the data. The exact order in which data elements are evaluated, and the relative importance of data elements, will not be the same for all users.
Linkages between geospatial data and metadata
Until recently, metadata have been created or derived with little or no automation. In fact, it is only with the recent development of metadata standards, and the development of metadata software based on these standards, has the consistent management of metadata been given any consideration by those collecting geospatial data. With an increased focus of incorporating geospatial data into corporate information systems, the development of an international standard for metadata, and the OpenGIS catalogue service specifications, new versions of commercial GIS software are now facilitating a close linkage between geospatial data and metadata.
Regardless of style of metadata, there is nominally one collection of properties or metadata associated with a given data set or feature collection. The 1:1 rule expresses the notion that a discrete resource should have a discrete metadata record. Although it seems simple enough, it isn't always so neat because resources are often not so discrete. For example, should each photograph in an article have its own record? How do you manage collections of articles? Can the collection be thought of as a resource? What about multi-media objects? Thus, one of the first tasks in metadata management is the identification of the data product or entity to be documented.
Metadata may exist at the collection level (e.g. satellite series), at a data product level (an image mosaic), at a data unit level (a vector data set), a group of features of a given type (certain roads), or even at a specific feature instance (a single road). Regardless of the level of abstraction, these associations of metadata to data objects should be maintained.
In practice, most metadata are currently collected at the data set level, and a metadata entry in a catalogue refers the user to its location for access. Increasingly sophisticated providers of geospatial data are including metadata at other levels of detail so as to preserve information richness. Metadata standards such as ISO 19115 allow different levels of metadata abstraction, and catalogue services will also need to accommodate this richness without confusing the user in its complexity.
Metadata Standards
Why use Standards?
Ideally, metadata structures and definitions should be referenced to a standard. One benefit of standards is that they have been developed through a consultative process (with other "experts") and provide a basis from which to develop national or discipline-oriented profiles. As standards become adopted within the wider community, software programs will be developed to assist the industry in implementing the standard. The consistency in metadata content and style is recommended to ensure that comparisons can be made quickly by data users as to the suitability of data from different sources. This means for example when comparing metadata about property or hazardous waste there is an indication of the dates to which the information refers or if comparing metadata about different map sources the relevant scales are shown. Without standardization, meaningful comparisons are more difficult to derive without reading and learning many metadata management styles.
Predictability is also encouraged through conformance to standards. However the problem has been that there are a number of “standards” in use or development. Detailed metadata standards that provide for an exhaustive definition of all aspects of various types of geospatial data are currently under preparation by a number of bodies, as are profiles of these standards as reference models to be adopted for international use.
Geospatial Metadata Standards
Considerable debate across the world centres on metadata and those characteristics that should be chosen to best describe the data set. There are discussion groups, seminars and conferences and quantities of paper generated in the debate about the subject. Standards have been generated by a number of organisations all designed to ensure that a degree of consistency exists within a given application community.
Three main metadata standards exist or are in development that are of broad international scope and usage and provide detail for all levels of metadata mentioned earlier:
The Content Standard for Digital Geospatial Metadata, U.S. 1994, revised 1998 http://www.fgdc.gov/
In the USA the Federal Geographic Data Committee (FGDC) approved their Content Standard for Digital Geospatial Metadata in 1994. This is a national spatial metadata standard developed to support the development of the National Spatial Data Infrastructure. The standard has also been adopted and implemented in the United States, Canada, and the United Kingdom through the National Geographic Data Framework (NGDF) and its successor the AGI. It is also in use by the South African Spatial Data Discovery Facility, the Inter-American Geospatial Data Network in Latin America, and elsewhere in Asia.
A CEN Pre-standard adopted in 1998 http://forum.afnor.fr/afnor/WORK/AFNOR/GPN2/Z13C/indexen.htm
In 1992 the Comité Européen de Normalisation (CEN) created technical committee 287 with responsibility for geographic information standards. A family of European Pre-standards have now been adopted including 'ENV (Euro-Norme Voluntaire) 12657 Geographic information - Data description - Metadata'. CEN TC 287 was reconvened in 2003 to address the development of European profiles of ISO TC 211 standards.
A number of national and regional initiatives have also developed metadata standards. These include initiatives managed by The Australian and New Zealand Land Information Council (ANZLIC) and two completed European Commission financed projects (LaClef and ESMI) now being assimilated by the INSPIRE project. These initiatives have taken similar approaches in promoting a limited set of metadata (described as "Core Metadata" or "Discovery Metadata" that organisations should use, as a minimum, to improve the knowledge, awareness and accessibility of the available geospatial data resources.
ISO 19115 (International Standard) and ISO 19139 (Draft Technical Specification)
An ISO standard for standard metadata was published and approved in 2003 (http://www.isotc211.org)1. The ISO standard was derived from inputs from the the various national bodies and their implementations of the respective metadata standards assisted by metadata software. Indeed, most of the existing standards already have a great deal in common with each other, and a robust international discussion has ensured that the ISO standard has accommodated most of the various international requirements. ISO 19115 provides an abstract or logical model for the organization of geospatial metadata. It does not provide for rigorous compliance testing as there is no normative guidance on formatting the metadata included in the standard. A companion specification, ISO 19139, standardises the expression of 19115 metadata using the Extensible Markup Language (XML) and includes the logical model (UML) derived from ISO 19115. In North America, work is beginning to create a North American Profile of Metadata based on ISO 19139 for Canada, the United States, and Mexico. This will allow for the compliance testing of metadata files using XML.
Metadata also forms an important part of the OpenGIS Abstract Specification. The OpenGIS Consortium (OGC) http://www.opengis.org is an international membership organisation engaged in a co-operative effort to create open computing specifications in the area of geoprocessing. As part of its draft 'OpenGIS Abstract Specification' OGC has adopted ISO 19115 as the abstract model for metadata management within the consortium. OGC is working closely with FGDC and ISO/TC 211 to develop formal, global spatial metadata standards. At their plenary meeting in Vienna, Austria in March 1999, ISO/TC 211 welcomed the satisfactory completion of the co-operative agreement between the OpenGIS Consortium and ISO/TC 211 and endorsed the terms of reference for an ISO/TC 211 / OGC co-ordination group.
Each of the initiatives is promoting the standards and use of discovery metadata as a foundation of their respective metadata directory initiatives. This discovery metadata provides sufficient information to enable an inquirer to ascertain that existence of data fit for purpose exists and to reference some point of contact for more information. If, after discovery, more detail is needed about individual data sets then more comprehensive and more specific metadata is required. It is possible that organisations may wish to develop metadata at different but complementary levels - at one level discovery metadata for external use and for in-house / internal use more detailed metadata. And to avoid duplication of effort those elements common to both are flagged. These guidelines have been developed with recognition of the importance of more extensive metadata required for data management and each of the organisations is promoting the adoption of ISO Metadata Standard.
General Metadata Standards
Other standards exist in the broader topic of metadata that do not specifically apply to geospatial information. These conventions are listed here for informational purposes. They may be useful references for linking or integrating non-geospatial resources into a geospatial framework.
The Dublin Core is a metadata element set intended to facilitate discovery of electronic resources. Originally conceived for author-generated description of Web resources, it has attracted the attention of formal resource description communities such as museums, libraries, government agencies, and commercial organisations.
The Dublin Core Workshop Series has gathered participants from the library world, the networking and digital library research communities, and a variety of content specialists in a series of invitational workshops. The building of an interdisciplinary, international consensus around a core element set is the central feature of the Dublin Core. The progress represents the emergent wisdom and collective experience of many stakeholders in the resource description arena. Dublin Core metadata is specifically intended to support general-purpose resource discovery. The elements represent one community's concepts of core elements that are likely to be useful to support resource discovery. Unfortunately, the formal use of the Dublin Core metadata model does has not always recognized the inclusion of qualified elements such as “Coverage.” This metadata element thus may contain text that represents a date or time, a description of a place name or time period, or coordinates, without a means to declare what type of content is present in the text element. As such, the Dublin Core unqualified elements are inadequate for even basic geospatial resource description and discovery, though they may be applied to web and library resources with a loose geospatial definition. Qualified Dublin Core elements can be derived from more detailed metadata models (such as ISO 19115) and can support discovery of lightly documented ancillary information such as books, reports, and other Web objects of potential interest to geospatial investigations.
The Spatial Data Transfer Standard (SDTS) and the Vector Product Format (VPF) Digital Exchange Standards (DIGEST) were developed to allow the encoding of digital spatial data sets for transfer between spatial data software. Both of these standards support the inclusion of metadata elements in an exchange, but remarkably have not until recently considered support for standardised the encoding of relevant geospatial metadata standards in their export or archival formats.
While other general-purpose metadata standards exist, it is recommended that a comprehensive geospatial metadata standard should be used to document geospatial data. It is easier to produce simplified metadata from a more robust collection of metadata, but it is impossible to do the opposite. Eventually, the integration of data content and exchange standards will converge with those in metadata content and exchange so that spatial data encoding efforts will provide a comprehensive solution for archive and documentation.
Implementation Approach
Who should create metadata?
Data managers tend to be either technically literate scientists or scientifically literate computer specialists. Creating correct metadata is like library cataloguing, except the creator needs to know more of the scientific information behind the data in order to properly document them. Don't assume that every professional needs to be able to create proper metadata. They may complain that it is too hard and they may not recognise the benefits. In this case, ensure that there is good communication between the metadata producer and the data producer; the former may have to ask questions of the latter to collaboratively develop adequate documentation.
The form for maintaining metadata will depend on a number of factors:
- the size of the data holdings,
- the size of an organisation and
- the patterns of data management within an organisation
If the metadata holdings are fairly modest, then it has been the convention to store the metadata in discrete documents by using any available software (e.g. word-processor, spreadsheet, and simple database). Historically, organisations have built up folders of single documents that may be in either paper or digital formats. Many organisations will start to investigate the use of more complex systems as they realise the benefit of the metadata, and as they gain greater data holdings and start to provide broader access to the data.
Indeed many organisations will start with a basic audit of their data holdings that will alert them of the vast wealth of data that they possess and where it is being used, replicated or improved across the organisation. As the data holdings become larger and the access to the data becomes distributed, then organisations would look at more advanced methods for maintaining metadata of their data holdings. These advanced tools may consist of commercial or selfdeveloped forms based systems that may also form part of the operational GI systems to extract aspects of the metadata automatically from the data itself.
How does one deal with people who complain that it's too hard? The solution in most cases is to redesign the workflow rather than to develop new tools or training. People often assume that data producers must generate their own metadata. Certainly they should provide informal, unstructured documentation, but they may not need to go through the rigors of fully structured formal metadata. For scientists or GIS specialists who produce one or two data sets per year it may not be worth their time to fully learn a complex metadata standard. Instead, they might be asked to fill out a less- complicated form or template that will be rendered in the proper format by a data manager or cataloguer who is familiar (not necessarily expert) with the subject and well-versed in the metadata standard. If twenty or thirty scientists are passing data to the data manager in a year, it is worth the data manager's time to learn the complex metadata standard. With good communication, this strategy complements the existing combination of software tools and training.
The first data set documented is always the worst. The other aspect to "It's too hard" is that documenting a data set fully requires a (sometimes) uncomfortably close look at the data and brings home the realisation of how little is really known about its processing history.
"Insufficient time" to document data sets is also a common complaint. This is a situation in which managers who appreciate the value of GIS data sets can set priorities to protect their data investment by allocating time to document it. Spending one or two days documenting a data set that may have taken months or years to develop at thousands of dollars in cost hardly seems like an excessive amount of time.
These 'pain' and 'time' concerns have some legitimacy, especially for agencies that may have hundreds of legacy data sets which could be documented, but for which the time spent documenting them takes away from current projects. At this point in time, it seems much more useful to have a lot of 'shortcut' metadata rather a small amount of full-blown metadata. So what recommendations can be made to these agencies with regard to a sort of 'minimum metadata' or means to reduce the documentation load?
In some operations, small amounts of metadata, or “notes” are collected sporadically during the data processing flow. These hints can then be assembled more readily later into a clear statement of the history and processing of the dataset. This can present a less daunting task at the end of a project as most of the details are already documented, a little at a time. Increasingly, GIS and image processing software are capable of collecting and reporting quantitative metadata that can be filled-in for the user rather than expecting human input. These procedures can amount to significant savings in overall time and effort over a single manual metadata preparation process at the conclusion of a project.
Don't invent your own standard. Select a supported international standard wherever possible. Try to stay within its constructs. Subtle changes from an international standard such as collapse of compound elements may be costly in the long run - you won't be able to use standard metadata tools and your metadata may not be directly exchangeable or paresable by software.
Don't confuse the metadata presentation (view) with the metadata itself. There is a temptation to lump form and content into the same bin (e.g. "What I see in my database is what I print"). However, the ability to differentiate the contents of the metadatabase (the columns or fields) from its presentation (writing formatted reports) is now commonplace in desktop database software packages. This allows users to consider more flexibly how to present what information.
There are typically three forms of metadata that should be recognized and supported in systems: the implementation form (within a database or software system), the export or encoding format (a machine-readable form designed for transfer of metadata between computers), and the presentation form (a format suitable to viewing by humans). By recognizing the connections between these dispositions of metadata, one can build systems that support mission requirements, standard encoding for exchange, and permit many “report” views of the metadata to satisfy the needs and experience of different user constituencies.
The Extensible Markup Language (XML) provides two solutions to this metadata problem. First, it includes a capable markup language with structural rules enforced through a control file to validate document structure. Second, through a companion standard (XML Style Language, or XSL), an XML document may be used along with a style sheet to produce standardised presentations of content, allowing the user to shuffle field order, change tag names, or show only certain fields of information. Used together XML and style sheets allow for a structured exchange format and for flexible presentation. Thus, a metadata entry can be rendered in many ways from the same, single structured encoding.
XML is a widely accepted encoding methodology with international software support. It is supported by a lot of software, both free and commercial. However, the metadata-producing community doesn't have much experience using it to solve problems yet. Through reference implementations of software and experimentation, local Spatial Data Infrastructures can share their successes and failures in applying this new technology to fullest community benefit.
Consider data granularity. Can you document many of your data sets (or tiles) under an umbrella parent? Prioritise your data. Begin by documenting those data sets that have current or anticipated future use, data sets that form the framework upon which others are based, and data sets that represent your organisation's largest commitment in terms of effort or cost.
Document at a level that preserves the value of the data within your organisation. Consider how much you would like to know about your data sets if one of your senior GIS operators left suddenly in favour of a primitive lifestyle on a tropical island.
How do I create metadata?
First, one should understand both the data you are trying to describe and the standard itself. Then one must decide how you to encode the information. Historically, one creates a single text file for each metadata record; that is, one disk file per data set. Typically a software program is used to assist the entry of information so that the metadata conform to the standard.
Specifically:
- Define exactly what data packaging is to be documented.
- Assemble information about the data set.
- Create a digital file containing the metadata, using a standard format whenever possible
- Check the syntactical structure of the file. Modify the arrangement of information and repeat until the syntactical structure is correct.
- Review the content of the metadata, verifying that the information describes the subject data completely and correctly.
A digression on conformance and interoperability
The various metadata standards are truly content standards. They may not dictate the layout of metadata in computer files. Since the standard is so complex, this has the practical effect that almost any metadata can be said to conceptually conform to the standard; the file containing metadata need only contain the appropriate information, and that information need not be easily interpretable or accessible by a person or even a computer. This is the case even with the ISO 19115 International Standard.
This rather broad notion of conformance is not very useful. Unfortunately it is rather common. To be truly useful, the metadata must be clearly comparable with other metadata, not only in a visual sense, but also to software that indexes, searches, and retrieves the documents over the Internet. To accomplish this, there are several encoding standards that specify the content of a metadata entry for exchange between computers, For real value, metadata must be both parseable, meaning machine-readable, and interoperable, meaning they work with software used in services such as the FGDC Clearinghouse through OpenGIS Catalogue Services. Fortunately, the companion ISO 19139 Technical Specification provides normative guidance in the form of an annotated XML Schema Document (XSD), and by example, as to how the metadata must be structured as XML for validation and exchange.
Parseable
To parse information is to analyse it by disassembling it and recognising its components. Metadata that are parseable clearly separate the information associated with each element from that of other elements. Moreover, the element values are not only separated from one another but are clearly related to the corresponding element names, and the element names are clearly related to each other as they are in the standard.
In practice this means that your metadata are usually arranged in a hierarchy, just as the elements are in the standard, and they must use standard names for the elements as a way to identify the information contained in the element values.
Interoperable
To operate with metadata service software, your metadata must be readable by that software. Generally this means that they must be parseable and must identify the elements in the manner expected by the software.
There is a general consensus that metadata should be exchanged in Extensible Markup Language (XML) conforming to a Document Type Declaration (DTD) or, even more rigorous, its more modern successor, the XML Schema Document. Support for XML in parsing and presentation solutions is widespread on the Web and is presumed in current draft standards of the ISO TC 211 and OpenGIS specifications.
What software is available to create and validate metadata?
No tool can check the accuracy of metadata. Moreover, no tool can determine whether the metadata properly include elements designated by the Standard to be conditional, or 'mandatory if applicable.' Consequently, some level of human review is required. But human review should be simpler in those cases where the metadata is known to have the correct syntactical structure.
Software cannot be said to conform to the Standard. Only metadata records in a given encoding form can be said to conform or not. A program that claimed to conform to the Standard would have to be incapable of producing output that did not conform. Such a tool would have to anticipate all possible data sets. Instead, tools should assist you in entering your metadata, and the output records must be checked for both conformance and accuracy in separate steps. At best one can describe or anticipate compatibility testing among software components.
Issues in Implementation
Vocabularies, Gazetteers and Thesauri
When searching for information, the inquirer may not find any references based on the words used to describe the information sought. This problem can be overcome by use of a thesaurus. In the context of metadata and other electronic documents, a thesaurus is a tool for the organisation and retrieval of information in electronic materials. It allows data to be indexed and retrieved in a consistent manner. It permits the display of hierarchies of concepts and ideas, leading the user, whether as indexer or information seeker, to define his or her search in terms that are most likely to lead to the retrieval of relevant information.
For example, it will allow improved information retrieval by providing successful searching on synonyms - if the user enters the term "farming" the thesaurus will find the term "agriculture". Hierarchies of meaning can be shown - the term "Great Britain" may retrieve data indexed with that term but could also expand the search to retrieve data on England, Wales and Scotland which have been indexed under those three terms. The term "meals on wheels", although in a hierarchy of terms related to food, could also be linked to concepts relating to personal social services and to the different categories of recipients and a user can elect to follow and retrieve these related terms. Consistent searching for metadata will be achieved if all those who prepare metadata use the same thesaurus.
Minimum collaboration with users during the definition and implementation phases: a user-friendly focus is needed
For a non-professional user, finding the information wanted is very difficult. Even if 'Help' or 'Tutorial' can be found in some metadata services, it is not very easy to understand what to do and where to type. Efforts must be made to explain what to ask for and to develop user-friendly and multi-lingual interfaces. If it takes too much time to understand how to react to metadata services, users will not stay long and will immediately complain! A dictionary, multilingual thesauri or catalogues with keywords, should be provided to users to ensure that the same vocabulary is used. One of the most important things is to develop services that are not technology dependent and technology driven. Projects must be done in collaboration with users (who must first be identified).
User-expected content
Given the complex metadata models deployed, we can be reasonably certain that the metadata that is now presented from catalogue services is almost always more than is expected by end users. It seems that the current tendency is to propose a complex database approach that seems to be very 'data producer oriented'. One can imagine that users are more interested in examples and benefits on how to use the proposed data sets than a detailed description of its structure and content. This can be accomplished through special presentations of metadata.
It is important to separate the content of spatial metadata with its means of presentation. Through applications such as the Extensible Markup Language (XML), documents with extensive detail can be rendered through different style sheets from one content source into many presentation forms suitable to different audiences. Further work on developing presentation methodologies is required to simplify the burden of understanding metadata by all.
Metadata for applications
There is a tendency to adapt the metadata structure and content to applications, for example, electronic commerce or data management within an organisation. Metadata that is created to satisfy a real need, rather than because it is seen as something that should be done in the general interest, is more likely to be well-written and maintained.
The OpenGIS Consortium and ISO TC 211 have developed metadata structures and fields to describe software interfaces, exposed as "services" for external use. ISO 19119 describes the structure of services metadata to help intelligent software, through brokers known as service catalogues, to discover available services that could ultimately be chained together to form new composite operations. The World Wide Web Consortium and Oasis XML groups have specified service and resource discovery mechanisms that exploit a published set of metadata fields. Two of these efforts are known as the ebXML with its Registry Information Model (ebRIM) and the Universal Description, Discovery, and Integration of Web Services (UDDI). The suggested interaction between ebXML, ISO metadata, and OGC catalogue service interfaces is being harmonized in OGC Catalog Services Version 2.0.
A geographic information product identification mechanism
There is no current mechanism to provide identification numbers (ID) to the different GI products produced and offered to users. This missing element is a very important issue for those who are implementing in parallel a metadata service and an e-commerce solution.
To make the e-commerce of GI a reality a study on how a GI numbering system could be organised and implemented and by whom should be made. This system could be similar to the ones used for other products, such as books. It would be extremely helpful if the Global Spatial Data Infrastructure activity could develop initial guidance on the technical and political issues involved in establishing a data product identifier system that will work globally on digital and non-digital geospatial information.
Incentives for metadata development
The impressive list of incentives which includes financial resources, knowledge and expertise, standard and tools provided by the FGDC (U.S. Federal Geographic Data Committee - http://www.fgdc.gov) to stimulate the creation and maintenance of metadata content and services within the concept of the Clearinghouse appeared to be a key success factor of the U.S. metadata initiative. It is important that national and regional governments evaluate, recognize, and provide such incentives to metadata builders and managers. Some have started – France, Canada, Australia, Spain, Ethiopia, the United States and other countries develop and provide free software and to metadata builders. It is anticipated that the widespread adoption of the ISO 19115/19139 metadata standards will further encourage the development of an international base of free and commercial tools around a common standard.
Envisage legislation for public sector metadata content
In countries where legislation is the main engine for creating new or adapting existing public sector activities, new laws may be needed to encourage or require the collection and distribution of standards-based metadata by the GI public sector and by commercial enterprises that collect geospatial data for the public sector.
Recommendations
- The Cookbook authors recommend that you don't invent your own standard. Adopt or build a national profile of the ISO 19139 Technical Specification based on the abstract ISO 19115 metadata standard.
Standards are very expensive to create and build implementations for. National standards should be adopted with the intention of supporting the ISO 19115 metadata content standard and its companion, Technical Specification ISO 19139, when it becomes available. This will provide the greatest interoperability rewards in a global environment.
- The Cookbook authors recommend that you prioritise your data.
Begin by documenting those data sets that have current or anticipated future use, data sets that form the framework upon which others are based, and data sets that represent your organisation's largest commitment in terms of effort or cost. Framework layers and special, unique layers of great interest should be adequately documented for use within your organisation and by those on the outside. Of course, all published data warrant documentation this way, but through setting priorities you will know what work you have ahead of you.
- The Cookbook authors suggest collecting metadata a little at a time.
For detailed metadata such as FGDC and ISO, an enormous amount of possible information can be collected. Although all fields are never filled in, it provides an opportunity to store specific properties in their correct location within the standard structure. This facilitates their storage and discovery in catalogues (See Chapter 4). If certain types of metadata are collected during the data collection process as part of the current workflow, then many 20-second notes can amount to a substantial story later on. This type of information cannot be easily collected after the fact.
- The Cookbook authors recommend the development of a coordinated spatial data product identifier system for use globally
The GSDI Technical Working Group with policy assistance from the Steering Committee should develop initial guidance on the technical and political issues involved in establishing a data product identifier system that will work globally on digital and non-digital geospatial information. Uniquely identifying metadata records themselves is a practice from the library community in which a single metadata record may be shared to reflect its availability in many locations.
- The Cookbook authors suggest that research into a common thematic classification system for geospatial data be conducted by the Technical Working Group of the GSDI.
Whereas ISO TC 211 is developing general specifications and methodologies, and the OpenGIS Consortium is building software interfaces, no convened global organisation is known to be co-ordinating a common classification system for geospatial data. As a result, the use of competing thematic thesauri make distributed search difficult.
References and Links
Chenez, Christian and Gaël Kermarrec, "On-going Metadata Initiatives in Europe", 1999, 5th EC-GIS Workshop, Stresa, ITALY http://www.eurogi.org/geoinfo/publications/5thgeo.html
Metadata Home Page, US Federal Geographic Data Committee http://www.fgdc.gov/metadata/metadata.html
Metadata Home Page Australia and New Zealand Land Information Council http://www.anzlic.org.au/infrastructure_metadata.html
Metadata (MetaGenie) Home Page UK Association for Geographic Information (AGI), http://www.askgiraffe.org.uk/datalocator/metadatatool.html
Reference Data and Metadata, INSPIRE Initiative, European Commission, http://inspire.jrc.it/about/reference.cfm
1 In 1994 the International Standards Organisation created technical committee 211 (ISO/TC 211) with responsibility for Geoinformation/Geomatics. They are finalizing a family of standards; this process involves a working group, the development of one or more committee drafts, a draft international standard, and finally the international standard. Many common work items now exist between the OpenGIS Consortium and ISO TC 211 that will lresult in OGC specifications being balloted as International Standards or Technical Specifications.
