Current location - Quotes Website - Signature design - Metadata standard
Metadata standard
1, Resource Organization Framework of Digital Library

2. Metadata development and application framework

The basic meaning of metadata is "data about data";

Metadata provides standardized and universal description methods and retrieval tools for various forms of digital information units and resource collections;

Metadata provides integrated tools and links for distributed information systems (such as digital libraries) composed of various digital resources.

Without metadata, the digital library will be a mess and cannot provide effective retrieval and processing.

3. Metadata application environment

3. 1 Application purpose of metadata

(1) Discovery and identification mainly focus on how to help people retrieve and confirm the resources they need. Data elements are often limited to simple information such as author, title, theme and location, and Dublin Core is its typical representative.

(2) Cataloging is used to describe data units in detail and comprehensively. Data elements include content, carrier, location and acquisition mode, production and utilization mode, and even related data units. The number of data elements is often large, and Mark, GILS and FGDC/CSDGM are typical examples of this kind of metadata.

(3) Resource management, which supports the storage and use management of resources. In addition to comprehensive descriptive information and data elements, it usually includes information on authority/privacy management, digital signature, approval/rating seal, access management, payment and accounting.

(4) The preservation and archiving of resources support the long-term preservation of resources. In addition to describing and confirming resources, data elements usually include detailed format information, production information, protection conditions, migration methods and preservation responsibilities.

3.2 Application of Metadata in Different Fields According to the data characteristics and application requirements of different fields, many metadata formats have appeared in different fields since the 1990s.

For example:

Network resources: Dublin core, IAFA template, CDF, network collection.

Literature: MARC (including 856 fields), Dublic core.

Humanities: TEI· Hyde

Social science data set: ICPSR SGML codebook

Museums and works of art: CIMI, CDWA, RLG reach the element set, VRA core.

Government information: GILS

Geospatial information: FGDC/CSDGM

Digital image: MOA2 yuan data, CDL metadata, open file format, VRA core, NISO/CLIR/RLG image technology metadata.

Archives and resource collection

Technical report: RFC 1807

Continuous image: MPEG-7

3.3 Application degree of metadata format

Metadata in different fields are in different stages of standardization:

In the description of network resources, after years of international efforts, Dublin Core Set has become a widely accepted and applied factual standard.

In terms of government information, because of the vigorous promotion of the American government and the implementation of relevant legal standards, GILS has become the standard of government information description, and has been applied to a considerable extent in several countries in the world, similar to FGDC/CSDGM;; Used for geospatial information processing;

However, in some fields, due to the rapid development and change of technology, there are still many schemes in competition, typically the metadata of digital images, and many standards put forward are in the stage of trial and improvement.

3.4 the degree of "standardization" of metadata format

The experience of metadata development and application shows that it is difficult to have a unified metadata format to meet the data description needs of all fields; Even in the same field, different purposes may require different but interchangeable metadata formats.

At the same time, the unified centralized metadata format standard is not suitable for the internet environment, which is not conducive to making full use of market mechanisms and various forces.

But in the same field, we should strive for "standardization", and in different fields, we should properly solve the interoperability problem of different formats.

4. Metadata structure

4. 1 Overall structure definition method Metadata format is defined by multilevel structure:

(1) content structure, which describes the constituent elements of metadata and its definition standards.

(2) Grammatical structure, defining metadata structure and how to describe it.

(3) Semantic structure, which defines the concrete description method of metadata elements.

4.2 Content structure

Content structure defines the constituent elements of metadata, which can include descriptive elements, technical elements, administrative elements and structural elements (such as links with coding languages, namespaces, data units, etc.). ).

These data elements are probably selected according to certain standards, so they need to be explained in the metadata content structure, such as ISBD based on MARC records, ISAD(G) referenced by EAD, and ICPSR data preparation manual based on ICP SR.

4.3 Syntactic structure

Grammatical structure defines the format structure and its description, such as the division and segmentation organization of elements, the selection and use rules of elements, the element description methods (for example, Dublin Core adopts ISO/IEC1179 standard), the element structure description methods (for example, MARC record structure, SGML structure, XML structure), and the structured statement description language (for example, EBNF)

Sometimes, the syntax structure needs to indicate whether the metadata is bound to the described data object, or exists as separate data but is linked to the data object in some form, and it is also possible to describe the linking mode with definition standards, DTD structures and namespaces.

4.4 Semantic structure Semantic structure defines the specific description methods of elements, such as standards, best practices or custom descriptions used when describing elements.

Some metadata formats define their own semantic structures, while others are defined by specific units of use. For example, Dublin Core suggests that the date element should be ISO 860 1, the resource type should be Dublin Core Type, the data format can be MIME, and the identification number should be URL or DOI or ISBN.

For example, OhioLink requires theme elements to use an&; AT, TGM and TGN, and the name element is Ulan.

5. Metadata coding language and production method

5. 1 Metadata Coding Language

Metadata encoding language refers to the specific syntax and semantic rules that define and describe metadata elements and structures, and is usually called Definition Description Language (DDL).

In the early stage of metadata development, people often use custom recording languages (such as MARC) or database recording structures (such as ROADS). However, with the increase of metadata format and the requirement of interoperability, people begin to use some standardized DDL to describe metadata, such as SGML and XML, among which XML has the most potential.

5.2 Metadata production method

(1) professional modules (for example, MARC, GILS, FGDC, etc. )

(2) Automatic compilation in data processing (such as Dublin core, etc.). )

(3) Automatic compilation when data is physically processed (for example, some metadata parameters during digital image scanning)

(4)*** Enjoy metadata (for example, OCLC/CORC, IMESH

6. Metadata interoperability

6. 1 Metadata Interoperability

Because there are many metadata formats in different fields (even in the same field), there are metadata interoperability problems when searching, describing and utilizing resources between resource systems described by different metadata formats:

Interpretation and transformation of various metadata formats and transparent retrieval between digital information resource systems described by various metadata formats.

6.2 Metadata format mapping

Different metadata meta formats are converted by using a specific converter, which is called metadata mapping/traversal.

There are a large number of conversion programs that can convert between several popular metadata formats, such as

Dublin core and USMARC;; Dublin core and EAD

Dublin core and GILS;; ; GILS and mark ·TEI

Hyde and Mark ·FGDC and Mark

You can also use mediation format to convert multiple metadata formats under the same format framework. For example, the UNIverse project uses GRS format to convert various MARC formats and other record formats. Format mapping conversion is accurate and efficient. However, the application efficiency of this method in the open environment where multiple metadata formats coexist is obviously limited.

6.3 Standard Description Framework

Another way to solve metadata interoperability is to establish a standard resource description framework, which describes all metadata formats, so as long as a system can analyze this standard description framework, it can explain the corresponding metadata formats. In fact, XML and RDF play similar roles from different angles.

Through its standard DTD definition, XML allows all systems that can interpret XML statements to recognize the metadata format defined by XML_DTD, thus solving the problem of interpreting different formats.

RDF defines a basic model consisting of three kinds of objects, namely resources, attributes and statements. The relationship between resources and attributes is similar to the E-R model, and statements describe this relationship in detail.

RDF establishes a framework for defining and using metadata through this abstract data model, and metadata elements can be regarded as attributes of the resources they describe.

In addition, RDF defines the standard schema, which specifies the mechanism of declaring resource types, related attributes and their semantics, and the method of defining the relationship between attributes and other resources. In addition, RDF provides a mechanism to call existing definition specifications by using XML namespace methods.

6.4 Digital Object Mode

Establishing a digital object containing metadata and its transformation mechanism may solve the problem of metadata interoperability from another angle.

The Cornell /FEDORA project proposes a composite digital object consisting of a structural kernel and a functional propagation layer.

The kernel can contain document content in the form of a bit stream, metadata describing the document, and related data for access control of the document and metadata.

In the functional communication layer, primary disseminators support the service functions of deconstructing kernel data types and reading kernel data, and there may be disseminators of content types, which can embed metadata format conversion mechanisms.

For example, there is metadata in MARC format in the kernel of a digital object, and a content type propagator requesting Dublin Core format and its conversion service is loaded in the function propagation layer. When a digital object user requests to read the metadata represented by Dublin Core, the corresponding content type propagator will request the digital object stored with Dublin Core and its conversion service program through the network, and then convert the MARC metadata in the requested digital object into Dublin Core and output it to the user.

7. Some suggestions

Track the development of metadata, actively participate in the formulation of metadata standards, accelerate the application of metadata, and pay attention to international integration.

Accelerate the research on the mechanism of effectively using metadata for retrieval (including transparent retrieval of heterogeneous systems), associated learning, personalized processing, etc.

Research on ways and methods to accelerate the organic integration of metadata with digital objects and digital resource systems.

Promote the research of metadata used in knowledge-based data organization and knowledge discovery.