Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Registry for controlled vocabularies at the Library of Congress

Similar presentations


Presentation on theme: "A Registry for controlled vocabularies at the Library of Congress"— Presentation transcript:

1 A Registry for controlled vocabularies at the Library of Congress
Rebecca Guenther Network Development & MARC Standards Office, Library of Congress October 29, 2008

2 Outline of presentation
Types of controlled vocabularies Vocabularies maintained at LC An introduction to SKOS Establishing concept databases at LC Examples of concept schemes: ISO and PREMIS event type Providing the registry as a web service ASIST 2008 Oct. 29, 2008

3 Why establish controlled vocabularies?
Control values that occur in metadata Document and publish for reuse Reduce ambiguity Control synonyms Establish formal relationships among terms (where appropriate) Test and validate terms Many metadata schemes allow for content from other sources. Some data elements may be more useful if a controlled vocabulary is used. Some are published formally, others are developed and used locally. Formal controlled vocabularies may be used for testing and validation of terms– this is often done in integrated library systems, where bibliographic records may validate against authority records. This is one instance of testing and validation of terms. There is work being done on establishing metadata registries for both documentation and machine validation of both controlled vocabularies and metadata elements/terms. This could be particularly useful for controlled vocabularies, since their usefulness depends on consistency. ASIST 2008 Oct. 29, 2008

4 Types of Controlled Vocabularies used in metadata standards
Lists of enumerated values Code lists (e.g. language, country) Taxonomies Formal Thesauri Locally controlled enumerated lists NISO has a standard for constructing thesauri (free download – in bibliography) ANSI/NISO Z : Guidelines for the Construction, Format, and Management of Monolingual Controlled Vocabularies ASIST 2008 Oct. 29, 2008

5 Enumerated lists Simple list of terms used in a pull-down menu or Web site pick list Values enumerated in an XML schema Little additional information or structure about each value Examples: Code and value from a MARC 21 fixed field, e.g. code “e” in Leader/06 is “cartographic material” Enumerated value “MD5” for METS CHECKSUMTYPE Enumerated value “born digital” in MODS digitalOrigin ASIST 2008 Oct. 29, 2008

6 Code lists Some established as ISO standards and used worldwide in many communities for many purposes The standard standardizes the code, not a particular name for it Codes are used as identifiers Examples (maintained by LC): ISO (language codes) MARC relator codes MARC country codes ASIST 2008 Oct. 29, 2008

7 Thesauri A thesaurus is a controlled vocabulary with multiple types of relationships Example: Rice UF paddy BT Cereals BT Plant products NT Brown rice RT Rice straw ASIST 2008 Oct. 29, 2008

8 Standards maintained at LC that use controlled vocabularies
MARC (including code lists) MODS METS MIX (XML schema for Z39.87 Technical metadata for digital still images) PREMIS ISO (language codes) Thesaurus of Graphic Materials LCSH … and some others ASIST 2008 Oct. 29, 2008

9 Simple Knowledge Organisation System(s)
SKOS: What is it? Simple Knowledge Organisation System(s) SKOS is … for declaring and publishing taxonomies, thesauri or classification schemes, for use in a distributed, decentralised information system (i.e. a semantic web). for describing Concepts and creating relationships between Concepts and Terms A practical application of RDF a formal language for representing controlled, structured vocabularies ASIST 2008 Oct. 29, 2008 9

10 The SKOS data model …views a knowledge organization system as a concept scheme comprising a set of conceptual resources (concepts). These concept schemes and conceptual resources are identified by URIs. The model is multilingual and extensible ASIST 2008 10 Oct. 29, 2008

11 Concepts can be… labeled with any number of strings. One label, in any given language, can be indicated as the "preferred" label for that language, and others as "alternate“ labels, "hidden“ labels, or using a notation: skos:prefLabel skos:altLabel skos:hiddenLabel skos:notation ASIST 2008 11 Oct. 29, 2008

12 Concepts can be… linked to other concepts within the same concept scheme. Hierarchical links: skos:broader and skos:narrower skos:broaderTransitive and skos:narrowerTransitive Associative links: skos:related ASIST 2008 12 Oct. 29, 2008

13 Concepts can be… grouped into collections, which can be labeled and/or ordered. A concept can be in one or more collections skos: Collection skos: OrderedCollection skos: member skos: memberList ASIST 2008 13 Oct. 29, 2008

14 Concepts can be… mapped to other concepts in different concept schemes. Hierarchical mapping: skos:broadMatch skos:narrowMatch Associative mapping: skos:relatedMatch skos:closeMatch skos:exactMatch ASIST 2008 14 Oct. 29, 2008

15 Advantages to using SKOS
SKOS has a defined element set which is particularly relevant for controlled vocabularies Relationships between entries in a thesaurus can be expressed (broader, narrower, etc.) Relationships between entries in different thesauri can be expressed (exactMatch, related) Having a dereferencable URI for concepts and their concept schemes enhances the ability to provide web services for consumers of these standards ASIST 2008 Oct. 29, 2008

16 Controlled vocabularies registry at LC
Library of Congress is establishing databases with controlled vocabulary values for standards that it maintains Controlled lists are represented using SKOS as well as alternative syntaxes Lists currently in progress: ISO and MARC language code list MARC geographic area codes MARC country code list MARC relators PREMIS controlled value lists Thesaurus of Graphic Materials Other possibilities Enumerated values in MODS schema Coded and uncoded value lists in MARC ASIST 2008 Oct. 29, 2008

17 Reasons for developing a registry
Facilitate development and maintenance process Make controlled lists openly available Develop a web service where comprehensive information about controlled terms is available Experiment with semantic web technologies Expose vocabularies to a wider communities ASIST 2008 Oct. 29, 2008

18

19 Example: ISO 639-2 vocabulary
One in the family of ISO 639 language coding standards Has a close relationship with other language coding standards (ISO and -3, MARC) LC is maintenance agency The standard is the CODE, not the language name; multiple names are given ASIST 2008 Oct. 29, 2008

20

21 ISO 639-2 language code example
<rdf:Description rdf:about= " <rdf:type rdf:resource=" #Concept"/> <skos:prefLabel xml:lang="x-notation">por</skos:prefLabel> <skos:altLabel xml:lang="en-Latn">Portuguese</skos:altLabel> <skos:altLabel xml:lang="fr-Latn">portugais</skos:altLabel> <skos:notation rdf:datatype="xs:string">por</skos:notation> <skos:definition xml:lang="en-Latn">This Concept has not yet been defined.</skos:definition> <skos:inScheme rdf:resource=" <vs:term_status>stable</vs:term_status> <skos:historyNote rdf:datatype="xs:dateTime"> T08:41: :00</skos:historyNote> <skos:exactMatch rdf:resource= " <skos:changeNote rdf:datatype="xs:dateTime"> T13:49: :00</skos:changeNote> </rdf:Description> This is one type of SKOS expression, which is RDF/XML. Tags defined by SKOS a wrapped in an RDF wrapper. skos:prefLabel uses the language code as value so as to not give a preference to a term in a particular language. The skos:altLabel is used here for the various language names skos:inScheme tells you what concept scheme this entry is included in; there can be multiples. skos:exactMatch gives a URI for the other code which is exactly the same; in this case there is the 2-character code “pt” (ISO 639-1), which is equivalent to this one.

22 PREMIS controlled lists
PREMIS Data Dictionary for Preservation Metadata Some semantic units call for controlled vocabularies and have suggested lists A central registry could document and make them available Users could submit their own terms PREMIS schema could be enhanced with enumerated values for validation generated dynamically ASIST 2008 Oct. 29, 2008

23

24

25 PREMIS event type example
<rdf:Description rdf:about= " <rdf:type rdf:resource= " <skos:prefLabel xml:lang="en-latn"> creation</skos:prefLabel> <skos:narrower rdf:resource= " <skos:narrower rdf:resource= " <skos:definition xml:lang= "en-latn">the act of creating a new object</skos:definition> <skos:inScheme rdf:resource= " /preservationEvents"/> </rdf:Description> This example is from a concept scheme called “preservation events”. The controlled value described is “creation”, which is in the PREMIS data dictionary as a suggested value under eventType. It is a broader term for 2 others on the value list: migration and normalization.

26 XML Database using XQuery
Registry Web service XML Database using XQuery (eXist) RDF Triple Store (Sesame) HTTP request User Runs query Gets results Sends back to database and then to user Interprets URI Formulates SPARQL query

27 Further development Consider programming changes to improve speed
Develop mechanisms to output all public documentation from database Include additional coding about relationships to other concept schemes and controlled vocabularies (facilitating crosswalks) Encourage experimentation ASIST 2008 Oct. 29, 2008

28 Questions? Contacts: Rebecca Guenther: rgue@loc.gov
Clay Redding: ASIST 2008 Oct. 29, 2008


Download ppt "A Registry for controlled vocabularies at the Library of Congress"

Similar presentations


Ads by Google