Presentation is loading. Please wait.

Presentation is loading. Please wait.

Module 6a: Intro to Controlled Vocabularies, Taxonomies and Classification IMT530: Organization of Information Resources Winter 2007 Michael Crandall.

Similar presentations


Presentation on theme: "Module 6a: Intro to Controlled Vocabularies, Taxonomies and Classification IMT530: Organization of Information Resources Winter 2007 Michael Crandall."— Presentation transcript:

1 Module 6a: Intro to Controlled Vocabularies, Taxonomies and Classification IMT530: Organization of Information Resources Winter 2007 Michael Crandall

2 IMT530- Organization of Information Resources2 Module 6a Outline Where we are Controlled vocabularies Types of controlled vocabularies Tagging Overview of building vocabularies

3 IMT530- Organization of Information Resources3 Recap We looked at the indexing process to see how controlled vocabularies can be used to enhance access to information –Different methods of indexing provide different results –Need to decide on your approach based on an analysis of your business objectives, the user needs, and the domain –A combination of automatic and human indexing is often the best solution

4 IMT530- Organization of Information Resources4 Overview of Subject Representation Subject analysis –a technique used to determine the “subject(s)” and disciplinary context exemplified by an object Subject indexing –a technique through which subject terms (words, taxonomic categories, or notation) are added to an object representation to describe the subject content of the object Controlled vocabularies –standards containing controlled subject terms (words, taxonomic categories, or notation) used in the indexing process

5 IMT530- Organization of Information Resources5 Controlled Vocabulary: Definition A controlled vocabulary is a list of terms (words or phrases) or codes (notation) used for indexing Almost always, controlled vocabularies show relationships among terms

6 IMT530- Organization of Information Resources6 Purpose of Controlled Vocabularies Specific Purposes –To provide access to content by subject, through providing hierarchical and associative relationships and synonym control for the terms used in the domain –Increase precision in retrieval and display by controlling homographs (words that are spelled the same but have different meanings) General Purposes –Assist users by conveying meaning, orientation, and structure in a subject area –Assist users by providing rich relationships among concepts and terms

7 IMT530- Organization of Information Resources7 Buckland Proposes five different vocabularies in any system: –Authors –Indexers –Syndetic structure –Searchers –Formulated queries Formal tradition vs. document tradition

8 IMT530- Organization of Information Resources8 Types of Controlled Vocabularies Subject Heading List Taxonomy Thesauri Classification Scheme More terminology on Leonard Will’s site –http://www.willpowerinfo.co.uk/glossary.htmhttp://www.willpowerinfo.co.uk/glossary.htm Zeng, M.L. (2005). Construction of controlled vocabularies: A primer.

9 IMT530- Organization of Information Resources9 Subject Heading Lists General list of terms (words and phrases), not limited by discipline or subject area Terms are called subject headings The distinction between thesauri & subject heading lists is largely historical (subject heading lists are older); there are very few subject heading lists because they are so expensive to maintain Terms are mainly subject attributes, but there are many exemplified attributes used in subdivisions Example: Library of Congress Subject Headings (LCSH), used in library catalogs –Sample terms: “France – Colonies – History – 18 th century”; “Time and space – Juvenile fiction”; “Frogs” (notice the use of subdivisions, marked here by dashes; thesauri seldom use subdivisions)

10 IMT530- Organization of Information Resources10 Taxonomies List of terms (words and phrases) that may be general or subject/discipline/domain specific Terms are called taxons or (simply) terms Terms represent subjects, disciplines/domains, and exemplified attributes Used in digital environment only Examples: Microsoft Corporation intranet taxonomies; Yahoo taxonomy used in the Yahoo directory –Sample terms from the Yahoo taxonomy (in Yahoo, you’ll find these at the top of the screen as you browse through the directory): “Education”; “Science > Agriculture > Research > Government Agencies”; “Health > Nursing”; “Health > Education”;

11 IMT530- Organization of Information Resources11 Thesauri Thesauri (pl.) / Thesaurus (s.) –List of terms (words and phrases) that are usually limited to a specific subject or disciplinary area –Terms listed in a thesaurus are often called descriptors –Thesauri were mostly defined and developed after the advent of the computer and were created for use in an computerized environment (or with computers in mind) –Terms are usually subject (about) attributes, but some thesauri also contain exemplified (example of) attributes- http://www.e-government.govt.nz/nzgls/thesauri http://www.e-government.govt.nz/nzgls/thesauri –Example: ERIC Thesaurus (education) Sample terms from the ERIC Thesaurus: “School community relationship”; “College entrance exams”; “Age grade placement”

12 IMT530- Organization of Information Resources12 “Classification” Schemes Chart of subject categories contextualized by a hierarchical structure Terms are lists of codes (notation) Terms are called classes and class numbers Classification schemes make use of disciplinary, subject, and (sometimes) exemplified attributes Used often to arrange physical documents; sometimes used in online environments

13 IMT530- Organization of Information Resources13 “Classification” Example Examples: Dewey Decimal Classification (DDC); Universal Decimal Classification (UDC); Colon Classification Sample entries (DDC): –510 (meaning: “Mathematics” (a discipline and a subject)); –512.57 (meaning: “Mathematics / Linear, multilinear, multidimensional algebras / Factor algebras”) –362.582 (meaning: “Social problems and services / Problems of and services to the poor / Financial assistance”)

14 IMT530- Organization of Information Resources14 Four Types of Classification Kwasnik describes four classification systems –Hierarchies –Trees –Paradigms –Facets Paradigms are useful primarily for analysis of subject gaps and relationships in a constrained space Trees are a poor form of hierarchy with limited relationships We’ll look at the other two in some detail over the next two weeks

15 IMT530- Organization of Information Resources15 Hierarchies Good for representation of knowledge in mature domains where the nature of the entities and relationships are well known You’ll see examples of these in the thesauri that we will look at in today’s exercise Require a model that describes what entities are included, with rules of association and distinction Tend to be monolithic and cumbersome for large domains

16 IMT530- Organization of Information Resources16 Facets Actually a different approach rather than a different structure –May use hierarchies or trees as part of the structure –Originated in the work of S.R. Ranganathan Proposed that any object could be viewed in five ways: personality, matter, energy, space and time (PMEST) –Being used more and more in modern information systems because of flexibility in meeting multiple needs

17 IMT530- Organization of Information Resources17 Collaborative Tagging Points out issues of “basic level” and “collective sensemaking” Tug of war between personal storage –Identifying qualities –Self reference –Task organizing and public nature of access –What or who it is about –What it is –Who owns it –Categories Stability emerges from imitation and shared experience

18 IMT530- Organization of Information Resources18 Trees vs. Tags Weinberger’s article postulates three types of vocabularies –Trees (hierarchies) –Facets –Tags Golder/Huberman and Weinberger both point out that each approach can be useful in particular situations –Choosing your approach is part of the process of subject and domain analysis

19 IMT530- Organization of Information Resources19 Steps in Constructing CVs Define your domain Gather concepts –From user interviews, search logs, content analysis, preexisting vocabularies Select your approach Extract terminology Control your terms Organize your terms Maintain, maintain, maintain

20 IMT530- Organization of Information Resources20 Questions? If not, take a break!!!

21 IMT530- Organization of Information Resources21 Exercise 6a Purpose is to explore some existing controlled vocabularies to investigate their differences and similarities, how useful they might be for subject access, and to become familiar with the structure of controlled vocabularies in general Spend the next 45 minutes on Exercise 6a Ask questions and talk!!! Be sure to hand in completed work at the end of class for credit!!!

22 IMT530- Organization of Information Resources22 Thursday We’ll start to look at ways to build controlled vocabularies and the rules associated with them Remember to read assignments BEFORE class


Download ppt "Module 6a: Intro to Controlled Vocabularies, Taxonomies and Classification IMT530: Organization of Information Resources Winter 2007 Michael Crandall."

Similar presentations


Ads by Google