Presentation is loading. Please wait.

Presentation is loading. Please wait.

OCLC Online Computer Library Center Two Paths to Interoperable Metadata Jean Godby, Devon Smith, Eric Childress DC-2003 September 29, 2003.

Similar presentations


Presentation on theme: "OCLC Online Computer Library Center Two Paths to Interoperable Metadata Jean Godby, Devon Smith, Eric Childress DC-2003 September 29, 2003."— Presentation transcript:

1 OCLC Online Computer Library Center Two Paths to Interoperable Metadata Jean Godby, Devon Smith, Eric Childress DC-2003 September 29, 2003

2 Outline of this talk The Metadata Switch project The Metadata Schema Transformations project An experiment with XML and XSLT A new system design Open issues Project status

3 The Metadata Switch project An umbrella activity for a set of OCLC Research projects that construct modular services for adding value to metadata. Some examples: –Harvesting –Fusion of metadata from different sources –Name authorities

4 Metadata Schema Transformations: Project goals A robust design for metadata translation –Clean separation of: document data model schema translations machinery –Support for current practice and for foreseeable innovation A metadata translation system/toolkit –Self-contained Web service for metadata translation –A place for human input (intellectual mappings) in an automated system

5 High-level system design Metadata schema translator Web services layer Crosswalk repository client Record translation client A transformed record A record A metadata crosswalk

6 XML and XSLT XML (Extensible Markup Language) –A markup language for structured documents. –Like SGML, but designed for use in the Web environment. –Like HTML, but display and structure markup are distinct. –A World Wide Web Consortium (W3C) standard; many supporting tools. XSLT (Extensible Stylesheet Language Transformations) –A tree-oriented language for transforming XML documents. –A W3C recommendation; newer than XML.

7 Our working client

8 A test case The records: –Data streams from the Colorado Digitization Project. –Minimal Dublin Core XML records that describe photographs. The process: –OCLC Research uses an XSLT script to convert DC simple records to MARC XML. A Perl script converts the XML records to MARC 2709. –Records are sent to OCLC production software for correction, validation, and batch loading into the WorldCat database.

9 Before and after

10

11 Problems with our XML/XSLT solution Lots of conditions for use –XML records –Supporting XML documentation: schemas, namespaces, DTDs, URIs –XSLT scripts or XSLT programming expertise –Simple structural transforms Not appropriate –for semantic mappings. Element semantics is lost. –when standards and encodings are in flux. Supporting documentation is unmanageable. –for our model of collective intelligence. Knowledge in a set of XSLT scripts can’t be mined.

12 The XSLT solution: reprise If they’re not equivalent, how do they differ? Which crosswalks have XML schemas that match my data? Which crosswalk s are equivalent ?

13 The long translation path: why? Metadata translation needs a layer of abstraction. –Metadata standards have many versions or encodings, but element definitions and mappings stay the same. –The lack of abstraction leads to a combinatoric explosion of pairwise mappings. (Metadata schema X (versions * encodings)) * (Metadata schema Y (versions * encodings)) –The meanings behind the semantic transforms are lost unless they are recorded and associated with element definitions. A full commitment to XML may be premature.

14 The long translation path 11 File of records in format X 55 File of records in format Y 22 Transform to intermediate format STRUCTURAL TRANSFORM Transform to output format Y STRUCTURAL TRANSFORM Transform interoperable core to intermediate format44 SEMANTIC TRANSLATION Transform intermediate format to interoperable core33 Interoperable Core SEMANTIC TRANSLATION Semantic maps from Excel tables

15 What the long path accomplishes The model encodes two sources of abstraction. –Syntactic normalization –Semantic mappings The user interacts with familiar objects. –A set of documents –Human-readable mappings But: Metadata schema translation is indirect. There is more processing overhead for “best-case” XML documents.

16 Custom software Handles “XML-ish” and non-XML data Normalizes variation in records Does special handling of data required for complete and robust translations Translates user-supplied crosswalks Advanced XML Uses XPointer and XLink to: – implement the interoperable core – document the semantics of translations Works with established standards Open issue 1: Implementing the long path

17 Open issue 2: The interoperable core A union…or an intersection of elements? An established standard…or a custom design? One…or many interoperable cores?

18 Project status Development is interspersed with testing on third-party data. The custom software for the long translation translation path is due to be completed in Autumn 2003. The advanced XML solution is being studied.

19 For further information The Metadata Switch Project at OCLC http://www.oclc.org/research/projects/mswitch/


Download ppt "OCLC Online Computer Library Center Two Paths to Interoperable Metadata Jean Godby, Devon Smith, Eric Childress DC-2003 September 29, 2003."

Similar presentations


Ads by Google