Taxonomy Strategies

FaceBook Twitter LinkedIn
Taxonomy Strategies

 

"For a large enterprise to share information across diverse product lines and functions, a common language or taxonomy is required to classify the information. The best way to develop the common taxonomy is to look at the hierarchies currently in use."

- David Lamar Smith, Halliburton Global Technical Services Chief

GLOSSARY

Automated Classification

Use of technology to organize content into groups so it can be retrieved when needed. The result of automatic classification is either a content collection clustered into groups (possibly a candidate taxonomy), or content categorized according to a pre-existing taxonomy. The best results are obtained by defining a business process that combines manual and automated processing so that technology is leveraged and human editorial input is optimized.

Dublin Core

A set of 15 metadata elements (the Dublin Core Metadata Element Set) used to describe and catalog content so it can be discovered and retrieved. The Dublin Core is the de facto standard for cataloging web content.

Information Retrieval Technologies

Automated methods to analyze, classify, search for, and retrieve text. The basic principles of information retrieval or IR are based on research done in the 1940’s and 1950’s. The key observation was that word frequency provides a useful measure of significance. Many refinements have been made to this simple observation utilizing statistics, linguistics, logic, and clever combinations of one or more methods.

Metadata

A common set of attributes that contain critical information to describe and catalog content. The basic concept behind metadata has been used to organize content since the beginning of clay tablet and papyrus scroll collections 3000 years ago. Card and book catalogs and bibliographic databases have used a commonly understood metadata standard to organize large collections.

Dublin Core metadata example:

 

Dublin Core Elements

Asset metadata—
The Who, Where and When

Title, Creator, Publisher, Contributor, Date, Type, Format, Identifier, Source, Language

Subject metadata—
The What and Why

Subject, Description, Coverage

Relational metadata—
Links between Assets

Relation

Use metadata—
How to Monetize Assets

Use

 

Taxonomy

Overall scheme for organizing content to solve a business problem such as improving search, browsing for content on an enterprise-wide portal, enabling business users to syndicate content, and otherwise providing the basis for content re-use. The basic idea behind taxonomy is to provide a controlled vocabulary for metadata attributes, and to specify relationships between terms in the controlled vocabulary. The simplest relationships are broader, narrower, and related, but relationships can be much more specific and complex. Click here for a glossary of taxonomy terms.

UNSPSC Taxonomy example:

Prepared and preserved foods

 

 

Broader term

 

Snack food

 

 

 

 

Corn chips

Narrower term

 

 

Popcorn

Narrower term

 

 

Potato chips

Narrower term

 

 

Pretzels

Narrower term

 

 

 

 

Beer

 

 

Related term

 

XML Schema

Data models expressed in XML. XML schema provide a means for defining and implementing a consistent structure or syntax, and semantics for XML documents that allow machines to carry out rules made by people. A facetted taxonomy provides the names of metadata elements and a consistent set of attribute values or vocabularies for filling the elements in an XML schema.

 

 

 

[Last updated 2012-02-29]

[image above:
Gaussian Scatter from Wikipedia, the free encyclopedia (en.wikipedia.org/wiki/File:
GaussianScatterPCA.png
)]