The Value of Context for Data and Information

Joseph A. Busch

Vice President for Information Product Development

DATAFUSION, Inc. [1]

This paper was originally prepared as a position paper for the NSF Workshop on Data Archiving & Information Preservation. Joseph Busch represented the American Society for Information Science (ASIS) at the workshop.

Some of the key questions that need to be answered to understand the requirements for data archiving and information preservation are:

Mechanisms for selecting, identifying, and organizing data and information are critical to finding and using it later. It is also important to recognize that these later uses may be for the same, but are more often for different purposes than those for which the data and information were originally created.

Business management demands economic accountability in terms of return on investment or ROI. At first glance archiving and preserving business data and information would appear to be primarily driven by compliance requirements. There are rules governing the requirements for business records retention. When these are followed, business records are routinely saved for a prescribed period of time and then they are destroyed. However, many types of business data and information such as that related to research and development, fall outside the scope of prescribed areas of compliance. Recent business theories related to issues such as innovation and intellectual asset management are leading to the view that business data and information are assets and that they need to be handled accordingly by the company. [2]

Knowledge Management

New information management theories are placing enormous emphasis on knowledge assets and capturing value from them. [3] The translation of these ideas by the computer systems application business has lead to the recent re-labeling of many product offerings as "knowledge management" applications. Companies such as Lotus/IBM [4], Oracle [5], and SAP [6] are suddenly in the knowledge management business. Merrill-Lynch recently published an analyst report on Enterprise Information Portals (EIP) -- integrative systems that allow users to find, extract and analyze information in company databases, file servers, and desktops. It is interesting to note that the analysts characterize this trend as a shift in focus "away from the actual content of the information to the context in which the end user consumes the information." [7] In this instance, we are to read the word "context "as equals "value".

How do these ideas of "value" translate into computer and information science theory? What research and what information policies are needed to enable the creation of knowledge assets and, more importantly, to capture value from them?

Data, Information, and Knowledge

If the major information technology companies suddenly think they are in the knowledge management business, then there must be some confusion about the differences between data, information, and knowledge. Consider a simple model that describes the life cycle of "stuff", the evolution of data into knowledge. Data just doesn't exist out there. It is an abstraction of the real world, a human construction that slices reality into a series of temporal chunks so that it can be measured, and so that those measurements can be recorded and used for a variety of purposes.

Data is generated via interfaces with the real world through a variety of input devices such as:

Data is transformed into information when it is formatted and structured in a way that facilitates its use, for example by publishing it.

Information is transformed into knowledge when it is organized, analyzed, communicated, and perhaps preserved in ways so that it can be found and used again. That is, information is transformed into knowledge when it is provided a context or is re-used in a way that may or may not be the way it was envisioned it would be used when it was first collected. [8]

 

Figure 1. The value of raw data versus knowledge.

At each stage in this life cycle, value is added. If it is not, then the economic (although not necessarily intrinsic) value of data and information degrades over time. Also, while the costs of digital input devices, processing, and storage have been decreasing (and will continue to decrease) over time; the cost of analyzing and preserving information is as expensive as ever. In fact as the costs of generating data and information decrease, more of it is being created, thus increasing the total cost of analyzing and preserving it.

So as the Merrill Lynch analysts point out, it is the capabilities to find, extract and analyze information that create long term value, not the content itself but its contextualization.

The Value of Categorization

Sorting things into categories is a natural activity. People arrange things alphabetically, chronologically, spatially, by physical attributes, and by topic. Business systems organize information along a variety of dimensions such as product, location, industry, equipment, and problem. Bibliographic systems use formal classification schemes like Dewey Decimal Classification (http://www.oclc.org/oclc/fp/about/brief.htm). The Chemical Abstracts System or CAS Registry Number classifies chemicals (http://www.cas.org/). SNOMED or the Standard Nomenclature of Medicine categorizes clinical medicine (http://snomed.org/). Master data systems or metathesauri such as the Unified Medical Language System or UMLS (http://www.nlm.nih.gov/pubs/factsheets/umls.html) aim to cross-reference multiple categorization schemes used in different resources.

Information objects are created (and systems optimized) for a particular purpose. But the same information contains shreds of evidence that can be used for other purposes. The value of categorization schemes is that they facilitate the transfer of knowledge -- finding, extracting, and analyzing content. The more an information object has been codified, the more economically it can be transferred. [9] This is the underlying value proposition of business applications such as Enterprise Resource Planning (ERP) and data warehouses. ERP systems are designed to collect and codify operational information so that it can be re-used. Online analytical processing (OLAP) systems and data mining are designed to support the analysis of such codified information repositories for decision-making purposes.

Figure 2. Explicit knowledge is more expensive to create, but it can be transferred more effectively than implicit knowledge.

However, it is interesting how little time and effort is dedicated by businesses to codifying personal information objects created with desktop application software such as word processed documents, spreadsheets, presentations, email, and bookmarks to Web page. By and large, the model of the paper filing cabinet is being carried forward to the personal hard disk, file server, and intranet. New information management products (such as Plumtree Software http://www.plumtreesoft.com/) that attempt to organize personal information objects, try to infer the content of these documents and categorize them in ad hoc ways without reference to even the most rudimentary document filing schemes, nomenclatures, or taxonomies.

Problems and Opportunities

For data archiving and information preservation to result in data and information from which knowledge can more readily be created, research and development is needed to address a variety of issues. A few of these are briefly described below.

Codifying Metadata Values. Ubiquitous date stamping, user identification, and document format identification does facilitate archiving of information objects, but adds little to their long term usefulness. The lack of explicit standards for codifying the metadata values that describe their content and thereby provide an initial context diminishes their value because it is difficult to locate and re-use them. There are few methodologies and vocabulary management tools for building such taxonomies, and a general lack of skilled practitioners to implement them. There are few storage, discovery, search, retrieval, and analysis environments that take real advantage of codified schemes when they exist.

Implicit Categorization. Research is needed on methodologies that take advantage of implicit classification. For example, it is known from citation analysis that documents which cite a common source are a means to build a useful collection of documents. Can a profile be extracted from such a collection of documents to function as a set of categories? Can the set of documents created by an author be useful to create a profile of that author?

Code Scheme Management. The names of concepts change over time, consider, for example, the place name Yugoslavia. It is a big problem to institute a common single version, let alone to manage multiple versions and editions. Research is needed on mechanisms for handling the temporal attributes of the data values in coding schemes.

Supporting Ambiguity. Structures that support the capability to represent multiple and sometimes contradictory information related to the same element are needed. [10]

Information Visualization. Efficient and expressive visualization of complex information spaces showing relationships among heterogeneous information objects requires much more research. [11]

Notes

[1] Located in San Francisco, DATAFUSION is an enterprise software company focused on advanced systems for accessing, categorizing, and retrieving information from disparate data sources. DATAFUSION brings information together from multiple sources, into an environment where it can be categorized and re-categorized, so that the information base can be queried, presented, and re-packaged in an intuitive and easy to use interface by many different types of users. Integrating a modular concept metathesaurus, a descriptive metadata repository, and expressive graphical Knowledge MapsTM, the DATAFUSION application environment is used to build and communicate relationships among disparate data sources. With a technical team comprised of highly qualified professionals in computer, information, and library science, DATAFUSION provides a unique combination of expertise to solving complex information management problems.

[2] See for example "Artifacts Scenario: Business Records in the Information Life Cycle." In: C.L. Borgman, and others, Social Aspects of Digital Libraries: Final Report to the National Science Foundation, November 1996. http://dlis.gseis.ucla.edu/DL/UCLA_DL_Report.html

[3] See for example D.J. Teece, "Capturing Value from Knowledge Assets: the New Economy, Markets for Know-How, and Intangible Assets," In: Special Issue on Knowledge and the Firm, California Management Review 40/3 (1998): 55-79.

[4] See for example: "Lotus Takes on Knowledge Management" (http://www.informationweek.com/667/67iulot.htm); "Lotus, Compaq team up to provide knowledge management bundles" (http://www.zdnet.com/pcweek/news/0720/21eknow.html); and "Analysis: For Lotus, Next Generation Groupware is Knowledge Management" (http://www.internetwk.com/news/news0130-10.htm).

[5] See for example, P.J. Gill, "Knowledge Management in the Information Age," Oracle Magazine (May/June 1998) (http://www.oramag.com/oracle/98-May/cov1.html).

[6] See for example, SAP AG, "SAP Knowledge Management Solution" (http://www.sap.com/products/know/knsmover.htm).

[7] C.C. Shilakes and J. Tylman, Enterprise Information Portals: in-Depth Report, [New York]: Merrill Lynch & Co., November 16, 1998.

[8] The emergence of knowledge management in the commercial sector is further evidence of the re-emergence of the knowledge utilization problem solving framework as noted in J. Paisley, "Knowledge Utilization: the Role of New Communication Technologies," Journal of the American Society for Information Science 44/4 (1993): 222-234.

[9] Teece, p. 63.

[10] See for example. J.A. Busch, "Use of a Relational Database System to Model the Variability of Historical Source Information." In: Cognitive Paradigms in Knowledge Organisation, Bangalore: Sarada Ranganathan Endowment for Library Science, 1992, pp. 372-389.

[11] See for example, M. Dodge, "An Atlas of Cyberspace," Cyber-Geography Research, Centre for Advanced Spatial Analysis (CASA), University College London. (http://www.cybergeography.org/atlas/atlas.html)

 

Knowledge Management Webography

@Brint. http://www.brint.com/. Business oriented information portal for knowledge management.

J.S. Brown and P. Duguid, "Stolen Knowledge," 1992. http://www.parc.xerox.com/ops/members/brown/papers/stolenknow.html. Paper on "situated learning" that is particularly interesting in its perceptions on explicit versus implicit learning.

T.M. Jorde and D.J. Teece. "Rule of Reason Analysis of Horizontal Arrangements: Agreements Designed to Advance Innovation and Commercialize Technology." http://www.ftc.gov/opp/global/jorde2.htm. Article that looks at one area of antitrust inquiry -- horizontal arrangements -- focusing on the importance of innovation to competition.

R. Ruggles, "Knowledge Tools: Using Technology to Manage Knowledge Better," Working Paper (April 1997). http://www.businessinnovation.ey.com/mko/html/toolsrr.html. Describe how technological tools can be used to support the automation or augmentation of organizational knowledge management. Defines knowledge management tools in contrast with data and information management tools.

P.A. Strassmann, "The Value Of Computers, Information and Knowledge" (January 30, 1996) http://www.strassmann.com/pubs/cik/cik-value.shtml. Discusses the relationship between corporate profitability and information technology spending, and measurement of return on investment for IT expenditures.

D.J. Teece, "Information Sharing, Innovation and Antitrust," Antitrust Law Journal 2(2): 65 (February 1994). http://www.lib.uconn.edu/Economics/Faculty/Langlois/teece.htm. Examines cooperation in information collection, dissemination, and exchange among "competitors" where markets are experiencing rapid change caused by technological innovation.

D.J. Teece, "Telecommunications in Transition: Unbundling, Reintegration, and Competition," Michigan Telecommunications Technology Law Review 1/4 (1995) http://www.umich.edu/~mttlr/VolOne/Teece.html. This paper outlines technological changes -- microelectronics, optics, and computer science, fully-interactive communications network, transition from analog to digital technologies, etc. -- and explores their implications for competition policy, industry structure, and business organization.