United States Patent5873097
Harris , ; et al.February 16, 1999

Title

Update mechanism for computer storage container manager

Abstract

Methods and data structures which permit information to be stored as objects in target containers and update containers. A target container defines a first state of the information, and the update container, which can point to the target container, identifies changes to the information in the first state which would be sufficient to update the first information state to a second information state. Update containers may be nested to any depth. When an application program opens an update container, the procedure searches down the chain until it finds the ultimate target container. It then creates in-memory structures for providing access to the objects and value data represented in such container. The procedure then works its way back up the chain, performing the changes on the in-memory structure, which are called for in each of the update containers. New modifications made after this process is complete, are recorded in memory, and when committed, are written out into a new update container which references the container that the application program originally opened. The changes which are identified in an update container, if they represent modifications to an object in an underlying container, refer to that object logically rather than physically. Multiple concurrent (parallel) updaters are supported, since more than one update container can refer to the same target container. Thus each updater has an independent view of the information being updated. The mechanism facilitates reconciliation of concurrent updates since it maintains a record of the changes made.


Inventors:Harris; Jared M. (Berkeley, CA), Ruben; Ira L.  (San Jose, CA)
Assignee:Apple Computer, Inc. (Cupertino, CA)
Appl. No.:08/768,339
Filed:December 17, 1996

Current U.S. Class:707/203 
Current International Class:G06F 17/30 (20060101)
Field of Search:707/100,103,200-204

U.S. Patent Documents
4853843August 1989Ecklund
5047918September 1991Schwartz et al.
5155850October 1992Janis et al.
5159669October 1992Trigg et al.
5206951April 1993Khoyi et al.
5278982January 1994Daniels et al.
5280612January 1994Lorie et al.
5317731May 1994Dias et al.
5317733May 1994Murdock
5347653September 1994Flynn et al.
5357631October 1994Howell et al.
5392390February 1995Crozier
5434994July 1995Shaheen et al.
5440730August 1995Elmasri et al.
5463696October 1995Beernink et al.
5499365March 1996Anderson et al.
5513352April 1996Tozuka
5519606May 1996Frid-Nielsen et al.
5535386July 1996Wang
Foreign Patent Documents
0 425 420 A3May., 1991EP
0 458 495 A2Nov., 1991EP
0 520 924 A3Dec., 1992EP
0 578 204 A2Jan., 1994EP
0 578 209 A2Jan., 1994EP
5020151Jan., 1993JP
Other References
Douglis, Fred; Ousterhout, John, "Log-Structured File Systems", Spring compcon89 of the IEEE Computer Society (Feb. 27-Mar. 3, 1989), pp. 124-129. .
Harris, Jed, "Bento Specification", Revision 0.9, Apple Computer, Inc. (Nov. 4, 1991). .
Harris, Jed, "Bento Specification", Revision 0.95, Apple Computer, Inc. (Nov. 15, 1991). .
Harris, Jed, "Bento Specification", Revision 1.0a1, Apple Computer, Inc. (Dec. 10, 1991). .
Harris, Jed, "Bento Specification", Revision 1.0a2, Apple Computer, Inc. (Feb. 16, 1992). .
Harris, Jed, "Bento Specification", Revision 1.0a3, Apple Computer, Inc. (Feb. 17, 1992). .
Harris, Jed; Ruben, Ira, "Bento Specification", Revision 1.0d4, Apple Computer, Inc. (Aug. 17, 1992). .
Harris, Jed; Ruben, Ira, "Bento Specification", Revision 1.0d4.1, Apple Computer, Inc. (Sep. 11, 1992). .
Herlihy, M., "A Methodology for Implementing Highly Concurrent Data Structures", Assn. for Computing Machinery Symp. on Principles and Practice of Parallel Programming Conference (1990), pp. 197-206. .
Kanner, H., "Projector, An Informal Tutorial", Apple Computer, Inc. (Dec., 1989). .
ON Technology, Inc., "Instant Update, User's Guide" (1991). .
ON Technology, Inc., "Instant Update, Administrator's Guide" (1991). .
Simmel, Sergiu S., "Kala.TM.--Main Concepts", Version 1.00, Samsung Software America, Inc. (1990). .
Simmel, Sergiu S., "Kala--Interface Reference, Part I: Kala Facilities", Kala ed. 2.1, Penobscot Research Center, Inc. (1992). .
D'Andrea, Robert J., et al., "Object-Oriented Programming: Concepts and Languages", Proceedings of the IEEE 1990 National Aerospace and Electronics Conference NAECON 1990 (May 21, 1990) vol. 2, pp. 634-639..~
Primary Examiner: Von Buhr; Maria N.
Attorney, Agent or Firm:Fliesler, Dubb, Meyer & Lovejoy LLP

Parent Case Text



CROSS-REFERENCE TO RELATED APPLICATION

This application is a continuation of Ser. No. 08/177,853, filed Jan. 5, 1994, now abandoned, which is a continuation-in-part of Ser. No. 08/060,809, May 12, 1993, pending and a continuation-in-part of Ser. No. 08/107,449, filed Aug. 16, 1993, now U.S. Pat. No. 5,652,879, issued Jul. 29, 1997.

Claims


We claim:
1. A method for updating information from a first persistently stored information state to a second persistently stored information state, said information being represented in said first information state by a target container containing a plurality of objects, comprising the steps of:
determining container changes sufficient to modify said first information state to said second information state;
committing said second information state by, at least in part, persistently storing, in a structure logically separate from said target container, an update container including objects identifying all of said container changes, such that said first information state remains readable without requiring application of change records relative to said second information state; and
including a table of contents (TOC) in said update container that includes information about every object in the update container.

2. A method according to claim 1, wherein said step of persistently storing comprises the step of persistently storing change instructions in said update container.

3. A method according to claim 1, wherein said container changes include insertion of a new object, and wherein said step of persistently storing comprises the step of persistently storing said new object in said update container.

4. A method according to claim 1, wherein said container changes include deletion of a particular object from said target container, and wherein said step of persistently storing comprises the step of persistently storing in said update container an object identifier persistently associated with said particular object.

5. A method according to claim 1, wherein said container changes include a plurality of deletions of particular objects from said target container, and wherein said step of persistently storing comprises the step of persistently storing in said update container a list of object identifiers persistently associated with said particular objects.

6. A method according to claim 1, wherein said container changes include modification of a particular object in said target container from a first object state being its state in said first information state, to a second object state being its state in said second information state, and wherein said step of persistently storing comprises the step of:
persistently storing in said update container an object identifier persistently associated with said particular object; and
persistently storing in said update container in correspondence with said object identifier, an identification of object changes which would be sufficient to update said particular object from said first object state to said second object state.

7. A method according to claim 6, wherein said particular object in said first object state includes at least one value identifying value data, wherein said object changes include modification of said value data from a first value data state being its state in said first object state, to a second value data state being its state in said second object state, and wherein said step of persistently storing an identification of object changes comprises the step of:
persistently storing in said update container in correspondence with said object identifier, an identification of said value data in said particular object, and an identification of value data changes which would be sufficient to update said value data from said first value data state to said second value data state.

8. A method according to claim 6, wherein each of said objects in said target container includes at least one value header designating value data, and wherein said step of persistently storing an identification of object changes comprises the steps of:
writing to said update container as new value data, change instructions sufficient to modify the value data of said particular object from a first value data state to a second value data state; and
writing to said update container a change instruction calling for said new value data to be inserted into said particular object.

9. A method according to claim 1, wherein said container changes stored in said step of persistently storing include no changes which are in excess of those sufficient to update said information state to said second information state.

10. A method according to claim 1, wherein said target container is stored in a first file, and wherein said step of persistently storing an update container comprises the step of storing said update container in a second file distinct from said first file.

11. A method according to claim 1, wherein said target container is stored in a first file, and wherein said step of persistently storing an update container comprises the step of appending said update container to said first file.

12. A method according to claim 1, wherein said step of determining container changes comprises the steps of:
representing said first information state in non-persistent storage;
manipulating said information in non-persistent storage using at least orne change steps said second information state being defined by the state of said information after said step of manipulating; and subsequently to said step of representing,
recording at least all non-superfluous ones of said change steps,
said container changes beina the change steps recorded in said step of recording.

13. A method according to claim 12, wherein said target container comprises a prior target container and a prior update container, said prior target container representing said information in a prior information state, and wherein said step of representing said first information state in non-persistent storage comprises the steps of:
using said prior target container to represent in non-persistent storage said prior information state; and
manipulating said information in non-persistent storage in response to change steps identified in said prior update container.

14. A method according to claim 1, wherein said step of persistently storing comprises the steps of:
writing change instructions to said update container; and
subsequently writing to said update container an indication that said update container is valid.

15. A method according to claim 14, wherein said change instructions include a first change instruction to be executed, and wherein said step of subsequently writing comprises the step of writing to said update container a label identifying said first change instruction.

16. A method according to claim 1, wherein said step of determining container changes sufficient to modify said first information state to said second information state comprises the steps of:
producing a temporary representation of said information in said first information state in unstable storage in dependence upon said target container;
modifying said temporary representation of said information in said first information state to produce a temporary representation of said information in said second information state in unstable storage;
determining said container changes from said temporary representation of said information in said second information state in unstable storage.

17. A method according to claim 1, further comprising the steps of:
determining second container changes sufficient to modify said first information state to a third information state; and
committing said third information state by, at least in part, persistently storing, in a structure logically separate from said target container and from said update container, a second update container identifying all of said second container changes, such that said first information state remains readable without requiring application of change records relative to said second or third information state, and such that said second information state remains readable without requiring application of change records from said second update container.

18. A method according to claim 1, wherein:
said target container is persistently stored on a first medium; and
said step of persistently storing an update container comprises the step of persistently storing said update container on a second medium different from said first medium.

19. A method according to claim 18, wherein said first medium is a read-only medium, and wherein said second medium is a read/write medium.

20. The method according to claim 1, wherein:
said TOC is maintained logically separate from said container changes; and
all format information about every object stored in said update container is maintained in said TOC.

21. The method according to claim 20, wherein the format of each object stored in said update container is different.

22. A method according to claim 20, wherein said container changes include modification of a particular object in said target container from a first object state being its state in said first information state, to a second object state being its state in said second information state, and wherein said step of persistently storing comprises the step of:
persistently storing in said update container an object identifier persistently associated with said particular object; and
persistently storing in said update container in correspondence with said object identifier, an identification of object changes which would be sufficient to update said particular object from said first object state to said second object state.

23. A method according to claim 20, wherein said step of determining container changes sufficient to modify said first information state to said second information state comprises the steps of:
producing a temporary representation of said information in said first information state in unstable storage in dependence upon said target container;
modifying said temporary representation of said information in said first information state to produce a temporary representation of said information in said second information state in unstable storage;
determining said container changes from said temporary representation of said information in said second information state in unstable storage.

24. A method for preparing information from persistent storage for further processing in temporary storage, said information being represented in said persistent storage in both a first information state and a second information state, comprising the steps of:
accessing a persistently stored target container representing a plurality of objects which identify said information in said first information state;
producing a first temporary representation of said information in said first information state in dependence upon said target container;
accessing a persistently stored update container identifying container changes which are sufficient to modify said information from said first information state to said second information state, including the substeps of,
accessing a table of contents (TOC) containing format information of the container changes, and
utilizing the format information to determine said container changes;
applying said container changes to said first temporary representation of said information to form a temporary representation of said information in said second information state; and
producing a second temporary representation of said information in said first information state in dependence upon said target container, said second temporary representation of said first information state existing simultaneously with said temporary representation of said second information state.

25. A method according to claim 24, wherein said step of accessing a persistently stored target container comprises the steps of:
accessing said persistently stored update container to obtain a reference to said target container; and
accessing said target container in response to said reference.

26. A method according to claim 24, wherein said step of producing a first temporary representation in dependence upon said target container comprises the steps of:
accessing a persistently stored prior target container representing a plurality of objects which identify said information in a prior information state;
producing a prior temporary representation of said information in said prior information state in dependence upon said prior target container;
accessing a persistently stored prior update container identifying prior container changes which are sufficient to update said information from said prior information state to said first information state; and
applying said prior container changes to said prior temporary representation of said information.

27. A method according to claim 24, further comprising the steps of:
modifying said information in temporary storage to produce a third information state for said information; and
persistently storing a second update container identifying container changes which would be sufficient to update said information from said second information state to said third information state.

28. A method according to claim 24, wherein said information is represented in persistent storage also in a third information state, further comprising the steps of:
accessing a persistently stored second update container identifying second container changes which are sufficient to modify said information from said first information state to said third information state; and
applying said second container changes to said second temporary representation of said information to form a temporary representation of said information in said third information state.

29. A method according to claim 28, wherein said temporary representation of said third information state exists simultaneously with said temporary representation of said second information state.

30. A method according to claim 24, wherein:
said step of accessing a persistently stored target container comprises the step of accessing said target container from a first medium; and
said step of accessing a persistently stored update container comprises the step of accessing said update container from a second medium different from said first medium.

31. A method according to claim 30, wherein said first medium is a read-only medium and wherein said second medium is a read/write medium.

32. A method for manipulating information, said information being represented in a first information state by a persistently stored target container containing a plurality of objects, comprising the steps of:
determining container changes sufficient to modify said first information state to a second information state;
committing said second information state by, at least in part, persistently storing, in a structure logically separate from said target container, an update container including objects identifying said container changes such that said first information state remains readable without requiring application of change records relative to said second information state;
including a table of contents (TOC) in the update container that includes information about every object in the update container;
subsequently accessing said target container;
producing a temporary representation of said information in said first information state in dependence upon said subsequently accessed target container;
accessing said update container; and
applying said container changes persistently stored in said update container to said temporary representation of said information.

33. A method according to claim 32, further comprising the steps of:
manipulating said temporary representation of said information using at least one subsequent change step; and
persistently storing a second update container identifying said at least one subsequent change step.

34. A storage medium carrying a collection of software procedures for updating information from a first persistently stored information state to a second persistently stored information state, said information being represented in said first information state by a target container containing a plurality of objects, said procedures including:
first computer instructions which determine container changes sufficient to modify said first information state to said second information state;
second computer instructions which commit said second information state by, at least in part, persistently store, in a structure logically separate from said target container, an update container including objects identifying said container changes, such that said first information state remains readable without requiring application of change records relative to said second information state; and
including a table of contents (TOC) in said update container that includes information about every object in the update container.

35. A storage medium according to claim 34, wherein said second computer instructions comprise computer instructions which persistently store change instructions in said update container.

36. A storage medium according to claim 34, wherein said container changes include insertion of a new object, and wherein said second computer instructions comprise computer instructions which persistently store said new object in said update container.

37. A storage medium according to claim 34, wherein said container changes include deletion of a particular object from said target container, and wherein said second computer instructions comprise computer instructions which persistently store in said update container an object identifier persistently associated with said particular object.

38. A storage medium according to claim 34, wherein said container changes include a plurality of deletions of particular objects from said target container, and wherein said second computer instructions comprise computer instructions which persistently store in said update container a list of object identifiers persistently associated with said particular objects.

39. A storage medium according to claim 34, wherein said container changes include modification of a particular object in said target container from a first object state being its state in said first information state, to a second object state being its state in said second information state, and wherein said second computer instructions comprise:
third computer instructions which persistently store in said update container an object identifier persistently associated with said particular object; and
fourth computer instructions which persistently store in said update container in correspondence with said object identifier, an identification of object changes which would be sufficient to update said particular object from said first object state to said second object state.

40. A storage medium according to claim 39, wherein said particular object in said first object state includes at least one value identifying value data, wherein said object changes include modification of said value data from a first value data state being its state in said first object state, to a second value data state being its state in said second object state, and wherein said fourth computer instructions comprise computer instructions which persistently store in said update container in correspondence with said object identifier, an identification of said value data in said particular object, and an identification of value data changes which would be sufficient to update said value data from said first value data state to said second value data state.

41. A storage medium according to claim 39, wherein each of said objects in said target container includes at least one value header designating value data, and wherein said fourth computer instructions comprise:
computer instructions which write to said update container as new value data, change instructions sufficient to modify the value data of said particular object from a first value data state to a second value data state; and
computer instructions which write to said update container a change instruction calling for said new value data to be inserted into said particular object.

42. A storage medium according to claim 34, wherein said container changes stored by said second computer instructions include no changes which are in excess of those sufficient to update said information state to said second information state.

43. A storage medium according to claim 34, wherein said target container is stored in a first file, and wherein said second computer instructions comprise computer instructions which store said update container in a second file distinct from said first file.

44. A storage medium according to claim 34, wherein said target container is stored in a first file, and wherein said second computer instructions comprise computer instructions which append said update container to said first file.

45. A storage medium according to claim 34, wherein said first computer instructions comprise:
third computer instructions which represent said first information state in non-persistent storage;
fourth computer instructions which manipulate said information in non-persistent storage using at least one change step, said second information state being defined by the state of said information after execution of said sixth computer instructions; and
fifth computer instructions which record at least all non-superfluous ones of said change steps, execution of said third computer instructions preceding execution of said fifth computer instructions,
said container changes being the change steps recorded by said fifth computer instructions.

46. A storage medium according to claim 24, wherein said target container comprises a prior target container and a prior update container, said prior target container representing said information in a prior information state, and wherein said third computer instructions comprise:
computer instructions which use said prior target container to represent in non-persistent storage said prior information state; and
computer instructions which manipulate said information in non-persistent storage in response to change steps identified in said prior update container.

47. A storage mediunm according to claim 34, wherein said second computer instructions comprise:
third computer instructions which write change instructions to said update container; and
fourth computer instructions which, after execution of said third computer instructions, write to said update container an indication that said update container is valid.

48. A storage medium according to claim 47, wherein said change instructions include a first change instruction to be executed, and wherein said fourth computer instructions comprise computer instructions which write to said update container a label identifying said first change instruction.

49. A new storage medium according to claim 34, wherein said first computer instructions include:
computer instructions which produce a temporary representation of said information in said first information state in unstable storage in dependence upon said target container;
computer instructions which modify said temporary representation of said information in said first information state to produce a temporary representation of said information in said second information state in unstable storage; and
computer instruction which determine said container changes from said temporay representation of said information in said second information state in unstable storage.

50. A storage medium according to claim 34, further comprising:
computer instructions which determine second container changes sufficient to modify said first information state to a third information state; and
computer instructions which commit said third information state by, at least in part, persistently storing, in a structure logically separate from said target container and from said update container, a second update container identifying all of said second container changes, such that said first information state remains readable without requiring application of change records relative to said second or third information state, and such that said second information state remains readable without requiring application of change records from said second update container.

51. A storage medium according to claim 34, wherein:
said target container is persistently stored on a first medium; and
said second computer instructions include computer instructions which persistently store said update container on a second medium different from said first medium.

52. A storage medium according to claim 51, wherein said first medium is a read-only medium, and wherein said second medium is a read/write medium.

53. A storage medium carrying a collection of software procedures for retrieving information from persistent storage for further processing in temporary storage, comprising:
first computer instructions which access a persistently stored target container representing a plurality of objects which identify said information in a first information state;
second computer instructions which produce a first temporary representation of said information in said first information state in dependence upon said target container;
third computer instructions which access a persistently stored update container identifying container changes which are sufficient to modify said information from said first information state to a second information state, which include instructions for,
accessing a table of contents (TOC) containing format information of the container changes, and
utilizing the format information to determine said container changes;
fourth computer instructions which apply said container changes to said first temporary representation of said information; and
computer instructions which produce a second temporary representation of said information in said first information state in dependence upon said target container, said second temporary representation of said first information state existing simultaneously with said temporary representation of said second information state.

54. A storage medium according to claim 53, wherein said first computer instructions comprise:
computer instructions which access said persistently stored update container to obtain a reference to said target container; and
computer instructions which access said target container in response to said reference.

55. A storage medium according to claim 53, wherein said second computer instructions comprise:
computer instructions which access a persistently stored prior target container representing a plurality of objects which identify said information in a prior information state;
computer instructions which produce a prior temporary representation of said information in said prior information state in dependence upon said prior target container;
computer instructions which access a persistently stored prior update container identifying prior container changes which are sufficient to update said information from said prior information state to said first information state; and
computer instructions which apply said prior container changes to said prior temporary representation of said information.

56. A storage medium according to claim 53, further comprising:
computer instructions which modify said information in temporary storage to produce a third information state for said information; and
computer instructions which persistently store a second update container identifying container changes which would be sufficient to update said information from said second information state to said third information state.

57. A storage medium according to claim 53, wherein said information is represented in persistent storage also in a third information state, further comprising:
computer instructions which access a persistently stored second update container identifying second container changes which are sufficient to modify said information from said first information state to said third information state; and
computer instructions which apply said second container changes to said second temporary representation of said information to form a temporary representation of said information in said third information state.

58. A storage medium according to claim 57, wherein said temporary representation of said third information state exists simultaneously with said temporary representation of said second information state.

59. A storage medium according to claim 53, wherein said persistently stored target container is stored on a first medium, and wherein said persistently stored update container is stored on a second medium different from said first medium.

60. A storage medium according to claim 59, wherein said first medium is a read-only medium and wherein said second medium is a read/write medium.

61. A storage medium carrying a collection of software procedures for manipulating information, said information being represented in a first information state by a persistently stored target container containing a plurality of objects, comprising:
first computer instructions which determine container changes sufficient to modify said first information state to a second information state;
second computer instructions which commit said second information state by, at least in part, persistently storing, in a structure logically separate from said target container, an update container identifying said container changes such that said first information state remains readable without requiring application of change records relative to said second information state, and a table of contents (TOC) in said update container that includes information about every object in the update container;
third computer instructions which, after execution of said second computer instructions, access said target container;
fourth computer instructions which produce a temporary representation of said information in said first information state in dependence upon said subsequently accessed target container;
fifth computer instructions which access said update container; and
sixth computer instructions which apply said container changes persistently stored in said update container to said temporary representation of said information.

62. A storage medium according to claim 61, further comprising:
computer instructions which manipulate said temporary representation of said information using at least one subsequent change step; and
computer instructions which persistenly store a second update container identifying said at least one subsequent change step.

63. A computer implemented persistent storage having:
a first chain of at least two sequentially associated containers including a top container and a bottom container, wherein,
each of said containers in said chain except said top container being a prior container to a respective next container in said chain,
each of said containers in said chain except said bottom container containing a reference to the respective prior container in said chain and each of said containers representing information in a respective state, and
said bottom container having a first information state stored therein, and
at least one of said containers representing said information as a plurality of objects, including a table of contents (TOC) that includes information about every object in the container, each given one of said containers except said bottom container identifying, container changes sufficient to update said information from the state represented by the prior container to the state represented by the given container.

64. Storage according to claim 63, further comprising a branch chain of at least two sequentially associated branch containers, said branch chain including a top branch container not being in said first chain and a bottom branch container being one of said containers in said first chain, each of said containers in said branch chain except said bottom branch container containing a reference to a respective prior container in said branch chain.

65. Storage according to claim 63, wherein at least two of said containers are stored in different files.

66. Storage according to claim 65, wherein said different files are stored on different media.

67. Storage according to claim 63, wherein at least two sequentially associated ones of said containers are stored in a single file.

68. Storage according to claim 63, wherein said bottom container represents said information as objects, and wherein said container changes in said given container are identified at least in part by a change instruction identifying a change selected from the group consisting of deletion of a specified object and modification of a specified object.

69. Storage according to claim 68, wherein each of said objects has associated therewith at least one property, and each property has associated therewith at least one value, and wherein said change instruction identifies a change which comprises modification of a specified object,
wherein said modification is selected from the group consisting of:
(a) deletion of a specified property of said specified object, and all values associated therewith;
(b) insertion of a specified new value in association with a specified property of said specified object; and
(c) a combined insertion of a specified new property in association with said specified object and insertion of a specified new value in association with said new property.

70. Storage according to claim 69, wherein said modification includes said combined insertion modification, and wherein said specified new value includes at least one value change instruction identifying a modification to be made to said at least one value of said specified object.

Description

BACKGROUND

Application programs in a computer system typically need to manage data in a manner that permits frequent updating. Two broad examples of such application programs are a word processor and a database manager. Word processors need to be able to manipulate sections of text and other related information each time the user modifies a document, and a database program needs to insert, delete and modify entries in accordance with a user's requirements. Updating is an issue for computer manufacturers as well, for example to permit upgrades to operating system routines, including those which are provided in ROM.

The above two related patent applications set forth a variety of issues that often face software application developers. For example, as set forth in more detail in the above-mentioned STORAGE MANAGER FOR COMPUTER SYSTEM patent application, there is a trade-off between storage space and speed of execution. Reduction of the amount of wasted space in a database, for example, often detrimentally impacts the speed with which certain operations are performed, such as searching. Also as set forth in the STORAGE MANAGER FOR COMPUTER SYSTEM, for many types of application programs, the file structure offered by the operating system is not appropriate to the task. Since the smallest unit of information supported by the operating system typically is a file, and since file manipulation operating system calls are slow and inefficient for small pieces of data, many application programs tend to maintain their data in a proprietary format in only one or a few files each containing many small data items. The result is extensive duplication of effort to define and maintain such proprietary formats, efforts that could otherwise be directed toward enhanced functionality.

Also as set forth in the STORAGE MANAGER FOR COMPUTER SYSTEM patent application, software developers often face issues when data is stored in different parts of a data storage apparatus which have different protocols for access. There is a need in the industry to simplify the implementation of application programs by providing a common mechanism by which the application developer can access data regardless of how or where it is stored in the computer system's storage apparatus.

Many application program developers also face yet another issue if the data maintained by the program is intended to be accessible, and modifiable, by more than one user. As used herein, persistent storage of information refers to information which remains after an application program which references or creates it, terminates. Persistent storage is often nonvolatile in that it also survives shutdown of the computer system, but in some situations can be partially or completely volatile. In a word processor, it is often desirable to support the ability of two or more different users to update a single document at the same time. In a database system, it is often desirable to permit different users to update the database data concurrently. Most application programs implement a technique known as "pessimistic concurrency" which, while permitting many users to read and view the data concurrently, permits only one user to modify the data at a time. The system "locks out" all other users from write accesses when one user has the data open for updating.

Pessimistic concurrency can be implemented at a file level or, in sophisticated database programs for example, at a record level. That is, for file level locking, only one user may have the file open at a time for writing. This is the typical manner with which word processors implement concurrency. A database program can implement record level locking if, for example, a backend process is the only process which has the data file open for writing, and all other users issue their commands and queries through the backend process.

Some database programs have implemented "optimistic concurrency", in which two or more users can update data at the same time. To the extent the concurrent updates conflict, only one can successfully be committed. Optimistic concurrency is different from pessimistic concurrency in that users are permitted to make updates concurrently, subject to subsequent detection of conflicts and resulting inability to commit the updates.

These update techniques produce serializability of updates--the characteristic that for a set of committed updates to the data, at least one sequence exists in which those specific updates could have been performed to achieve the resulting state of the information. Optimistic concurrency permits increased performance in some circumstances, but still produces serializable committed updates.

A few programs implement an update model known as "Iversioning," which does challenge the requirement of serializable updates. In a versioning mechanism, several concurrent updaters can each have an independent yet internally consistent view of the information. The views are known as "configurations". The different updaters modify the information concurrently, and can write their updated configurations to persistent storage, all subject to subsequent reconciliation. One example of a program implementing a versioning update model is the Macintosh.RTM. Programming Workshop (MPW) Projector available from Apple Computer, Inc., Cupertino, Calif. MPW Projector is described in the MPW 3.1 Reference Manual, and in H. Kanner, "Projector, An Informal Tutorial", available from Apple Computer, Inc. (1989), incorporated herein by reference.

While MPW Projector is a good first step toward reducing the constraints imposed by strict serializability, significant additional flexibility is highly desirable. For example, Projector's finest level of granularity is still represented by a "file". It would be desirable to support much finer degrees of granularity. As another example, MPW Projector's provisions for reconciling two conflicting versions of a document is limited to a single procedure in which the computer identifies strict text differences, and a user indicates how each text difference should be resolved. Significant additional intelligence will be desirable in the comparison procedure, as would significant increased flexibility and automation in the resolution of conflicts, as well as support for comparisons between non-text information. Accordingly, there is a need for much greater flexibility in the support of non-serialized updates in the maintenance of data.

Increasingly, documents and other collections of stored information are made up of multiple content elements, such as text, tables, images, formatting information, mathematical equations and graphs. Often content is created using one application program and then included in documents created by other applications. Subsequently, content elements may be copied out of a document and used in yet another document, and so on.

In the past, different applications typically had no way to exchange multiple content elements, unless they had a "private contract" about the format to be used. Furthermore, one application typically had no way to find the content elements in another application's document, so typically it was not able to obtain content elements from the other application's documents even if it knew the format. Moreover, every application developer who wanted to store multiple content elements in a document typically had to develop a proprietary object storage mechanism.

The use of multiple content elements in a document implicates at least two difficult issues: where each element is located and what the format of the data is. Regarding the first of these issues, it would be desirable if the data in a particular element could be stored in memory, in a local persistent storage device, across the network, or even created dynamically, all in a manner which is transparent to the application program which is operating on element. In this way the limited resources available to application program developers can be directed toward enhancement of functionality rather than dealing with multiple types of storage devices.

Similarly, with regard to the second issue, it would be desirable if each different content element could have stored in association with it all of the routines which are needed to manipulate it, again, transparently to the application program. This, too, would free up developers' resources for more useful purposes.

In a general way, an individual developer might obtain some of the transparency described above by programming the application using an object-oriented programming language such as C++. Object-oriented rogramming is described in many references, including, for example, G. Booch, "Object-Oriented Design With Applications" (Benjamin/Cummings Publishing Company: 1991), incorporated herein by reference. While these languages can be used to address the problems described above for handling multiple content elements, it is not clear how that can be done. Certainly the languages themselves do not provide guidance on how they can be used for such purposes. For example, the inheritance mechanism in C++ is a compile-time mechanism.

The issues that software developers face regarding update techniques are worthy of additional attention. In particular, it would be desirable to address the following more specific updating issues. First, in the situation where an operating system (or other program) is provided at one time and an update is distributed subsequently, the update mechanism needs to have a way of locating specific portions of the base version to be changed. The problem is referred to herein as patching. One way this was handled in the past was to provide a series of patches, each of which identify a location in the base version, specify something about its expected existing contents (to improve confidence that the update is correct), and one or more operations to be performed at that location. Patches are very fragile and error prone, however. They are usually created by hand, thus requiring a person to generate them who knows the base operating system in extensive detail, often at the binary level. Patches will also fail, or even corrupt the user's system, if some aspect of the user's system configuration or prior update history was not taken into account when the patches were created. Patching can be performed also where the base version is provided in read-only memory, provided the system accesses such information using a level of indirection which is changeable. But this can be even more complicated than direct patches, and often creates additional problems of its own.

Accordingly, there is a need for an update mechanism which does not rely on the physical location of the base information which is to be updated.

Second, as mentioned above, it is often desirable to support the ability of two or more different users to update information at the same time. Most application programs implement pessimistic concurrency, while only a few implement optimistic concurrency. Still fewer implement versioning. As mentioned above, MPW projector is a good first step toward implementing versioning. MPW Projector is an integrated set of tools and scripts whose primary purpose is to maintain control of the development of source code. It preserves in an orderly manner the various revisions of a file, and through the versioning mechanism also prevents one programmer from inadvertently destroying changes made by another. If the underlying data is text, data compression is achieved by storing only one complete copy of a file and storing revisions only as files of differences. Different users of the same set of files can view them differently since each user is given independent control of the mapping between the user's local directory hierarchy, in which the user keeps the files, and the hierarchy used for their storage in the main Projector database. Projector also has a facility for associating a specific set of file revisions with a name, this name being usable as a designator for a particular version, or release, of a product. Thus the name alone can be used to trigger the selection of just those source files that are required to build the desired instance of the product.

MPW Projector maintains versions in a tree structure. When one user desires to modify a file in the main Projector database, the user "checks out" the file, thereby making a copy of the file in the user's own directory. The user can check out a file either as "read-only" or, if no one else has already done so, as "read/write". After modifying the file, the user can then "check in" the file back to the main Projector database, either as a new version in a new branch of the file version tree, or, only if the file was check out as read/write as a new version in the same branch of the version tree. When it is finally desirable to merge a branch of the revision tree back into the main trunk, MPW Projector performs a strict text-based comparison between the two versions of the file and displays the differences in a pair of windows on the computer system display. A user then cuts-and-pastes portions from one window into the other in order to merge them together.

As can be seen MPW projector's technique for reconciling different updates of a base version is useful only when the underlying data is text, and operates only by comparison of the two updates to be reconciled. No record is kept of the individual changes that were made from the base version to each of the updated versions. The reconciliation process for an implementation of version merging can be made significantly more intelligent if such a record was kept.

Third, inherent in the desirability to support multiple concurrent updaters, each updater should be provided with an independent "view" of the information. That is, if a work includes a plurality of modules, each updater should have his "current" view of the information include all of the modules which have not been changed in his or her current version, plus only his or her current version of the modules which have been changed. MPW projector accomplishes this, but only at the file level. Much finer granularity would be desirable. For example, if the information is a document, it would be desirable for the level of granularity to be as small as a paragraph, or even a sentence. If the information is in a database format, it would be desirable for the level of granularity to be possibly as small as a record or less.

Accordingly, it is desirable to have an update mechanism which supports fine degrees of granularity.

Fourth, when updates are made, the update mechanism should perform them atomically. That is, while the storage mechanism should be able to maintain partial updates in nonvolatile storage, so as to reduce memory requirements, the base document should always be recoverable in case of a system or power failure. Most application programs, such as word processors, open a temporary file in nonvolatile memory to store the partially edited version. Alternatively, in virtual memory machines, the partially edited version can remain in "virtual memory". When the user is ready to "commit" the updates, the application program copies any unedited portions of the base information into the temporary file so that the temporary file is complete, then renames the old base file to a second temporary name, then renames the first temporary file to the name of the prior name of the base file, and then deletes the old file. Thus there is never a time when a complete, internally consistent version of the information, does not exist in nonvolatile storage.

This technique becomes problematical, however, when the information file is extremely large. In particular, it can be seen that the technique requires twice as much available space in the nonvolatile storage medium as the ultimate file requires. In addition, the extensive amount of copying required by the mechanism can severely degrade performance.

Another conventional technique for handling atomic updates is sometimes referred to as the "shadow page technique". In this technique, the base file is divided into pages, and an index to the current version of the pages is maintained. Updates are managed atomically at the page level rather than the file level, and are accomplished by writing the new version of a page, then updating the index to point to the new version rather than the old version of that page, and then deleting the old version of the page. In some implementations, the index itself may also be shadowed. The set of old pages, identified by the old index, and the set of new pages identified by the new index, completely describe the base and updated states of the information, respectively.

The shadow page technique avoids the large file problem mentioned above, but can still be inadequate in many situations. For example, the minimum granularity of a page may still be too coarse. Additionally, like other update mechanisms described above, reconciliation of two or more concurrently created updates is difficult in the shadow page technique in part because no record is kept of the changes which were made from the base version to each update version to be reconciled.

The shadow page technique has yet another problem, which arises from the fact that different pages may end up at different places in the file. The indexing mechanism permits random placement of these pages. A consecutive read of the information in the file therefore can cause extensive jumping around inside the file, thereby degrading performance.

Yet another conventional update technique can combine shadow pages with the use of a log file. As changes are made in a temporary version of the information, for example in memory, the individual changes are recorded in a log which is written out to nonvolatile storage. Periodically, or at userspecified times, the current state of the information as represented in memory is written to persistent storage and a marker is written to the log file to indicate that persistent storage is consistent up to that point in the log file. The problem of randomly located pages is minimized in this technique since speed optimizations such as elevator algorithms can be used during the write.

There are many variations of log-enhanced update techniques, including some in which the change log identifies base pages logically rather than physically. An extreme example of log-enhanced updating techniques is set forth in Douglis and Osterhout, "Log Structured File Systems," Compcon '89 (1989), incorporated herein by reference. The Osterhout paper describes a log-structured file system which maintains only the change log in persistent storage. No complete set of the information exists in persistent storage. Rather, it is always merely reconstructed in memory, to the extent needed, by traversing the log file. Persistent storage also maintains a pointer to the last known valid change specified in the log file, and atomicity of updates is accomplished by redirecting that pointer to point to a subsequent position in the log file at the time of each "commit".

Neither the log-enhanced techniques nor the exclusively log-based technique, however, supports any concept of more than one valid consistent state of the information, and therefore does not support nonserialized concurrency well. They also fail to adequately handle situations like the operating system patch example set forth above, and do not permit different users to have independent views of the current state of the information being updated.

Accordingly, it is desirable to provide an update mechanism which can operate at fine granularity, support independent views of the information by multiple updaters, facilitate reconciliation of concurrent updates, and maintain atomicity of updates without requiring large amounts of free disk space or substantially degrading performance. Moreover, it would be desirable if the same update mechanism could be used for all different types of information, including operating systems, databases, documents and so on. Still further, it would be desirable for the update mechanism to be integrated with other container manager mechanisms intended to reduce storage space without degrading performance, handle fine granularity units of information, support a wide variety of different types of content elements, and unify the methods by which the application programs access different kinds of storage media.

SUMMARY OF THE INVENTION

The above-incorporated patent applications described object-oriented techniques for managing information in containers.

According to the invention, roughly described, methods and data structures are defined which permit information to be stored as objects in target containers and update containers. A target container defines a first state of the information, and the update container, which can point to the target container, identifies changes to the information in the first state which would be sufficient to update the first information state to a second information state. Update containers may be nested to any depth. When an application program opens an update container, the procedure searches down the chain until it finds the ultimate target container. In one embodiment, the procedure then creates in-memory structures for providing access to the objects and value data represented in such container. The procedure then works its way back up the chain, performing the changes on the in-memory structure, which are called for in each of the update containers. New modifications made after this process is complete, are recorded, and when committed, are written out into a new update container that the application program originally opened.

The changes which are identified in an update container, if they represent modifications to an object in an underlying container, refer to that object logically rather than physically. For example, they include a reference to a persistent object identifier which is unique within the underlying contianer and its own base containers. Thus certain changes are allowed on existing closed containers, such as rearranging information or deleting superfluous information, as long as the logical view of the closed container remains the same.

The update mechanism described herein supports multiple concurrent (parallel) updaters, since more than one update container can refer to the same target container. Thus each updater has an independent view of the information being updated. Also, the mechanism facilitates reconciliation of concurrent updates since it maintains a record of the changes that were made.

When an update container is committed to persistent storage, the application program can have it appended either to the end of the target container or can have it stored as a separate file. If it is stored as a separate file, then its reference to the target container can be stored as a dynamic value, thereby affording to the update mechanism all the flexibility of dynamic values described in the above mentioned DYNAMIC VALUE MECHANISM FOR COMPUTER STORAGE CONTAINER MANAGER patent application.

In the DYNAMIC VALUE MECHANISM FOR COMPUTER STORAGE CONTAINER MANAGER patent application, roughly described, a set of procedures are defined which permit substantially arbitrary composability of chains of handlers. The procedures follow rules which render them independent of the "type" of the value for which they are called, as viewed by the caller. Thus application programs can be written at only a high level of functionality, without needing to be concerned with the differences in the way different types of values need to be handled.

The procedures determine which handler to call to perform a given operation in dependence upon the type of the object for which the procedure was invoked. The handlers, too, are relatively easy to write because like the application program, the rules permit the handle rs to call the very s ame set of procedures (recursively if the very same procedure is called) as are available to the application program. Thus like the application program, handlers too can be written without knowledge of any characteristics of the object for which they are invoked other than the charact eristics defining the type for which the handler is specifically written. For example, a read handler for a type which defines a data compression/decompression algorithm need never know where the data is physically located since it merely calls the predefined read value procedure to obtain the data to decompress.

Types can be defined in a tree structure . This further simplifies the writing of handlers since the different characteristics of a type can be divided into many small components, each defined by a different sub-type on the tree. Thus each handler can be written to accomplish only a limited objective (for example an I/O redirection o r a data transformation). the predefined procedures automatically follow the chain of handlers defined by a type tree, by knowing where in the chain a given caller of the procedure is. Neither the application program nor the handlers themselves need keep track of this informa tion.

Additionally, the predefined procedures make no assumptions about the types in a type tree. An application developer can define novel types as required by dividing them into subtypes (if desired) and writing handlers for each subtype. As mentioned, the complexity of the handlers depend only on the complexity of the transformation or redirection which they are to individually perform, not on the complexity of either the type tree or the procedures which implement the present invention. So long as the handlers follow certain rules of good behavior, the predefined procedures will be able to follow any such user-defined type tree. Additional procedures are provided for associating the individual handlers to their corresponding types (subtypes), and for building the type trees themselves.

To implement the above procedures, computer apparatus stores a subject value and a chain of sequentially associated value handlers for the subject value. The chain includes a top value handler and a bottom value handler, each of the value handlers in the chain except the bottom value handler invoking the respective next value handler when invoked, the bottom value handler performing an operation on the subject value when invoked. The value operations can be data read operations, data write operations, etc., and the value handlers in the chain can perform data transformations and/or data redirections, transparently to its caller.

The dynamic value chain is not stored in persistent storage; rather it is created when an application program desires to perform a value operation on the subject value. The subject value has a type associated with it which determines the value handlers to be placed in the chain. The chain can have more than one value handler in it for a given value operation if the type associated with the subject value is made up of a hierarchy of sub-types.

Other aspects of the invention will become apparent from the Drawings, Detailed Description and Claims appended hereto.

BRIEF DESCRIPTION OF THE DRAWINGS

The invention will be described with respect to particular embodiments thereof, and reference will be made to the drawings, in which:

FIGS. 1 and 2 illustrate type trees;

FIG. 3 is a block diagram of a hardware computer system platform;

FIG. 4 is an overall block diagram of major data structures which are created in main memory of the computer system of FIG. 3 during the pendency of a session;

FIG. 5 illustrates the structure of in-memory objects which are created by dynamic value mechanism;

FIG. 6 illustrates the same structure as FIG. 5 using a simplified notation;

FIG. 7 illustrates a type object using the notation of FIG. 6;

FIG. 8 is a flowchart of a CMReadValueData( ) routine;

FIGS. 9, 10 and 11 are representations of information stored in a table of contents in persistent storage;

FIGS. 12 and 22 illustrate the structure of in-memory objects which are created to support an update mechanism;

FIG. 13 is a state transition diagram describing a finite state machine;

FIG. 14 is a table defining actions taken and the next state for the finite state machine of FIG. 13;

FIG. 15 illustrates a logical view of certain value data;

FIG. 16 illustrates a physical view of the value data illustrated in FIG. 15;

FIGS. 17, 18 and 19 illustrate ways in which value data may be modified;

FIG. 20 illustrates major structures of an update container in persistent storage; and

FIG. 21 describes the structure of value updating instructions.

DETAILED DESCRIPTION

The embodiment described herein takes the form of a Container Manager and its associated data structures which can be used by developers of a wide variety of types of application programs. The Container Manager includes a number of C language type definitions and a number of procedures for implementing the functionality provided by the Container Manager. Together they provide a common application program interface (API) for the different types of application programs.

The structures are described first with respect to their logical organization and subsequently their physical organization in the storage apparatus managed by the container manager. That is, they will be described first with respect to the view which the container manager software provides to an application programmer via the API, and subsequently with respect to the way that logical organization is actually implemented in the present embodiment. While many of the advantages of the present invention derive from the logical organization, it will be apparent that such logical organization implies certain physical structures which are required to maintain the metaphor as viewed by the application developer. The physical organization described hereinafter includes many inventive aspects of the invention, but it is by no means the only physical structure which can support the logical organization presented to the application developer.

TABLE OF CONTENTS

I. GENERAL OVERVIEW

A. Overview of Container Manager Entities

B. Overview of Types and Dynamic Values

C. Format Overview

D. Format Definition

E. Format Usage Examples

II. IMPLEMENTATION

A. Hardware

B. In-Memory Data Structures

C. Routines

1. Session Operations

2. Container Operations

3. Object Operations

4. Type and Property Operations

5. Value Operations

III. DYNAMIC VALUE HANDLERS

A. Sample Session Flow

B. Sample Value Handlers

IV. UPDATING

A. Updating Algorithm In-Memory Data Structures

B. Criteria for Touching

C. Close Time Processing

D. Updating Container Layout in Persistent Storage

E. Updating Instructions

F. Updating Instruction Handlers

G. Open Time Processing

H. Open Time Processing for Multilayered Updates

I. Other Routines, in the following non-printed appendices:

APPENDIX A--CMContainerops.c

APPENDIX B--Update.h

APPENDIX C--Update.c

APPENDIX D--TOCIO.h

APPENDIX E--TOCIO.c

APPENDIX F--TargetContainerHandlers.h

APPENDIX G--TargetContainerHandlers.c

I. GENERAL OVERVIEW

In the present embodiment, an object is a collection of data that "hangs together" and that can be referenced by other data. Objects can be simple or complex, small (a few bytes) or large (up to 26.sup.64 bytes). Compared with objects in languages such as C++, objects of the Container Manager are typically larger and more complex, because they represent user meaningful content elements, rather than the atoms and molecules used to build this content. For example, a sequence of bytes of data would not by itself be an object, because we can only understand the bytes if we know how they will be used. A paragraph, an image, etc. can be an object if it contains enough information so that we know how to interpret it. Typically an object contains information about what kind of object it is, and some data, which provides the content of the object. In this description, the information "about" the object is called metadata, and the content of the object is called its value.

The Container Manager groups objects in an object container, which is some form of data storage or transmission (such as a file, a piece of RAM, or an inter-application message) that is used to hold one or more objects (both their metadata and their values). These containers are defined by a set of rules for storing multiple objects in a such a container, so that software that understands the rules can find the objects, figure out what kind of objects they are, and use them correctly. The rules accommodate a wide variety of different kinds of objects, different ways that applications want to use objects, and system considerations about how data can be stored.

The Container Manager provides a container definition that can conveniently, efficiently, and reliably hold all the different kinds of objects that users and applications want to group together, store, and exchange. The Container Manager does not define how any given object is structured internally (within its value) so as not to limit the formats which an application developer may want to define. Objects stored in a container can have proprietary or standard formats, they can be designed to use the Container Manager mechanisms or they can be completely ignorant of the existence of the Container Manager.

A. Overview of Container Manager Entities

The Container Manager manipulates and stores data using primary and secondary entities. The primary entities used by the Container Manager are containers, objects, properties, values, and types.

Containers. Every object is in some container. An object consists of a set of properties. The properties are not in any particular order. Each property consists of a set of values with distinct types. The values are not in any particular order. Every object must have at least one property, and that property must have at least one value. Each value consists of a variable length sequence of bytes.

The Container Manager knows very little about a container beyond the objects in it. However, the container always contains a distinguished object, and applications can add arbitrary properties to that object, so applications can specify further information about the container if they wish.

Containers are often files, but they can also be many other forms of storage. For example, in various applications developers already support the following types of containers: blocks of memory, the clipboard, network messages, and Container Manager values. Undoubtedly other types of containers will be useful as well.

Objects. Each Container Manager object has a persistent ID which is unique within its container. Other than that, objects don't really exist independent of their properties. An object contains no information beyond what is stored in its properties.

Properties. A property defines a role for a value. Properties are like field names in a record or struct, with two differences. First, properties can be added freely to an object, so an application should never assume an object only has the properties it knows about. Second, property names are globally unique, so that they can never collide when various different applications add properties to the same object. This also means that the same property name always means the same thing, no matter what object it is in. Properties are distinct from types, just as field names are distinct from the data type of the field.

For example, different properties of an object might indicate the name of an object, the author of the object, a comment, a copyright notice, and so on. These different properties could all have values of the same type: string.

Conversely, a property indicating the date created might have a string, Julian day, or OSI standard date representation. These different formats would not be indicated by the property, but by the type (see below).

Values. Values are where the data is actually stored. In terms of physical location, this data might actually be stored anywhere in a container. In fact, it can be broken up into any number of separate pieces, and the pieces can be stored anywhere. (See the discussion of value segments below.)

Each value may range in size from 0 bytes to 2.sup.64 bytes, although that range can differ in a different embodiment. The overhead per value varies depending on the circumstances. For an object with a single value, the typical overhead will be
21 bytes. For a small value which is one of several values associated with a property, the overhead can be as low as five bytes.

Types. The type of a value describes the format of that value. Types record the structure of a value, whether it is compressed, what its byte ordering is, and so on. The Container Manager provides an open-ended mechanism, so that types can be extended to include whatever metadata is required.

To continue the example above, the type of a string value could indicate the alphabet, whether it was null terminated, and possibly other information (such as the intended language). It might also indicate that the string was stored in a compressed form, and could indicate the compression technique, and the dictionary if one was required. If the string used multi-byte characters, and the byte-ordering was not defined by the alphabet, the type could indicate the byte-ordering within the characters.

The Container Manager defines an inheritance mechanism to make building complex types like this efficient. The structure of types is tied into the mechanism for accessing values, so that the type associated with a value causes the appropriate code to be invoked to access the value, decompress it, byte-swap it, and so on. The specific mechanism for doing this is referred to herein as Dynamic Values.

Secondary Entities. In addition to the primary entities manipulated by the Container Manager, there are several additional entities that play supporting roles in the Container Manager design. These entities are important to fully understand how The Container Manager works, but they do not significantly change the picture given above.

Type and property descriptions. Each property associated with a value is actually a reference to a property description. Similarly, the type of a value is actually a reference to a type description. These type and property descriptions are objects, and their IDs are drawn from the same name-space as other object IDs.

Many type and property descriptions will simply consist of the globally unique name of the type or property. To continue the example above further, the type of a string of 7-bit ASCII, not compressed or otherwise transformed, would simply be described by a globally unique name. This would allow applications to recognize the type.

References to type and property descriptions are distinct from references to ordinary objects in the API to allow language type checking to catch errors in the manipulation of type and property references. However, type and property references can still be passed to the Container Manager routines which manipulate user-defined objects and values, so that value manipulation can be done on types and properties in the same manner as it can be done on user-defined objects.

Globally unique names. Globally unique names are public or private identifiers in a format defined by the ISO 9070 Public Text Identifier standard. They are simply strings written in a subset of 7 bit ASCII. They begin with a name that is assigned by a naming authority designated by ISO (companies can easily register as naming authorities). After this come additional segments, as determined by the naming authority, each of which is unique in the context of the previous segments.

The most common globally unique names will be generated by system vendors or commercial application developers, and may be registered. However, in many cases names will be generated by vertical application developers to record their local types and properties. To meet this need, the naming rules allow for local creation of unregistered unique names, for example by using a product serial number as one of the name segments.

IDs. The Container Manager assigns each object a persistent ID that is unique within the container in which the object is created. These IDs are never reused once they have been assigned, so even if an object is deleted, its ID will never be reassigned.

These IDs are obviously essential to the functioning of the Container Manager format, but they do not appear directly in the API. The only points at which an application actually deals with anything corresponding to an ID is when it needs to store an object reference into a value, or find the object corresponding to a reference retrieved from a value. Even in this case, however, the API does not give the application direct access to an object ID, but only to a token that corresponds to the ID in the context of that particular value. This hiding of actual IDs permits the Container Manager to perform reference tracking.

Refnums. In the API, types, properties, and objects are referred to using opaque reference numbers provided by the Container Manager. The refnums are much more convenient to use than IDs because they are unique within the session, while an ID would need to be used together with a container reference. Since they are opaque, they allow implementations of the API that support caching schemes in which only portions of the container metadata are in memory at any given time.

Refnums have no persistent meaning, so they cannot be stored in values as references to other values. The tokens provided by the reference calls must always be used for persistent references.

Dynamic values. As mentioned above in the discussion on "Types", a Container Manager value can be compressed, encrypted, byte-swapped, etc. during read/write. Furthermore, these transformations can be composed together to form a chain of transformations.

In addition to data transformation, the same mechanism also supports I/O redirection. In this case a value actually stored in a container is a description of how to find the data, rather than the data itself. Such descriptions can be as simple as references to files, or to objects in another container, or as complex as queries that cause data to be retrieved from a database.

Both I/O transformations and I/O redirection are carried out implicitly by the Container Manager library, using "handlers" determined by the type of the value. These handlers are attached to temporary entities called dynamic values created by the library. Dynamic values are never visible to the application, and have no persistent meaning. The Dynamic Value mechanism is described in more detail below.

Value segments. To support interleaving and other uses that require breaking a value up into pieces, The Container Manager allows a value to be stored as multiple segments stored at different locations in the container. These segments are not visible at the API, since the Container Manager routines concatenate them to create a single stream of bytes.

The Container Manager also takes advantage of value segments to represent insertions, deletions, and overwrites of contiguous bytes in a value. This allows the Container Manager to represent these operations directly in recording updates, rather than having to create a new copy of the value.

Handlers. The Container Manager makes use of dynamically linked handlers supplied by the execution environment for two reasons: portability and extensibility. The use of handlers means that the Container Manager library is almost trivially portable, since all the system dependencies are in the handlers. The Container Manager library is also easily extensible, with the addition of newly written handlers, since the handler interfaces are carefully designed to provide cleanly encapsulated abstractions.

The Container Manager employs session handlers, container handlers and value handlers. Session handlers are global to the session as a whole. These include allocating and de-allocating memory, and reporting errors. Container handlers perform all of the actual I/O to containers. These handlers map I/O to the underlying storage in a way that depends on the container type. Container handlers basically provide a stream I/O interface to the container storage.

Value handlers implement both I/O transformations and value indirection. These handlers are determined by the type of each value. New handlers to carry out new types of data transformations or support new types of indirect values can be written at any time.

These handlers are invoked entirely by the library. The accessing application does not need to know that it is using handlers to access the value.

B. Overview of Types and Dynamic Values

The Container Manager provides a very powerful mechanism for transforming values during I/O, and for following indirect references. The Container Manager type mechanisms are probably best explained in terms of some usage examples.

Usage example 1--External File. Suppose an application developer would like to have a value that represents a file. When the application calls the Container Manager's Write Value Data procedure (CMWriteValueData) for writing data to the value, we want to actually perform I/O to the file.

The mechanisms described herein allow us to store a reference to the file in a value. When the value is used, an I/O redirection is set up, without the application being aware of it.

Note that this raises the thorny problem of platform-independent file references. The Container Manager avoids this problem. It allows any number of different types of references, implemented by handlers.

Usage Example 2--Compressed Value. Suppose an application developer would like to compress data as it is written to the value, and decompress it as it is read out. In addition to maintaining the data in the value itself, this compression may depend on a dictionary associated with the type of value. Furthermore, the compression routine may need to maintain a state, since the compression at any point may depend on what has already been written.

The mechanisms described herein allow us to give the value a type that causes the compression/decompression handler to be transparently invoked when the application does I/O. Again, this is an extensible mechanism, so that new compression algorithms (or more generally, arbitrary transformations) can be added without modifying the library.

Usage Example 3--Compressed. Format Converted Array. Suppose the value which an application is dealing with is actually an array of pixels. In addition to decompressing it, on a given platform we want to convert each pixel to a different format.

The mechanisms described herein allow us to take two (or more) data transformations, such as compression and format conversion, and compose them together. Just as the application does not need to be aware of the underlying transformations, the individual transformations do not need to be aware of each other.

Usage Example 4--All of the Above. The next step is to put the compressed pixel array out in a file, and convert it to a different format when it is read in. This is all supported using exactly the same composition as used in the previous example. The interfaces to data transformations and I/O redirection are the same, so no special mechanism is required.

Other Usage Examples. To briefly illustrate further where this leads, here are some further examples:

A value contains a query that is used to look information up in a database. The "I/O redirection" provides access to a table retrieved from the database.

A value contains a file reference that is encrypted because it also holds the file-server password. A decryption stage is required before the I/O redirector can be applied to the file reference.

A value contains a query that is used to generate a file reference, which then becomes the basis for a second level of I/O redirection.

Numerous other usages can be developed which can take advantage of the mechanisms described herein.

All of the above examples are based on the types associated with the values involved. The examples depend on two aspects of Container Manager types.

First, every value handler is bound to values only indirectly through the name of a type. Handlers are associated with type names through the CMSetMetaHandler Container Manager operation. This association is session-wide. Then the handler is bound to a particular type in a given container through the name of that type. This binding is done when the container is opened.

Second, even in the simplest examples above, such as the value that is merely an indirection to a file, or the value that is merely compressed, the value essentially has two types: the type visible to the application, which encodes the format of the data from the application's point of view, and the type used to find the appropriate handler for compression, I/O redirection, etc.

As the more complex examples show, multiple types of a value need to be independent. This leads to a view of a value as having multiple, independent types. By analogy with C++ (an analogy which is not perfect, as described below) we call these "ebase types" of the value's type. Base types can be added to and removed from any Container Manager type using the Container Manager CMAddBaseType() and CMRemoveBaseType() operations.

Base types are normal types, and themselves may have base types. This could be useful, for example, when the combination of file access and decompression is used in a variety of different contexts. The two could be made base types of a new type, and then that new type could be used in various ways, including making it a base type of the "all of the above" type which adds format conversion.

To illustrate the concept of base types, FIG. 1 is a symbolic diagram of a tree having three types 102, 104 and 106. A value may have a "compressed file type" 102 associated with it, but the compressed file type 102 has two base types: a "file access type" 104 and a "compression type" 106. The complex "compressed file type" 102 can be created by first defining the compressed file type 102 object, than calling the Container Manager procedure to add a base type 104 to the type 102, and then by calling the procedure again to add the base type 106 to the compressed file type 102.

FIG. 2 illustrates a more complex type tree. As shown in FIG. 2, the type "format converted compressed file type" 202 has two base types, "compressed file type" 204 and "format conversion type" 206. As with compressed file type 102 in FIG. 1, compressed file type 204 has two base types, "file access type" 206 and "compression" 208.

The addition of base types will always form a tree routed in the original type. If the same type is used as a base type in more than one place in the tree, the separate uses are treated as entirely separate types.

To understand how a given tree of types will behave, the tree is flattened into a linear "chain" of types. In the present embodiment, this is done by performing a depth-first, post-order walk on the tree. Thus, in the case of FIG. 1, the resulting sequence is file access, compression, then compressed file. If an application program calls the Container Manager routine to read data from a value (CMReadValueData), and the value has the type, "compressed file", then the Container Manager will first call the read handler for the compressed file type 102. The read handler for the compressed file type 102 will then (through another call to CMReadValueData) call the read handler for the compression type 106, which in turn calls (through yet another call to CMReadValueData) the read handler for file access type 104. The read handler for file access type 104 obtains (through yet another call to CMReadValueData) the information which is stored in the container in the storage area allocated to the value which the application desires to read, and uses this information to access the actual data on, for example, a hard disk. This data, obtained from the hard disk, is the return value of the read handler for file access type 104. This data gets decompressed by the read handler for compression type 106, and then returned to the caller by the read handler for the compressed file type 102.

The chain formed by the flattened type tree is considered herein to have a "top" and a "bottom" type which are, respectively in FIG. 1, compressed file type 102 and file access type 104. This means that the first handler to be called for any value operation is the value handler associated with the "top" type on the chain. That handler invokes the next handler on the chain, which in turn invokes the next handler on the chain, and so on down to the "bottom" handler on the chain. The handlers then return one by one to their respective calling handlers, until the "top" handler returns to the application program.

In the type tree of FIG. 2, the depth-first, post-order walk of the tree flattens it into the following linear chain: file access type 206, compression type 208, compressed file type 204, format conversion type 206, and format converted compressed file type 202. Format converted compressed file type 202 is the "top" type on the chain, and "file access" type 206 is the "bottom" type on the chain. Note that compressed file type 204 and format converted compressed file type 202 do not have handlers associated with them (let us assume), they will not have any effect on the value.

In order to support the above examples, the present embodiment assumes two design constraints. First, the application, and each handler, must always think that it is dealing with a "normal" value (i.e. one without redirection or transformations); that is, any redirection or transformation must be completely transparent to the caller. Second, in several cases we saw that handlers might have a non-trivial amount of state to manage.

We address these constraints by giving each handler its own "private" value, called a dynamic value. Dynamic values are transient (i.e. not persistent); they are created just to provide an environment for the handlers, and they are never written to the container, saved in the container's Table Of Contents (TOC), etc. However, they do have refnums and from the "outside" (i.e. from any application code or handler code except the handler that "owns" them) they look exactly like normal values. It will be seen that dynamic values have the same "value header" as real values, except that instead of pointing to storage locations which contain actual value data, they point to a vector of "handlers", one for each of a predefined set of "value operations", to be called when a prior caller desires to use the value.

The following value operations are supported by the Container Manager. The Container Manager routines which implement these operations first check whether the specified value is real or dynamic. If real, then the routine simply operates on the real data. If dynamic, then the routine calls the handler which is associated with the specified value for the specified value operation. Thus for a given dynamic value, a handler can be provided to support each of the following value operations:

.COPYRGT.1992 Apple Computer, Inc.

CMSize CMGetValueSize(CMValue value);

CMSize CMReadValueData(CMvalue value, CMPtr buffer, CMCount offset, CMSize maxsize);

void CMWriteValueData(CMValue value, CMPtr buffer, CMCount offset, CMSize size);

void CMInsertValueData(CMValue value, CMPtr buffer, CMCount offset, CMSize size);

void CMDeleteValueData(CMValue value, CMCount offset, CMSize size);

void CMGetValueInfo(CMValue value, CMContainer *container, CMObject *object, CMProperty *property, CMType *type, CMGeneration *generation);

void CMSetValueType(CMValue value, CMType type);

void CMSetValueGeneration(CMValue value, CMGeneration generation);

void CMReleaseValue(CMValue);

As an aside, the present description often uses C-language notation as a shorthand way of describing the steps performed by, or other characteristics of, a Container Manager routine. In this notation, all module names and external data that can possibly be visible to an application programmer begin with the letters "CM" or "cm". The upper case "CM" prefixes all API visible routines and macros. The prefix "kCM" is used for constants. The lower case "cm" is used for all inter-module references within the Container Manager. All other data and modules have no other naming conventions and should not be visible outside of the file in which they occur. Macros used within the Container Manager do not follow these conventions since they are never visible in the generated object modules. Thus names beginning with "cm" or (upper or lower case) are reserved by the API and should not be used by the applications using the API.

Also as an aside, routines or code portions which are not described herein are considered self-documenting either due to commenting or due to the use of self-documenting symbol names. For example, it will be apparent to the reader without further explanations that the CMGetValueSize() operation mentioned above returns the size of the specified value.

Returning again to the above Container Manager value routines, none can be called for a particular value until one of the following preparatory routines are called for that value: CMNewValue() or CMUseValue(). As described below, if the desired value is a dynamic value, these routines set up the chains of dynamic value handlers needed to support the above routines.

When a dynamic value is spawned by CMNewValue() or CMUseValue(), the pointer to the top-most dynamic value header is returned as the refNum. Then, whenever the user passes a retnum to an API value routine, it checks to see if the refNum is a dynamic value. If it is, it initiates the call to the corresponding value handler. That may cause a search up the base value chain to look for an "inherited" value routine. In the limit, we end up using the original API value routine if no handler is supplied and we reach the "real" value in the chain. Thus the handler must be semantically identical to the corresponding API call.

These dynamic values only exist from creation during the CMUseValue() until they are released by CMReleaseValue(). A dynamic value can have its own data, but this data is stored in the value's refcon rather than in the value data itself. Dynamic values do not have associated data in the normal sense.

A dynamic value is created when a value is created by CMNewValue() or used by CMUseValue(), and the following two conditions occur:

1. The type of the value, or any of its base types, have metahandlers which have been registered by the Container Manager CMSetMetaHandler() routine in a session-wide metahandler symbol table (CMSetMetaHandler() is usually called when a container is first opened); and

2. The metahandlers support a Use Value Handler, and in addition for CMNewValue(), a New Value Handler.

The New Value Handlers are used to save initialization data for the Use Value Handlers. The Use Value Handlers are called to set up and return a refCon. Another metahandler address is also returned. This is used to get the address of the value operation handlers corresponding to the standard API CM . . . value routines mentioned above.

When a CMNewValue() or CMUseValue() is almost done, a check is made on the value's type, and all of its base types (if any) to see if it has an associated registered metahandler. If it does it is called with a Use value operation type to see if a Use Value Handler exists for the type. If it does, we spawn the dynamic value.

The spawning is done by calling the Use Value Handler. The Use Value Handler is expected to set up a refCon to pass among the value handlers and a pointer to another metahandler. These are returned to CMNewValue() or CMUseValue() which does the actual creation of the dynamic value. The extensions are initialized, the metahandler pointer and refCon are saved. The pointer to the created dynamic value header is what CMNewValue() or CMUseValue() returns to the user as the refNum.

Now, when the user attempts to do a value operation using this refNum, we will use the corresponding handler routine in its place. The vector entries are set on first use of a value operation. If a handler for a particular operation is not defined for a value, its "base value" is used to get the "inherited" handler. This continues up the chain of base values, up to the original "real" value that spawned the base values from the CMNewValue() or CMUseValue(). Once found, we save the handler in the top layer vector (associated with the refNum) so we don't have to do the search again. Thus, as in C++, dynamic values may be "subclassed" via their (base) types.

Note that if we indeed do have to search up the base value chain then we must save the dynamic value refNum (pointer) along with the handler address. This is very much like C++ classes, where inherited methods are called and the appropriate "this" must also be passed.

The Container Manager supports layering of dynamic values. The best way to describe layering is in terms of C++. Say we have the following class types (using a somewhat abbreviated notation):

______________________________________ .COPYRGT. 1992 Apple Computer, Inc. class Layer1 { // a base class <layer1 data> // possible data (fields) Layer1 (<layer1 args>); // constructor to init the data other methods . . . // value operations in our case }; class Layer2 { // another base class <layer2 data> // possible data (fields) Layer2 (<layer2 args>); // constructor to init the data other methods . . . // value operations in our case }; class T: Layer1, Layer2 { // the class of interest! <T data> // possible data (fields) T(<T args>, <layer1 args>, <layer2 args>); // constructor to init the data and bases other methods . . . // value operations in our case }; ______________________________________

In Container Manager terminology, T is to be a registered type with other registered types as base types (classes). All type objects are created using the standard API call CMRegisterType(). Base types can be added to a type by using CMAddBaseType(). This defines a form of inheritance like the C++ classes.

Type T would be registered with its base types as follows:

______________________________________ .COPYRGT. 1992 Apple Computer, Inc. layer1 = CMRegisterType(container, "Layer1"); layer2 = CMRegisterType(container, "Layer2"); t = CMRegisterType(container, "T"); CMAddBaseType(t, layer1); CMAddBaseType(t, layer2); ______________________________________

For the t object, the global name property and value are created as usual by CMRegisterType(container, "T"). The CMAddBaseType() calls add the base types. These are recorded as the object ID's for each base type in the order created as separate value segments for a special "base type" property belonging to the type object.

As mentioned above, CMNewValue() or CMUseValue() spawn dynamic values if the original type or any of its base types have an associated Use Value Handler. Assume that was done for "T" in the above example. What happens is that CMNewValue() or CMUseValue() will look at its type object (t here) to see if the base type property is present. If it is, it will follow each type "down" to leaf types using a depth-first search.

In the example, "layer1" will be visited, then "layer2", and finally the original type "T" itself. If the "layer1" type object had base types of its own, they would be visited before using "layer1" itself. Hence the depth-first search down to the leaf types.

For each type processed, if it has a Use Value Handler of its own, it will be called to get a refCon and value handler metahandler.

Note that this scheme allows total freedom for the user to mix types. For example, type T1 could have base types T2 and T3. Alternatively, T1 could just have base type T2 and T2 have T3 as its base type!

In the C++ class types shown above, note that each class could have its own data along with its own constructor. The T class has a constructor that calls the constructors of all of its base classes. We can carry this analogy with the Container Manager just so far. Here is where it starts to break down.

The problem here is that C++ class types are declared statically. A C++ compiler can see all the base classes and can tell what data gets inherited and who goes with what class. In the Container Manager, all "classes" (i.e., our type objects) are created dynamically. So the problem is we need some way to tell what data "belongs" to what type.

The solution is yet another special handler, which returns a format specification called metadata. The handler is the Metadata Handler whose address is determined by the Container Manager from the same metahandler that returns the New Value and Use Value Handler addresses.

Metadata is very similar to C-language printf() format descriptions, and is used for similar purposes. The next section will describe the metadata in detail. For now, it is sufficient to know that it tells CMNewValue() how to interpret its ". . . " parameters. The rest of this section will discuss how this is done to dynamically create data.

As with C++ classes, the data is created when a new value is created, i.e., with a CMNewValue() call. The data will be saved in the container, so CMUseValue() uses the type format descriptions to extract the data for each dynamic value layer.

CMNewValue() is defined with the following prototype:

CMValue CMNewValue(CMObject object, CMProperty property, CMe type, . . . );

The ". . . " is an arbitrary number of parameters used to create the data. Metadata, accessed from the Metadata Handler, tells CMNewValue() how to interpret the parameters just like a printf() format tells printf() how to use its arguments.

The order of the parameters is important. Because base types are done with a depth-first search through the types down to their leaves, the CMNewValue() ". . . " parameters must be ordered with the parameters for the first type in the chain occurring first in the parameter list. Note what's happening here is that the user is supplying all the constructor data just like T constructor class example above.

The way the data gets written is with a special handler, called the New Value Handler. After CMNewValue() calls the Metadata Handler, it uses the metadata to extract the next set of CMNewValue() ". . . " parameters. CMNewValue() then passes the parameters along in the form of a data packet to the New Value Handler. The New Value Handler is then expected to use this data, which it can extract with the Container Manager CMScanDataPacket() routine. Once it has the data, it can compute initialization values to write to its base value. It is the data written by the New Value Handler that the Use Value Handler will read to create its refCon.

Only CMNewValue() does this. The New Value Handler is only for new values, but the Use Value Handler is used by both CMNewValue() and CMUseValue().

In the simplest case, with only one dynamic value, it can be seen that the data is written to the "real" value. Now if you layer another dynamic value on to this, the next chunk of data is written using that layer's base value and hence its handlers. The second layer will thus use the first layer's handlers. That may or may not end up writing to the "real" value depending on the kind of layer it is. If it's some sort of I/O redirection handler (i.e., it reads and writes somewhere else), the second layer data will probably not go to the "real" value.

The Use Value Handler is called both for CMNewValue() and CMUseValue(). The Use Value Handler reads the data from its base value to create its refcon. If the user comes back the next day and does a CMUseValue(), only the Use Value Handler is called. Again it reads the data from its base value to construct the refCon and we're back as we were before in the CMNewValue() case.

It should be pointed out here that the Metadata and New Value Handlers will always be executed with a Container Manager running on some particular hardware (obviously). The data packet built from the CMNewValue() ". . . " parameters is stored as a function of the hardware implementation on which it is run (i.e., whatever the sizes are for bytes, words, longs, etc.). How it is stored is a function of the metadata returned from the Metadata Handler. In other terms, the New Value Handler has a contract with both the Container Manager and the Metadata Handler on the meaning of the parameter data.

Note, however, it is not required that you be on the same hardware when you come back the next day and to the CMUseValue() that leads to the Use Value Handler call. The handler writer must keep this in mind. Specifically, the Use Value Handler must know the attributes (bytes size, big/little endian, etc.) of the data written out by the New Value Handler so it knows how to use that info. In other words, the Use Value Handler has a (separate) "contract" with its own New Value Handler on the meaning of the data written to the base value.

There is another, relatively minor, thing to keep in mind. That is that the value handlers for any one layer must take into account the size of its own data when manipulating additional data created by the handlers for CMReadValueData(), CMWriteValueData(), etc. This simply offsets the write and read value data operations by the proper amount. Remember all operations are on their base values. So if a New Value Handler writes data, this basically prefixes the "real" stuff being written by the handler operations.

The Metadata Handler is only needed for CMNewValue() so that the proper number of CMNewValue() ". . . " parameters can be placed into a data packet for the New Value Handler. The Metadata Handler must follow the prototype,

CMMetaData metaData.sub.-- Handler(CbType type); where "type" is the (base) type layer whose metadata is to be defined.

The Metadata Handler simply returns a C string containing the metadata using the format descriptions described above.

The type is passed as a convenience. It may or may not be needed. It is possible for a type object to contain other data for other properties. Types, after all, are ordinary objects. There is nothing prohibiting the creation of additional properties and their values. This fact could be used to add additional (static and private) information to a type to be used elsewhere. For example, the type could contain a compression dictionary.

The New Value Handler must follow the prototype,

CMBoolean newvalue.sub.-- Handler(CMValue basevalue,

cmType type,

CMDataPacket datapacket);

where

basevalue=the base value which is to be used to write the refCon data for the Use Value Handler

type=the type corresponding to this New Value Handler

dataPacket=the pointer to the data packet, created from the CMNewValue() ". . . "

parameters according the types metadata format description.

The type is passed again as a convenience just as in the Metadata Handler. It can also be used here to pass to CMScanDataPacket() to extract the dataPacket back into variables that exactly correspond to that portion of the CMNewValue() ". . . " parameters that correspond to the type. It is not required, however that CMScanDataPacket() be used.

The Use Value Handler is called for both the CMUseValue() and CMNewValue() cases. If its companion New Value Handler wrote data to its base value, the Use Value Handler will probably read the data to create its refCon. The re