Home
Patent Search
IMT Blog
REGISTER
|
SIGN IN
United States Patent Application
20030033441
Kind Code
A1
FORIN, ALESSANDRO ; et al.
February 13, 2003
HIGHLY COMPONENTIZED SYSTEM ARCHITECTURE WITH A DEMAND-LOADING NAMESPACE AND PROGRAMMING MODEL
Abstract
The invention is embodied in software executable on a computer having a working memory with demand-loadable components initially stored outside of the working memory, each component having an entry point including a constructor for an object. Preferably, the demand-loadable components are initially provided in a memory within the computer or a location external of the computer. A Namespace in the working memory provides access in the working memory to the components as they become needed by applications running in the computer. The Namespace provides the access by managing demand-loading and unloading of the components in the working memory.
Inventors:
FORIN; ALESSANDRO
(REDMOND, WA)
, HELANDER; JOHANNES V.
(BELLEVUE, WA
)
, RAFFMAN; ANDREW R.
(WOODINVILLE, WA
)
Correspondence Name and Address:
TWO PRUDENTIAL PLAZA, SUITE 4900 180 NORTH STETSON AVENUE
LEYDIG VOIT & MAYER, LTD
CHICAGO
IL
60601-6780
US
Series Code:
282238
Filed:
March 31, 1999
U.S. Current Class:
709/315;
709/332
U.S. Class at Publication:
709/315;
709/332
Intern'l Class:
G06F 009/44;
G06F 015/163; G06F 009/54; G06F 009/00
Claims
What is claimed is:
1. A method of providing software executable on a computer having a working memory, comprising: providing demand-loadable components initially stored outside of said working memory, each component having an entry point comprising a constructor for an object.
2. The method of claim 1 wherein said demand-loadable components are initially provided in one of: (a) a memory within said computer; (b) a location external of said computer.
3. The method of claim 1 further comprising: providing a Namespace in said working memory which provides in said working memory access to ones of said components as they become needed by applications running in said computer.
4. The method of claim 3 wherein said Namespace provides said access by managing demand-loading and unloading of said components in said working memory.
5. The method of claim 3 further comprising providing applications in said working memory which rely on said Namespace to furnish access to ones of said components in said working memory as they become needed by ones of said applications.
6. The method of claim 5 wherein each component comprises an object, and wherein the step of providing each demand-loadable component comprises: providing an IUnknown interface in the object having the following methods: (a) add reference for incrementing a count of the number of applications requiring the object; (b) release reference for decrementing a count of the number of applications requiring the object; wherein said Namespace is responsive to said count in said managing of said demand-loading and unloading.
7. The method of claim 5 wherein each component comprises an object, and wherein the step of providing a demand-loadable component comprises: providing an IUnknown interface in the object having a QueryInterface method of providing access to the methods of the object to an application invoking QueryInterface.
8. The method of claim 5 wherein said object is a COM object.
9. A method of operating a computer having a working memory wherein applications and objects may be loaded during run time, comprising: providing a Namespace in said computer; running an application in said computer; said application calling a demand-loadable object by causing the name of said object to be presented to said Namespace; in response to being presented with the name of said object, said Namespace returning to said application an IUnknown pointer of said object; upon return of said IUnknown pointer of said object, said application using said IUknown pointer to call a QueryInterface method of said object and request a pointer to a desired interface; said QueryInterface method returning said desired interface, whereby said application can invoke a desired method through said interface.
10. The method of claim 9, wherein said computer comprises a loader, and wherein said Namespace, responds to being presented with the name of said object in that said Namespace: determines whether the name of said object is currently registered in said Namespace, and, if so, carries out the step of returning said pointer; if said name is not currently registered, causes said loader to load said object into said working memory and registers said name in said Namespace, and then carries out the step of returning said pointer.
11. The method of claim 10 wherein said object has a constructor and an entry point, and wherein the loading of said object comprises: said loader invoking said constructor; said constructor finding said entry point of said object and calling an executable at said entry point; said executable causing space in said working memory to be allocated for a VTable, an Interface and an Implementation of said object and producing a pointer to said memory space, said pointer comprising said IUnknown pointer.
12. The method of claim 11 further comprising: loading said VTable, Interface and Implementation in the space in said working memory allocated therefor; initializing the state of said object including said VTable and interface pointers.
13. A computer having a working memory and access to a storage memory, said computer comprising: an application capable of being loaded in to said working memory and running in said computer; at least one object initially stored in said nonworking memory; a Namespace in said working memory; said application being programmed to cause said one object to be identified to said Namespace whenever said application finds a need for said object during the running of said application; said Namespace being programmed to: (a) respond to said application identifying said one object by determining whether said one object is currently registered in Namespace, and if it is not registered, then, (b) causing said one object to be loaded from said storage memory to said working memory and, (c) registering said one object in said Namespace, (d) upon said object being registered in said Namespace, returning to said application a pointer to said object.
14. The computer of claim 13 wherein said object comprises a VTable, plural interfaces and corresponding methods thereof, and plural implementations, one of said interfaces comprising an IUknown interface including a QueryInterface method, and wherein said application finds a need for said one object because said application needs a particular one said interfaces of said one object, wherein: said pointer to said object returned by said Namespace comprises an IUnknown pointer to said IUknown interface of said one object; said application is programmed to use said IUnknown pointer to access said IUknown interface of said one object and to invoke the QueryInterface method thereof to access said particular one interface.
15. The computer of claim 14 wherein said computer further comprises a loader in said working memory capable of loading objects from said storage memory to said working memory, said Namespace causing said one object to be loaded by causing said loader to load said one object.
16. The computer of claim 15 wherein said object has a constructor and an entry point, and wherein said loader is programmed such that said loader: invokes said constructor; said constructor is programmmed to find said entry point of said object and call an executable at said entry point; said executable is programmed to cause space in said working memory to be allocated for a VTable, an Interface and an Implementation of said object and producing a pointer to said memory space, said pointer comprising said IUnknown pointer.
17. The computer of claim 16 wherein: said loader is further programmed to load said VTable, Interface and Implementation in the space in said working memory allocated therefor; said loader is programmed to initialize the state of said object including said VTable and interface pointers.
18. A computer having a working memory and access to a storage memory, said storage memory holding at least one object, said computer comprising: an application in said working memory, said application needing to access to said one object at a particular time during the running of said application, said application being programmed to cause a request for said one object to issue contemporaneously with said particular time; a Namespace which is programmed to respond to a request from said application for said one object by loading said one object from said storage memory into said working memory and then providing said application with a pointer to said one object.
19. The computer of claim 18 wherein: said application is programmed to notify said Namespace whenever it no longer needs access to said one object; said Namespace is further programmed to permit said one object to be unloaded from said working memory when no longer needed.
20. The computer of claim 18 wherein: said application is programmed to notify said one object it no longer needs access to it, whereby said object notifies said NameSpace said object is no longer needed; said Namespace is further programmed to permit said one object to be unloaded from said working memory when no longer needed.
21. The computer of claim 18 wherein said Namespace permits said one object to remain in working memory after being no longer needed by said application in order to permit other applications to access said one object.
22. A system residing in a working memory of a computer and in a storage memory, said system comprising: demand-loadable components initially stored in said storage memory, each component having an entry point comprising a constructor for an object.
23. The system of claim 22 wherein said storage memory comprises one of: (a) a memory within said computer; (b) a memory external of said computer. (c) the output of another software component such as a compiler.
24. The system of claim 22 further comprising: a Namespace in said working memory which provides in said working memory access to ones of said components as they become needed by applications running in said computer.
25. The system of claim 24 wherein said Namespace provides said access by managing demand-loading and unloading of said components in said working memory.
26. The system of claim 24 further comprising applications in said working memory which rely on said Namespace to furnish access to ones of said components in said working memory as they become needed by ones of said applications.
27. The system of claim 26 wherein each component comprises an object, and wherein each demand-loadable component comprises: an IUnknown interface in the object having the following methods: (a) add reference for incrementing a count of the number of applications requiring the object; (b) release reference for decrementing a count of the number of applications requiring the object; wherein said Namespace is responsive to said count in said managing of said demand-loading and unloading.
28. The system of claim 27 wherein each component comprises an object, and wherein each demand-loadable component comprises: an IUnknown interface in the object having a QueryInterface method of providing access to the methods of the object to an application invoking QueryInterface.
29. The system of claim 26 wherein said object is a COM object.
30. The system of claim 27 further comprising a loader in said working memory, and wherein said Namespace is responsive to being presented with the name of one of said objects by: returning a pointer to said object if said object is registered in said Namespace and returning to said application a pointer to said object; if said name is not currently registered, causing said loader to load said object into said working memory.
31. The system of claim 30 wherein said object has a constructor invoked by said loader, an entry point and an working memory space-allocating executable called by said constructor at said entry point.
32. The system of claim 31 wherein said object comprises a VTable, an Interface and an Implementation in locations in said working memory allocated by said executable.
Description
BACKGROUND OF THE INVENTION
[0001] 1. Technical Field
[0002] The invention is related to computer operating systems and in particular to a computer operating system which is highly componentized and has dynamically loadable operating features which may be loaded and unloaded during system run time.
[0003] 2. Background Art
[0004] The progressive computerization of society involves a number of diverse computing platforms beside the general-purpose computer:
[0005] Embedded control systems, including consumer devices, intelligent sensors and smart home controls.
[0006] Communication-oriented devices such as digital cell phones and networking infrastructure.
[0007] Programmable peripherals and microcontrollers.
[0008] In all these cases, the general-purpose platform approach is either not applicable, or it is prohibitively expensive. The microprocessor might be a DSP, a VLIW, or a microcontroller; the memory budget is severely restricted; there might be no MMU; the network connection might be sporadic; and Real-Time support is essential.
[0009] Current operating systems are either inflexible, big, lack Real-Time support, have complex hardware requirements, or are so special purpose that good development tools are unavailable and code reusability is low.
[0010] Microkernels [Black92, Engler95] attempt to modularize the operating system. But they confuse modularity with security by mandating that system services be in separate address spaces. Many of the services moved into separate server processes are still necessary for these systems to function and often the services have to trust each other.
[0011] C++ and Java provide objects at a very fine granularity level, and they are extremely successful with application programmers. Unfortunately, both languages confine their objects to a single address space. Object Linking and Embedding (OLE) [Brockschmidt95] and other similar systems extend objects across address spaces and across machine boundaries. OLE seamlessly integrates independently developed components. When editing an Excel spreadsheet inside a Word document it is in fact the Excel process that operates on objects inside of Word's address space. Unfortunately, it only works for user mode applications.
[0012] Modularity has always been an important paradigm in software design. By breaking a complex system into pieces, the complexity becomes more manageable. Address spaces provide security by installing virtual-memory based firewalls between applications. These two issues are orthogonal, but the distinction has been lost in systems research that has been concentrating on so-called microkernels. These issues have been discussed in the following publications:
[0013] [Bershad95] Brian Bershad, S. Savage, P. Pardyak, E. G. Sirer, M. Fiuczynski, D. Becker, S. Eggers, C. Chambers. Extensibility, safety and performance in the Spin operating system. In 15.sup.th ACM Symposium on Operating System Principles, pages 267-284, Copper Mountain Resort, Colorado, December 1995.
[0014] [Black92] David Black, David Golub, Daniel Julin, Richard Rashid, Richard Draves, Randall Dean, Alessandro Forin, Joseph Barrera, Hideyuki Tokuda, Gerald Malan, David Bohman. Microkernel Operating System Architecture and Mach. In 1.sup.st USENIX Workshop on Micro-kernels and Other Kernel Architectures, pages 11-30, Seattle, April 1992.
[0015] [Brockschmidt95] K. Brockshmidt. Inside OLE, Second ed. Microsoft Press, Redmond Wash., 1995.
[0016] [Cheriton94] David Cheriton, Kenneth Duda. A Caching Model of Operating System Kernel Functionality. In 1.sup.st Symposium on Operating Systems Design and Implementation, Seattle, 1994.
[0017] [Cheriton88] David Cheriton. The V distributed system. In Communications of the ACM, pages 314-333, March 1988.
[0018] [Draves97] Richard Draves, Scott Cutshall. Unifying the User and Kernel Environments. Microsoft Research Technical Report MSR-TR-97-10, 16
pages, March 1997
[0019] [Engler95] D. R. Engler, M. F. Kaashoek, J. O'Toole Jr. Exokernel: an operating system architecture for application-specific resource management. In 15.sup.th ACM Symposium on Operating System Principles, pages 251-266, Copper Mountain Resort, Colorado, December 1995.
[0020] [Ford97] Bryan Ford, Godmar Back, Greg Benson, Jay Lepreau, Albert Lin, Olin Shivers. The Flux OSKit: A Substrate for Kernel and Language Research. In Proceedings of the 16th ACM Symposium on Operating Systems Principles, pages 38-51. ACM SIGOPS, Saint-Malo, France, October 1997.
[0021] [Golub90] David Golub, Randall Dean, Alessandro Forin, Richard Rashid. UNIX as an application program. In USENIX 1990 Summer Conference, pages 87-95, June 1990.
[0022] [Helander94] Johannes Helander. Unix under Mach: The Lites Server. Master's thesis, 71 pages, Helsinki University of Technology, 1994. Available from http://www.cs.hut.fi/.about.jvh/lites.MASTERS.ps
[0023] [Hildebrand92] D. Hildebrand. An architectural overview of QNX. In 1.sup.st USENIX Workshop on Micro-kernels and Other Kernel Architectures, pages 113-126, Seattle, April 1992.
[0024] [ISI95] Integrated Systems Inc. pSOSystem System Concepts. Part No. COL0011, May 1995, ISI, Sunnyvale Calif.
[0025] [Jones96] Michael B. Jones, Joseph S. Barrera, III, Richard P. Draves, Alessandro Forin, Paul J. Leach, Gilad Odinak. An Overview of the Rialto Real Time Architecture. In Proceedings of the 7.sup.th ACM SIGOPS European Workshop, pagg. 249-256, September 1996.
[0026] [Jones97] Michael B. Jones et al. CPU Reservations and Time Constraints: Efficient, Predictable Scheduling of Independent Activities. In Proceedings of the 16th ACM Symposium on Operating Systems Principles, pages 198-211. ACM SIGOPS, Saint-Malo, France, October 1997.
[0027] [Jones 97b] Michael B. Jones. The Microsoft Interactive TV System: An Experience Report. Microsoft Research Technical Report MSR-TR-97-18, July, 1997.
[0028] [Julin91] Daniel Julin, Jonathan Chew, Mark Stevenson, Paulo Guedes, Paul Neves, Paul Roy. Generalized Emulation Services for Mach 3.0: Overview, Experiences and Current Status. In Proceedings of the Usenix Mach Symposium, 1991.
[0029] [Lee98] Dennis Lee, Patrick Crowley, Jean-Loup Baer, Tom Anderson, Brian Bershad. Execution characteristics of desktop applications on Windows NT. In Proceedings of the 25.sup.th International Symposium on Computer Architecture, Barcelona, Spain, June 1998.
[0030] [Liedtke95] Jochen Liedtke. On .quadrature.-kernel construction. In 15.sup.th ACM Symposium on Operating System Principles, pages 237-250, Copper Mountain Resort, Colorado, December 1995.
[0031] [Mogul87] Jeffrey Mogul, Richard Rashid, Michael Accetta. The Packet Filter: an Efficient Mechanism for User-level Network Code. In 11.sup.th ACM Symposium on Operating System Principles, November 1987.
[0032] [Rashid87] Richard Rashid. From RIG to Accent to Mach: The evolution of a network operating system. Carnegie Mellon University Technical Report, August 1987.
[0033] [Rozier88] M. Rozier, A. Abrassimov, F. Armand, I. Boule, M. Gien, M. Guillemont, F. Hermann, C. Kaiser, S. Langlois, P. Leonard, W. Neuhauser. CHORUS distributed operating system. In Computing Systems, pages 305-370, Vol. 1-4, 1988.
[0034] [Young89] Michael Wayne Young. Exporting a User Interface to Memory Management from a Communication-Oriented Operating System. Ph.D. Thesis CMU-CS-89-202, Carnegie Mellon University, November 1989.
[0035] Mach [Black92] defined an interface for external memory managers [Young89] and was able to split virtual memory into functionally distinct parts, allowing part of the functionality to reside outside the privilege-level component (the "kernel"). Mach also separated part of the Unix operating system services out of the kernel [Golub90, Helander94], achieving modularity but limited additional functionality. The multiserver project [Julin91] went further in the modularization by splitting the Unix services into multiple independent servers. The componentization added structure and generality to the services. However, keeping the services in multiple address spaces did not add any security or robustness since components had to be available and trusted in any case. The most interesting new functionality was in the ability to emulate multiple OS interfaces, at the same time.
[0036] Contemporary research systems take the minimization of the "kernel" concept even further by defining even lower level abstractions and demonstrating the ability to split states across address space boundaries. None of these systems defines a new application programming interface (API) different from the Unix they emulate. The API that their predecessors [Rashid87, Cheriton88, Rozier88] did define, based on RPC and message exchanges, were not very successful with programmers.
[0037] The Cache Kernel [Cheriton94] uses Mach's external memory manager metaphor uniformly for the management of all kernel objects. Threads, Address Spaces and User Kernels are all handled through this pagein-pageout logical interface. An actual application is statically linked with a number of libraries, which provide default implementations of the required User Kernel components (VM, scheduling, IPC). This offers some flexibility by letting untrusted applications have their custom application kernel. Overall complexity is not decreased; it seems an application kernel would have to be as complicated as any other operating system. The ability to write your own application kernel would seem useful for a limited number of users, in teaching operating systems for instance.
[0038] Exokernel [Engler95] goes along the same lines demonstrating further ability to run operating system code in user mode. While it is highly successful in this and offers some added flexibility, it is questionable whether the premises differ from that of microkernels. The main contribution is in the mechanisms for application-specific resource management.
[0039] [Liedtke95] argues that microkernels have failed exclusively on performance grounds, and that poor performance is their only cause for inflexibility. Our argument is the opposite: inflexibility is inherent in the design, and leads to unavoidable inefficiencies that can only be mitigated by good implementations, never eliminated.
[0040] Spin [Bershad95] addresses the issue of expensive address space crossings by letting user code compiled by a trusted compiler run inside the kernel. This can be viewed as smart proxies that can do a lot of the work locally that otherwise would require communication. It is similar to loading packet filters into network drivers [Mogul87], to running database application query language inside database engines [reference], or to sandboxing Java applets. Applying these techniques to operating systems is beneficial when a trust boundary must be crossed and the cost would otherwise be high. It does not address the issue of whether or not a trust boundary is necessary. Spin uses an object-based language (Modula3) to provide extensibility. The pointer-safety property of the language is what permits execution of untrusted code in privileged mode. Trust relationships, as in the user-versus-kernel separation, should not dominate system decomposition. It is important to return to a global system view. The present invention addresses the issue of how to minimize the set of base services, and how to dynamically extend them on demand.
[0041] [Ford97] shows how a base set of system components can be composed in different ways to build an operating system kernel. The granularity is fairly coarse, and the techniques are limited to static linking. Components that should be of interest to OS researchers (VM, IPC, scheduling, etc.) cannot be replaced or removed, neither statically nor dynamically. The decomposition is otherwise limited to the "OS" component; it is not meant as a whole-system approach. This does not go far enough in the componentization. It provides a few convenient components, such as bootstrap loader and filesystems, but is mostly concerned with reusing existing device drivers and Unix code. It fails to componentize the core kernel services or extend the paradigm to applications.
[0042] Componentization and location independence has also been studied in the context of filesystems and network protocols [Maeda93] and in a number of existing embedded systems, such as pSOS [ISI95]. In a typical embedded system there is no loader, and components can only be chosen at static link time when the load image is built. Services are extremely limited, sometimes exclusively to the scheduling component. The number and priority of threads might have to be specified statically as well.
[0043] Chorus [Rozier88] can be configured to use either a page-based or a segment-based VM system.
SUMMARY OF THE INVENTION
[0044] A preferred embodiment of the invention is directed to a flexible system architecture that is suitable for a wide range of applications. The system is built out of minimal but flexible components, which can be deployed as needed. Instead of mandating a fixed set of operating system services and hardware requirements, the system preferably provides a menu of well-defined components that can be chosen to compose a complete system depending on hardware capabilities, security needs, and application requirements.
[0045] Dynamic loading and unloading of components provides the flexibility that lets the system adapt to changing requirements.
[0046] The componentization makes it possible to change the implementation of a component without affecting the rest of the system. Minimalism makes it possible to use the system with severely restricted hardware budgets. It also forces the system to be understandable and flexible. Software components, when possible, are not tied to a particular layer of the system, but can be reused. For example, the same code that implements the system physical memory heap is used to provide application heaps over virtual memory. The key system building blocks are componentized. This includes the virtual memory system, IPC, and the scheduler in addition to filesystems, networking, drivers, and protection policies. Preferred embodiments of the present invention extend object-orientation both across address spaces and across protection levels.
[0047] In a preferred embodiment, components are located in separate address spaces only when there is a real reason for it, such as security or specific address space requirements. Thus, the price of multiple address spaces (and transitions thereof) is paid only where needed.
[0048] The invention is embodied in software executable on a computer having a working memory with demand-loadable components initially stored outside of the working memory, each component having an entry point including a constructor for an object. Preferably, the demand-loadable components are initially provided in a memory within the computer or a location external of the computer. A Namespace in the working memory provides access in the working memory to the components as they become needed by applications running in the computer. The Namespace provides the access by managing demand-loading and unloading of the components in the working memory. Applications in the working memory rely on the Namespace to furnish access to ones of the components in the working memory as they become needed by ones of the applications. Each component includes an object, and each demand-loadable component is obtained through an IUnknown interface in the object. The IUnknown interface preferably has the following methods: (a) add reference for incrementing a count of the number of applications requiring the object; (b) release reference for decrementing a count of the number of applications requiring the object. The Namespace is responsive to the count in the managing of the demand-loading and unloading.
[0049] Preferably, the computer includes a loader, and the Namespace determines whether the name of the object is currently registered in the Namespace, and, if so, carries out the step of returning the pointer. If the name is not currently registered, the loader loads the object into the working memory and registers the name in the Namespace. The object has a constructor and an entry point, and the object is loaded by the loader invoking the constructor; the constructor finding the entry point of the object and calling an executable at the entry point; the executable causing space in the working memory to be allocated for a VTable, an Interface and an Implementation of the object and producing a pointer to the memory space, the pointer including the IUnknown pointer.
[0050] In accordance with another aspect of the invention, a computer having a working memory and access to a storage memory includes an application capable of being loaded in to the working memory and running in the computer, at least one object initially stored in the non-working memory, a Namespace in the working memory, the application being programmed to cause the one object to be identified to the Namespace whenever the application finds a need for the object during the running of the application. The Namespace is programmed to (a) respond to the application identifying the one object by determining whether the one object is currently registered in Namespace, and if it is not registered, then, (b) causing the one object to be loaded from the storage memory to the working memory and, (c) registering the one object in the Namespace, (d) upon the object being registered in the Namespace, returning to the application a pointer to the object.
BRIEF DESCRIPTION OF THE DRAWINGS
[0051] FIG. 1A is a block diagram of an exemplary operating environment of the invention.
[0052] FIG. 1B is a block diagram of an operating system embodying the present invention in the computer illustrated in FIG. 1A.
[0053] FIG. 1C illustrates one application of the invention to form stacked virtual memories with a local virtual memory.
[0054] FIG. 2 illustrates a page table registry structure of a virtual memory manager of the operating system of FIG. 1B.
[0055] FIG. 3 illustrates the objects in the virtual memory manager.
[0056] FIG. 4 illustrates the virtual memory manager of FIG. 3 with a set of interfaces.
[0057] FIG. 5 illustrates the structure of an object in the operating system of FIG. 1B.
[0058] FIG. 6 illustrates the structure of the virtual memory view object of the operating system of FIG. 1B.
[0059] FIG. 7 illustrates the objects in a preferred implementation of the virtual memory manager. FIG. 8 illustrates the Load VMM process in the operating system of FIG. 1B.
[0060] FIG. 9 illustrates the method of handling a virtual memory (VMM) fault in the operating system of FIG. 1B.
[0061] FIG. 10 illustrates the operation of the VMM fault handler in the operating system of FIG. 1B.
[0062] FIG. 11 illustrates the method for taking a VMM fault in the operating system of FIG. 1B.
[0063] FIG. 12 illustrates the operation of the context switch process in the operating system of FIG. 1B.
[0064] FIG. 13 illustrates the SwitchTo process in the operating system of FIG. 1B.
[0065] FIG. 14 illustrates the operation for unloading the virtual memory manager in the system of FIG. 1B.
[0066] FIG. 15 illustrates the process for handling a page fault in the system of FIG. 1B.
[0067] FIG. 16 illustrates a process by which a constructor creates a thread in the operating system of FIG. 1B.
[0068] FIG. 17 illustrates multiple views of memory provided by the VMView object of the operating system of FIG. 1B.
[0069] FIG. 18 illustrates multiple views of memory that can be obtained in accordance with FIG. 17.
[0070] FIG. 19 illustrates the basic features of a loadable interprocess communication (IPC) manager in the operating system of FIG. 1B.
[0071] FIG. 20 illustrates the process of loading of the IPC manager of FIG. 19.
[0072] FIG. 21 illustrates an interface between the IPC manager and other threads.
[0073] FIG. 22 illustrates intercommunication provided by the IPC manager between different address spaces.
[0074] FIG. 23 illustrates how an IPC trap is handled in the operating system of FIG. 1B.
[0075] FIG. 24A illustrates the operation of the IPC trap handler in the operating system of FIG. 1B.
[0076] FIG. 24B illustrates objects inside the loadable IPC system of a preferred embodiment of the present invention.
[0077] FIG. 24C illustrates objects in different address spaces connected by the loadable IPC system of FIG. 24B.
[0078] FIG. 25 illustrates the interface Imutate which provides object mutation in the operating system of FIG. 1B.
[0079] FIG. 26 illustrates one application of object mutation in the operating system of FIG. 1B.
[0080] FIG. 27 illustrates another application of object mutation applied to a Vtable.
[0081] FIG. 28 illustrates synchronization of object mutation by mutual exclusion.
[0082] FIG. 29 illustrates synchronization of object mutation by transactional synchronization.
[0083] FIG. 30A illustrates the process of object mutation by swizzling in accordance with a preferred embodiment of the invention.
[0084] FIG. 30B illustrates the structure of a thread relative to external objects prior to swizzling.
[0085] FIG. 30C illustrates the structure of the thread relative to the external objects corresponding to FIG. 30B after swizzling.
[0086] FIG. 31 illustrates an application of object mutation to achieve object interposition.
[0087] FIG. 32 illustrates an application of object mutation to carry out dynamic software upgrading.
[0088] FIG. 33 illustrates an application of object mutation to carry out run-time code generation.
[0089] FIG. 34 illustrates how to achieve object mobility using object mutation.
[0090] FIG. 35 illustrates how proxies may be used with object mutation to communicate across address spaces.
[0091] FIG. 36 illustrates a mutatable structure of the virtual memory manager.
[0092] FIG. 37 illustrates a method embodying the programming model of the invention.
[0093] FIG. 38 illustrates operations carried out with the demand-loading NameSpace in accordance with the programming model of FIG. 37.
[0094] FIG. 39 illustrates the loading of an object in accordance with the programming model of FIG. 37.
[0095] FIG. 40 illustrates an application of the programming model of FIG. 37 to plug-and-play technology.
[0096] FIGS. 41 and 42 illustrate an example of a conventional process for linking an executable image.
[0097] FIG. 43 illustrates an example of a conventional process for linking with shared libraries.
[0098] FIG. 44 illustrates a process in accordance with the present invention for linking an executable image using shared libraries.
[0099] FIG. 45 illustrates a process in accordance with the present invention for forming a dynamically linked library.
[0100] FIGS. 46A and 46B illustrate an example of a jump shortcutting process of the present invention.
[0101] FIGS. 47A and 47B illustrate an example of a jump shortcutting process as applied to data references in accordance with the present invention.
[0102] FIG. 48 illustrates an example of a post-link time compaction process of the present invention.
[0103] FIG. 49 illustrates a load time code synthesis process for virtual memory in accordance with the present invention.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS
[0104] Exemplary Operating Environment:
[0105] FIG. 1A and the following discussion are intended to provide a brief, general description of a suitable computing environment in which the invention may be implemented. Although not required, the invention will be described in the general context of computer-executable instructions, such as program modules, being executed by a personal computer. Generally, program modules include processes, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. Moreover, those skilled in the art will appreciate that the invention may be practiced with other computer system configurations, including inside various programmable peripheral interface cards such as 126, 128, 130, 144, 158, 148 in FIG. 1A, inside programmable peripherals such as disks, game controllers and accessories, speakers, modems, printers and the like, in hand-held devices, multiprocessor systems, microprocessor-based or programmable consumer electronics, network PCs, minicomputers, mainframe computers, and the like. Thus, for example, the present invention can be an operating system of an optimally minimized configuration, as described below, running inside a network interface card of the network interface 158 of FIG. 1A or in an embedded control system or in a communication-oriented device. The invention may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located both in local and in remote memory storage devices.
[0106] With reference to FIG. 1A, an exemplary system for implementing the invention includes a general purpose computing device in the form of a conventional personal computer 120, including a processing unit 121, a system memory 122, and a system bus 123 that couples various system components including the system memory to the processing unit 121. The system bus 123 may be any of several types of bus structures including a memory bus or memory controller, a peripheral bus, and a local bus using any of a variety of bus architectures. The system memory includes read only memory (ROM) 124 and random access memory (RAM) 125. A basic input/output system 126 (BIOS), containing the basic process that helps to transfer information between elements within the personal computer 120, such as during start-up, is stored in ROM 124. The personal computer 120 further includes a hard disk drive 127 for reading from and writing to a hard disk, not shown, a magnetic disk drive 128 for reading from or writing to a removable magnetic disk 129, and an optical disk drive 130
for reading from or writing to a removable optical disk 131 such as a CD ROM or other optical media. The hard disk drive 127, magnetic disk drive 128, and optical disk drive 130 are connected to the system bus 123 by a hard disk drive interface 132, a magnetic disk drive interface 133, and an optical drive interface 134, respectively. The drives and their associated computer-readable media provide nonvolatile storage of computer readable instructions, data structures, program modules and other data for the personal computer 120. Although the exemplary environment described herein employs a hard disk, a removable magnetic disk 129 and a removable optical disk 131, it should be appreciated by those skilled in the art that other types of computer readable media which can store data that is accessible by a computer, such as magnetic cassettes, flash memory cards, digital video disks, Bernoulli cartridges, random access memories (RAMs), read only memories (ROM), and the like, may also be used in the exemplary operating environment.
[0107] A number of program modules may be stored on the hard disk, magnetic disk 129, optical disk 131, ROM 124 or RAM 125, including an operating system 135, one or more application programs 136, other program modules 137, and program data 138. A user may enter commands and information into the personal computer 120 through input devices such as a keyboard 140 and pointing device 142. Other input devices (not shown) may include a microphone, joystick, game pad, satellite dish, scanner, or the like. These and other input devices are often connected to the processing unit 121 through a serial port interface 146 that is coupled to the system bus, but may be connected by other interfaces, such as a parallel port, game port or a universal serial bus (USB). A monitor 147
or other type of display device is also connected to the system bus 123
via an interface, such as a video adapter 148. In addition to the monitor, personal computers typically include other peripheral output devices (not shown), such as speakers and printers.
[0108] The personal computer 120 may operate in a networked environment using logical connections to one or more remote computers, such as a remote computer 149. The remote computer 149 may be another personal computer, a server, a router, a network PC, a peer device or other common network node, and typically includes many or all of the elements described above relative to the personal computer 120, although only a memory storage device 150 has been illustrated in FIG. 1A. The logical connections depicted in FIG. 1A include a local area network (LAN) 151
and a wide area network (WAN) 152. Such networking environments are commonplace in offices, enterprise-wide computer networks, intranets and Internet.
[0109] When used in a LAN networking environment, the personal computer 120 is connected to the local network 151 through a network interface or adapter 153. When used in a WAN networking environment, the personal computer 120 typically includes a modem 154 or other means for establishing communications over the wide area network 152, such as the Internet. The modem 154, which may be internal or external, is connected to the system bus 123 via the serial port interface 146. In a networked environment, program modules depicted relative to the personal computer 120, or portions thereof, may be stored in the remote memory storage device. It will be appreciated that the network connections shown are exemplary and other means of establishing a communications link between the computers may be used.
[0110] Introduction to the Architecture
[0111] In a preferred embodiment of the invention, the operating system components contain code and other metadata for classes of objects. When a component is loaded into an address space it is instantiated. The instantiated component creates object instances that communicate with other objects, potentially in other components. The objects expose their methods through Component Object Model (COM) [Brockschmidt95] interfaces. Threads execute code and synchronize through Mutexes and Condition variables. System components are typically written in C or C++ but there is no fundamental bias towards any particular language.
[0112] COM enables late binding, version compatibility and checking, transparency through proxies, cross language support, and is reasonably lightweight and efficient. Each object needs a method table pointer and a reference count. Each call adds one indirection for fetching the actual method pointer.
[0113] Component implementations in the preferred embodiment of the invention are rarely aware of their intended system layer. The same code can be used in different address spaces or contexts and can be nested. A filesystem can be applied to a file provided by another filesystem as well as to one provided by a disk driver. A heap can be applied to any memory: physical memory, memory allocated from another heap, or memory provided by a virtual memory manager. The loader loads modules into any address space.
[0114] Selection of System Components
[0115] What components should be part of a deployed system depends upon the application itself and its interface requirements, application memory requirements, security requirements, and the target hardware capabilities. Flexible loading of modules was an important design goal for the operating system described herein. The loading of components can be deferred until they are actually used by an application. Device drivers and runtime services typically fall into this category. Others can be loaded just prior to running an application, such as virtual memory for untrusted applications. Most services will terminate themselves when they are no longer needed.
[0116] Drivers and virtual memory can not be used when the hardware to support them is not present. An application that tries to use them will look them up in the demand-loading namespace. The lookup operation fails, either because the driver is absent or it returns a NULL pointer.
[0117] Execution Model
[0118] Components have code, static data, a stack and a number of dynamic objects. A heap object allows dynamic memory allocations. The stack is pointed to by the stack pointer register; it is allocated from the heap. In a physical memory system the initial size of the stack is also the maximum size of the stack; every byte has to be paid for by real memory. Thus in an embedded application the stack size must be chosen carefully. Most compilers can generate stack checks at function entry, to guard against stack overflows. In a virtual memory system, the stack does not have to be backed by real memory, which can be allocated on demand. The stack only consumes virtual address range and can thus be allocated liberally. A Real-Time application might still want to pre-allocate all memory in order to avoid run time fluctuations. In this case the existence of virtual memory does not affect the stack.
[0119] Memory for code and static data is also allocated from the heap. Code can be placed anywhere in memory if it is either position-independent (pc-relative) or relocatable. The Microsoft Visual C++ compiler, for instance, creates a compressed relocation table that the runtime loader uses to fix any references if the executable was placed in a different place in memory than it was linked for. All compilers for embedded use provide similar functionality, although the specific image formats and relocation schemes differ.
[0120] On the other hand, it is often found that most compilers do not support reentrancy. If the code in an image is not reentrant, it is still possible to execute multiple instances of the same image in the same address space. The code and data are simply loaded multiple times, each time relocated differently.
[0121] If the relocation information is not present, and a component virtually overlaps with another component it cannot be executed in the same address space. In this case a new address space is required, which in turn requires virtual memory.
[0122] System Components
[0123] An exemplary base set of system components in a preferred embodiment of the invention is now described.
[0124] Referring to FIG. 1B, an exemplary operating system in accordance with an embodiment of the invention has a kernel or link-time component 202 and a set of run-time loadable resources 204. The kernel 202 includes a set of software resources including, preferably, a HEAP (physical memory manager) 302, a loader 304, a support library 306, a timer 310, an interrupt control unit 312, a scheduler 314, thread support 316 including synchronization primitives 318, NameSpace 320, filesystem 322 and a startup program 324. The set of run-time loadable resources 204 are available to the system through the loader 304. The resources 204
include, preferably, a virtual memory manager (VMM) 362, inter-process communication 364, drivers 366, applications 368 and a network program 370. A minimal multi-threaded kernel may be provided in accordance with the present invention having only the thread support 316, the scheduler 314, the library 306, the timer 310 and the startup 324. If multi-threading is not desired, the kernel may be further minimized to include only the library 306, the timer 310 and the startup 324.
[0125] As illustrated in FIG. 1B, the ICU (interrupt control unit) 312
preferably includes the following software methods at link time: install VMM (virtual memory manager) trap handler 372, install IPC (inter-process communication) trap handler 374. These resources are preferably included in the interrupt control unit 312 because it is possible for such a system to take a VMM trap or an IPC trap or a page fault whether or not a VMM or IPC has been loaded.
[0126] Any one of the components contained in the set of loadable resources 202 may be fetched by the loader 304 and loaded into the operating system on a demand or as-needed basis during run time. The loader 304 registers the name of any such component that has been so loaded in NameSpace 320 so that all users in the system can find the component by querying NameSpace 320.
[0127] In particular, the VMM 372 is loadable upon demand into the operating system and may be unloaded when all demand for it disappears.
[0128] Different implementations of a virtual memory manager may be selected for different purposes during run time from a VMM library 380
storing a set of VMMs, as shown in FIG. 1B.
[0129] Heap
[0130] The Heap 302 implements (physical) memory management, allowing dynamic memory allocations with specifiable alignments. The constructor allows creating nested heaps or heaps over virtual memory.
[0131] Loader
[0132] The Loader 304 is used to load additional components into a running system. Most embedded systems do not provide a loader, and if not needed it can be eliminated at link time from this system as well. Multiple image formats are supported. The loader loads images into the same address space, or given a flag and a virtual memory system, it creates a new address space and loads the image in there.
[0133] No particular distinction is made herein between executables and DLLs (shared libraries). An executable is simply a DLL that exports no entry points besides main( ).
[0134] Support Library, Machine Initialization
[0135] The library 306 is a shared support library and includes common base utilities like memcpy and other compiler support routines, AtomicAdd, CurrentThread, etc. It is used by many system components and is available to applications.
[0136] Basic machine initialization code is used at startup and system reset. Most of the machine dependent code of the componentized operating system of the invention resides here.
[0137] Timer and Interrupt Drivers
[0138] A driver for the timer 310 is used by the scheduler 314 to keep track of time and for thread pre-emption. A driver for the Interrupt Control Unit (ICU) 312 dispatches interrupts and keeps a registry of interrupt routines, which can be installed and removed by other components. The system has no particular notion of a "device driver" per se.
[0139] It does enforce strict limits as to what an interrupt routine can do: wakeup a thread.
[0140] Scheduler
[0141] The scheduler 314 is a policy module that determines which thread should run at any given time. Low-level management of blocking and switching between threads is handled by the thread and synchronization components 316, 318.
[0142] The timer interrupt and thread and synchronization modules call into the scheduler 314, possibly passing callback functions as arguments.
[0143] Three example schedulers have been implemented: the null scheduler, a round robin scheduler, and a constraint based Real-Time scheduler. The null scheduler is for systems that use only one thread. The round robin scheduler provides time-sharing, it can easily be extended to handle dynamically changing priorities. Constraint scheduling is for consumer Real-Time applications and is described in [Jones97]. The existence of these schedulers proves that the scheduling interface is necessary and sufficient to implement all of the prior art scheduling policies.
[0144] Threads and Synchronization
[0145] The thread support and synchronization components 316, 318 provide basic thread support and synchronization primitives. Each thread is run in one address space. A thread is usually created in the address space of the component in which it is started. If there is no virtual memory, the address space is always the physical address space. Threads can be created in remote components as well as in local components that are part of the same address space. Threads can block on mutexes and conditions. They can inform the scheduler of their time constraints, but these calls will fail if the scheduler is not a constraint scheduler. The constraint scheduler performs priority inheritance when threads block on mutexes. Preferably, the thread support 316 and scheduler 314 are separated so that the scheduler 314 and be changed while maintaining thread support. So, a third party could change the scheduler 314 without affecting applications, so that the applications and the scheduler are isolated.
[0146] NamespacesA simple boot namespace where applications register objects may be provided. The Namespace 320 is a namespace that cooperates with the loader 304 in demand-loading and caching of components. A namespace may be used for displaying the status (e.g. running threads) and performance parameters (e.g. execution times) of a system during development. Filesystems are also namespaces.
[0147] Filesystem
[0148] Filesystem 322 is used to load additional components during runtime, and as permanent data repository. RomFS is a filesystem for read-only in-memory images (arbitrary files and the system can be merged into one image) and FatFS is for reading/writing disks. NetFile is a network filesystem client built on top of sockets.
[0149] Startup Program
[0150] Startup 324 is a program that is started once the system has been initialized. It can be a simple command interpreter that configures the system and launches applications, or the (only) application itself.
[0151] Network
[0152] The Network Program 370 provides the entire BSD4.4Lite network protocol code, with minor adaptations. The interface is a COM interface that provides sockets. The protocols operate the network drivers through another interface.
[0153] Other Components
[0154] A small Win32 compatibility library may be provided to make it easier to use WindowsNT code in some of the drivers and applications.
[0155] How NameSpace is Used to Manage the Loading of Objects
[0156] Namespaces are used to let applications gain access to objects provided by other components. A namespace is like a filesystem directory tree, except it can hold any kind of objects, not just files. Namespaces can themselves be implemented by different components, including a filesystem that exports its directories as sub-namespaces, and files as registered objects. Namespaces can be registered into other namespaces, extending the directory tree. Location transparency of all objects automatically makes namespaces distributed. Therefore it is easily possible to view some other machine's NameSpace as a sub-Namespace of one's own machine. Namespaces can be filtered for access control or for providing different views for different applications. There is no limit as to the number of namespaces. A component can gain access to its namespace through a call to CurrentNamespace( ). In a minimal system, all applications share the same boot namespace.
[0157] When an application looks up a name in the namespace, it obtains a reference to the object: a local direct reference in case the object is local, or an automatically created proxy if the object is remote. (For remote objects, the interprocess communication (IPC) system described below in this specification is responsible for creating proxies, handling the delegation to remote objects, and reference counting.) A namespace is free to only give access to previously registered objects or to create objects on demand, as it sees fit. The namespace only handles the IUnknown interface. It is up to the application to obtain the proper interface from the object, using the QueryInterface method.
[0158] The interface of Namespace includes a method called Bind. The Bind method is used to request an object. The Bind method finds whether the requested object has already been loaded and, if not, Bind obtains the IUnknown interface of the requested object and returns it as an argument. Bind returns a pointer to the IUnknown's pointer to the requested object. Bind is the method that looks up the requested object in Namespace while Register is the method for registering the object in Namespace. After the object has been loaded, the Query Interface method may be used to query the object.
[0159] Objects can be made available to other components by registering them in a namespace. Every COM object has a virtual method table and at least the three methods derived from the base interface (the IUnknown) used with Namespace: QueryInterface for agreeing on the interface protocols, and AddRef and Release for reference counting. Specific interfaces have additional methods to do the actual work. In addition, a constructor is usually provided.
[0160] Garbage collection is done through reference counting. When Release has called for the last reference, the implementation can finalize and deallocate the object. Even if reference counting has limitations, it is convenient in a system environment due to its simplicity.
[0161] Interaction with objects using other garbage collection models can be achieved through proxies that intercept the IUnknown methods to update their root sets.
[0162] Loadable Virtual Memory Manager
[0163] Virtual memory provides virtual address spaces. Threads run in either a virtual address space or in the physical memory space. Components can be loaded into any address space. Component code may be shared between various address spaces like shared libraries. For instance, any code loaded into the physical memory space is made visible to applications regardless of their address space. There is nothing secret in the system's code, so there is no security problem. Virtual memory can be used to protect sensitive application code, for instance to defend it against disassembly and theft of intellectual property. Unlike most existing operating systems, in the preferred embodiment of the invention the support for virtual memory is not an integral part of the system. The system can function with or without it, and it executes the same executable binary images. The virtual memory manager is a component like any other, and is loaded dynamically on demand.
[0164] Loading or unloading of the virtual memory system does not interfere with applications already running, or started later on in the physical memory space. Once the virtual memory system has been started, new components can be loaded into any address space. Component code may be shared between different address spaces, as is the case with shared libraries.
[0165] Virtual memory can be used for:
[0166] Security reasons, when a program is not trusted. The virtual memory system implements firewalls between applications.
[0167] Covering for common programming errors such as NULL pointer references and memory leaks.
[0168] Creating a sparse address space. This often leads to better memory utilization with fragmented heaps.
[0169] Paging. This provides more memory than available, working set adaptation, and mapped files.
[0170] Safe and flexible memory sharing: Copy-on-write for libraries, shared memory windows.
[0171] Running non-relocatable executables as described above. Implementing garbage collection and other protection mechanisms.
[0172] In accordance with preferred embodiments of the invention, the virtual memory manager is a loadable component that provides multiple virtual address spaces. It can be viewed as a driver for MMU hardware. It creates virtual memory mappings using physical memory and MMU hardware. Loading and starting the virtual memory manager executable does not interfere with applications already running. Unloading can be done once all references to objects provided by the manager are released. A new one can be started if needed.
[0173] A virtual memory space looks like the physical memory space, except it can be larger, doesn't have to be contiguous, can be paged, protected, replicated, and can be (recursively) mapped to other objects.
[0174] The virtual memory manager exports a number of control interfaces that are used to create new address spaces (VMSpace), to map address spaces or files to address spaces (VMMap), to link threads to address spaces (VMView), to control state and protections (VMSpace), and to create instances of virtual memory objects (VMFactory).
[0175] Realistically, any MMU driver will need exclusive control of the MMU hardware. However, other objects implementing the virtual memory interfaces can be interposed between an application and the MMU driver. In this way, logical virtual memory systems can be arbitrarily composed or, for instance, stacked as in the Exokernel. A stacked virtual memory system relies on another one for its base functionality, but add some specific functionality of its own. For example, the bottom virtual memory manager in the stack could control the local or host machine, while virtual memory managers stacked in higher layers would control other machines thereby providing an application running on top of multiple machines the illusion of running on a single shared memory multiprocessor system. What the present invention facilitates is multiple levels of virtual memories which are recursively stackable. Furthermore, the invention provides transparency. The Exokernel approach was to have a library that implements a virtual memory, so that there is in effect only a single level, since one library cannot be stacked over another library. In the present invention, the different virtual memories exist in different address spaces, so that there is no corresponding limitation on stacking multiple levels, and the stack is remotable. For example, referring to FIG. 1C, the invention can be used to make a collection of different machines (machine-A and machine-B) appear as a single address space by employing a local virtual memory (VM-1) while other virtual memories (VM-2 and VM-3) relating to the address spaces of the other machines but having the same interface as the local virtual memory are stacked over it.
[0176] Design
[0177] The virtual memory interfaces were designed for flexibility and simplicity:
[0178] VMSpace. Methods: Reserve, Delete, Map, Protect, CacheControl, QueryVM, CreateShadow.
[0179] Reserve( ) reserves regions in the VMSpace, Delete( ) deletes them. Map( ) maps VMMap objects to the space. Protect( ) sets permanent attributes (e.g. read-only), CacheControl( ) controls transient attributes (e.g. dirty) and paging state (e.g. present). QueryVM( ) returns information about ranges, CreateShadow( ) creates a new mapping, atomically moves ranges from the current mapping to the new one and maps the new one into the current one. It facilitates symmetric copy-on-write. A VMSpace also implements the VMMap interface, returned by QueryInterface( ).
[0180] VMMap. Methods: Read, Write, Share, QueryAccess, GetSize, Clone.
[0181] Read( ) and Write( ) are used for copy-paging. Share( ) is used to establish shared pages between two VMSpaces and to return pages to VMView::Fault( ). QueryAccess( ) and GetSize( ) return information about the mapping. Clone( ) creates a restricted mapping from the given VMMap. The constructors are used to turn files into mappings and to create memory with no backing store.
[0182] VMView. Methods: SwitchTo, Fault, SetMapping, GetMapping.
[0183] SwitchTo( ) is called by the context switch path.
[0184] Fault( ) is called at page fault time.
[0185] SetMapping( ) changes the associated VMMap (and indirectly VMSpace).
[0186] GetMapping returns the current associated mapping.
[0187] VMFactory. Methods: CreateVmView, CreateVmSpace, CreateVmMappingFromFile, CreateVmMappingFromZero. VMFactory is an object that contains the constructors of other VMM objects.
[0188] Process creation works as follows. A new VMSpace and a new VMView are created. The VMView is bound to the VMSpace. A temporary file is created as backing store for zero-fill memory allocations. A VMMap is constructed with CreateFromFile( ), and it is mapped into the VMSpace. A Heap is created from the resulting memory. The file for the executable is looked up from a namespace. A VMMap object is constructed from that file. The VMMap is mapped into the VMSpace copy-on-write, with backing store for any copies coming again from the temporary file. The loader relocates the code if necessary. An IPC channel to the newly loaded component is created. A thread is created with the VMView associated to it. The thread is started by handing it to the scheduler.
[0189] Page fault handling works as follows. VMView::Fault( ) is invoked. This in turn calls Share( ) on the associated VMSpace (which also exports VMMap interface). Assuming the page is not present, Share first calls VMSpace::CacheControl( ), which calls VMMap::Read( ) on the file's VMMap, which in turn calls File::Read( ) on the file object. The VMSpace then adds the returned data to its page list and returns a reference to it to the VMView, which adds it to the VTLB (virtual translation look-aside buffer), which makes it available to the hardware.
[0190] Memory wiring works as follows. The application invokes VMSpace::Protect( ) with the wire flag set. Protect first calls VMSpace::CacheControl( ) to page in all the pages. VMSpace::CacheControl( ) may fail if the physical memory is exhausted. In this case Protect fails without wiring any pages. Otherwise it marks the pages wired and returns successfully. Any operation affecting permanent attributes is atomic. Those affecting transient attributes are not.
[0191] FIG. 2 illustrates how page table entries are used in providing translation between virtual memory addresses and physical memory addresses by the run-time loadable VMM 362 of FIG. 1B. Each VMSpace object stores a pointer value which is loadable into a page directory register 400 provided in the host computer's microprocessor. The pointer value, once loaded into the page directory register 400, points to a particular page table 402 which is one of many page tables available to the system. Each page table provides a particular set of address translations between virtual memory addresses and physical memory addresses. Different objects may be stored in different memory locations and therefore may require different address translations. The individual page table 402 has a set of virtual addresses 404 and physical addresses 406 correlated to corresponding ones of the virtual addresses, the addresses constituting the page table entries of the page table 402.
[0192] Certain advantages of virtual memory are realized by controlling the allocation of physical memory. For example, in order to protect certain areas of memory from being written, the page table entry for a particular physical address may include an indication that the address is available for read-only. In order to hold a certain area in reserve or to delay the allocation of physical memory until an actual need arises, the page table entry for a particular physical address may include an indication that the physical address is "invalid". A pointer to the corresponding virtual memory address will cause the system to take a virtual memory trap, as will be described below.
[0193] Referring to FIGS. 3 and 4, the virtual memory manager (VMM) includes the following interfaces: IVMSpace 610, IVMMap 620, IVMView 630, IUnknown 640 and IVMFactory 650. The IUnknown interface preferably is included in every object. The purpose of this is to give every application the ability to query the object to determine if it supports a given interface. IUnknown refers to processes for querying for a given interface ("query"), for adding a reference to the object ("add") and for releasing a reference to the object ("release"). Thus, each of the three primary interfaces of VMM includes the three processes of IUnknown, as illustrated in FIG. 4.
[0194] VMFactory 650 has an interface INamespace 652 which it exports. Inamespace is used to enumerate all the objects that one virtual memory manager handles, regardless of whether they have also been registered in the common Namespace.
[0195] FIG. 4 illustrates the VMM interfaces as containing particular methods. IVMSpace contains the methods Query, Add, Release, Reserve, Delete, Map, Protect, CacheControl, QueryVM, CreateShadow. IVMMap contains the methods Query, Add, Release, Read, Write, Share, Clone, QueryAccess, GetSize. IVMView contains the methods Query, Add, Release, SwitchTo, SetMapping, Fault, GetMapping. IVMFactory contains the methods or constructors CreateVMView, CreateVMSpace, CreateVMappingFromFile, CreateVMMappingFromZero. IUnknown contains the methods Query, Add, Release.
[0196] FIG. 5 illustrates the internal object architecture employed in a preferred embodiment of the invention. The object has a virtual table (V-table) 510 and a state 520. An instance pointer 530 points to the object. The state 520 can include, for example, the page table register contents, the IUnknown reference count and pointer values. The pointers of an object will be discussed below with reference to FIG. 7.
[0197] The V-table 510 points to a particular interface 540, which may be one of many interfaces available to the system. The interface 540 lists a number of processes or methods associated with the particular object. Each method entry listed in the interface 540 points to an implementation 550 of that method. The implementation 550 contains the code for carrying out the algorithm implementing the particular process or method listed in the interface 540. As illustrated in FIG. 5, more than one object may point to the same interface. Of course, it is to be expected that different objects have their own unique interfaces in many instances. For any application calling the object, the object's interface provides a complete list of the processes or methods supported by the object.
[0198] FIG. 6 corresponds to FIG. 5 and illustrates the object architecture of the VMView object of the VMM (virtual memory manager). In addition to the three universal IUnknown processes of query, add and release, VMView further includes the processes of fault and switch to. The "fault" process will be discussed below with reference to FIG. 10
while the "switch to" process of VMView will be discussed below with reference to FIG. 12.
[0199] FIG. 7 illustrates a preferred embodiment of the Virtual Memory Manager (VMM). The VMSpace is implemented as a sequence of regions connected by a skip-list of the type well-known in the art. Each region contains a mapping, a copy mapping (for copy-on-write), a number of page lists for physical memory pages, and a set of attributes. Permanent attributes are kept in the region; any transient state is part of the page list. The VMSpace also exports a VMMap interface so that it can be directly mapped into other address spaces. FIG. 7 illustrates mappings by VMMap of various regions of VMSpace to other objects, as will be discussed in more detail below. A VMSpace is 64 bits wide in the preferred embodiment. The VMView provides a view into a VMSpace for one or more threads. It is limited by the hardware address size (e.g. 32
bits). If the VMSpace it points to is indeed larger (64 bits) then the view is a window into part of that space. A virtual translation look-aside buffer (VTLB) is attached to the VMView. The VTLB contains machine dependent mapping information for the MMU hardware and a translation cache. The VTLB interface is common across all architectures. The rest of the system is unaware of the virtual memory component, with a few exceptions. A thread must hold a pointer to its VMView so that page faults can be resolved within the correct context, and the context switch path must check for a VMView change. If the context switch path detects an address space change, it calls a VMView method to synchronize the MMU hardware with the change. The virtual memory manager registers its interrupt handler with the ICU driver. The Heap may choose to modify its behavior when running over virtual memory. The loader can only create new address spaces when a virtual memory system is present. The IPC system may utilize virtual memory mappings for data transfer.
[0200] Referring to FIG. 7, different threads 710, 715 can contain pointers pointing to the same VMView object 720. The VMView object contains pointers pointing to a page of PTEs (page table entries) 725 and to a VMSpace object 730. The VMSpace object 730 points to (defines) Region1, Region2 and Region3, corresponding to different non-contiguous memory regions linked by the skip-list. These regions are mapped to other objects in the manner illustrated in the example of FIG. 7 by pointers provided by the VMMap interface of VMSpace. Region1 is reserved (no writing permitted) and points to an empty space 735 (labelled Empty). Region2 has one pointer to a VMMap object 740 corresponding to a region in memory containing all zeroes, which is labelled ZeroVMMap. Region2 has another pointer to a page list 745. The page list 745 points to a memory 750 having allocated memory spaces in which writing is permitted. Region3
has one pointer to a VMMap object 755 which is an implementation of VMMap for mapping files and is labelled FileVMMap. The FileVMMap has a pointer pointing to a file 760. Region3 has another pointer pointing to a VMSpace object 765 (which is different from the VMSpace object 730 discussed above). The connection through Region3 between the two VMSpace objects 730, 765 supports a copy process. Region3 has yet another pointer pointing to a PageList 770. The PageList 770 has a pointer pointing to a second PageList 775. The second PageList 775 points to a Memory object 780. The linking of the two successive PageLists supports a copy-on-write function.
[0201] In summary, the VMMap object can map a region to another VMMap object, examples of which are illustrated in FIG. 7 such as the pointer from Region3 to File VMMap 755 and the pointer from Region2 to Zero VMMap 740. The mapping can be to a different portion of the VMMap object itself rather than another VMMap object. The VMMap object can map to the ZeroMap object providing zero-filled physical pages, as illustrated in FIG. 7 by the pointer to the ZeroMap object 740. The VMMap object can map a region to a VMSpace object, as illustrated by the pointer from Region3 to VMSpace 765. The VMMap object can map a region to a file, as illustrated by the pointers from Region3 to the File VMMap 755 and from thence to File 760. This may include the case of the "system pager" which handles the traditional paging file. Finally, the VMMap object can map a region to a mapping filter ("CloneMap") which, for example, restricts the protections allowed of a mapping, as illustrated by the pointer from Region3 to CloneMap.
[0202] A PageList lists what pages have been called by the object. This is useful in case this information is forgotten by the PTE, for example.
[0203] The state of an object, such as the virtual memory object of FIG. 7, consists of the pointer values of the various pointers in the object (such as the pointers illustrated in FIG. 7).
[0204] FIG. 8 illustrates the Load VMM (load virtual memory manager) process. The first step (block 810 of FIG. 8) is to install the VMM Fault Handler method. The last step (block 830) is to register the VMM constructors, which are part of the VMFactory interface. This step is carried out by creating an object of the IVMFactory type and registering it into the NameSpace.
[0205] A local constructor is a method in an object. In order to export it to another address space, the constructor must be wrapped in a factory object of the type well-known in the art. These objects are designated as "XYFactory". For example, VMFactory can create VM objects. There is a correspondence between C++ classes and factory objects. A C++ compiler could automatically create the factory of an object.
[0206] FIG. 9 illustrates how a virtual memory fault is handled. The first step is to save the state of the object or thread that was running at the time the fault or trap was taken (block 910 of FIG. 9). Next, the VMM Fault Handler is called (block 920). Then, the state of the object is restored (block 930).
[0207] FIG. 10 illustrates the operation of the VMM Fault Handler. The first step (block 1010 of FIG. 10) is to call IVMView::Fault. The next step is to determine whether IVMView::Fault can provide a VM mapping (block 1020). If so ("YES" branch of block 1020), the page table entry is loaded (block 1030). Otherwise ("NO" branch of block 1020), an exception is taken (block 1040) and an exception handler is called (block 1050).
[0208] FIG. 11 illustrates how a VM fault is taken. The first step (block 1110 of FIG. 11) is to determine whether the VM fault is due to an error. If so ("YES" branch of block 1110), then an exception is taken (block 1120). Otherwise ("NO" branch of block 1110), a determination is made whether the VM fault is due to a memory allocation postponement, sometimes referred to as "lazy memory allocation" (block 1130 of FIG. 11). Lazy memory allocation is a feature that can be implemented to guard against premature allocation of memory otherwise caused by early requests by an application. The determination of whether the VM fault-is due to lazy memory allocation is made by determining whether there is any reference to the object in the PageList. If the fault was due to lazy memory allocation ("YES" branch of block 1130), then VMSpace is called to perform the memory allocation (block 1140 of FIG. 11). Otherwise ("NO" branch of block 1130), a determination is made whether the VM fault was due to a copy-on-write process (block 1150). This determination is made by asking whether the current PageList points to another PageList (as discussed above with reference to FIG. 7). If not ("NO" branch of block 1150), the reference in PageList is copied to the page table or PTE (block 1160). Otherwise ("YES" branch of block 1150), the pointer to the other PageList is taken (block 1170 of FIG. 11). This entails allocating a new page, copying the content of the old page to the new page, and then entering the mapping for the new page in the PTEs.
[0209] FIG. 12 illustrates the operation of the context switch process. First, the Scheduler (shown in FIG. 1) decides to perform a context switch (block 1210 of FIG. 12). This decision may be occasioned, for example, by a thread running up to a predetermined time limit. As a result, the Scheduler selects a new thread to replace the currently-running thread (block 1212 of FIG. 12). In a preferred embodiment, the next step is to inspect the views of memory provided by IVMView of the current and new threads (block 1214) and determine whether they are different (block 1216). If so ("YES" branch of block 1216), IVMView::SwitchTo is called (block 1218). Otherwise ("NO" branch of block 1216), the system returns to the new thread (block 1220) without altering the contents of the page directory register 400 of FIG. 2.
[0210] FIG. 13 illustrates the SwitchTo process, consisting of the step of loading the page directory register 400 of FIG. 2 with the new page directory value.
[0211] FIG. 14 illustrates the process for unloading the virtual memory manager (VMMUnload). The first step begins whenever the last thread of the last address space using virtual memory requests termination (block 1410 of FIG. 14). The next step is to determine whether the reference count is zero (block 1420). If so ("YES" branch of block 1420), a choice (block 1430) may be made whether to terminate the virtual memory (block 1432) or to mark the VM object as "cached" (block 1434). In the preferred embodiment, the virtual memory manager is terminated (block 1432). Otherwise ("NO" branch of block 1420), the reference is released, which decrements the reference count by one (block 1440).
[0212] FIG. 15 illustrates the page fault handling process. The first step is to call IVMView::Fault upon the occurrence of a page fault (block 1510). IVMView::Fault calls VMMap::Share (block 1520) which in turn calls VMSpace::CacheControl (block 1530), which in turn calls VMMap::Read (block 1540). VMMap::Read calls File::Read (block 1550), which returns data--such as the content of a file, for example--(block 1560 of FIG. 15). Finally, VMSpace adds the returned data to PageList (block 1570).
[0213] FIG. 16 illustrates how a constructor creates a thread. First, VMFactory creates a VMSpace, and VMSpace creates an address space (block 1610). Next, VMFactory creates a VMMap and VMMap maps an object into the address space (block 1620). Then, VMFactory creates a VMView and VMView creates a view of the address space created in the previous step (block 1630). Finally, the scheduler creates a thread associated with the view created in the previous step (block 1640).
[0214] FIG. 17 illustrates how multiple views of the same memory space are provided for multiple threads. In the example of FIG. 17, two different threads, THREAD1 and THREAD 2 are directed by the PTE to two different views, VMView1 and VMView2, respectively, of the same memory space, VMSPACE1 through the PTE. For THREAD1, VMSPACE1 has a pointer to VMMap1
which points to Filel. For THREAD2, VMSPACE1 has a pointer to VMMap2
which points to File2.
[0215] FIG. 18 illustrates possible results of the implementation of FIG. 17. In FIG. 17, the view of memory by THREAD1 may make the memory appear smaller than its actual physical size, which the view of THREAD2 may make the memory appear to have a wider address field than that of the actual physical memory. For example, the actual physical memory may have an address space that is 32 bits wide while it appears to be 64 bits wide in the view of memory provided to THREAD2. On the other hand, the physical memory address space may be 64 bits wide while it appears to be only 32
bits wide in the view provided to THREAD1. In a more complex case, the address space widths of the objects VMView ("THREAD1") and VMSpace ("THREAD2") and of the physical memory may all differ from one another, being in one example 32 bits, 64 bits and 36 bits, respectively. The address space width in VMSpace may be chosen without regard to the address space widths of the virtual memory and the physical memory.
[0216] The virtual memory interfaces described here loosely correspond to the Mach external memory manager scheme [Young89]. VMView and VMSpace replace Mach's task. Memory objects are replaced by VMSpaces. VMMap and VMSpace jointly implement the XMM interface, although with synchronous interfaces only. The VTLB interface is a refinement of the Pmap interface.
[0217] In other implementations of the invention, one could emulate Unix or Windows virtual memory APIs, and the design would easily permit it (including fork).
[0218] Loadable Interprocess Communication (IPC) Manager
[0219] An IPC system is needed if applications are to be run in separate address spaces. Otherwise the applications can not talk to each other or to system services. An IPC system allows:
[0220] Communication between address spaces within a machine.
[0221] Communication between applications in different machines in a distributed environment.
[0222] Graceful termination and cleanup of applications even within one address space.
[0223] Cleanup involves releasing the memory held by an application. It also involves closing all references into and out of the application's objects. A level of indirection is needed for bookkeeping and for providing a cutoff point. This level of indirection is what an IPC system provides.
[0224] A preferred embodiment of the IPC system implements the COM model. It is possible, however, to replace it with another communication model for applications that expect a different model. Components implementing various communication paradigms can be loaded into the system as needed.
[0225] The preferred interprocess communication (IPC) manager is a run-time loadable resource residing outside of the operating system kernel, as illustrated in FIG. 1B. The loader can load the Loadable IPC Manager at any time based upon need. The Loadable IPC Manager can also be unloaded whenever it is no longer required by any applications currently running.
[0226] FIG. 19 illustrates the basic components of the Loadable IPC Manager. A trap component 1910 consists of a program to install the method IPC Trap Handler 1912, which is called whenever the system recognizes a need within the currently-running thread to communicate with another thread or a need to communicate with a thread in another machine. A copy component 1930 can be implemented with either a simple "copy" function or virtual memory or shared memory.
[0227] FIG. 20 illustrates the operation of the Loadable IPC Manager, which may have the file name IPC.EXE in a preferred implementation. The first step is to load the Loadable IPC Manager (block 2010). The next step is to install the IPC Trap Handler (block 2020). Finally, the IPC constructors are registered in NameSpace (block 2030).
[0228] FIG. 21 illustrates the structure of the interface between the loadable IPC and threads that use it. In the exemplary case of two threads 2110, 2120 which use a particular Loadable IPC Manager 2130, each thread has its own FIGURE pointer 2110a, 2120a pointing to the Loadable IPC Manager 2130. The Loadable IPC Manager 2130 has interfaces for the following methods: Query Interface 2140, Add Reference 2150, Release Reference 2160 and IPC Trap Handler 2170. As will be discussed later in this specification, each one of these methods and interfaces is replaceable.
[0229] FIG. 22 illustrates the intercommunication provided by the Loadable IPC Manager between threads in different address spaces. One thread (THREAD1) resides in a first address space 2210, and has a pointer to a loadable IPC Manager 2220.
[0230] At some point in the running of THREAD1, it causes the Loadable IPC Manager to provide communication with another thread (THREAD2) in another address space 2230. Both THREAD1 and THREAD2 have a pointer to the Loadable IPC Manager 2220.
[0231] The Loadable IPC Manager provides the requisite communication between THREAD1 and THREAD2. For example, THREAD1 may need to increment a counter controlled by THREAD2 in the other address space. The magnitude of the increment may need to be communicated to THREAD2 and the value of the count after incrementation may need to be returned to THREAD1, in some cases. The Loadable IPC Manager handles both of these communications, thus providing two-way communication between THREAD1 and THREAD2 as illustrated in FIG. 22.
[0232] The IPC process begins when THREAD1 signifies a need to go to another address space to access some resource in that other address space. In one implementation, THREAD1 makes a remote procedure call directed to a resource controlled by THREAD2 in the other address space 2230. Such a remote procedure call is transparent to the application running. Since THREAD1 is currently running in the current address space 2210, the thread takes an IPC trap.
[0233] FIG. 23 illustrates how an IPC trap is handled. First, the state of the currently running thread (THREAD1) is saved (block 2310). Then, a call is made (block 2320) to the IPC Trap Handler which (unlike the Loadable IPC Manager stored outside of the operating system kernel of FIG. 1) resides within the operating system kernel.
[0234] FIG. 24A illustrates the operation of the IPC Trap Handler. First is the occurrence of an IPC trap (block 2410). The next step is to copy the arguments of the currently running thread (block 2420). These arguments may be, for example, the identity of a counter in another address space. The thread traps into the loadable IPC system so that its VMView is changed to the VMView that points to the new address space (block 2430). The thread continues running in the new address space where it invokes an object or method that can produce desired information or data such as a new incremented counter stored in a register (block 2440). The next step is to return the values of the new data from the thread (block 2450). Then, the thread switches back to the original VMView (block 2460). In contrast, conventional IPC systems (which were not loadable as the IPC of the present invention) typically did not move the thread from one address space to the other. Instead, the desired information in the second address space was obtained by selecting and running a different thread in that second address space. In contrast, the preferred embodiment of the loadable IPC of the invention uses a single thread and moves between among address spaces by changing its VMView. However, an alternative embodiment of the loadable IPC of the present invention could be implemented using the conventional technique employing different threads.
[0235] Structure of a Loadable IPC System
[0236] The loadable IPC system of the invention (hereinafter referred to as the "LIS") differs in structure from a non-loadable one in the handling of the dependencies upon other subsystems (such as the scheduler and the virtual memory). The LIS can only have dynamic dependencies, and therefore cannot use any "backdoor" or make assumptions about the internals of other components. The LIS finds these components in the NameSpace, most likely at initialization time. If a specific component is not found or cannot be loaded, the LIS fails to initialize and unloads itself.
[0237] It is possible for the LIS to require specific interfaces from these other components, which it asks for via the Query Interface method. This could mean that not all virtual memory managers would be suitable for a given LIS, only those that implement the interface(s) the LIS requires. The converse is also true: all virtual memory managers that implement the required interfaces should be suitable for LIS use. In the preferred embodiment of the LIS of the invention, it is an interface definition error for a component to violate this rule.
[0238] Different LISs can depend on different sets of other components. Specifically, it is possible for a simple LIS not to require a virtual memory manager and still be fully functional. For instance, such a LIS would be useful for connecting two simple computers that do not possess a memory management unit(MMU). The LIS provides two classes of services: administration services and communication services. Administrative services support the creation, linking and destruction of communication endpoints and any other ancillary service. Communication services involve the transmission of the input and output arguments of an RPC and the creation and management of proxies and method signatures. An application requests these services from the LIS in an architectural and LIS-dependent way. More specifically, a processor usually provides special instructions to request system services ("system calls"); execution of such an instruction causes a processor trap. The LIS can be viewed as the system handler for such a trap. The LIS is handed the state of the thread at trap time. It is then LIS-dependent what elements of the thread state are intended to convey information to the LIS. For example, a certain processor register might contain the index of the service the application requires. Different embodiments of the LIS could offer different services, and use different register use conventions for argument passing.
[0239] A thread is associated with a specific LIS. Different threads can be associated with different LIS's. If two LIS's do not communicate, then their client threads also cannot communicate. It is not preferable to split a single computer into two or more non-communicating subsystems, except for the extreme case of highly secure systems. Therefore, the most practical case is the one where multiple LIS's are able to communicate among each other. In the present invention, this can be accomplished quite naturally by loading all LIS's in the same address space, making them visible in a common NameSpace, and using normal object invocation for their interaction.
[0240] Interfaces and Methods
[0241] A preferred embodiment of LIS.EXE has five internal (not exported) interfaces: ILISFactory, IPCSpace, IEndPoint, ISignature, and IEndPointTable.
[0242] ILISFactory::CreateIPCSpace( ) creates an IPCSpace with one empty ExportTable and one empty ImportTable. Both tables are IEndPointTables.
[0243] ILISFactory::CreateEndPoint(IPCSpace,ObjectPointer) creates an EndPoint in the given IPCSpace, to represent the object at virtual address ObjectPointer. The EndPoint has the IUnknown default signature associated with it. The EndPoint receives a new IPID, which is a 128 bit universal identifier uniquely associated with the EndPoint.
[0244] ILISFactory::CreateSignature(IID,ParsingString[ ]) creates a type signature given an array of properly formed parsing strings. Each string describes the type signature of one method. There are as many strings as there are methods in the interface. IID is the 128 bit universal identifier for the interface.
[0245] IPCSpace::GetImportTable( ) returns the ImportTable associated with the given IPCSpace.
[0246] IPCSpace::GetExportTable( ) returns the ExportTable associated with the given IPCSpace.
[0247] IEndPoint::SetSignature(ISignature) changes the signature of the given EndPoint.
[0248] IEndPoint::GetSignature( )returns the signature of the given EndPoint.
[0249] IEndPoint::GetIPID( ) returns the IPID of the EndPoint.
[0250] ISignature::GetParsingString(MethodNumber) returns the parsing string for the given method number.
[0251] IEndPointTable::Add(EndPoint)adds an endpoint.
[0252] IEndPointTable::Remove(EndPoint) removes an endpoint.
[0253] IEndPointTable::Lookup(IPID) looks up the ID number of the endpoint.
[0254] In addition to the interfaces described above, LIS.EXE makes use of Proxies inside the application address space. A Proxy is an object like any other, but it merely acts as the representative of some other remote object. The application invokes methods on the proxy, and the proxy's implementation for all methods simply traps to LIS.EXE. Two notable exceptions are the AddRef and Release methods, which maintain a local reference count. Only the Release method traps, and only when the local reference count goes to zero. Other embodiments of LIS might not require the use of proxies.
[0255] Operational Overview of the Loadable IPC System
[0256] FIG. 24B illustrates how instances of these interfaces are related. In the drawing, two threads belong to two different applications. Thread-B was given access to the endpoint EndP-1.
[0257] LIS.EXE handles the NameSpace specially, for three reasons. LIS.EXE is responsible for exporting primitive (`kernel`) objects to its clients. These are the objects that are found in the boot NameSpace, but for which LIS.EXE does not have an EndPoint. Other components that are loaded in the physical address space along with LIS.EXE can create and register objects in the boot NameSpace before and/or while LIS.EXE is loaded and active. When an application looks up one such object in the NameSpace, LIS.EXE automatically creates an EndPoint in the primitive IPCSpace. Secondly, when an application calls IgameSpace::Bind( ) to obtain access to a remote object, LIS.EXE must be able to intervene and create a Proxy for the remote object in the application's address space. Similarly, when an application wants to INameSpace::Register( ) an object, LIS.EXE must be able to intervene and remember which VMSpace was associated with the object.
[0258] Finally, when an application terminates abnormally LIS.EXE cleans up after it. Among other things, LIS.EXE is responsible for removing from the NameSpace all objects that belonged to the dead application.
[0259] An application thread is associated with a given LIS at thread creation time. More specifically, the LIS associates an IPCSpace instance to the VMSpace of the thread. This IPCSpace instance is the object passed to LIS.EXE at trap time. Other per-thread information that is used at trap time is the service requested, object IPID, method number, and the stack pointer register to access the arguments of the method call.
[0260] An application thread can find LIS.EXE in the NameSpace. An IPCSpace contains an ImportTable and an ExportTable, each table containing pointers to EndPoints. The ExportTable points to EndPoints that the IPCSpace exports to other IPCSpaces. As a result, the application associated with that IPCSpace fully implements the object the EndPoint represents. The ImportTable acts as a security filter; it only allows a thread access to those remote objects that it was granted access. The ImportTable can also be used as a renaming table: the application uses local names for remote objects that are only meaningful when used with the application's ImportTable.
[0261] An EndPoint has a unique ISignature. This provides an advantage in the effect of the QueryInterface method: when an application invokes QueryInterface on a remote object, it truly receives a new proxy for a new EndPoint.
[0262] In the example illustrated in FIG. 24B, Thread-A creates an EndPoint EndP-1, passing in a Signature Signature-1 and a virtual address Object-A. The EndPoint is entered in the ExportTable ETable-A of the thread's IPCSpace-A. A reference is taken on the thread's VMSpace-A (not shown). Thread-A now registers the EndPoint in the NameSpace. One reference is taken on the object and one on the EndPoint, to signify that the object is exported and visible in the NameSpace.
[0263] Generally, threads do not explicitly create EndPoints; they register objects in the NameSpace. It is LIS.EXE that automatically creates an EndPoint for the objects as part of its overriding of the INameSpace::Register( ) method. Alternatively, a method invocation might require the passing of an object as argument or as return value. Again, LIS.EXE automatically creates an EndPoint if necessary, inserts it in the ExportTable if not already in there, and inserts it in the remote party's ImportTable if not there already.
[0264] With reference to FIG. 24B, Thread-B can either look up Object-A in the NameSpace, or invoke a method on some other object in VMSpace-A that returns Object-A as result. In either case, LIS.EXE finds that EndP-1 is the associated EndPoint, and enters it in ITable-B, the ImportTable for Thread-B's IPCSpace-B. A Proxy for Object-A is created in Thread-B's VMSpace-B (not shown). In order to create the proxy, Signature-1 is used to find the size of the necessary VTable, and for the loading (or memory mapping) of the proxy's marshalling methods. The proxy holds a copy of the EndPoint's IPID.
[0265] Thread-B can now invoke a method on the proxy. In this case, a pointer to the proxy's state is loaded in a register and a trap is taken. The proxy's IPID is used to find EndP-1 in ITable-B. The remaining arguments are on the stack-A, in VMSpace-A. A new stack-B is created in VMSpace-B and is remapped in VMSpace-A. Arguments are copied from stack-A to stack-B, according to the EndPoint's signature. Thread-B's VMSpace is now changed to VMSpace-B, and the stack pointer changed to point to stack-B. The return address is set to special code that traps back to LIS.EXE. Thread-B now jumps to executing the proper method on the real Object-A.
[0266] The return path is symmetrical, and it includes copying the return arguments back to stack-A, and switching back to VMSpace-A.
[0267] If Thread-B deletes the last reference on its proxy, then a trap is taken to LIS.EXE, which removes EndP-1 from ITable-B. A reference is Release ( )d from EndP-1, and the trap returns. The proxy is deleted.
[0268] Thread-A removes Object-A from the NameSpace. LIS.EXE deletes one reference from the object itself, and one from EndP-1. If this was the last reference on EndP-1, this indicates that the object is no longer in the ImportTable of any other IPCSpace. Moreover, it indicates that the object is not in the NameSpace. Therefore, the EndPoint can be safely destroyed.
[0269] At application cleanup time, LIS.EXE walks the ImportTable and Release( )s all the EndPoints. It then walks the ExportTable and removes all EndPoints from the NameSpace. Each EndPoint is then Release( )d. If the reference count of one such object does not go to zero it means that some application is actively using the EndPoint. There are two equally valid alternatives for handling this case. LIS.EXE could prevent the application from terminating until all references are gone. Alternatively, it could mark the EndPoint as dead and let the application terminate. If some other application tries to use a dead EndPoint, it receives an exception.
[0270] FIG. 24C illustrates the structure involved in the foregoing operations of the LIS. In the example of FIG. 24C, a first object, Object-A, resides in a first address space, Address Space 2, and a proxy object, Proxy-A, has been inserted by the LIS into a second address space, Address Space 1. LIS.EXE (corresponding to the LIS.EXE illustrated in FIG. 24B) provides communication between the the two address spaces. Proxy-A has an import index of "3" to the third entry in the import table 2472 in LIS.EXE. The import table 2472 has a pointer to an endpoint (EndPoint-1) which is exported by Address Space 2. Endpoint-1 includes the type signature and address of (i.e., a pointer to) Object-A in Address Space A. The export table for Address Space 2 in LIS.EXE (not illustrated) would have its own pointer to the same endpoint.
[0271] Endpoint-1 has a list of signatures ("iii") which define the bits to be taken from the top of Proxy-A's stack, Stack-A, in order to obtain all the necessary arguments to be passed in the method call. Information is put onto Stack-A as part of the call to (the proxy of) Object-A. FIG. 24C illustrates Object-A as including a V-Table pointer and a method table, in which the second method points to the implementation code of Method 2. Endpoint-1 contains an "object address" field containing the address ("0X123) of Object-A (i.e., the pointer to Object-A) and a signature ("iii") for Method 2 in Object-A's method table.
[0272] Proxy-A has an index value of "3" into the import table 2472. Upon the occurrence of an IPC trap, IPCSpace points to the import table 2472
as a collection of objects that the IPC imports or exports. Stack-B in address space 2 is a free stack, ready to accept incoming method calls. Upon occurrence of the IPC trap, the IPC looks at Stack-A and finds the import index (i.e., "3") and therefore goes to the third entry in its import table. This entry points to EndPoint-1, and EndPoint-1 has the signature ("iii") for Method 2 of Object-A. The following values are therefore copied to Stack-B as part of the call: arg 0, arg 1, arg 2, method 2, the object address 0X123, and the program counter for the code "call". As a result, Method-2 of Object-A is called with arg 0, arg 1 and arg 2 as specified in Stack-B of Proxy-A, and therefore the resulting communication has the appearance of a local call.
[0273] The Loadable IPC Manager is not only run-time loadable from the run-time resources into the kernel by the loader, but is also unloadable. Moreover, there may be more than one Loadable IPC Manager stored among the run-time loadable resources of the operating system. For example, a very simple Loadable IPC Manager--which takes up less space in memory--may be used in cases where the communication needed is within the same machine. A more powerful Loadable IPC Manager may be called whenever it is necessary to communicate with a thread running in another machine.
[0274] Object Mutation by Applications External to the Object
[0275] An object consists of an interface, an instance pointer, an implementation, and some state. The interface is a list of methods. The instance pointers and interfaces are exposed to other objects; the state and the implementation are not. Worker threads execute implementation code that accesses and modifies the state. Once an object instance has been created, the instance pointer, interface, and implementation are traditionally immutable, only the state can be changed by method calls.
[0276] The preferred embodiment of the invention allows run-time changes to the ordinarily immutable part of an object, even while the object is being usedThe term mutation as used in this specification refers to the act of atomically changing an ordinarily constant part of an object, such as a method implementation. The thread performing the mutation is called a mutator.
[0277] A mutator must translate the state of the object from the representation expected by the old implementation to the one expected by the new implementation. It must also coordinate with worker threads and other mutators through suitable synchronization mechanisms. Transition functions capture the translations that are applied to the object state and to the worker thread's execution state. In order to limit the amount of metadata, execution transitions only happen between corresponding clean points in the old and new implementations.
[0278] A number of mechanisms can be implemented using mutation. Interposition is done via replacement of the object with a filter object that points to a clone of the original object. A dynamic software upgrade would replace the incorrect implementation of a method with the corrected one. Run-time code generation might use a stub implementation as a trigger. Mutation can be used to replace generic code with a specialized version that exploits partial evaluation by treating ordinarily non-constant state as immutable. Once the optimistic conditions are no longer true, mutation allows reverting back to the generic code. Execution profiling might indicate that a different implementation would perform better, and trigger a mutation. Object mobility is realized by turning objects into proxies and vice versa.
[0279] One example where mutation in accordance with the present invention was found to be useful was in device drivers. In one configuration on the x86 the invention was implemented with minimal floppy and disk drivers that called BIOS (ROM) functions to do the work. A loadable driver would later take over and mutate the BIOS driver with a real driver, transparently to the filesystem.
[0280] While only methods within an object can change the object's state in conventional operating systems, the present invention provides a mutation object which, during run time, can dynamically change the state of other objects as desired. The state of an object includes the object's pointers. For example, to change an implementation of a method listed in the object's interface, the pointer from the interface for that method would be changed to point to a different implementation of that method, the change in the pointer value being a change in the object's state relating to the fundamental structure of the object.
[0281] FIG. 25 illustrates the interface of the mutation object, IMutate, which includes the following methods: Query Interface 2510, Add Reference 2520, Release Reference 2530, MutateVTable 2540 and MutateObject 2560. The operation of the Query Interface, Add Reference and Release Reference interfaces 2510, 2520, 2530 have been described above in this specification.
[0282] The MutateObject method 2560 is a general method enabling the user to change any pointer or register in an object. The MutateVTable method 2540 is a special case, the method being directed specifically to changing the VTable pointer. One example of the general MutateObject method 2560 is illustrated in FIG. 26, in which the MutateObject method changes the interface pointer 2610 for method_i in the Object Interface 2620 from Implementation A to Implementation B.
[0283] FIG. 27 illustrates an example of the operation of the MutateVTable method. In this example, the object being altered by the MutateObject method has a VTable 2710 which can point to one of two interfaces 2720, 2730. The two interfaces 2720, 2730 can list different methods or, if the same method is listed in both interfaces 2720, 2730, then their interface pointers 2750 for the corresponding method can point to different implementations, as in the case of Method_A , or to the same implementation, as in the case of Method_C. The synchronization mechanisms suitable for implementing mutation can be divided into three groups:
[0284] Mutual exclusion: Mutation cannot happen while workers are executing methods of the object to be mutated. The implementation can be a read-write lock, disabling preemption on a uniprocessor, or a holding tank [Cowan96] with reference counting. Mutual exclusion is simple in that there is no worker state associated with the object when mutation is allowed to happen.
[0285] Transactional: Roll back the workers that are affected by mutation. Mutators and workers operate on an object transactionally and can be aborted when necessary.
[0286] Swizzling: Modify the state of the workers to reflect mutation. Instead of waiting for workers to exit the object or forcing them out, the third mechanisms just suspends them. The mutator then modifies the state of each worker to reflect the change in the object.
[0287] FIG. 28 illustrates how object mutation is synchronized by mutual exclusion, which is the simplest synchronization embodiment. When the Mutate Object is called, the first step is to prevent any threads from accessing the object or initiating new activity with the object (block 2810). Then a determination is made whether any threads or worker threads that were already running with or within the object are still run