Home
Patent Search
IMT Blog
REGISTER
|
SIGN IN
United States Patent Application
20040162930
Kind Code
A1
Forin, Alessandro ; et al.
August 19, 2004
Highly componentized system architecture with loadable virtual memory manager
Abstract
The present invention is directed to a loadable virtual memory manager, and generally to a computer operating system capable of supporting application programs running in a computer having a working memory, the computer operating system including a kernel resident in the working memory at run time, and a loadable virtual memory manager resident at link time outside of the working memory and dynamically loadable into the working memory at run time upon demand of one of the application programs. The kernel includes a loader for loading the virtual memory manager into the working memory in response to a demand from one of the application programs. The computer is able to access a storage memory separate from the working memory, the loadable virtual memory manager residing at link time in the storage memory. The loader loads the virtual memory manager from the storage memory to the working memory. The loadable virtual memory manager is removable from the working memory upon lack of demand therefor by the application programs.
Inventors:
Forin; Alessandro
(Redmond, WA)
, Helander; Johannes V.
(Bellevue, WA
)
Correspondence Name and Address:
TWO PRUDENTIAL PLAZA, SUITE 4900 180 NORTH STETSON
LEYDIG, VOIT & MAYER, LTD.
CHICAGO
IL
60601-6780
US
Series Code:
754112
Filed:
January 8, 2004
U.S. Current Class:
711/001
U.S. Class at Publication:
711/001
Intern'l Class:
G11C 005/00
Claims
What is claimed is:
1. A computer operating system capable of supporting application programs running in a computer having a working memory, said computer operating system comprising: a kernel resident in said working memory at run time; and a loadable virtual memory manager resident at link time outside of said working memory and dynamically loadable into said working memory at run time upon demand of one of said application programs.
2. The operating system of claim 1 wherein said kernel comprises a loader for loading said virtual memory manager into said working memory in response to a demand from one of said application programs.
3. The operating system of claim 1 wherein said computer is able to access a storage memory separate from said working memory, said loadable virtual memory manager residing at link time in said storage memory.
4. The operating system of claim 1 wherein said loader loads said virtual memory manager from said storage memory to said working memory.
5. The operating system of claim 1 wherein said loadable virtual memory manager is removable from said working memory upon lack of demand therefor by said application programs.
6. The operating system of claim 1 wherein said kernel of said operating system further comprises a virtual memory fault handler for handling virtual memory faults occurring in the absence of said virtual memory manager in said working memory.
7. The operating system of claim 2 wherein said kernel of said operating system comprises a Namespace for registering said virtual memory manager upon said virtual memory manager being loaded into said working memory, whereby said virtual memory manager becomes available to each application program through said Namespace.
8. The operating system of claim 7 wherein said Namespace comprises an object supporting plural interfaces exposable by said application programs, said plural interfaces comprising: a query interface, through which an application program invokes said virtual memory manager; an add reference interface by which said Namespace manages each request for said virtual memory manager from any of said application programs; and a release reference interface, by which Namespace manages termination of said virtual memory manager from said working memory.
9. The operating system of claim 8 wherein said loader is responsive to said Namespace in maintaining said virtual memory manager in said working memory whenever said add reference interface has a reference to said virtual memory manager.
10. The operating system of claim 8 wherein said virtual memory manager comprises an object with plural interfaces exposable to other objects, said plural interfaces of said virtual memory manager comprising: virtual memory space interface(VMSpace); virtual memory map interface(VMMap); virtual memory view interface (VMView); query interface; and virtual memory factory interface (VMFactory).
11. The operating system of claim 8 wherein said virtual memory manager comprises an object with plural interfaces exposable to other objects, one of said plural interfaces of said virtual memory manager comprising a virtual memory space (VMSpace) interface to a VMSpace object, said VMSpace interface providing plural methods which control access by said application programs to virtual memory locations of said virtual memory manager.
12. The operating system of claim 11 wherein said VMSpace object comprises plural memory regions and a skip list linking said plural memory regions across areas of memory described by said regions.
13. The operating system of claim 12 wherein another one of said objects comprises a VMMap object which maps ones of said regions of VMSpace to other objects.
14. The system of claim 13 wherein said VMMap maps to another VMMap object.
15. The system of claim 13 wherein said VMMap maps to a ZeroMap object.
16. The system of claim 13 wherein said VMMap object maps to a VMSpace object.
17. The system of claim 13 wherein said VMMap object maps to a File object.
18. The system of claim 13 wherein said VMMap object maps to a mapping filter.
19. The operating system of claim 11 wherein the methods of said VMSpace object comprise: Reserve; Delete; Map; Protect; CacheControl; QueryVM; CreateShadow.
20. The operating system of claim 8 wherein said virtual memory manager comprises an object with plural interfaces exposable to other objects, one of said plural interfaces of said virtual memory manager comprising a virtual memory map (VMMap) interface, said VMMap interface providing plural methods which control use by said application programs of virtual memory locations of said virtual memory manager.
21. The operating system of claim 20 wherein said methods of VMMap comprise: Read; Write; Share; QueryAccess; GetSize; Clone.
22. The operating system of claim 8 wherein said virtual memory manager comprises an object with plural interfaces exposable to other objects, one of said plural interfaces of said virtual memory manager comprising a virtual memory view (VMView) interface, said VMView interface providing plural methods which control views by said application programs of virtual memory locations of said virtual memory manager.
23. The operating system of claim 22 wherein said methods of VMView comprise: SwitchTo; SetMapping; GetMapping; Fault.
24. The operating system of claim 23 wherein said VMView interface provides different views of the same memory space to different ones of said application programs.
25. The operating system of claim 23 wherein said VMView interface supports context switching.
26. The operating system of claim 25 wherein said VMView interface provides a SwitchTo method.
27. The operating system of claim 10 wherein said VMFactory comprises constructors of objects of said virtual memory manager.
28. The operating system of claim 27 wherein said VMFactory comprises a Namespace interface exposable by applications which enumerates all the objects of said virtual memory manager independently of said Namespace.
29. The operating system of claim 2 further comprising plural virtual memories resident outside of said working memory at link time, said loader being capable of loading plural ones of said plural virtual memories into said working memory at run time, whereby plural virtual memory managers reside in said working memory during run time.
30. The operating system of claim 2 wherein said loadable virtual memory manager comprises a default virtual memory manager associated with a local address space and plural virtual memory managers stacked with said default virtual memory manager, each of said plural virtual memory managers being associated with respective external address spaces through which said default virtual memory manager provides access.
31. The operating system of claim 30 wherein said external address spaces reside on external computers.
32. The operating system of claim 1 wherein said kernel contains a minimal set of objects sufficient to support said application programs.
33. A computer comprising: a working memory containing an operating system kernel; a virtual memory manager stored in a location outside of said working memory; said operating system kernel in said working memory comprising a loader for loading said virtual memory manager from said location outside of said working memory into said working memory, so as to provide said operating system kernel with a virtual memory manager.
34. The computer of claim 33 further comprising a storage memory, said location outside of said working memory in which said virtual memory manager is stored comprising a memory location in said storage memory, wherein said loader loads said virtual memory manager from said storage memory to said working memory.
35. The computer of claim 34 wherein said kernel further comprises a NameSpace, said loader being responsive to said NameSpace receiving a query for said virtual memory manager from an application program to load said virtual memory manager.
36. A method of operating a computer having a working memory with an operating system kernel installed therein for supporting application programs in said computer, said method comprising: in response to a request from at least one of said application programs, installing a virtual memory manager from a location outside of said working memory into said working memory so as to provide said operating system kernel with a virtual memory capability, said virtual memory having plural interfaces including VMView; in response to occurrence of a virtual memory fault: calling VMView, determining whether VMView can provide a virtual memory mapping covering the virtual memory fault, loading a corresponding page table entry from IVMView if said virtual memory mapping is provided.
37. A method of operating a computer having a working memory with an operating system kernel installed therein for supporting application programs in said computer, said method comprising: in response to a request from at least one of said application programs, installing a virtual memory manager from a location outside of said working memory into said working memory so as to provide said operating system kernel with a virtual memory capability; in response to occurrence of a virtual memory fault invoking the installed virtual memory manager for the handling of the fault.
38. A computer readable medium storing computer-executable instructions for carrying out the steps of claim 37.
39. A method of operating a computer having a working memory with an operating system kernel installed therein for supporting application programs in said computer, said method comprising: in response to a request from at least one of said application programs, installing a virtual memory manager from a location outside of said working memory into said working memory so as to provide said operating system kernel with a virtual memory capability, said virtual memory having plural interfaces including VMSpace, VMView, VMMap, and VMFactory; in response to occurrence of a virtual memory fault: determining whether said virtual memory fault was due to an error; if said fault was not due to an error, determining whether said fault was due to a postponed allocation of memory. if said fault was due to a postponed allocation of memory, calling VMSpace to allocate memory space in response to said virtual memory fault.
40. The method of claim 39 where the step of determining whether said fault was due to a postponed allocation of memory comprises: determining whether there is any reference to the object in a current Pagelist.
41. The method of claim 40 further comprising: if said fault was not due to postponed allocation of memory, determining whether said fault was due to a copy-on-write procedure; if said fault was not due to a copy-on-write procedure, copying a reference in the current PageList to the Page Table Entry.
42. The method of claim 41 wherein the step of determining whether said fault was due to a copy-on-write procedure comprises: determining whether there has been a change from a current PageList to another PageList.
43. The method of claim 41 further comprising: if said fault was due to a copy-on-write procedure, allocating a new page, copying the contents of the current page to the new page, and then entering the mapping for the new page as a page table entry.
44. A computer-readable medium storing computer-executable instructions for carrying out the steps of claim 43.
45. A method of operating a computer having a working memory with an operating system kernel installed therein for supporting threads to run in said computer, said method comprising: in response to a request from at least one of said threads, installing a virtual memory manager from a location outside of said working memory into said working memory so as to provide said operating system kernel with a virtual memory capability, said virtual memory having plural interfaces including VMSpace, VMView VMMap, and VMFactory; deciding to perform a context switch to replace one of said threads which is currently running; selecting a new one of said threads to run next; calling VMView and determining whether VMView provides different views of the current thread and the new thread; if VMView provides the same views of the current and new threads, returning to the new thread.
46. The method of claim 45, wherein said computer comprises a page directory register storing a page directory value pointing to a particular page table having page table entries therein providing translation between virtual memory addresses and physical address, wherein: the step of returning to the new thread is carried out without changing the contents of said page directory register.
47. The method of claim 46 further comprising: if VMView provides different views of the current and new threads, calling SwitchTo of VMView; SwitchTo loading a new page directory value into said page directory register.
48. A computer-readable medium storing computer-executable instructions for carrying out the steps of claim 47.
49. The method of claim 45, wherein said computer comprises an address space ID register storing a selector of valid page table entries, said entries stored in a Translation Buffer, therein providing translation between virtual memory addresses and physical address for that particular address space ID as indicated by said VMView, wherein: the step of returning to the new thread is carried out without changing the contents of said address space ID register.
50. The method of claim 49 further comprising: if VMView provides different views of the current and new threads, calling SwitchTo of VMView; SwitchTo loading a new page directory value into said page directory register.
51. A computer-readable medium storing computer-executable instructions for carrying out the steps of claim 50.
52. A method of operating a computer having a working memory with an operating system kernel installed therein for supporting threads to run in said computer, said method comprising: in response to a request from at least one of said threads, installing a virtual memory manager from a location outside of said working memory into said working memory so as to provide said operating system kernel with a virtual memory capability; incrementing a reference count with each request from any one of said threads for said virtual memory manager; decrementing said reference count with each request from any thread to terminate said virtual memory manager.
53. The method of 52 further comprising: terminating said virtual memory manager whenever said reference count reaches a minimum value.
54. The method of claim 53 wherein said minimum value of said reference count is zero.
55. A computer-readable medium storing computer-executable instructions for carrying out the steps of claim 54.
56. A method of operating a computer having a working memory with an operating system kernel installed therein for supporting threads to run in said computer, said method comprising: in response to a request from at least one of said threads, installing a virtual memory manager from a location outside of said working memory into said working memory so as to provide said operating system kernel with a virtual memory capability, said virtual memory having plural interfaces including VMSpace having an object PageList, VMView and VMMap; in response to occurrence of a page fault, calling VMView::Fault; in response to VMView fault, calling VMSpace::CacheControl; in response to VMSpace::CacheControl, calling VMMap::Read; in response to VMMap::Read, calling File::Read; returning data from File::Read; installing the data returned from File::Read into PageList.
57. A method of creating a thread in a computer having a working memory with an operating system kernel installed therein for supporting threads to run in said computer, said kernel including a Scheduler, said method comprising: in response to a request from at least one of said threads, installing a virtual memory manager from a location outside of said working memory into said working memory so as to provide said operating system kernel with a virtual memory capability, said virtual memory having plural interfaces including VMSpace, VMView, VMMap, and VMFactory; VMSpace creating a virtual memory address space for thread to be created; VMMap mapping an object into the virtual memory address space for the thread to be created; VMView creating a view of the virtual memory address space for the thread to be created; and said Scheduler creating a thread associated with said view.
58. A computer-readable medium storing computer-executable instructions for carrying out the steps of claim 57.
59. A method of operating a computer having a working memory with an operating system kernel installed therein for supporting threads to run in said computer, said kernel including a Scheduler, said method comprising: in response to a request from at least one of said threads, installing a virtual memory manager from a location outside of said working memory into said working memory so as to provide said operating system kernel with a virtual memory capability, said virtual memory having plural interfaces including VMSpace, VMView, VMMap, and VMFactory; providing by VMView different views of a common memory space to respective ones of said threads.
60. A computer-readable medium storing computer-executable instructions for carrying out the steps of claim 59.
61. The method of claim 59 further comprising: providing by VMMap respective mappings to respective files in said common memory space to respective ones of said different views for said respective threads.
62. The method of claim 61 wherein said VMMap maps to another VMMap object.
63. The method of claim 61 wherein said VMMap maps to a ZeroMap object.
64. The method of claim 61 wherein said VMMap object maps to a VMSpace object.
65. The method of claim 61 wherein said VMMap object maps to a File object.
66. The method of claim 61 wherein said VMMap object maps to a mapping filter.
67. A computer-readable medium storing computer-executable instructions for carrying out the steps of claim 61.
68. A data structure in a computer, comprising: plural threads; respective VMView objects associated with respective ones of said plural threads, each of said VMView objects providing different views of a common memory space to respective ones of said plural threads, whereby to provide different threads with different views of said common memory space.
69. The data structure of claim 68 wherein said common memory space has an address space width of 32 bits and one of said VMView objects provides a view of to a respective one of said threads in which said memory appears to have an address space width of 64 bits.
70. The data structure of claim 68 wherein said common memory space has an address space width of 64 bits and one of said VMView objects provides a view of to a respective one of said threads in which said memory appears to have an address space width of 32 bits.
71. The data structure of claim 68 wherein said common memory space has an address space width of 36 bits, said VMView object provides an address space width of 32 bits and a VMSpace object has a view of said common memory space with an address space width of 64 bits.
72. The data structure of claim 68 further comprising: respective VMMap objects providing respective mappings for respective ones of said different views to respective files, whereby to map different threads with different files through respective views of said common memory space.
73. A loadable virtual memory comprising: a default virtual memory associated with a local address space; and plural virtual memories stacked with said default virtual memory, each of said plural virtual memories being associated with respective external address spaces through which said default virtual memory provides access.
74. The loadable virtual memory of claim 73 wherein said external address spaces correspond to external computers and said local address space corresponds to a local computer in which said virtual memory resides.
Description
BACKGROUND OF THE INVENTION
[0001] 1. Technical Field
[0002] The invention is related to computer operating systems and in particular to a computer operating system which is highly componentized and has dynamically loadable operating features which may be loaded and unloaded during system run time.
[0003] 2. Background Art
[0004] The progressive computerization of society involves a number of diverse computing platforms beside the general-purpose computer:
[0005] Embedded control systems, including consumer devices, intelligent sensors and smart home controls.
[0006] Communication-oriented devices such as digital cell phones and networking infrastructure.
[0007] Programmable peripherals and microcontrollers.
[0008] In all these cases, the general-purpose platform approach is either not applicable, or it is prohibitively expensive.
[0009] The microprocessor might be a DSP, a VLIW, or a micro-controller; the memory budget is severely restricted; there might be no MMU; the network connection might be sporadic; and Real-Time support is essential.
[0010] Current operating systems are either inflexible, big, lack Real-Time support, have complex hardware requirements, or are so special purpose that good development tools are unavailable and code reusability is low.
[0011] Microkernels [Black92, Engler95] attempt to modularize the operating system. But they confuse modularity with security by mandating that system services be in separate address spaces. Many of the services moved into separate server processes are still necessary for these systems to function and often the services have to trust each other.
[0012] C++ and Java provide objects at a very fine granularity level, and they are extremely successful with application programmers. Unfortunately, both languages confine their objects to a single address space. Object Linking and Embedding (OLE) [Brockschmidt95] and other similar systems extend objects across address spaces and across machine boundaries. OLE seamlessly integrates independently developed components. When editing an Excel spreadsheet inside a Word document it is in fact the Excel process that operates on objects inside of Word's address space. Unfortunately, it only works for user mode applications.
[0013] Modularity has always been an important paradigm in software design. By breaking a complex system into pieces, the complexity becomes more manageable. Address spaces provide security by installing virtual-memory based firewalls between applications. These two issues are orthogonal, but the distinction has been lost in systems research that has been concentrating on so-called microkernels. These issues have been discussed in the following publications:
[0014] [Bershad95] Brian Bershad, S. Savage, P. Pardyak, E. G. Sirer, M. Fiuczynski, D. Becker, S. Eggers, C. Chambers. Extensibility, safety and performance in the Spin operating system. In 15th ACM Symposium on Operating System Principles, pages 267-284, Copper Mountain Resort, Colorado, December 1995.
[0015] [Black92] David Black, David Golub, Daniel Julin, Richard Rashid, Richard Draves, Randall Dean, Alessandro Forin, Joseph Barrera, Hideyuki Tokuda, Gerald Malan, David Bohman. Microkernel Operating System Architecture and Mach. In 1st USENIX Workshop on Micro-kernels and Other Kernel Architectures, pages 11-30, Seattle, April 1992.
[0016] [Brockschmidt95] K. Brockshmidt. Inside OLE, Second ed. Microsoft Press, Redmond Wash., 1995.
[0017] [Cheriton94] David Cheriton, Kenneth Duda. A Caching Model of Operating System Kernel Functionality. In 1.sup.st Symposium on Operating Systems Design and Implementation, Seattle, 1994.
[0018] [Cheriton88] David Cheriton. The V distributed system. In Communications of the ACM, pages 314-333, March 1988.
[0019] [Draves97] Richard Draves, Scott Cutshall. Unifying the User and Kernel Environments. Microsoft Research Technical Report MSR-TR-97-10, 16
pages, March 1997
[0020] [Engler95] D. R. Engler, M. F. Kaashoek, J. O'Toole Jr. Exokernel: an operating system architecture for application-specific resource management. In 15.sup.th ACM Symposium on Operating System Principles, pages 251-266, Copper Mountain Resort, Colorado, December 1995.
[0021] [Ford97] Bryan Ford, Godmar Back, Greg Benson, Jay Lepreau, Albert Lin, Olin Shivers. The Flux OSKit: A Substrate for Kernel and Language Research. In Proceedings of the 16th ACM Symposium on Operating Systems Principles, pages 38-51. ACM SIGOPS, Saint-Malo, France, October 1997.
[0022] [Golub90] David Golub, Randall Dean, Alessandro Forin, Richard Rashid. UNIX as an application program. In USENIX 1990 Summer Conference, pages 87-95, June 1990.
[0023] [Helander94] Johannes Helander. Unix under Mach: The Lites Server. Master's thesis, 71 pages, Helsinki University of Technology, 0.1994. Available from http://www.cs.hut.fi/.about.jvh/lites.MASTERS.ps
[0024] [Hildebrand92] D. Hildebrand. An architectural overview of QNX. In 1.sup.st USENIX Workshop on Micro-kernels and Other Kernel Architectures, pages 113-126, Seattle, April 1992.
[0025] [ISI95] Integrated Systems Inc. pSOSystem System Concepts. Part No. COL0011, May 1995, ISI, Sunnyvale Calif.
[0026] [Jones96] Michael B. Jones, Joseph S. Barrera, III, Richard P. Draves, Alessandro Forin, Paul J. Leach, Gilad Odinak. An Overview of the Rialto Real Time Architecture. In Proceedings of the 7th ACM SIGOPS European Workshop, pagg. 249-256, September 1996.
[0027] [Jones97] Michael B. Jones et al. CPU Reservations and Time Constraints: Efficient, Predictable Scheduling of Independent Activities. In Proceedings of the 16th ACM Symposium on Operating Systems Principles, pages 198-211. ACM SIGOPS, Saint-Malo, France, October 1997.
[0028] [Jones 97b] Michael B. Jones. The Microsoft Interactive TV System: An Experience Report. Microsoft Research Technical Report MSR-TR-97-18, July, 1997.
[0029] [Julin91] Daniel Julin, Jonathan Chew, Mark Stevenson, Paulo Guedes, Paul Neves, Paul Roy. Generalized Emulation Services for Mach 3.0: Overview, Experiences and Current Status. In Proceedings of the Usenix Mach Symposium, 1991.
[0030] [Lee98] Dennis Lee, Patrick Crowley, Jean-Loup Baer, Tom Anderson, Brian Bershad. Execution characteristics of desktop applications on Windows NT. In Proceedings of the 25.sup.th International Symposium on Computer Architecture, Barcelona, Spain, June 1998.
[0031] [Liedtke95] Jochen Liedtke. On .quadrature.-kernel construction. In 15.sup.th ACM Symposium on Operating System Principles, pages 237-250, Copper Mountain Resort, Colorado, December 1995.
[0032] [Mogul87] Jeffrey Mogul, Richard Rashid, Michael Accetta. The Packet Filter: an Efficient Mechanism for User-level Network Code. In 11.sup.th ACM Symposium on Operating System Principles, November 1987.
[0033] [Rashid87] Richard Rashid. From RIG to Accent to Mach: The evolution of a network operating system. Carnegie Mellon University Technical Report, August 1987.
[0034] [Rozier88] M. Rozier, A. Abrassimov, F. Armand, I. Boule, M. Gien, M. Guillemont, F. Hermann, C. Kaiser, S. Langlois, P. Leonard, W. Neuhauser. CHORUS distributed operating system. In Computing Systems, pages 305-370, Vol. 1-4, 1988.
[0035] [Young89] Michael Wayne Young. Exporting a User Interface to Memory Management from a Communication-Oriented Operating System. Ph.D. Thesis CMU-CS-89-202, Carnegie Mellon University, November 1989.
[0036] Mach [Black92] defined an interface for external memory managers [Young89] and was able to split virtual memory into functionally distinct parts, allowing part of the functionality to reside outside the privilege-level component (the "kernel"). Mach also separated part of the Unix operating system services out of the kernel [Golub90, Helander94], achieving modularity but limited additional functionality. The multiserver project [Julin91] went further in the modularization by splitting the Unix services into multiple independent servers. The componentization added structure and generality to the services. However, keeping the services in multiple address spaces did not add any security or robustness since components had to be available and trusted in any case. The most interesting new functionality was in the ability to emulate multiple OS interfaces, at the same time.
[0037] Contemporary research systems take the minimization of the "kernel" concept even further by defining even lower level abstractions and demonstrating the ability to split states across address space boundaries. None of these systems defines a new application programming interface (API) different from the Unix they emulate. The API that their predecessors [Rashid87, Cheriton88, Rozier88] did define, based on RPC and message exchanges, were not very successful with programmers.
[0038] The Cache Kernel [Cheriton94] uses Mach's external memory manager metaphor uniformly for the management of all kernel objects. Threads, Address Spaces and User Kernels are all handled through this pagein-pageout logical interface. An actual application is statically linked with a number of libraries, which provide default implementations of the required User Kernel components (VM, scheduling, IPC). This offers some flexibility by letting untrusted applications have their custom application kernel. Overall complexity is not decreased; it seems an application kernel would have to be as complicated as any other operating system. The ability to write your own application kernel would seem useful for a limited number of users, in teaching operating systems for instance.
[0039] Exokernel [Engler95] goes along the same lines demonstrating further ability to run operating system code in user mode. While it is highly successful in this and offers some added flexibility, it is questionable whether the premises differ from that of microkernels. The main contribution is in the mechanisms for application-specific resource management.
[0040] [Liedtke95] argues that microkernels have failed exclusively on performance grounds, and that poor performance is their only cause for inflexibility. Our argument is the opposite: inflexibility is inherent in the design, and leads to unavoidable inefficiencies that can only be mitigated by good implementations, never eliminated.
[0041] Spin [Bershad95] addresses the issue of expensive address space crossings by letting user code compiled by a trusted compiler run inside the kernel. This can be viewed as smart proxies that can do a lot of the work locally that otherwise would require communication. It is similar to loading packet filters into network drivers [Mogul87], to running database application query language inside database engines [reference], or to sandboxing Java applets. Applying these techniques to operating systems is beneficial when a trust boundary must be crossed and the cost would otherwise be high. It does not address the issue of whether or not a trust boundary is necessary. Spin uses an object-based language (Modula3) to provide extensibility. The pointer-safety property of the language is what permits execution of untrusted code in privileged mode. Trust relationships, as in the user-versus-kernel separation, should not dominate system decomposition. It is important to return to a global system view. The present invention addresses the issue of how to minimize the set of base services, and how to dynamically extend them on demand.
[0042] [Ford97] shows how a base set of system components can be composed in different ways to build an operating system kernel. The granularity is fairly coarse, and the techniques are limited to static linking. Components that should be of interest to OS researchers (VM, IPC, scheduling, etc.) cannot be replaced or removed, neither statically nor dynamically. The decomposition is otherwise limited to the "OS" component; it is not meant as a whole-system approach. This does not go far enough in the componentization. It provides a few convenient components, such as bootstrap loader and filesystems, but is mostly concerned with reusing existing device drivers and Unix code. It fails to componentize the core kernel services or extend the paradigm to applications.
[0043] Componentization and location independence has also been studied in the context of filesystems and network protocols [Maeda93] and in a number of existing embedded systems, such as pSOS [ISI95]. In a typical embedded system there is no loader, and components can only be chosen at static link time when the load image is built. Services are extremely limited, sometimes exclusively to the scheduling component. The number and priority of threads might have to be specified statically as well.
[0044] Chorus [Rozier88] can be configured to use either a page-based or a segment-based VM system.
SUMMARY OF THE INVENTION
[0045] A preferred embodiment of the invention is directed to a flexible system architecture that is suitable for a wide range of applications. The system is built out of minimal but flexible components, which can be deployed as needed. Instead of mandating a fixed set of operating system services and hardware requirements, the system preferably provides a menu of well-defined components that can be chosen to compose a complete system depending on hardware capabilities, security needs, and application requirements.
[0046] Dynamic loading and unloading of components provides the flexibility that lets the system adapt to changing requirements.
[0047] The componentization makes it possible to change the implementation of a component without affecting the rest of the system. Minimalism makes it possible to use the system with severely restricted hardware budgets. It also forces the system to be understandable and flexible. Software components, when possible, are not tied to a particular layer of the system, but can be reused. For example, the same code that implements the system physical memory heap is used to provide application heaps over virtual memory. The key system building blocks are componentized. This includes the virtual memory system, IPC, and the scheduler in addition to filesystems, networking, drivers, and protection policies. Preferred embodiments of the present invention extend object-orientation both across address spaces and across protection levels.
[0048] In a preferred embodiment, components are located in separate address spaces only when there is a real reason for it, such as security or specific address space requirements. Thus, the price of multiple address spaces (and transitions thereof) is paid only where needed.
[0049] The present invention is directed to a loadable virtual memory manager, and generally to a computer operating system capable of supporting application programs running in a computer having a working memory, the computer operating system including a kernel resident in the working memory at run time, and a loadable virtual memory manager resident at link time outside of the working memory and dynamically loadable into the working memory at run time upon demand of one of the application programs. The kernel includes a loader for loading the virtual memory manager into the working memory in response to a demand from one of the application programs. The computer is able to access a storage memory separate from the working memory, the loadable virtual memory manager residing at link time in the storage memory. The loader loads the virtual memory manager from the storage memory to the working memory. The loadable virtual memory manager is removable from the working memory upon lack of demand therefor by the application programs.
[0050] The kernel of the operating system further includes a virtual memory fault handler for handling virtual memory faults occurring in the absence of the virtual memory manager in the working memory. The kernel of the operating system includes a Namespace for registering the virtual memory manager upon the virtual memory manager being loaded into the working memory, whereby the virtual memory manager becomes available to each application program through the Namespace. Preferably, the Namespace includes an object supporting plural interfaces exposable by the application programs, the plural interfaces including a query interface through which an application program invokes the virtual memory manager, an add reference interface by which the Namespace manages each request for the virtual memory manager from any of the application programs, and a release reference interface, by which Namespace manages termination of the virtual memory manager from the working memory. The loader is responsive to the Namespace in maintaining the virtual memory manager in the working memory whenever the add reference interface has a reference to the virtual memory manager.
[0051] The virtual memory manager includes an object with plural interfaces exposable to other objects, the plural interfaces of the virtual memory manager including a virtual memory space interface(VMSpace), a virtual memory map interface(VMMap), a virtual memory view interface (VMView); a query interface, and a virtual memory factory interface (VMFactory). The VMSpace interface provides plural methods which control access by the application programs to virtual memory locations of the virtual memory manager and preferably includes plural memory regions and a skip list linking the plural memory regions across areas of memory described by the regions.
[0052] The VMMap interface provides plural methods which control use by the application programs of virtual memory locations of the virtual memory manager. The VMView interface provides plural methods which control views by the application programs of virtual memory locations of the virtual memory manager. The VMView interface can provide different views of the same memory space to different ones of the application programs and can support context switching.
[0053] The VMFactory interface provides constructors of objects of the virtual memory manager. The VMFactory interface provides a Namespace interface exposable by applications which enumerates all the objects of the virtual memory manager independently of the Namespace.
[0054] In one embodiment, the operating includes plural virtual memories resident outside of the working memory at link time, the loader being capable of loading plural ones of the plural virtual memories into the working memory at run time.
[0055] In one embodiment, the loadable virtual memory manager can include a default virtual memory associated with a local address space and plural virtual memories stacked with the default virtual memory, each of the plural virtual memories being associated with respective external address spaces through which the default virtual memory provides access. The external address spaces can reside on external computers.
BRIEF DESCRIPTION OF THE DRAWINGS
[0056] FIG. 1A is a block diagram of an exemplary operating environment of the invention.
[0057] FIG. 1B is a block diagram of an operating system embodying the present invention in the computer illustrated in FIG. 1A.
[0058] FIG. 1C illustrates one application of the invention to form stacked virtual memories with a local virtual memory.
[0059] FIG. 2 illustrates a page table registry structure of a virtual memory manager of the operating system of FIG. 1B.
[0060] FIG. 3 illustrates the objects in the virtual memory manager.
[0061] FIG. 4 illustrates the virtual memory manager of FIG. 3 with a set of interfaces.
[0062] FIG. 5 illustrates the structure of an object in the operating system of FIG. 1B.
[0063] FIG. 6 illustrates the structure of the virtual memory view object of the operating system of FIG. 1B.
[0064] FIG. 7 illustrates the objects in a preferred implementation of the virtual memory manager.
[0065] FIG. 8 illustrates the Load VMM process in the operating system of FIG. 1B.
[0066] FIG. 9 illustrates the method of handling a virtual memory (VMM) fault in the operating system of FIG. 1B.
[0067] FIG. 10 illustrates the operation of the VMM fault handler in the operating system of FIG. 1B.
[0068] FIG. 11 illustrates the method for taking a VMM fault in the operating system of FIG. 1B.
[0069] FIG. 12 illustrates the operation of the context switch process in the operating system of FIG. 1B.
[0070] FIG. 13 illustrates the SwitchTo process in the operating system of FIG. 1B.
[0071] FIG. 14 illustrates the operation for unloading the virtual memory manager in the system of FIG. 1B.
[0072] FIG. 15 illustrates the process for handling a page fault in the system of FIG. 1B.
[0073] FIG. 16 illustrates a process by which a constructor creates a thread in the operating system of FIG. 1B.
[0074] FIG. 17 illustrates multiple views of memory provided by the VMView object of the operating system of FIG. 1B.
[0075] FIG. 18 illustrates multiple views of memory that can be obtained in accordance with FIG. 17.
[0076] FIG. 19 illustrates the basic features of a loadable interprocess communication (IPC) manager in the operating system of FIG. 1B.
[0077] FIG. 20 illustrates the process of loading of the IPC manager of FIG. 19.
[0078] FIG. 21 illustrates an interface between the IPC manager and other threads.
[0079] FIG. 22 illustrates intercommunication provided by the IPC manager between different address spaces.
[0080] FIG. 23 illustrates how an IPC trap is handled in the operating system of FIG. 1B.
[0081] FIG. 24A illustrates the operation of the IPC trap handler in the operating system of FIG. 1B.
[0082] FIG. 24B illustrates objects inside the loadable IPC system of a preferred embodiment of the present invention.
[0083] FIG. 24C illustrates objects in different address spaces connected by the loadable IPC system of FIG. 24B.
[0084] FIG. 25 illustrates the interface Imutate which provides object mutation in the operating system of FIG. 1B.
[0085] FIG. 26 illustrates one application of object mutation in the operating system of FIG. 1B.
[0086] FIG. 27 illustrates another application of object mutation applied to a Vtable.
[0087] FIG. 28 illustrates synchronization of object mutation by mutual exclusion.
[0088] FIG. 29 illustrates synchronization of object mutation by transactional synchronization.
[0089] FIG. 30A illustrates the process of object mutation by swizzling in accordance with a preferred embodiment of the invention.
[0090] FIG. 30B illustrates the structure of a thread relative to external objects prior to swizzling.
[0091] FIG. 30C illustrates the structure of the thread relative to the external objects corresponding to FIG. 30B after swizzling.
[0092] FIG. 31 illustrates an application of object mutation to achieve object interposition.
[0093] FIG. 32 illustrates an application of object mutation to carry out dynamic software upgrading.
[0094] FIG. 33 illustrates an application of object mutation to carry out run-time code generation.
[0095] FIG. 34 illustrates how to achieve object mobility using object mutation.
[0096] FIG. 35 illustrates how proxies may be used with object mutation to communicate across address spaces.
[0097] FIG. 36 illustrates a mutatable structure of the virtual memory manager.
[0098] FIG. 37 illustrates a method embodying the programming model of the invention.
[0099] FIG. 38 illustrates operations carried out with the demand-loading NameSpace in accordance with the programming model of FIG. 37.
[0100] FIG. 39 illustrates the loading of an object in accordance with the programming model of FIG. 37.
[0101] FIG. 40 illustrates an application of the programming model of FIG. 37 to plug-and-play technology.
[0102] FIGS. 41 and 42 illustrate an example of a conventional process for linking an executable image.
[0103] FIG. 43 illustrates an example of a conventional process for linking with shared libraries.
[0104] FIG. 44 illustrates a process in accordance with the present invention for linking an executable image using shared libraries.
[0105] FIG. 45 illustrates a process in accordance with the present invention for forming a dynamically linked library.
[0106] FIGS. 46A and 46B illustrate an example of a jump shortcutting process of the present invention.
[0107] FIGS. 47A and 47B illustrate an example of a jump shortcutting process as applied to data references in accordance with the present invention.
[0108] FIG. 48 illustrates an example of a post-link time compaction process of the present invention.
[0109] FIG. 49 illustrates a load time code synthesis process for virtual memory in accordance with the present invention.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS
[0110] Exemplary Operating Environment:
[0111] FIG. 1A and the following discussion are intended to provide a brief, general description of a suitable computing environment in which the invention may be implemented. Although not required, the invention will be described in the general context of computer-executable instructions, such as program modules, being executed by a personal computer. Generally, program modules include processes, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. Moreover, those skilled in the art will appreciate that the invention may be practiced with other computer system configurations, including inside various programmable peripheral interface cards such as 126, 128, 130, 144, 158, 148 in FIG. 1A, inside programmable peripherals such as disks, game controllers and accessories, speakers, modems, printers and the like, in hand-held devices, multiprocessor systems, microprocessor-based or programmable consumer electronics, network PCs, minicomputers, mainframe computers, and the like. Thus, for example, the present invention can be an operating system of an optimally minimized configuration, as described below, running inside a network interface card of the network interface 158 of FIG. 1A or in an embedded control system or in a communication-oriented device. The invention may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located both in local and in remote memory storage devices.
[0112] With reference to FIG. 1A, an exemplary system for implementing the invention includes a general purpose computing device in the form of a conventional personal computer 120, including a processing unit 121, a system memory 122, and a system bus 123 that couples various system components including the system memory to the processing unit 121. The system bus 123 may be any of several types of bus structures including a memory bus or memory controller, a peripheral bus, and a local bus using any of a variety of bus architectures. The system memory includes read only memory (ROM) 124 and random access memory (RAM) 125. A basic input/output system 126 (BIOS), containing the basic process that helps to transfer information between elements within the personal computer 120, such as during start-up, is stored in ROM 124. The personal computer 120 further includes a hard disk drive 127 for reading from and writing to a hard disk, not shown, a magnetic disk drive 128 for reading from or writing to a removable magnetic disk 129, and an optical disk drive 130
for reading from or writing to a removable optical disk 131 such as a CD ROM or other optical media. The hard disk drive 127, magnetic disk drive 128, and optical disk drive 130 are connected to the system bus 123 by a hard disk drive interface 132, a magnetic disk drive interface 133, and an optical drive interface 134, respectively. The drives and their associated computer-readable media provide nonvolatile storage of computer readable instructions, data structures, program modules and other data for the personal computer 120. Although the exemplary environment described herein employs a hard disk, a removable magnetic disk 129 and a removable optical disk 131, it should be appreciated by those skilled in the art that other types of computer readable media which can store data that is accessible by a computer, such as magnetic cassettes, flash memory cards, digital video disks, Bernoulli cartridges, random access memories (RAMs), read only memories (ROM), and the like, may also be used in the exemplary operating environment.
[0113] A number of program modules may be stored on the hard disk, magnetic disk 129, optical disk 131, ROM 124 or RAM 125, including an operating system 135, one or more application programs 136, other program modules 137, and program data 138. A user may enter commands and information into the personal computer 120 through input devices such as a keyboard 140 and pointing device 142. Other input devices (not shown) may include a microphone, joystick, game pad, satellite dish, scanner, or the like. These and other input devices are often connected to the processing unit 121 through a serial port interface 146 that is coupled to the system bus, but may be connected by other interfaces, such as a parallel port, game port or a universal serial bus (USB). A monitor 147
or other type of display device is also connected to the system bus 123
via an interface, such as a video adapter 148. In addition to the monitor, personal computers typically include other peripheral output devices (not shown), such as speakers and printers.
[0114] The personal computer 120 may operate in a networked environment using logical connections to one or more remote computers, such as a remote computer 149. The remote computer 149 may be another personal computer, a server, a router, a network PC, a peer device or other common network node, and typically includes many or all of the elements described above relative to the personal computer 120, although only a memory storage device 150 has been illustrated in FIG. 1A. The logical connections depicted in FIG. 1A include a local area network (LAN) 151
and a wide area network (WAN) 152. Such networking environments are commonplace in offices, enterprise-wide computer networks, intranets and Internet.
[0115] When used in a LAN networking environment, the personal computer 120 is connected to the local network 151 through a network interface or adapter 153. When used in a WAN networking environment, the personal computer 120 typically includes a modem 154 or other means for establishing communications over the wide area network 152, such as the Internet. The modem 154, which may be internal or external, is connected to the system bus 123 via the serial port interface 146. In a networked environment, program modules depicted relative to the personal computer 120, or portions thereof, may be stored in the remote memory storage device. It will be appreciated that the network connections shown are exemplary and other means of establishing a communications link between the computers may be used.
[0116] Introduction to the Architecture
[0117] In a preferred embodiment of the invention, the operating system components contain code and other metadata for classes of objects. When a component is loaded into an address space it is instantiated. The instantiated component creates object instances that communicate with other objects, potentially in other components. The objects expose their methods through Component Object Model (COM) [Brockschmidt95] interfaces. Threads execute code and synchronize through Mutexes and Condition variables. System components are typically written in C or C++ but there is no fundamental bias towards any particular language.
[0118] COM enables late binding, version compatibility and checking, transparency through proxies, cross language support, and is reasonably lightweight and efficient. Each object needs a method table pointer and a reference count. Each call adds one indirection for fetching the actual method pointer.
[0119] Component implementations in the preferred embodiment of the invention are rarely aware of their intended system layer. The same code can be used in different address spaces or contexts and can be nested. A filesystem can be applied to a file provided by another filesystem as well as to one provided by a disk driver. A heap can be applied to any memory: physical memory, memory allocated from another heap, or memory provided by a virtual memory manager. The loader loads modules into any address space.
[0120] Selection of System Components
[0121] What components should be part of a deployed system depends upon the application itself and its interface requirements, application memory requirements, security requirements, and the target hardware capabilities. Flexible loading of modules was an important design goal for the operating system described herein. The loading of components can be deferred until they are actually used by an application. Device drivers and runtime services typically fall into this category. Others can be loaded just prior to running an application, such as virtual memory for untrusted applications. Most services will terminate themselves when they are no longer needed.
[0122] Drivers and virtual memory can not be used when the hardware to support them is not present. An application that tries to use them will look them up in the demand-loading namespace. The lookup operation fails, either because the driver is absent or it returns a NULL pointer.
[0123] Execution Model
[0124] Components have code, static data, a stack and a number of dynamic objects. A heap object allows dynamic memory allocations. The stack is pointed to by the stack pointer register; it is allocated from the heap. In a physical memory system the initial size of the stack is also the maximum size of the stack; every byte has to be paid for by real memory. Thus in an embedded application the stack size must be chosen carefully. Most compilers can generate stack checks at function entry, to guard against stack overflows. In a virtual memory system, the stack does not have to be backed by real memory, which can be allocated on demand. The stack only consumes virtual address range and can thus be allocated liberally. A Real-Time application might still want to pre-allocate all memory in order to avoid run time fluctuations. In this case the existence of virtual memory does not affect the stack.
[0125] Memory for code and static data is also allocated from the heap. Code can be placed anywhere in memory if it is either position-independent (pc-relative) or relocatable. The Microsoft Visual C++ compiler, for instance, creates a compressed relocation table that the runtime loader uses to fix any references if the executable was placed in a different place in memory than it was linked for. All compilers for embedded use provide similar functionality, although the specific image formats and relocation schemes differ.
[0126] On the other hand, it is often found that most compilers do not support reentrancy. If the code in an image is not reentrant, it is still possible to execute multiple instances of the same image in the same address space. The code and data are simply loaded multiple times, each time relocated differently.
[0127] If the relocation information is not present, and a component virtually overlaps with another component it cannot be executed in the same address space. In this case a new address space is required, which in turn requires virtual memory.
[0128] System Components
[0129] An exemplary base set of system components in a preferred embodiment of the invention is now described.
[0130] Referring to FIG. 1B, an exemplary operating system in accordance with an embodiment of the invention has a kernel or link-time component 202 and a set of run-time loadable resources 204. The kernel 202 includes a set of software resources including, preferably, a HEAP (physical memory manager) 302, a loader 304, a support library 306, a timer 310, an interrupt control unit 312, a scheduler 314, thread support 316 including synchronization primitives 318, NameSpace 320, filesystem 322 and a startup program 324. The set of run-time loadable resources 204 are available to the system through the loader 304. The resources 204
include, preferably, a virtual memory manager (VMM) 362, inter-process communication 364, drivers 366, applications 368 and a network program 370. A minimal multi-threaded kernel may be provided in accordance with the present invention having only the thread support 316, the scheduler 314, the library 306, the timer 310 and the startup 324. If multi-threading is not desired, the kernel may be further minimized to include only the library 306, the timer 310 and the startup 324.
[0131] As illustrated in FIG. 1B, the ICU (interrupt control unit) 312
preferably includes the following software methods at link time: install VMM (virtual memory manager) trap handler 372, install IPC (inter-process communication) trap handler 374. These resources are preferably included in the interrupt control unit 312 because it is possible for such a system to take a VMM trap or an IPC trap or a page fault whether or not a VMM or IPC has been loaded.
[0132] Any one of the components contained in the set of loadable resources 202 may be fetched by the loader 304 and loaded into the operating system on a demand or as-needed basis during run time. The loader 304 registers the name of any such component that has been so loaded in NameSpace 320 so that all users in the system can find the component by querying NameSpace 320.
[0133] In particular, the VMM 372 is loadable upon demand into the operating system and may be unloaded when all demand for it disappears.
[0134] Different implementations of a virtual memory manager may be selected for different purposes during run time from a VMM library 380
storing a set of VMMs, as shown in FIG. 1B.
[0135] Heap
[0136] The Heap 302 implements (physical) memory management, allowing dynamic memory allocations with specifiable alignments. The constructor allows creating nested heaps or heaps over virtual memory.
[0137] Loader
[0138] The Loader 304 is used to load additional components into a running system. Most embedded systems do not provide a loader, and if not needed it can be eliminated at link time from this system as well. Multiple image formats are supported. The loader loads images into the same address space, or given a flag and a virtual memory system, it creates a new address space and loads the image in there.
[0139] No particular distinction is made herein between executables and DLLs (shared libraries). An executable is simply a DLL that exports no entry points besides main( ).
[0140] Support Library, Machine Initialization
[0141] The library 306 is a shared support library and includes common base utilities like memcpy and other compiler support routines, AtomicAdd, CurrentThread, etc. It is used by many system components and is available to applications. Basic machine initialization code is used at startup and system reset. Most of the machine dependent code of the componentized operating system of the invention resides here.
[0142] Timer and Interrupt Drivers
[0143] A driver for the timer 310 is used by the scheduler 314 to keep track of time and for thread pre-emption. A driver for the Interrupt Control Unit (ICU) 312 dispatches interrupts and keeps a registry of interrupt routines, which can be installed and removed by other components. The system has no particular notion of a "device driver" per se. It does enforce strict limits as to what an interrupt routine can do: wakeup a thread.
[0144] Scheduler
[0145] The scheduler 314 is a policy module that determines which thread should run at any given time. Low-level management of blocking and switching between threads is handled by the thread and synchronization components 316, 318.
[0146] The timer interrupt and thread and synchronization modules call into the scheduler 314, possibly passing callback functions as arguments.
[0147] Three example schedulers have been implemented: the null scheduler, a round robin scheduler, and a constraint based Real-Time scheduler. The null scheduler is for systems that use only one thread. The round robin scheduler provides time-sharing, it can easily be extended to handle dynamically changing priorities. Constraint scheduling is for consumer Real-Time applications and is described in [Jones97]. The existence of these schedulers proves that the scheduling interface is necessary and sufficient to implement all of the prior art scheduling policies.
[0148] Threads and Synchronization
[0149] The thread support and synchronization components 316, 318 provide basic thread support and synchronization primitives. Each thread is run in one address space. A thread is usually created in the address space of the component in which it is started. If there is no virtual memory, the address space is always the physical address space. Threads can be created in remote components as well as in local components that are part of the same address space. Threads can block on mutexes and conditions. They can inform the scheduler of their time constraints, but these calls will fail if the scheduler is not a constraint scheduler. The constraint scheduler performs priority inheritance when threads block on mutexes. Preferably, the thread support 316 and scheduler 314 are separated so that the scheduler 314 and be changed while maintaining thread support. So, a third party could change the scheduler 314 without affecting applications, so that the applications and the scheduler are isolated.
[0150] NamespacesA simple boot namespace where applications register objects may be provided. The Namespace 320 is a namespace that cooperates with the loader 304 in demand-loading and caching of components. A namespace may be used for displaying the status (e.g. running threads) and performance parameters (e.g. execution times) of a system during development. Filesystems are also namespaces.
[0151] Filesystem
[0152] Filesystem 322 is used to load additional components during runtime, and as permanent data repository. RomFS is a filesystem for read-only in-memory images (arbitrary files and the system can be merged into one image) and FatFS is for reading/writing disks. NetFile is a network filesystem client built on top of sockets.
[0153] Startup Program
[0154] Startup 324 is a program that is started once the system has been initialized. It can be a simple command interpreter that configures the system and launches applications, or the (only) application itself.
[0155] Network
[0156] The Network Program 370 provides the entire BSD4.4Lite network protocol code, with minor adaptations. The interface is a COM interface that provides sockets. The protocols operate the network drivers through another interface.
[0157] Other Components
[0158] A small Win32 compatibility library may be provided to make it easier to use WindowsNT code in some of the drivers and applications.
[0159] How NameSpace is Used to Manage the Loading of Objects
[0160] Namespaces are used to let applications gain access to objects provided by other components. A namespace is like a filesystem directory tree, except it can hold any kind of objects, not just files. Namespaces can themselves be implemented by different components, including a filesystem that exports its directories as sub-namespaces, and files as registered objects. Namespaces can be registered into other namespaces, extending the directory tree. Location transparency of all objects automatically makes namespaces distributed. Therefore it is easily possible to view some other machine's NameSpace as a sub-Namespace of one's own machine. Namespaces can be filtered for access control or for providing different views for different applications. There is no limit as to the number of namespaces. A component can gain access to its namespace through a call to CurrentNamespace( ). In a minimal system, all applications share the same boot namespace.
[0161] When an application looks up a name in the namespace, it obtains a reference to the object: a local direct reference in case the object is local, or an automatically created proxy if the object is remote. (For remote objects, the interprocess communication (IPC) system described below in this specification is responsible for creating proxies, handling the delegation to remote objects, and reference counting.) A namespace is free to only give access to previously registered objects or to create objects on demand, as it sees fit. The namespace only handles the IUnknown interface. It is up to the application to obtain the proper interface from the object, using the QueryInterface method.
[0162] The interface of Namespace includes a method called Bind. The Bind method is used to request an object. The Bind method finds whether the requested object has already been loaded and, if not, Bind obtains the IUnknown interface of the requested object and returns it as an argument. Bind returns a pointer to the IUnknown's pointer to the requested object. Bind is the method that looks up the requested object in Namespace while Register is the method for registering the object in Namespace. After the object has been loaded, the Query Interface method may be used to query the object.
[0163] Objects can be made available to other components by registering them in a namespace. Every COM object has a virtual method table and at least the three methods derived from the base interface (the IUnknown) used with Namespace: QueryInterface for agreeing on the interface protocols, and AddRef and Release for reference counting. Specific interfaces have additional methods to do the actual work. In addition, a constructor is usually provided.
[0164] Garbage collection is done through reference counting. When Release has called for the last reference, the implementation can finalize and deallocate the object. Even if reference counting has limitations, it is convenient in a system environment due to its simplicity.
[0165] Interaction with objects using other garbage collection models can be achieved through proxies that intercept the IUnknown methods to update their root sets.
[0166] Loadable Virtual Memory Manager
[0167] Virtual memory provides virtual address spaces. Threads run in either a virtual address space or in the physical memory space. Components can be loaded into any address space. Component code may be shared between various address spaces like shared libraries. For instance, any code loaded into the physical memory space is made visible to applications regardless of their address space. There is nothing secret in the system's code, so there is no security problem. Virtual memory can be used to protect sensitive application code, for instance to defend it against disassembly and theft of intellectual property. Unlike most existing operating systems, in the preferred embodiment of the invention the support for virtual memory is not an integral part of the system. The system can function with or without it, and it executes the same executable binary images. The virtual memory manager is a component like any other, and is loaded dynamically on demand.
[0168] Loading or unloading of the virtual memory system does not interfere with applications already running, or started later on in the physical memory space. Once the virtual memory system has been started, new components can be loaded into any address space. Component code may be shared between different address spaces, as is the case with shared libraries.
[0169] Virtual memory can be used for:
[0170] Security reasons, when a program is not trusted. The virtual memory system implements firewalls between applications.
[0171] Covering for common programming errors such as NULL pointer references and memory leaks.
[0172] Creating a sparse address space. This often leads to better memory utilization with fragmented heaps.
[0173] Paging. This provides more memory than available, working set adaptation, and mapped files.
[0174] Safe and flexible memory sharing: Copy-on-write for libraries, shared memory windows.
[0175] Running non-relocatable executables as described above.
[0176] Implementing garbage collection and other protection mechanisms.
[0177] In accordance with preferred embodiments of the invention, the virtual memory manager is a loadable component that provides multiple virtual address spaces. It can be viewed as a driver for MMU hardware. It creates virtual memory mappings using physical memory and MMU hardware. Loading and starting the virtual memory manager executable does not interfere with applications already running. Unloading can be done once all references to objects provided by the manager are released. A new one can be started if needed.
[0178] A virtual memory space looks like the physical memory space, except it can be larger, doesn't have to be contiguous, can be paged, protected, replicated, and can be (recursively) mapped to other objects.
[0179] The virtual memory manager exports a number of control interfaces that are used to create new address spaces (VMSpace), to map address spaces or files to address spaces (VMMap), to link threads to address spaces (VMView), to control state and protections (VMSpace), and to create instances of virtual memory objects (VMFactory).
[0180] Realistically, any MMU driver will need exclusive control of the MMU hardware. However, other objects implementing the virtual memory interfaces can be interposed between an application and the MMU driver. In this way, logical virtual memory systems can be arbitrarily composed or, for instance, stacked as in the Exokernel. A stacked virtual memory system relies on another one for its base functionality, but add some specific functionality of its own. For example, the bottom virtual memory manager in the stack could control the local or host machine, while virtual memory managers stacked in higher layers would control other machines thereby providing an application running on top of multiple machines the illusion of running on a single shared memory multiprocessor system. What the present invention facilitates is multiple levels of virtual memories which are recursively stackable. Furthermore, the invention provides transparency. The Exokernel approach was to have a library that implements a virtual memory, so that there is in effect only a single level, since one library cannot be stacked over another library. In the present invention, the different virtual memories exist in different address spaces, so that there is no corresponding limitation on stacking multiple levels, and the stack is remotable. For example, referring to FIG. 1C, the invention can be used to make a collection of different machines (machine-A and machine-B) appear as a single address space by employing a local virtual memory (VM-1) while other virtual memories (VM-2 and VM-3) relating to the address spaces of the other machines but having the same interface as the local virtual memory are stacked over it.
[0181] Design
[0182] The virtual memory interfaces were designed for flexibility and simplicity:
[0183] VMSpace. Methods: Reserve, Delete, Map, Protect, CacheControl, QueryVM, CreateShadow.
[0184] Reserve( ) reserves regions in the VMSpace, Delete( ) deletes them. Map( ) maps VMMap objects to the space. Protect( ) sets permanent attributes (e.g. read-only), CacheControl( ) controls transient attributes (e.g. dirty) and paging state (e.g. present). QueryVM( ) returns information about ranges, CreateShadow( ) creates a new mapping, atomically moves ranges from the current mapping to the new one and maps the new one into the current one. It facilitates symmetric copy-on-write. A VMSpace also implements the VMMap interface, returned by QueryInterrface( ).
[0185] VMMap. Methods: Read, Write, Share, QueryAccess, GetSize, Clone.
[0186] Read( ) and Write( ) are used for copy-paging. Share( ) is used to establish shared pages between two VMSpaces and to return pages to VMView::Fault( ). QueryAccess( ) and GetSize( ) return information about the mapping. Clone( ) creates a restricted mapping from the given VMMap. The constructors are used to turn files into mappings and to create memory with no backing store.
[0187] VMView. Methods: SwitchTo, Fault, SetMapping, GetMapping.
[0188] SwitchTo( ) is called by the context switch path.
[0189] Fault( ) is called at page fault time.
[0190] SetMapping( ) changes the associated VMMap (and indirectly VMSpace).
[0191] GetMapping returns the current associated mapping.
[0192] VMFactory. Methods: CreateVmView, CreateVmSpace, CreateVmMappingFromFile, CreateVmMappingFromZero, VMFactory is an object that contains the constructors of other VMM objects.
[0193] Process creation works as follows. A new VMSpace and a new VMView are created. The VMView is bound to the VMSpace. A temporary file is created as backing store for zero-fill memory allocations. A VMMap is constructed with CreateFromFile( ), and it is mapped into the VMSpace. A Heap is created from the resulting memory. The file for the executable is looked up from a namespace. A VMMap object is constructed from that file. The VMMap is mapped into the VMSpace copy-on-write, with backing store for any copies coming again from the temporary file. The loader relocates the code if necessary. An IPC channel to the newly loaded component is created. A thread is created with the VMView associated to it. The thread is started by handing it to the scheduler.
[0194] Page fault handling works as follows. VMView::Fault( ) is invoked. This in turn calls Share( ) on the associated VMSpace (which also exports VMMap interface). Assuming the page is not present, Share first calls VMSpace::CacheControl( ), which calls VMMap::Read( ) on the file's VMMap, which in turn calls File::Read( ) on the file object. The VMSpace then adds the returned data to its page list and returns a reference to it to the VMView, which adds it to the VTLB (virtual translation look-aside buffer), which makes it available to the hardware.
[0195] Memory wiring works as follows. The application invokes VMSpace::Protect( ) with the wire flag set. Protect first calls VMSpace::CacheControl( ) to page in all the pages. VMSpace::CacheControl( ) may fail if the physical memory is exhausted. In this case Protect fails without wiring any pages. Otherwise it marks the pages wired and returns successfully. Any operation affecting permanent attributes is atomic. Those affecting transient attributes are not.
[0196] FIG. 2 illustrates how page table entries are used in providing translation between virtual memory addresses and physical memory addresses by the run-time loadable VMM 362 of FIG. 1B. Each VMSpace object stores a pointer value which is loadable into a page directory register 400 provided in the host computer's microprocessor. The pointer value, once loaded into the page directory register 400, points to a particular page table 402 which is one of many page tables available to the system. Each page table provides a particular set of address translations between virtual memory addresses and physical memory addresses. Different objects may be stored in different memory locations and therefore may require different address translations. The individual page table 402 has a set of virtual addresses 404 and physical addresses 406 correlated to corresponding ones of the virtual addresses, the addresses constituting the page table entries of the page table 402.
[0197] Certain advantages of virtual memory are realized by controlling the allocation of physical memory. For example, in order to protect certain areas of memory from being written, the page table entry for a particular physical address may include an indication that the address is available for read-only. In order to hold a certain area in reserve or to delay the allocation of physical memory until an actual need arises, the page table entry for a particular physical address may include an indication that the physical address is "invalid". A pointer to the corresponding virtual memory address will cause the system to take a virtual memory trap, as will be described below.
[0198] Referring to FIGS. 3 and 4, the virtual memory manager (VMM) includes the following interfaces: IVMSpace 610, IVMMap 620, IVMView 630, IUnknown 640 and IVMFactory 650. The IUnknown interface preferably is included in every object. The purpose of this is to give every application the ability to query the object to determine if it supports a given interface. IUnknown refers to processes for querying for a given interface ("query"), for adding a reference to the object ("add") and for releasing a reference to the object ("release"). Thus, each of the three primary interfaces of VMM includes the three processes of IUnknown, as illustrated in FIG. 4.
[0199] VMFactory 650 has an interface INamespace 652 which it exports. Inamespace is used to enumerate all the objects that one virtual memory manager handles, regardless of whether they have also been registered in the common Namespace.
[0200] FIG. 4 illustrates the VMM interfaces as containing particular methods. IVMSpace contains the methods Query, Add, Release, Reserve, Delete, Map, Protect, CacheControl, QueryVM, CreateShadow. IVMMap contains the methods Query, Add, Release, Read, Write, Share, Clone, QueryAccess, GetSize. IVMView contains the methods Query, Add, Release, SwitchTo, SetMapping, Fault, GetMapping. IVMFactory contains the methods or constructors CreateVMView, CreateVMSpace, CreateVMMappingFromFile, CreateVMMappingFromZero. IUnknown contains the methods Query, Add, Release.
[0201] FIG. 5 illustrates the internal object architecture employed in a preferred embodiment of the invention. The object has a virtual table (V-table) 510 and a state 520. An instance pointer 530 points to the object. The state 520 can include, for example, the page table register contents, the IUnknown reference count and pointer values. The pointers of an object will be discussed below with reference to FIG. 7.
[0202] The V-table 510 points to a particular interface 540, which may be one of many interfaces available to the system. The interface 540 lists a number of processes or methods associated with the particular object. Each method entry listed in the interface 540 points to an implementation 550 of that method. The implementation 550 contains the code for carrying out the algorithm implementing the particular process or method listed in the interface 540. As illustrated in FIG. 5, more than one object may point to the same interface. Of course, it is to be expected that different objects have their own unique interfaces in many instances. For any application calling the object, the object's interface provides a complete list of the processes or methods supported by the object.
[0203] FIG. 6 corresponds to FIG. 5 and illustrates the object architecture of the VMView object of the VMM (virtual memory manager). In addition to the three universal IUnknown processes of query, add and release, VMView further includes the processes of fault and switch to. The "fault" process will be discussed below with reference to FIG. 10
while the "switch to" process of VMView will be discussed below with reference to FIG. 12.
[0204] FIG. 7 illustrates a preferred embodiment of the Virtual Memory Manager (VMM). The VMSpace is implemented as a sequence of regions connected by a skip-list of the type well-known in the art. Each region contains a mapping, a copy mapping (for copy-on-write), a number of page lists for physical memory pages, and a set of attributes. Permanent attributes are kept in the region; any transient state is part of the page list. The VMSpace also exports a VMMap interface so that it can be directly mapped into other address spaces. FIG. 7 illustrates mappings by VMMap of various regions of VMSpace to other objects, as will be discussed in more detail below. A VMSpace is 64 bits wide in the preferred embodiment.
[0205] The VMView provides a view into a VMSpace for one or more threads. It is limited by the hardware address size (e.g. 32 bits). If the VMSpace it points to is indeed larger (64 bits) then the view is a window into part of that space. A virtual translation look-aside buffer (VTLB) is attached to the VMView. The VTLB contains machine dependent mapping information for the MMU hardware and a translation cache. The VTLB interface is common across all architectures. The rest of the system is unaware of the virtual memory component, with a few exceptions. A thread must hold a pointer to its VMView so that page faults can be resolved within the correct context, and the context switch path must check for a VMView change. If the context switch path detects an address space change, it calls a VMView method to synchronize the MMU hardware with the change. The virtual memory manager registers its interrupt handler with the ICU driver. The Heap may choose to modify its behavior when running over virtual memory. The loader can only create new address spaces when a virtual memory system is present. The IPC system may utilize virtual memory mappings for data transfer.
[0206] Referring to FIG. 7, different threads 710, 715 can contain pointers pointing to the same VMView object 720. The VMView object contains pointers pointing to a page of PTEs (page table entries) 725 and to a VMSpace object 730. The VMSpace object 730 points to (defines) Region1, Region2 and Region3, corresponding to different non-contiguous memory regions linked by the skip-list. These regions are mapped to other objects in the manner illustrated in the example of FIG. 7 by pointers provided by the VMMap interface of VMSpace. Region1 is reserved (no writing permitted) and points to an empty space 735 (labelled Empty). Region2 has one pointer to a VMMap object 740 corresponding to a region in memory containing all zeroes, which is labelled ZeroVMMap.
[0207] Region2 has another pointer to a page list 745. The page list 745
points to a memory 750 having allocated memory spaces in which writing is permitted. Region3 has one pointer to a VMMap object 755 which is an implementation of VMMap for mapping files and is labelled FileVMMap. The FileVMMap has a pointer pointing to a file 760. Region3 has another pointer pointing to a VMSpace object 765 (which is different from the VMSpace object 730 discussed above). The connection through Region3
between the two VMSpace objects 730, 765 supports a copy process. Region3
has yet another pointer pointing to a PageList 770. The PageList 770 has a pointer pointing to a second PageList 775. The second PageList 775
points to a Memory object 780. The linking of the two successive PageLists supports a copy-on-write function.
[0208] In summary, the VMMap object can map a region to another VMMap object, examples of which are illustrated in FIG. 7 such as the pointer from Region3 to File VMMap 755 and the pointer from Region2 to Zero VMMap 740. The mapping can be to a different portion of the VMMap object itself rather than another VMMap object. The VMMap object can map to the ZeroMap object providing zero-filled physical pages, as illustrated in FIG. 7 by the pointer to the ZeroMap object 740. The VMMap object can map a region to a VMSpace object, as illustrated by the pointer from Region3 to VMSpace 765. The VMMap object can map a region to a file, as illustrated by the pointers from Region3 to the File VMMap 755 and from thence to File 760. This may include the case of the "system pager" which handles the traditional paging file. Finally, the VMMap object can map a region to a mapping filter ("CloneMap") which, for example, restricts the protections allowed of a mapping, as illustrated by the pointer from Region3 to CloneMap.
[0209] A PageList lists what pages have been called by the object. This is useful in case this information is forgotten by the PTE, for example.
[0210] The state of an object, such as the virtual memory object of FIG. 7, consists of the pointer values of the various pointers in the object (such as the pointers illustrated in FIG. 7).
[0211] FIG. 8 illustrates the Load VMM (load virtual memory manager) process. The first step (block 810 of FIG. 8) is to install the VMM Fault Handler method. The last step (block 830) is to register the VMM constructors, which are part of the VMFactory interface. This step is carried out by creating an object of the IVMFactory type and registering it into the NameSpace.
[0212] A local constructor is a method in an object. In order to export it to another address space, the constructor must be wrapped in a factory object of the type well-known in the art. These objects are designated as "XYFactory". For example, VMFactory can create VM objects. There is a correspondence between C++ classes and factory objects. A C++ compiler could automatically create the factory of an object.
[0213] FIG. 9 illustrates how a virtual memory fault is handled. The first step is to save the state of the object or thread that was running at the time the fault or trap was taken (block 910 of FIG. 9). Next, the VMM Fault Handler is called (block 920). Then, the state of the object is restored (block 930).
[0214] FIG. 10 illustrates the operation of the VMM Fault Handler. The first step (block 1010 of FIG. 10) is to call IVMView::Fault. The next step is to determine whether IVMView::Fault can provide a VM mapping (block 1020). If so ("YES" branch of block 1020), the page table entry is loaded (block 1030). Otherwise ("NO" branch of block 1020), an exception is taken (block 1040) and an exception handler is called (block 1050).
[0215] FIG. 11 illustrates how a VM fault is taken. The first step (block 1110 of FIG. 11) is to determine whether the VM fault is due to an error. If so ("YES" branch of block 1110), then an exception is taken (block 1120). Otherwise ("NO" branch of block 1110), a determination is made whether the VM fault is due to a memory allocation postponement, sometimes referred to as "lazy memory allocation" (block 1130 of FIG. 11). Lazy memory allocation is a feature that can be implemented to guard against premature allocation of memory otherwise caused by early requests by an application. The determination of whether the VM fault is due to lazy memory allocation is made by determining whether there is any reference to the object in the PageList. If the fault was due to lazy memory allocation ("YES" branch of block 1130), then VMSpace is called to perform the memory allocation (block 1140 of FIG. 11). Otherwise ("NO" branch of block 1130), a determination is made whether the VM fault was due to a copy-on-write process (block 1150). This determination is made by asking whether the current PageList points to another PageList (as discussed above with reference to FIG. 7). If not ("NO" branch of block 1150), the reference in PageList is copied to the page table or PTE (block 1160). Otherwise ("YES" branch of block 1150), the pointer to the other PageList is taken (block 1170 of FIG. 11). This entails allocating a new page, copying the content of the old page to the new page, and then entering the mapping for the new page in the PTEs.
[0216] FIG. 12 illustrates the operation of the context switch process. First, the Scheduler (shown in FIG. 1) decides to perform a context switch (block 1210 of FIG. 12). This decision may be occasioned, for example, by a thread running up to a predetermined time limit. As a result, the Scheduler selects a new thread to replace the currently-running thread (block 1212 of FIG. 12). In a preferred embodiment, the next step is to inspect the views of memory provided by IVMView of the current and new threads (block 1214) and determine whether they are different (block 1216). If so ("YES" branch of block 1216), IVMView::SwitchTo is called (block 1218). Otherwise ("NO" branch of block 1216), the system returns to the new thread (block 1220) without altering the contents of the page directory register 400 of FIG. 2.
[0217] FIG. 13 illustrates the SwitchTo process, consisting of the step of loading the page directory register 400 of FIG. 2 with the new page directory value.
[0218] FIG. 14 illustrates the process for unloading the virtual memory manager (VMMUnload). The first step begins whenever the last thread of the last address space using virtual memory requests termination (block 1410 of FIG. 14). The next step is to determine whether the reference count is zero (block 1420). If so ("YES" branch of block 1420), a choice (block 1430) may be made whether to terminate the virtual memory (block 1432) or to mark the VM object as "cached" (block 1434). In the preferred embodiment, the virtual memory manager is terminated (block 1432). Otherwise ("NO" branch of block 1420), the reference is released, which decrements the reference count by one (block 1440).
[0219] FIG. 15 illustrates the page fault handling process. The first step is to call IVMView::Fault upon the occurrence of a page fault (block 1510). IVMView::Fault calls VMMap::Share (block 1520) which in turn calls VMSpace::CacheControl (block 1530), which in turn calls VMMap::Read (block 1540). VMMap::Read calls File::Read (block 1550), which returns data--such as the content of a file, for example--(block 1560 of FIG. 15). Finally, VMSpace adds the returned data to PageList (block 1570).
[0220] FIG. 16 illustrates how a constructor creates a thread. First, VMFactory creates a VMSpace, and VMSpace creates an address space (block 1610). Next, VMFactory creates a VMMap and VMMap maps an object into the address space (block 1620).
[0221] Then, VMFactory creates a VMView and VMView creates a view of the address space created in the previous step (block 1630). Finally, the scheduler creates a thread associated with the view created in the previous step (block 1640).
[0222] FIG. 17 illustrates how multiple views of the same memory space are provided for multiple threads. In the example of FIG. 17, two different threads, THREAD1 and THREAD 2 are directed by the PTE to two different views, VMView1 and VMView2, respectively, of the same memory space, VMSPACE1 through the PTE. For THREAD1, VMSPACE1 has a pointer to VMMap1
which points to File1. For THREAD2, VMSPACE1 has a pointer to VMMap2
which points to File2.
[0223] FIG. 18 illustrates possible results of the implementation of FIG. 17. In FIG. 17, the view of memory by THREAD1 may make the memory appear smaller than its actual physical size, which the view of THREAD2 may make the memory appear to have a wider address field than that of the actual physical memory. For example, the actual physical memory may have an address space that is 32 bits wide while it appears to be 64 bits wide in the view of memory provided to THREAD2. On the other hand, the physical memory address space may be 64 bits wide while it appears to be only 32
bits wide in the view provided to THREAD1. In a more complex case, the address space widths of the objects VMView ("THREAD1") and VMSpace ("THREAD2") and of the physical memory may all differ from one another, being in one example 32 bits, 64 bits and 36 bits, respectively. The address space width in VMSpace may be chosen without regard to the address space widths of the virtual memory and the physical memory.
[0224] The virtual memory interfaces described here loosely correspond to the Mach external memory manager scheme [Young89]. VMView and VMSpace replace Mach's task. Memory objects are replaced by VMSpaces. VMMap and VMSpace jointly implement the XMM interface, although with synchronous interfaces only. The VTLB interface is a refinement of the Pmap interface.
[0225] In other implementations of the invention, one could emulate Unix or Windows virtual memory APIs, and the design would easily permit it (including fork).
[0226] Loadable Interprocess Communication (IPC) Manager
[0227] An IPC system is needed if applications are to be run in separate address spaces. Otherwise the applications can not talk to each other or to system services. An IPC system allows:
[0228] Communication between address spaces within a machine.
[0229] Communication between applications in different machines in a distributed environment.
[0230] Graceful termination and cleanup of applications even within one address space.
[0231] Cleanup involves releasing the memory held by an application. It also involves closing all references into and out of the application's objects. A level of indirection is needed for bookkeeping and for providing a cutoff point. This level of indirection is what an IPC system provides.
[0232] A preferred embodiment of the IPC system implements the COM model. It is possible, however, to replace it with another communication model for applications that expect a different model. Components implementing various communication paradigms can be loaded into the system as needed.
[0233] The preferred interprocess communication (IPC) manager is a run-time loadable resource residing outside of the operating system kernel, as illustrated in FIG. 1B. The loader can load the Loadable IPC Manager at any time based upon need. The Loadable IPC Manager can also be unloaded whenever it is no longer required by any applications currently running.
[0234] FIG. 19 illustrates the basic components of the Loadable IPC Manager. A trap component 1910 consists of a program to install the method IPC Trap Handler 1912, which is called whenever the system recognizes a need within the currently-running thread to communicate with another thread or a need to communicate with a thread in another machine. A copy component 1930 can be implemented with either a simple "copy" function or virtual memory or shared memory.
[0235] FIG. 20 illustrates the operation of the Loadable IPC Manager, which may have the file name IPC.EXE in a preferred implementation. The first step is to load the Loadable IPC Manager (block 2010). The next step is to install the IPC Trap Handler (block 2020). Finally, the IPC constructors are registered in NameSpace (block 2030).
[0236] FIG. 21 illustrates the structure of the interface between the loadable IPC and threads that use it. In the exemplary case of two threads 2110, 2120 which use a particular Loadable IPC Manager 2130, each thread has its own pointer 2110a, 2120a pointing to the Loadable IPC Manager 2130. The Loadable IPC Manager 2130 has interfaces for the following methods: Query Interface 2140, Add Reference 2150, Release Reference 2160 and IPC Trap Handler 2170. As will be discussed later in this specification, each one of these methods and interfaces is replaceable.
[0237] FIG. 22 illustrates the intercommunication provided by the Loadable IPC Manager between threads in different address spaces. One thread (THREAD1) resides in a first address space 2210, and has a pointer to a loadable IPC Manager 2220. At some point in the running of THREAD1, it causes the Loadable IPC Manager to provide communication with another thread (THREAD2) in another address space 2230. Both THREAD1 and THREAD2
have a pointer to the Loadable IPC Manager 2220. The Loadable IPC Manager provides the requisite communication between THREAD1 and THREAD2. For example, THREAD1 may need to increment a counter controlled by THREAD2 in the other address space. The magnitude of the increment may need to be communicated to THREAD2 and the value of the count after incrementation may need to be returned to THREAD1, in some cases. The Loadable IPC Manager handles both of these communications, thus providing two-way communication between THREAD1 and THREAD2 as illustrated in FIG. 22.
[0238] The IPC process begins when THREAD1 signifies a need to go to another address space to access some resource in that other address space. In one implementation, THREAD1 makes a remote procedure call directed to a resource controlled by THREAD2 in the other address space 2230. Such a remote procedure call is transparent to the application running. Since THREAD1 is currently running in the current address space 2210, the thread takes an IPC trap.
[0239] FIG. 23 illustrates how an IPC trap is handled. First, the state of the currently running thread (THREAD1) is saved (block 2310). Then, a call is made (block 2320) to the IPC Trap Handler which (unlike the Loadable IPC Manager stored outside of the operating system kernel of FIG. 1) resides within the operating system kernel.
[0240] FIG. 24A illustrates the operation of the IPC Trap Handler. First is the occurrence of an IPC trap (block 2410). The next step is to copy the arguments of the currently running thread (block 2420). These arguments may be, for example, the identity of a counter in another address space. The thread traps into the loadable IPC system so that its VMView is changed to the VMView that points to the new address space (block 2430). The thread continues running in the new address space where it invokes an object or method that can produce desired information or data such as a new incremented counter stored in a register (block 2440). The next step is to return the values of the new data from the thread (block 2450). Then, the thread switches back to the original VMView (block 2460). In contrast, conventional IPC systems (which were not loadable as the IPC of the present invention) typically did not move the thread from one address space to the other. Instead, the desired information in the second address space was obtained by selecting and running a different thread in that second address space. In contrast, the preferred embodiment of the loadable IPC of the invention uses a single thread and moves between among address spaces by changing its VMView. However, an alternative embodiment of the loadable IPC of the present invention could be implemented using the conventional technique employing different threads.
[0241] Structure of a Loadable IPC System
[0242] The loadable IPC system of the invention (hereinafter referred to as the "LIS") differs in structure from a non-loadable one in the handling of the dependencies upon other subsystems (such as the scheduler and the virtual memory). The LIS can only have dynamic dependencies, and therefore cannot use any "backdoor" or make assumptions about the internals of other components. The LIS finds these components in the NameSpace, most likely at initialization time. If a specific component is not found or cannot be loaded, the LIS fails to initialize and unloads itself.
[0243] It is possible for the LIS to require specific interfaces from these other components, which it asks for via the Query Interface method. This could mean that not all virtual memory managers would be suitable for a given LIS, only those that implement the interface(s) the LIS requires. The converse is also true: all virtual memory managers that implement the required interfaces should be suitable for LIS use. In the preferred embodiment of the LIS of the invention, it is an interface definition error for a component to violate this rule.
[0244] Different LISs can depend on different sets of other components. Specifically, it is possible for a simple LIS not to require a virtual memory manager and still be fully functional. For instance, such a LIS would be useful for connecting two simple computers that do not possess a memory management unit (MMU).
[0245] The LIS provides two classes of services: administration services and communication services. Administrative services support the creation, linking and destruction of communication endpoints and any other ancillary service. Communication services involve the transmission of the input and output arguments of an RPC and the creation and management of proxies and method signatures. An application requests these services from the LIS in an architectural and LIS-dependent way. More specifically, a processor usually provides special instructions to request system services ("system calls"); execution of such an instruction causes a processor trap. The LIS can be viewed as the system handler for such a trap. The LIS is handed the state of the thread at trap time. It is then LIS-dependent what elements of the thread state are intended to convey information to the LIS. For example, a certain processor register might contain the index of the service the application requires. Different embodiments of the LIS could offer different services, and use different register use conventions for argument passing.
[0246] A thread is associated with a specific LIS. Different threads can be associated with different LIS's. If two LIS's do not communicate, then their client threads also cannot communicate. It is not preferable to split a single computer into two or more non-communicating subsystems, except for the extreme case of highly secure systems. Therefore, the most practical case is the one where multiple LIS's are able to communicate among each other. In the present invention, this can be accomplished quite naturally by loading all LIS's in the same address space, making them visible in a common NameSpace, and using normal object invocation for their interaction.
[0247] Interfaces and Methods
[0248] A preferred embodiment of LIS.EXE has five internal (not exported) interfaces: ILISFactory, IPCSpace, IEndPoint, ISignature, and IEndPointTable.
[0249] ILISFactory::CreateIPCSpace( ) creates an IPCSpace with one empty ExportTable and one empty ImportTable. Both tables are IEndPointTables.
[0250] ILISFactory::CreateEndPoint(IPCSpace,ObjectPointer) creates an EndPoint in the given IPCSpace, to represent the object at virtual address ObjectPointer. The EndPoint has the IUnknown default signature associated with it. The EndPoint receives a new IPID, which is a 128 bit universal identifier uniquely associated with the EndPoint.
[0251] ILISFactory::CreateSignature(IID,ParsingString[ ]) creates a type signature given an array of properly formed parsing strings. Each string describes the type signature of one method. There are as many strings as there are methods in the interface. IID is the 128 bit universal identifier for the interface.
[0252] IPCSpace::GetImportTable( ) returns the ImportTable associated with the given IPCSpace.
[0253] IPCSpace::GetExportTable( ) returns the ExportTable associated with the given IPCSpace.
[0254] IEndPoint::SetSignature(ISignature) changes the signature of the given EndPoint.
[0255] IEndPoint::GetSignature( ) returns the signature of the given EndPoint.
[0256] IEndPoint::GetIPID( ) returns the IPID of the EndPoint.
[0257] ISignature::GetParsingString(MethodNumber) returns the parsing string for the given method number.
[0258] IEndPointTable::Add(EndPoint) adds an endpoint.
[0259] IEndPointTable::Remove(EndPoint) removes an endpoint.
[0260] IEndPointTable::Lookup(IPID) looks up the ID number of the endpoint.
[0261] In addition to the interfaces described above, LIS.EXE makes use of Proxies inside the application address space. A Proxy is an object like any other, but it merely acts as the representative of some other remote object. The application invokes methods on the proxy, and the proxy's implementation for all methods simply traps to LIS.EXE. Two notable exceptions are the AddRef and Release methods, which maintain a local reference count. Only the Release method traps, and only when the local reference count goes to zero. Other embodiments of LIS might not require the use of proxies.
[0262] Operational Overview of the Loadable IPC System
[0263] FIG. 24B illustrates how instances of these interfaces are related. In the drawing, two threads belong to two different applications. Thread-B was given access to the endpoint EndP-1.
[0264] LIS.EXE handles the NameSpace specially, for three reasons. LIS.EXE is responsible for exporting primitive (`kernel`) objects to its clients. These are the objects that are found in the boot NameSpace, but for which LIS.EXE does not have an EndPoint. Other components that are loaded in the physical address space along with LIS.EXE can create and register objects in the boot NameSpace before and/or while LIS.EXE is loaded and active. When an application looks up one such object in the NameSpace, LIS.EXE automatically creates an EndPoint in the primitive IPCSpace. Secondly, when an application calls INameSpace::Bind( ) to obtain access to a remote object, LIS.EXE must be able to intervene and create a Proxy for the remote object in the application's address space. Similarly, when an application wants to INameSpace::Register( ) an object, LIS.EXE must be able to intervene and remember which VMSpace was associated with the object.
[0265] Finally, when an application terminates abnormally LIS.EXE cleans up after it. Among other things, LIS.EXE is responsible for removing from the NameSpace all objects that belonged to the dead application.
[0266] An application thread is associated with a given LIS at thread creation time. More specifically, the LIS associates an IPCSpace instance to the VMSpace of the thread. This IPCSpace instance is the object passed to LIS.EXE at trap time. Other per-thread information that is used at trap time is the service requested, object IPID, method number, and the stack pointer register to access the arguments of the method call.
[0267] An application thread can find LIS.EXE in the NameSpace. An IPCSpace contains an ImportTable and an ExportTable, each table containing pointers to EndPoints. The ExportTable points to EndPoints that the IPCSpace exports to other IPCSpaces. As a result, the application associated with that IPCSpace fully implements the object the EndPoint represents. The ImportTable acts as a security filter; it only allows a thread access to those remote objects that it was granted access. The ImportTable can also be used as a renaming table: the application uses local names for remote objects that are only meaningful when used with the application's ImportTable.
[0268] An EndPoint has a unique ISignature. This provides an advantage in the effect of the QueryInterface method: when an application invokes QueryInterface on a remote object, it truly receives a new proxy for a new EndPoint.
[0269] In the example illustrated in FIG. 24B, Thread-A creates an EndPoint EndP-1, passing in a Signature Signature-1 and a virtual address Object-A. The EndPoint is entered in the ExportTable ETable-A of the thread's IPCSpace-A. A reference is taken on the thread's VMSpace-A (not shown). Thread-A now registers the EndPoint in the NameSpace. One reference is taken on the object and one on the EndPoint, to signify that the object is exported and visible in the NameSpace.
[0270] Generally, threads do not explicitly create EndPoints; they register objects in the NameSpace. It is LIS.EXE that automatically creates an EndPoint for the objects as part of its overriding of the INameSpace::Register( ) method. Alternatively, a method invocation might require the passing of an object as argument or as return value. Again, LIS.EXE automatically creates an EndPoint if necessary, inserts it in the ExportTable if not already in there, and inserts it in the remote party's ImportTable if not there already.
[0271] With reference to FIG. 24B, Thread-B can either look up Object-A in the NameSpace, or invoke a method on some other object in VMSpace-A that returns Object-A as result. In either case, LIS.EXE