Home
Patent Search
IMT Blog
REGISTER
|
SIGN IN
United States Patent
5465365
Winterbottom
November 7, 1995
Title
Apparatus and methods for making a portion of a first name space available as a portion of a second name space
Abstract
Apparatus and methods for incorporating a portion (including all) of a first name space into a second name space. An agent operating in the first name space receives first operation specifiers which use names as they are used in the second name space from an entity operating in the first name space and translates the first operation specifiers into second operation specifiers which specify the same operations and use the names as they are used in the first name space. The apparatus and methods are implemented in the Plan 9 operating system. The first and second name spaces are organized as trees and belong to a first process operating on a first processor and a second process operating on a second processor. The first operation specifiers are Plan 9 file protocols which the first process receives from the second process. The second operation specifiers are Plan 9 file system calls made by the first process. The translation between the file protocols and the system calls is made using a data structure which permits the names in the portion as they are used in the second name space to be mapped onto the names as they are used in the first name space.
Inventors:
Winterbottom; Philip S.
(Gillette,
NJ
)
Assignee:
AT&T Corp.
(Murray Hill,
NJ
)
Appl. No.:
247626
Filed:
May 20, 1994
Current U.S. Class:
707/101
Current International Class:
G06F 17/30 (20060101)
Field of Search:
395/700,425,400,600
U.S. Patent Documents
4853843
August 1989
Ecklund
5014192
May 1991
Mansfield et al.
Foreign Patent Documents
91110042
Dec., 1991
EP
91306291
Jan., 1992
EP
93104967
Oct., 1993
EP
Other References
Tannenbaum, et al, "Distributed Operating Systems", Computing Surveys, vol. 17, No. 4, Dec. 1985. .
Mullender, et al., "Amoeba--A Distributed Operating System for the 1990s", IEEE Comp. Magazine, May, 1990. .
Comer, et al., "An Experimental Implementation of the Tilde Naming System", Comp. Systems, vol. 3, No. 4, Fall 1990, pp. 487-515. .
Cabrera et al., "Quick Silver Distributed File Services: An Architecture for Horizontal Growth", CH2441-4/88/0000/0023, IEEE, 1988. .
Presotto, et al., "The Organization of Networks in Plan 9", Proceedings of Winter Usenix Conf., San Diego, 1993, pp. 271-280. .
R. Pike, et al. "The Use of Name Spaces in Plan 9", Proceedings of the 5th ACM SIGOPS Workshop, Mont. Saint Michel, 1992, paper #34. .
H. C. Rao, "Distributed Application Framework for Large Scale Distributed Systems", The 13th International Conference on Distributed Computing Systems, 25 May 1993, Pittsburgh, Pa., pp. 31-38. .
P. B. Hansen, "A Joyce Implementation", Software Practice & Experience, vol. 17, No. 4, Apr. 1987, Chichester GB, pp. 267-276. .
A. K. Sanwalka, Z. G. Vranesic, S. G. Zaky and V. C. Hamacher, "An Approach to Internetworking for Existing Operation Systems", Proceedings of the IEEE Pacific RIM Conference on Communications, Computers and Signal Processing, 4 Jun. 1987, Victoria, BC, CDN, pp. 55-59..~
Primary Examiner:
Chan; Eddie P.
Attorney, Agent or Firm:
Nelson; Gordon E.
Parent Case Text
This application is a continuation of application Ser. No. 07/999755, filed on Dec. 31, 1992, now abandoned.
Claims
What is claimed is:
1. Apparatus which is implemented in a computer system for making an exported name from a portion of a first name space of user-defined names representing entities available to be used in a portion of a second name space of user-delined names so that use of the exported name in the portion of the second name space results in use of the exported name in the portion of the first name space, the apparatus comprising:
mounting means operating in the second name space for incorporating the portion of the first name space into the second name space and thereby making the exported name available in the second name space;
means operating in the first name space for receiving a first operation specification produced in the second name space, the first operation specification specifying an operation and including the exported name, the exported name being used in the first operation specification as required for the portion in the second name space, the means for receiving operating after operation of the mounting means;
interpretation means operating in the first name space for responding to the first operation specification by producing a second operation specification which specifics the same operation and includes the exported name, the exported name being used in the second operation specification as required for the portion in the first name space; and
means operating in the first name space and responsive to the second operation specification for performing the operation on the entity represented by the exported name.
2. The apparatus set forth in claim 1 wherein:
the means for receiving the first operation specification receives the first operation specification from a source entity which uses the second name space; and
the apparatus further comprises
reply means for receiving a result of the operation from the means responsive to the second operation specification and transmitting a result specification including the result to the source entity.
3. The apparatus set forth in claim 2 further comprising:
a process which is a component in the computer system, the process operating in the second name space and being used to implement the means for receiving, the interpretation means, and the reply means; and
a channel for transferring the first operation specification and the result specification between the process and the source entity.
4. The apparatus set forth in claim 3 wherein:
the source entity is another process; and
the first name space and the second name space are per-process name spaces.
5. The apparatus set forth in claim 1 wherein the computer system further comprises;
a central processing unit and
a local processing unit; and
the first name space, the means for receiving a first operation specification, the interpretation means, and the operation means are implemented in the local processing unit;
the second name space and the mounting means are implemented in the central processing unit; and
the means for receiving a first operation specification receives the first operation specification from the central processing unit.
6. The apparatus set forth in claim 5 wherein:
the local processing unit includes a display and an input device;
there is a plurality of the exported names; and
the exported names include names for the input device and the display.
7. The apparatus set forth in claim 5 wherein:
the first name space and the second name space belong to the same user of the local processing unit and the central processing unit.
8. The apparatus set forth in claim 1 wherein:
the first operation specification specifies one of a plurality of operations, the plurality of operations including a blocking operation, and
the interpretation means responds to the first operation specification when the first operation specification specifies a blocking operation by obtaining a slave process in the computer system for the second operation specification; and
the means responsive to the second operation specification performs the operation on the entity represented by the exported name for the slave process, whereby the interpretation means is free to respond to a further first operation specification while the operation on the entity represented by the exported name is being performed for the slave process.
9. The apparatus set forth in claim 1 wherein the interpretation means includes:
mapping means for mapping the exported name into the second namespace.
10. The apparatus set forth in claim 9 wherein
the means for receiving a first operation specification receives a plurality of operation specifications; and
the mapping means is dynamically constructed in response to certain ones of the received first operation specifications and includes at least means for determining from the exported name how the exported name is to be used in the second operation specification.
11. The apparatus set forth in claim 10 wherein the mapping means comprises:
a dynamically-constructed tree with nodes for names in the first name space, the nodes being arranged such that determining how the exported name is to be used is done by locating the node for the exported name in the tree and traversing the tree from that node to the tree's root.
12. The apparatus set forth in claim 10 wherein:
the mapping means further maps the exported name onto an entity handle which uniquely identifies the entity represented by the exported name in the portion of the second name space but is different from a entity handle employed for the entity in the means for performing the operation, whereby the exported name can be exported to more than one namespace.
13. The apparatus set forth in claim 12 wherein:
the means for receiving the first operation specification receives the first operation specification from a source entity which uses the second name space;
the apparatus further comprises
means for receiving a result of the operation from the means for performing the operation and transmitting a result specification including the result to the source entity;
the first operation specification specifies a status operation for determining a status of an entity represented by the exported name; and
the result specification for the status operation returns the entity handle employed for the entity in the means for performing the operation.
14. A distributed computing system comprising:
a first process which has a first name space of user-defined names representing entities;
a second process which has a second name space of user-defined names;
a channel for transferring messages between the first process and the second process;
means in the first process for associating a portion of the second name space with the channel and binding the channel to a name in the first name space, whereby the names in the portion are exported to the first name space and become available therein;
means in the first process for sending a tint message via the channel which specifies an operation on an entity indicated by one of the exported names, the exported name being used in message as required in the first name space; and
interpretation means in the second process for responding to the first message by producing a first operation specification which specifies the same operation, includes the exported name, and in which the exported name is used as required for the portion of the second name space; and
operation execution means accessed by the second process and responsive to the first operation specification for performing the operation on an entity represented by the exported name.
15. The distributed computing system set forth in claim 14 further comprising:
reply means in the second process for receiving it result of the operation from the operation execution means and transmitting a second message containing the result to the first process.
16. The distribute computing system set forth in claim 15 wherein:
the means for sending the message sends the first message in response to a second operation specification which includes the exported name, the exported name being used in the second operation specification as required for the portion of the second name space, and responds to the second message by providing the result contained therein as the result of the operation specified by the second operation specification.
17. The distributed computing system set forth in claim 14 wherein:
the first and second name spaces are per-process name spaces.
18. The distributed computing system set forth in claim 14 further comprising:
a central processing unit, a local processing unit, and communications means connecting the central processing unit and the local processing unit; and wherein
the first process executes on the central processing unit, the second process executes on the local processing unit, and the channel operates across the communications means.
19. The distributed computing system set forth in claim 18 wherein:
the local processing unit includes a display and an input device; and
the identical names include names for the input device and the display.
20. The distributed computing system set forth in claim 14 wherein:
the first process and the second process belong to the same user.
21. A method employed in a computer system of making an exported name from a portion of a first name space of user-defined names representing entities in the computer system available in a portion of a second name space of user-defined names, the method comprising the steps of:
mounting the portion into the second name space, the exported name thereby becoming available in the second name space;
receiving a first operation specification in the first name space which specifies an operation and includes the exported name from a source entity of the computer system which operates in the second name space, the exported name being used in the first operation specification as required in the second name space;
responding to the first operation specification in the first name space by producing a second operation specification which specifies the same operation and includes the exported name, the exported name being used in the second operation specification as required for the first name space; and
performing the operation specified in the second operation specification on the entity represented by the exported name, in the first name space.
22. The method set forth in claim 21 further comprising the steps of:
obtaining a result of the operation and transmitting a result specification including the result to the source entity.
23. A method employed in a computer system whereby a first entity having a first name space of user-defined names representing entities in the computer system cooperates with a second entity having a second name space of user-defined names in executing a first program, the method comprising the steps of:
in the first entity,
sending a first command specifying the first program to the second entity;
in the second entity,
responding to the first command by sending a second command to the first entity which specifies that the first entity execute a second program which, when executed, makes names from a first portion of the first name space available to the second entity as exported names in a second portion of the second name space;
in the first entity,
responding to the second command by executing the second program, exported names received from the second entity being thereby interpreted as names in the first portion of the first name space;
in the second entity,
incorporating the portion into the second name space; and executing the first program in the second name space.
Description
CROSS-REFERENCE TO RELATED APPLICATION
The techniques described in this application are implemented in the distributed computing system described in U.S. Ser. No. 07/70265 Pike, et al. Distributed Computing System, filed May 17, 1991, now abandoned. FIGS. 1-13 from that application and the portion of the Detailed Description through the section "Obtaining an Open Protocol Channel" have been incorporated into the present application.
BACKGROUND OF THE INVENTION
1. Field of the Invention
The techniques described herein relate generally to name spaces employed in computer systems and more specifically to the incorporation of one name space into another name space.
2. Description of the Prior Art
Many of the entities available to a program executing on a computer system have names, i.e., identifiers which are not addresses but which may nevertheless be used by a program to locate the entity specified by the identifier. The computer system's operating system includes components which permit the user to name entities, which keep track of the relationships between names and entities, and which permit the user to locate the entity by means of its name.
Files are common examples of such named entities. The file system component of the operating system permits the user to give a file a character-string name and to locate the file using the character-string name. Other named entities may be devices such as terminals or printers. The set of names which a program may use to locate entities is termed the name space in which the program executes. The operating system determines how the name space is organized. In some cases, the name space is fiat, that is, all names are on a single level. In other cases, the operating system organizes the names into hierarchies. A common form of hierarchical organization is the single tree. All of the names in the name space are organized into a tree with a single root. Names at interior nodes of the tree represent directories; names at leaf nodes represent ordinary entities. To locate an entity or directory in such a tree, a program specifies the name of the entity or directory and the names of all of the directories between the root of the tree and the entity or directory or the name of the entity or directory and the names of all of the directories between the process's current directory and the entity or directory. The combination of names necessary to locate the entity is termed the entity's path name.
In the past, name spaces have generally been per-system. Any program operating on a system could locate any named entity in the system. An example of such a per-system name space is that provided by a computer running the wellknown UNIX.TM. operating system. All of the files provided by the UNIX operating system are organized into a single tree, and a program may locate any file in the system by specifying the file's path name in the tree. An advantage of this form of organization was that entities used by all programs, such as the executable code for the operating system or other utilities, could be made available to all programs by putting them in at predetermined places in the name space. Indeed, if two systems shared the same naming conventions for those predetermined places, a program which executed on one UNIX system would execute on another. For example, in UNIX systems, commonly-used utility programs are generally kept in the directory which is accessible by the path name/bin.
Originally, computer systems were independent entities. They could be connected by communications media, but the connected systems did not form a single system. As communications media have improved and the price of processors and memory have dropped, distributed computer systems have arisen. In these systems, a set of processors, display devices, and file storage devices which are connected by communications media form a single system. An advantage of a distributed system is that there is no logical limit to system size. While current distributed systems typically consist of a set of work stations which are connected via a local area network to each other and to a file server, i.e., a file storage device with a processor which is specialized to perform file operations, there is no reason why every work station, processor, file storage device, display device, and printer in a large corporation could not be part of a single distributed system.
One problem in the design of distributed systems is how to define the name space. In the distributed computing system described in the Pike et al. patent application cited above, name spaces are defined by processes. A process may have its own name space or share a name space with another process, and a process may modify its name space by rearranging the relationships between names in the name space and by adding file trees provided by services to the name space. A difficulty with the definition of name spaces in the computing system of the Pike et al. patent application was that a first process could not incorporate a portion of a second process's name space into its own name space. As a result, operations such as incorporating the name space of a process executing on a work station into the name space of a process executing on a CPU for the user of the work station were exceedingly cumbersome. This problem is overcome by the techniques disclosed herein, which may be used generally to incorporate a portion of one name space into another name space.
SUMMARY OF THE INVENTION
Among the techniques is apparatus operating in a first name space for exporting a portion of the first name space to a second name space. The apparatus includes:
means for receiving a first operation specification which specifies an operation using a name in the portion as the name is used in the second name space;
interpretation means for responding to the first operation specification by producing a second operation specification which specifies the same operation and uses the name as the name is used in the first name space; and
means responsive to the second operation specification for performing the operation.
The foregoing and other objects and advantages of the invention will be apparent to one of ordinary skill in the art who peruses the following Drawing and Detailed Description, wherein:
BRIEF DESCRIPTION OF THE DRAWING
FIG. 1 is an overview of the per-process name spaces provided by the operating system described herein;
FIG. 2 is an overview of the mount service and protocol services in the operating system;
FIG. 3 is a diagram of the CHANNEL data structure;
FIG. 4 is a diagram of the data structures used to locate open files;
FIG. 5 is a diagram of the data structures used to send protocols to and receive protocols from protocol services;
FIG. 6 is a diagram of the file tree provided by the root service and of the file tree after certain bind and mount operations have been performed;
FIG. 7 is a diagram of the distributed system in which the operating system of the present invention is implemented;
FIG. 8 is a diagram of the mount table of the present invention;
FIG. 9 is a diagram of the mount table for the second file tree of FIG. 6;
FIG. 10 is a flow chart of the namec function in a preferred embodiment;
FIG. 11 is a flow chart of the walk function in a preferred environment;
FIG. 12 is a flow chart of the mount function in a preferred environment;
FIG. 13 is a diagram of the data structures used to register a protocol service in a preferred embodiment;
FIG. 14 is a diagram of a first process which has provided a portion of its name space to a second process;
FIG. 15 is a diagram of a preferred embodiment of exportfs;
FIG. 16 is a diagram of the data structures employed in protocol interpreter data base 1517; and
FIG. 17 is a diagram of how the data structures are combined in protocol interpreter dam structures 1517.
The reference numbers employed in the Drawing and the Detailed Description have three or more digits. The two least significant digits are a number within a figure; the remaining digits are the figure number. Thus, the element with the reference number "305" is first shown in FIG. 3.
DETAILED DESCRIPTION
The following Derailed Description of a preferred embodiment begins with an overview of the Plan 9 operating system in which the invention is embodied and a description of the manner in which Plan 9 creates a name space for a process. Thereupon, the techniques by means of which one process may export a name space to another process are described in detail, and finally, two Plan 9 services which employ the techniques are described. Material not taken from the Pike et al. patent application begins with the section "Overview of Export Files Service or exportfs" and FIG. 14 of the drawings.
Overview of the Plan 9 Operating System: FIG. 7
Plan 9 is a general-purpose, multi-user, portable distributed system implemented on a variety of computers and networks.
Plan 9 is divided along lines of service function. CPU servers concentrate computing power into large (not overloaded) multiprocessors; file servers provide repositories for storage; and terminals give each user of the system a dedicated computer with bitmap screen and mouse on which to run a window system. The sharing of computing and file storage services provides a sense of community for a group of programmers, amortizes costs, and centralizes and hence simplifies management and administration.
The pieces communicate by a single protocol, built above a reliable data transport layer offered by an appropriate network, that defines each service as a rooted tree of files. Even for services not usually considered as files, the unified design permits some noteworthy and profitable simplification. Each process has a local file name space that contains attachments to all services the process is using and thereby to the files in those services. One of the most important jobs of a terminal is to support its user's customized view of the entire system as represented by the services visible in the name space.
To be used effectively, the system requires a CPU server, a file server, and a terminal; it is intended to provide service at the level of a departmental computer center or larger. The CPU server and file server are large machines best housed in an air conditioned machine room with conditioned power.
The following sections describe the basic components of Plan 9, explain the name space and how it is used, and offer some examples of unusual services that illustrate how the ideas of Plan 9 can be applied to a variety of problems.
CPU Servers
Several computers provide CPU service for Plan 9. The production CPU server is a Silicon Graphics Power Series machine with four 25 MHz MIPS processors, 128 megabytes of memory, no disk, and a 20 megabyte-per-second back-to-back DMA connection to the file server. It also has Datakit and Ethernet controllers to connect to terminals and non-Plan 9 systems. The operating system provides a conventional view of processes, based on fork and exec system calls, and of files, mostly determined by the remote file server. Once a connection to the CPU server is established, the user may begin typing commands to a command interpreter in a conventional-looking environment.
A multiprocessor CPU server has several advantages. The most important is its ability to absorb load. If the machine is not saturated (which can be economically feasible for a multiprocessor) there is usually a free processor ready to run a new process. This is similar to the notion of free disk blocks in which to store new files on a file system. The comparison extends farther: just as one might buy a new disk when a file system gets full, one may add processors to a multiprocessor when the system gets busy, without needing to replace or duplicate the entire system. Of course, one may also add new CPU servers and share the file servers.
The CPU server performs compilation, text processing, and other applications. It has no local storage; all the permanent files it accesses are provided by remote servers. Transient parts of the name space, such as the collected images of active processes or services provided by user processes, may reside locally but these disappear when the CPU server is rebooted. Plan 9 CPU servers are as interchangeable for their task--computation--as are ordinary terminals for theirs.
File Servers
The Plan 9 file servers hold all permanent files. The current server is another Silicon Graphics computer with two processors, 64 megabytes of memory, 600 megabytes of magnetic disk, and a 300 gigabyte jukebox of write-once optical disk (WORM). It connects to Plan 9 CPU servers through 20 megabyte-per-second DMA links, and to terminals and other machines through conventional networks.
The file server presents to its clients a file system rather than, say, an array of disks or blocks or files. The files are named by slash-separated components that label branches of a tree, and may be addressed for I/O at the byte level. The location of a file in the server is invisible to the client. The true file system resides on the WORM, and is accessed through a two-level cache of magnetic disk and RAM. The contents of recently-used files reside in RAM and are sent to the CPU server rapidly by DMA over a high-speed link, which is much faster than regular disk although not as fast as local memory. The magnetic disk acts as a cache for the WORM and simultaneously as a backup medium for the RAM. With the high-speed links, it is unnecessary for clients to cache dam; instead the file server centralizes the caching for all its clients, avoiding the problems of distributed caches.
The file server actually presents several file systems. One, the "main" system, is used as the file system for most clients. Other systems provide less generally-used data for private applications. One service is unusual: the backup system. Once a day, the file server freezes activity on the main file system and flushes the data in that system to the WORM. Normal file service continues unaffected, but changes to files are applied to a fresh hierarchy, fabricated on demand, using a copy-on-write scheme. Thus, the file tree is split into two: a read-only version representing the system at the time of the dump, and an ordinary system that continues to provide normal service. The roots of these old file trees are available as directories in a file system that may be accessed exactly as any other (read-only) system. For example, the file/usr/rob/doe/plan9 .ms as it existed on Apr. 1, 1990, can be accessed through the backup file system by the name/1990/0401/usr/rob/doc/plan9
.ms. This scheme permits recovery or comparison of lost flies by traditional commands such as file copy and comparison routines rather than by special utilities in a backup subsystem. Moreover, the backup system is provided by the same file server and the same mechanism as the original files so permissions in the backup system are identical to those in the main system; one cannot use the backup data to subvert security.
Terminals
The standard terminal for Plan 9 is termed herein a Gnot (with silent `G`). The Gnot is a locally-designed machine of which several hundred have been manufactured. The terminal's hardware is reminiscent of a diskless workstation: 4 or 8
megabytes of memory, a 25 MHz 68020 processor, a 1024.times.1024 pixel display with two bits per pixel, a keyboard, and a mouse. It has no external storage and no expansion bus; it is a terminal, not a workstation. A 2 megabit per second packet-switched distribution network connects the terminals to the CPU and file servers. Although the bandwidth is low for applications such as compilation, it is more than adequate for the terminal's intended purpose: to provide a window system, that is, a multiplexed interface to the rest of Plan 9.
Unlike a workstation, the Gnot does not handle compilation; that is done by the CPU server. The terminal runs a version of the CPU server's operating system, configured for a single, smaller processor with support for bitmap graphics, and uses that to run programs such as a window system and a text editor. Files are provided by the standard file server over the terminal's network connection.
Just like old character terminals, all Gnots are equivalent, as they have no private storage either locally or on the file server. They are inexpensive enough that every member of a research center can have two: one at work and one at home. A person working on a Gnot at home sees exactly the same system as at work, as all the files and computing resources remain at work where they can be shared and maintained effectively.
Networks
Plan 9 has a variety of networks that connect the components. CPU servers and file servers communicate over back-to-back DMA controllers. That is only practical for the scale of, say, a computer center or departmental computing resource. More distant machines are connected by traditional networks such as Ethernet or Datakit. A terminal or CPU server may use a remote file server completely transparently except for performance considerations. As the Datakit network spans the country, Plan 9
systems can be assembled on a large scale. To keep their cost down, Gnots employ an inexpensive network that uses standard telephone wire and a single-chip interface. (The throughput is respectable, about 120 kilobytes per second.) Since the terminal only mediates communication--it instructs the CPU server to connect to the file server but does not participate in the resulting communication--the relatively low bandwidth to the terminal does not affect the overall performance of the system.
FIG. 7 shows the components and topology of Plan 9. Plan 9 system 701 consists of file servers 705, CPUs 703, and Gnots 711. Clusters 717 of CPUs 703 and file servers 705 and clusters 719 of Gnots 711 are connected by a nationwide long-haul communications network 715; Gnots 711 within a cluster 719 are connected by a distribution network 713 such as a LAN and file servers 705 and CPUs 703 within a cluster 717 are connected by a high-speed DMA link 709.
Name Spaces
There are two kinds of name space in Plan 9: the global space of the names of the various servers on the network and the local space of files and servers visible to a process. Names of machines and services connected to Datakit are hierarchical, for example nj/mh/astro/helix, defining (roughly) the area, building, department, and machine in a department. Because the network provides naming for its machines, global naming issues need not be handled directly by Plan 9. However one of Plan 9's fundamental operations is to attach network services to the local name space on a per-process basis. This fine-grained control of the local name space is used to address issues of customizability, transparency, and heterogeneity.
The protocol for communicating with Plan 9 services is file-oriented; all services must implement a file system. That is, each service, local or remote, is arranged into a set of file-like objects collected into a hierarchy called the name space of the server. For a file server, this is a trivial requirement. Other services must sometimes be more imaginative. For instance, a printing service might be implemented as a directory in which processes create files to be printed. Other examples are described in the following sections; for the moment, consider just a set of ordinary file servers distributed around the network.
When a program calls a Plan 9 service (using mechanisms inherent in the network and outside Plan 9 itself) the program is connected to the root of the name space of the service. Using the protocol, usually as mediated by the local operating system into a set of file-oriented system calls, the program accesses the service by opening, creating, removing, reading, and writing files in the name space.
From the set of services available on the network, a user of Plan 9 selects those desired: a file server where personal files reside, perhaps other file servers where data is kept, or a departmental file server where the software for a group project is being written. The name spaces of these various services are collected and joined to the user's own private name space by a fundamental Plan 9 operator, called attach, that joins a service's name space to a user's. The user's name space is formed by the union of the spaces of the services being used. The local name space is assembled by the local operating system for each user, typically by the terminal. The name space is modifiable on a per-process level, although in practice the name space is assembled at log-in time and shared by all that user's processes.
To log in to the system, a user sits at a terminal and instructs it which file server to connect to. The terminal calls the server, authenticates the user (see below), and loads the operating system from the server. It then reads a file, called the profile, in the user's personal directory. The profile contains commands that define what services are to be used by default and where in the local name space they are to be attached. For example, the main file server to be used is attached to the root of the local name space, /, and the process file system is attached to the directory/proc. The profile then typically starts the window system.
Within each window in the window system runs a command interpreter that may be used to execute commands locally, using file names interpreted in the name space assembled by the profile. For computation-intensive applications such as compilation, the user runs a command cpu that selects (automatically or by name) a CPU server to run commands. After typing cpu, the user sees a regular prompt from the command interpreter. But that command interpreter is running on the CPU server in the same name space--even the same current directory--as the cpu command itself. The terminal exports a description of the name space to the CPU server, which then assembles an identical name space, so the customized view of the system assembled by the terminal is the same as that seen on the CPU server. (A description of the name space is used rather than the name space itself so the CPU server may use high-speed links when possible rather than requiring intervention by the terminal.) The cpu command affects only the performance of subsequent commands; it has nothing to do with the services available or how they are accessed.
Although there is a large catalogue of services available in Plan 9, including the service that finds services, a few suffice to illustrate the usage and possibilities of this design.
The Process File System
An example of a local service is the `process file system`, which permits examination and debugging of executing processes through a file-oriented interface.
The root of the process file system is conventionally attached to the directory/proc. (Convention is important in Plan 9; although the name space may be assembled willy-nilly, many programs have conventional names built in that require the name space to have a certain form. It doesn't matter which server the program/bin/rc (the command interpreter) comes from but it must have that name to be accessible by the commands that call on it.) After attachment, the directory/proc itself contains one subdirectory for each local process in the system, with name equal to the numerical unique identifier of that process. (Processes running on the remote CPU server may also be made visible; this will be discussed below.) Each subdirectory contains a set of files that implement the view of that process. For example,/proc/77/mem contains an image of the virtual memory of process number 77. Plan 9's/proc implements other functions through other files. Here is a list of the files provided for each process.
mem The virtual memory of the process image. Offsets in the file correspond to virtual addresses in the process.
ctl Control behavior of the processes. Messages sent (by a write system call) to this file cause the process to stop, terminate, resume execution, etc.
text The file from which the program originated. This is typically used by a debugger to examine the symbol table of the target process, but is in all respects except name the original file; thus one may type `/proc/77/text` to the command interpreter to instantiate the program afresh.
note Any process with suitable permissions may write the note file of another process to send it an asynchronous message for interprocess communication. The system also uses this file to send (poisoned) messages when a process misbehaves, for example divides by zero.
status A fixed-format ASCII representation of the status of the process. It includes the name of the file the process was executed from, the CPU time it has consumed, its current state, etc.
The status file illustrates how heterogeneity and portability can be handled by a file server model for system functions. The command cat/proc/*/status presents (readably but somewhat clumsily) the status of all processes in the system; in fact the process status command ps is just a reformatting of the ASCII text so gathered. The source for ps is a page long and is completely portable across machines. Even when /proc contains files for processes on several heterogeneous machines, the same implementation works.
It is worth noting that the services/proc provides, although varied, do not strain the notion of a process as a file. For example, it is not possible to terminate a process by attempting to remove its process file nor is it possible to start a new process by creating a process file. The files give an active view of the processes, but they do not literally represent them. This distinction is important when designing services as file systems.
The Window System
In Plan 9, user programs, as well as specialized stand-alone servers, may provide file service. The window system is an example of such a program; one of Plan 9's most unusual aspects is that the window system is implemented as a user-level file server.
The window system is a server that presents a file /dev/cons, similar to the /dev/tty or CON: of other systems, to the client processes running in its windows. Because it controls all I/O activities on that file, it can arrange that each window's group of processes sees a private /dev/cons. When a new window is made, the window system allocates a new /dev/cons file, puts it in a new name space (otherwise the same as its own) for the new client, and begins a client process in that window. That process connects the standard input and output channels to /dev/cons using the normal file opening system call and executes a command interpreter. When the command interpreter prints a prompt, it will therefore be written to /dev/cons and appear in the appropriate window.
It is instructive to compare this structure to other operating systems. Most operating systems provide a file like /dev/cons that is an alias for the terminal connected to a process. A process that opens the special file accesses the terminal it is running on without knowing the terminal's precise name. Since the alias is usually provided by special arrangement in the operating system, it can be difficult for a window system to guarantee that its client processes can access their window through this file. This issue is handled easily in Plan 9 by inverting the problem. A set of processes in a window shares a name space and in particular dev/cons, so by multiplexing /dev/cons and forcing all textual input and output to go through that file the window system can simulate the expected properties of the file.
The window system serves several files, all conventionally attached to the directory of I/O devices,/dev. These include cons, the port for ASCII I/O; mouse, a file that reports the position of the mouse; and bitblt, which may be written messages to execute bitmap graphics primitives. Much as the different cons files keep separate clients' output in separate windows, the mouse and bitblt files are implemented by the window system in a way that keeps the various clients independent. For example, when a client process in a window writes a message (to the bitblt file) to clear the screen, the window system clears only that window. All graphics sent to partially or totally obscured windows is maintained as a bitmap layer, in memory private to the window system. The clients are oblivious of one another.
Since the window system is implemented entirely at user level with file and name space operations, it can be run recursively: it may be a client of itself. The window system functions by opening the files /dev/cons,/dev/bitblt, etc., as provided by the operating system, and reproduces--multiplexes--their functionality among its clients. Therefore, if a fresh instantiation of the window system is run in a window, it will behave normally, multiplexing its /dev/cons and other files for its clients. This recursion can be used profitably to debug a new window system in a window or to multiplex the connection to a CPU server. Since the window system has no bitmap graphics code--all its graphics operations are executed by writing standard messages to a file--the window system may be run on any machine that has /dev/bitblt in its name space, including the CPU server.
CPU.about.Command
The cpu command connects from a terminal to a CPU server using a full-duplex network connection and runs a setup process there. The terminal and CPU processes exchange information about the user and name space, and then the terminal-resident process becomes a user-level file server that makes the terminal's private files visible from the CPU server. In a preferred embodiment, the CPU server builds the name space by re-executing the user's profile; in an alternative embodiment, the name space will be exported by a special terminal-resident server that can be queried to recover the terminal's name space. The CPU process makes a few adjustments to the name space, such as making the file /dev/cons on the CPU server be the same file as on the terminal, perhaps making both the local and remote process file system visible in /proc, and begins a command interpreter. The command interpreter then reads commands from, and prints results on, its file /dev/cons, which is connected through the terminal process to the appropriate window (for example) on the terminal. Graphics programs such as bitmap editors also may be executed on the CPU server since their definition is entirely based on I/O to files `served` by the terminal for the CPU server. The connection to the CPU server and back again is utterly transparent.
This connection raises the issue of heterogeneity: the CPU server and the terminal may be, and in the current system are, different types of processors. There are two distinct problems: binary data and executable code. Binary data can be handled two ways: by making it not binary or by strictly defining the format of the data at the byte level. The former is exemplified by the status file in /proc, which enables programs to examine, transparently and portably, the status of remote processes. Another example is the file, provided by the terminal's operating system, /dev/time. This is a fixed-format ASCII representation of the number of seconds since the epoch that serves as a time base for programs requiring time stamps. Processes on the CPU server get their time base from the terminal, thereby obviating problems of distributed clocks.
For files that are t/O intensive, such as /dev/bitblt, the overhead of an ASCII interface can be prohibitive. In Plan 9, such files therefore accept a binary format in which the byte order is predefined, and programs that access the files use portable libraries that make no assumptions about the order. Thus /dev/bitblt is usable from any machine, not just the terminal. This principle is used throughout Plan 9. For instance, the format of the compilers' object files and libraries is similarly defined, which means that object files are independent of the type of the CPU that compiled them.
Having different formats of executable binaries is a thornier problem, and Plan 9 solves it as follows: directories of executable binaries are named appropriately: /mips/bin,/68020/bin, etc., and a program may ascertain, through a special server, what CPU type it is running on. A program, in particular the cpu command, may therefore attach the appropriate directory to the conventional name/bin so that when a program runs, say, /bin/rc, the appropriate file is found. Although this is a fairly clumsy solution, it works well in practice. The various object files and compilers use distinct formats and naming conventions, which makes cross-compilation painless, at least once automated by make or a similar program.
Security
Plan 9 does not address security issues directly, but some of its aspects are relevant to the topic. Breaking the file server away from the CPU server enhances the possibilities for security. As the file server is a separate machine that can only be accessed over the network by the standard protocol, and therefore can only serve files, it cannot run programs. Many security issues are resolved by the simple observation that the CPU server and file server communicate using a rigorously controlled interface through which it is impossible to gain special privileges.
Of course, certain administrative functions must be performed on the file server, but these are available only through a special command interface accessible only on the console and hence subject to physical security. Moreover, that interface is for administration only. For example, it permits making backups and creating and removing files, but it does not permit reading files or changing their permissions. The contents of a file with read permission for only its owner will not be divulged by the file server to any other user, even the administrator.
Of course, this begs the question of how a user proves who he or she is. In a preferred embodiment, this is done using a simple authentication manager on the Datakit network itself, so that when a user logs in from a terminal, the network assures the authenticity of the maker of calls from the associated terminal. The need for trust in a local network may be eliminated by replacing the authentication manager by a Kerberos-like system.
Overview of the Architecture of Plan 9 Services: FIG.1
As already mentioned, Plan 9 services are implemented as file systems, that is, the service appears to a process executing on a computer with a Plan 9 operating system as a set of files. The process can obtain data from the service by performing file read operations on "files" provided by the service and can provide data to the service by performing file write operations on the "files". As already explained in detail with regard to the "file system" provided by the process service, the service need not actually maintain files, but must merely be able to respond to requests for file operations by a process as ff it did maintain them. For example, when a process reads a "file" maintained by the service, the service must provide data to the process reading the "file".
FIG. 1 provides an overview of the relationship between Plan 9 services and processes. Service architecture 101 shows how processes 102 employ file system 109 to access one or more services 123. The processes 102 may be executing either on a Gnot 711 or a CPU 703 and the services may be implemented either locally to the processor where process 102 is executing or in a remote device such as a file server 705.
As shown in FIG. 1, each service 123 provides one or more trees of files 125 to the processes 102 which use the service. The trees of files are made up of data files 131 and directory files 127 and 129. A directory file is a file which contains a list of files. The files listed in the directory files may be either directory files or data files. As may be seen from file tree 125(a), data files 131 B, C, and D are the "leaves" of the file tree, while directory file 129 occupies the point where the tree branches. At the base of the entire tree, there is a single root directory file 127. Each file in a service 123 has a file name. In a preferred embodiment, the file name is a character string.
A process 102 accesses a file in a service 123 by means of calls to file system functions provided by the Plan 9 operating system. There are two main classes of functions: file locator functions (FL) 113, which locate files, and file access functions (FA) 111, which access files after they have been located by file access functions 111. Calls to file locator functions 113 are represented by arrows 107, and those to file access functions 111 are represented by arrows 105.
As mentioned above, each Plan 9 process 102 has a name space associated with it. A process 102's name space determines which files provided by services 123 a process 102 may access. A process 102's name space in a preferred embodiment consists of a single file tree 117 which is assembled out of file trees 125 and/or subtrees of those file trees. Thus name space 115(o) maintained and used by file locator functions 113 is the name space for process 102(a). As may be seen from FIG. 1, process
102(a)'s name space contains file tree 117(o) which includes 125(a) from service 123(a) and file tree 125(k) from service 123(k), but does not include file tree 125(b) from service 123(b). File tree 117(0) has been constructed by making the directory file "X" in process 102's file tree 117(o) equivalent to the root of tree 125(a) and making the directory file "Y" in file tree 117(o) equivalent to the root of file tree 125(k). How "X" and "Y" themselves came to be in name space 115(o) will be explained in detail later.
Within name space 115(o), a file may be located by means of a path name. A path name is a list of file names which includes the name of the file which is to be located and the names of all of the directory files between the point in the file tree from which the file is being located to the file. The point in the file tree from which the file is being located is termed the working directory. Thus, if the working directory is the directory file X, the pathname is A/C. The "/" character in A/C serves to separate the names in the path name. Additionally, any file may be located by specifying its full path name, that is, the "/" character, representing the root directory, the names of all of the directories between the root directory and the file, and the name of the files. The names of the files are again separated by "/". Thus, the full path name of Thus, the complete pathname for file C in name space 115(o) is /X/A/C.
In Plan 9, a number of processes 102 may share a name space. The processes 102 sharing a name space make up a process group 103; thus, the processes in process group 103(o) share name space 115(0) defined by file tree 117(o), while the processes in process group 103(z) share name space 115(z). Processes are created in Plan 9 by means of the "fork" system call. The newly-created process 102 remains associated with the same name space 115 as the process which made the "fork" system call and therefore belongs to the same process group 103 as the process 102 which made the fork system call. A new process group 103 is established when a process 102 makes the "forkpgrp" system call. The call has a single argument: a flag indicating whether the new process group should receive a copy of process 102's old name space 105 or receive a minimal default name space 105. Execution of the system call places the process 102 making the system call in the new process group. Once a new name space 115
has been established for a process 102 by the forkpgrp system call, changes in new name space 115 do not affect the old name space 115 of process group 103 to which the process 102 making the forkpgrp system call formerly belonged.
Any process 102 which belongs to a process group 103 may modify the name space 115 for the process group. To do so, the process uses one of two name space modification functions. The first of these is "bind", which makes a name which is already in the name space equivalent to file specified by a new path name. After the bind, references to the file specified by the old path name will be interpreted as references to the file specified by the new path name. For example, a "bind ("/Y/H","/X/A")" function call in name space 115(o) may be used to replace the subtree whose root is A in name space 115(o) with the subtree whose root is H in that name space. After the bind, the pathname /X/A/K will refer to the file K of subtree H.
The second operation is "mount", which makes a name which is already in the name space equivalent to the root of a service 123. As will be explained in more detail later, the root of the service is represented by a file descriptor, which is a small integer. Thus, the directory name "X" of file tree 117(o) was made equivalent to the root of file tree 125(a) of service 123(a) by the mount function "mount (FD, "/X"). After the mount, the pathname /X/A refers to file "A" in file tree 125(a).
A further refinement of "mount" and "bind" in Plan 9 is that three different kinds of relationships between the pathnames or between the file descriptor and the pathname may be specified. The first relationship is replacement. If that relationship is specified, whatever is referred to by the new name or the file descriptor completely replaces whatever is referred to by the old name. The second and third relationships establish what is termed union directories. When a union directory is established, the old name and the new name used in the bind function must both refer to directory files and the old name in the mount command must refer tp a directory file, while the file descriptor must refer to the root of a file tree in a service
123. The effect of the bind or mount is to establish a new directory which is the union of the files in the old and new directory files. One relationship, before, establishes a union directory in which the new directory is searched before the old directory; the other relationship, after, establishes a union directory in which the new directory is searched after the old directory. Thus, the bind command "bind("/Y/H","/X/A",BEFORE) establishes a directory in which the files I,J,K precede the files B and C, and when file locator functions 113 respond to the pathname /X/A/C, they will first search through the directory H and then through the directory A. By thus determining the order in which the locator functions search through directories for a file, the union directories provide the same kind of control as is provided by search lists in operating systems such as UNIX.
File locator functions 113 which locate files instead of rearranging the name space take a path name as an argument and return a file descriptor. In this context, the file descriptor is a small integer which is used in file access functions 111
to refe to the file. The file locator functions 113 include "chdir", which makes the directory file specified by the pathname into the process 102's working directory, and "open", which opens the file specified by the pathname. Both "chdir" and "open" return file descriptors for the working directory file and the opened file respectively. Additionally, the "create" function works as does "open", except that the file specified in the path name is created. File access functions 105 then use the file descriptor provided by the file locator functions to read the file, write the file, set and obtain the file's status, and to close the file.
In architecture 101, file system 109 translates the file access calls 105 and the file locator calls 107 made by a process 102 into service file operation requests. Each service file operation request requests a service 123 to perform an operation on a file in one of its file trees 125. Again, there are two classes of such requests: service file access requests, indicated by arrow 119, and service file locator requests, indicated by arrow 121. As will be explained in more detail, requests 119 and 121 for some services 123 take the form of function calls; for other services 123, the requests take the form of file protocols. In the case of the function calls, files are represented by file names or file descriptors; in the case of the file protocols, files are represented by file names or file identifiers; for the purpose of the following discussion, both file descriptors and file identifiers will be subsumed in the term file specifiers.
In file system 109, a file descriptor employed by a process 102 is associated with a qid, a file handle provided by service 123. The association may be either direct or indirect, by way of the association of a file descriptor with a file identifier and the association of the file identifier with the qid. When there is an association between a file descriptor used by a process 102 and a qid, the process 102 is said to have a connection to the file represented by the qid. Similarly, in service 123, the qid is associated with one or more file specifiers. Any file system call 105 or 107 which results in a service file request having a file specifier associated with a given qid will result in the file operation being performed on the file identified by the qid.
An advantage of representing a file by a name or a file specifier in file system 109 but by a name or a qid in services 123 is that different services can establish different relationships between file names and qids. For example, in some services, such as the process service explained above, it is advantageous for the file names used by all processes to refer to a single set of files; in other services, such as the 8.5 window service to be described in detail later, the file names used by each process refer to a set of files peculiar to that process. As a result, file names in architecture 101 have properties analogous to those of variables in programming languages such as C. Some file names are like global static variable names: as the global static variable name refers to the same memory location in all code in which the variable name appears, so file names like those provided by the process service refer to the same files in all processes. Other file names are like local variable names: as the local variable name refers to a different memory location in every invocation of a procedure, so file names like those provided by the 8.5 server refer to different files in each process. Which properties a file name has is of course determined by the service 123 which provides the file. In analogy with the terminology used with variables, services 123 are said to provide instances of their files; thus, the process service provides a single instance of each of its files, and any process 102 which has a connection to one of the files provided by the service has access to the same file as all of the other processes 102 which has a connection to that file. The window service, on the other hand, provides separate instances of the files in its file tree to each process 102 which establishes a connection with the root of the file tree; any process 102 which has a connection with a file provided by the window service has a connection to one of the multiple instances of that file.
Each service 123 is able to handle the following classes of service file locator requests 121:
attach: given a file specifier for the root of a file tree in a specified service 123, return a qid for the root to file system 109;
walk: search a directory file specified by a file specifier for a file having a given name, and if the file is found, associate the file specifier with the qid for the file and return the qid;
create: make a new file in a directory file specified by a file specifier, give the file a specified name, associate the file specifier with the qid for the new file; and return the qid;
remove: remove a file specified by a file specifier from the server, disassociate the file specifier from the qid, and return the name of the file being removed.
Each service 123 is further able to handle the following classes of service file access requests:
open: prepare a file specified by a file specifier for file access operations and return the qid associated with the file specifier;
clone: given a first file specifier representing a file and a second file specifier which does not yet represent a file, associate the second file specifier with the file associated with the first file specifier;
read: given a file specifier, an offset in the file, and a count, read the number of bytes specified in the count beginning at the offset;
write: given a file specifier, an offset in the file, a count, and data, write the data into the number of bytes specified in the count beginning at the offset;
clunk: given a file specifier, end the association between the file specifier and the file.
stat: given a file specifier and information required by the service to obtain file status information, return the file's status;
wstat: given a file specifier and information required by the service to change the file's status, change the file's status.
While all services 123 must implement the above operations, in some cases, the operation may in fact have no effect or produce an error. For example, if a service 123 is a printer, and thus can implement only write-only "files", a read operation will produce an error.
Kernel Services and Protocol Services: FIG. 2
There are two types of services 123 in a preferred embodiment of Plan 9. Kernel services are services in which service file operation requests 205 and 207 are implemented by means of calls to Plan 9 kernel functions. Protocol services are services in which service file operation requests 205 and 207 are implemented by means of the Plan 9 file protocol. In the protocol services, each service file operation request is initiated by a tmessage transmitted for a process 102 to a protocol service. The protocol service replies to the tmessage with a rmessage which contains information returned from the protocol service to the process. The tmessage and rmessage making up a service file operation request for a protocol service are termed a transaction. For example the service file read operation request for a protocol service is a read transaction which consists of the following read tmessage and rmessage:
The file identifier is associated with the file descriptor in file system 109 and with the qid in file system 109 and in the protocol service. The tread message includes a type specifier indicating that the message is a tread message, the file identifier, a tag identifying the message, the offset in the file at which the read is to begin, and the number of bytes. The tread message includes a type specifier indicating that the message is an tread message, the file identifier, another tag, the count of bytes actually read, and the data itself. If the read results in an error, protocol service 209 returns an rerror message instead of the tread message. It has the form:
Each protocol service 209 must be able to perform transactions corresponding to each of the service file operation requests listed above. Since this is all that is required of a protocol service 209, a protocol service 209 may be implemented on a processor which has an architecture completely different from the architecture of the processor upon which mount service 203 is executing. All that is required to use a protocol service 209 is a connection to the protocol service 209 over which the protocols may be transmitted and received. Further, in a preferred embodiment, the data sent and returned in the protocols has predetermined formats which are used by all protocol services 209 which provide a given file tree. Thus, a process 102 may use any protocol service which provides the given file tree, regardless of the type of machine the process 102 is executing on and the type of machine the protocol service 209 is executing on.
FIG. 2 shows the relationship between protocol services and the rest of architecture 101. As shown in the figure, protocol services (PS) 209 perform file locator transactions 205 and file access transactions 207 involving file protocols corresponding to the service file locator requests and service file access requests explained above. A special kernel service, mount service 203, receives function calls 217 specifying service file locator requests and function calls 219 specifying service file access requests from file system 109. Mount service 203 then responds to the function calls by performing corresponding transactions with protocol services 209. To perform the transactions, mount service 203 employs a number of communications services 215. There are two types of such communications services: network communications services (NS) 214 and inter-process communications services (IPS) 216. Network communications services employ networks to communicate with remote protocol servers 213 which are connected via the networks to the CPU upon which process 102 is executed. Inter-process communications services 216 employ an interprocess communications system to communicate with local protocol servers 211. In a preferred embodiment, the interprocess communications system employed by IPS 216 is a pipe system like that in the UNIX operating system. When a communications service 215 is connected to a protocol server 209, it is represented by a file descriptor. It is these file descriptors which are employed in the "mount" system call.
As may be seen by the reference numbers in parentheses, network communications services 215 may employ distribution network 713, long haul network 715, and DMA network 705 of FIG. 7 and the remote protocol services 213 may include one or more file services 705. Local protocol services 211 may be devices such as the 8.5 services for GNOT 711 or even other processes 102 which communicate with the given process 102 by means of interprocess communications facilities. As indicated above, the distinction between a protocol service 209 and a kernel service like mount service 203 is not location, but merely that protocol service 209 performs transactions with mount service 203. As will be seen in detail in the following description of the 8.5
service, an advantage of this characteristic of protocol services 209 is that the same protocol service 209 may be used by both a process 102 running on a local processor and a process 102 running on a remote processor such as CPU 703 of FIG. 7.
Implementing File Operations in Plan 9
The following will discuss the implementation of file operations in Plan 9, beginning with the data structures used to represent files and relate them to processes 102 and services 123, continuing with the data structures used to represent process name space 115, and ending with examples of certain typical file operations.
Representing A Connection between a Process 102 and a File: FIGS. 3-5
Any file to which a process 102 has a connection is represented by a channel data structure. At a minimum, the channel data structure for a file specifies the service 123 which is providing the file and associates the qid provided by service 123
with the file descriptor which process 102 uses to refer to the file. If service 123 is a protocol service 209, the channel further associates the file identifier used for the file in the file protocols and a protocol channel for the file protocols with the file descriptor.
FIG. 3 is a diagram of the channel data structure. Channel data structure 301 has the following components:
lock 303, which bars more than one process from having access to channel 301 at the same time;
reference count (ref) 307, which indicates whether the channel is pointed to by other kernel data structures;
next/offset field 309: when channel 301 is not in use, it is stored in a list of unused channels 301, and in that case, the next/offset field points to the next unused channel 301; when the file represented by channel 301 is being accessed, next/offset specifies the current offset into the file.
Kernel service information (KSINFO) 311: The next two fields contain information which specifies the kemel service which provides the file represented by the channel 301;
type 313: this field specifics the kemel service by means of which the file represented by the channel 301 is accessed; for protocol services 209, type field 313 specifies mount service 203;
device (dev) 315: this field specifies a specific device represented by the kernel service; in the case of mount service 203, the device field 315 specifies the protocol service 209 which provides the file represented by channel 301; p1 Access information (AINFO) 317: the next three fields contain information needed to perform file accesses;
mode 319 indicates how the file represented by the channel may be accessed;
flag 319 contains status information about the file;
qid 323 is the qid which the sen,ice which provides the file has assigned to the file;
Mount Information (MINFO) 325 indicates where information concerning binds and mounts affecting the pathname of the file represented by the channel 301 may be found;
mountptr 327 is a pointer to a data structure from which the information concerning the binds and mounts may be found;
mountid 329 is an identifier for the data structure pointed to by mountptr 327;
file identifier (FID) 331 is the file identifier which mount service 203 provides to protocol service 209 providing the file represented by channel 301 and which protocol service 209 associates with qid 323 for the file as long as the file is connected; in a preferred embodiment, file identifier 331 is set to a unique value for each channel 301 when the channel 301 is created;
auxiliary information 333 is information whose meaning depends on device type 313;
protocol service information (PSINFO) 335 is information indicating how file protocols may be sent to the protocol service providing the file represented by the channel;
protocol channel (PCHAN) 337 is a pointer to a channel structure 301 which represents the communications service 215 used for transactions with the file represented by the present channel structure;
mqid 339 is the qid of the root directory of the file tree in the protocol service which contains the file represented by the channel 301.
Each process 102 is associated with a set of files with which it has connections. The set of files may include files with which connections were established by ancestors of process 102 and which were "inherited" by process 102 when it was created by the "fork" system call and files with which the connections were established by process 102 itself. The association is achieved by means of the data structures 401 shown in FIG. 4. Data structures 401 further associate each file in the set with the file descriptor which process 102 uses to refer to the file.
The starting point for access to data structures 401 is user dam structure 403 for process 102. A pointer 411 in user data structure 403 points to file descriptor array (FDA) 413. File descriptor array 413 is an array of pointers 423 to channels 401 which is indexed by file descriptor (FD) 417. If a file belongs to the set of files connected to process 102, the entry (FDAE) 415 in array 413 corresponding to the file descriptor for the file will contain a pointer 423 to a channel structure 301 for the file. Such channel structures appear in FIG. 4 as file channel structures 419. There is a set 421 of such file channels 419 corresponding to the set of files with which process 102 is connected. In a preferred embodiment, file descriptor array 413 has 100 elements, and consequently, each process 102 may be connected to up to 100 files.
As previously indicated, file locator functions such as "bind", "mount", "chdir", or "open" take a path name as an argument and return the file descriptor 417 for the file. As will be explained in more detail below, access to file channels 419
for purposes of path name interpretation is provided by a pointer 407 in user structure 403 to file channel 419 for the root of process 102's file tree and another pointer 409 to file channel 419 for process 102's working directory. Another component of user 403 which is employed in path name interpretation is element (ELEM) 405, which contains the name in the list of names making up the pathname which is currently being interpreted.
When the file represented by file channel 419 is provided by a protocol service 209, file channel 419 is associated with a protocol channel which represents a connection via a communications service with a protocol service 209. Protocol channels are also implemented using channel structures 301. In a channel structure 301 serving as a protocol channel, TYPE field 313 specifies either inter-process communications services 216 or network services 214 (in a preferred embodiment, a pipe service or a DATAKIT.TM. network communications service). Fields 327, 329, and 337 have no meaning in a protocol channel. When the connection represented by the protocol channel is opened by a process 102, a pointer to the protocol channel is placed in file descriptor array 413 for the process 102 and the protocol channel receives the corresponding file descriptor. A connection to a local protocol service by means of a pipe is opened when the pipe system call creating the pipe is executed by the process
102, and a connection to a remote protocol service 209 by means of a DATAKIT connection is opened when the file representing the remote protocol service 209 in the srv service is opened.
FIG. 5 shows the data structures 501 employed in a preferred embodiment to associate a file channel 419 with a protocol channel and with data structures representing the tmessages and rmessages. As previously indicated, PCHAN field 337 in file channel 419 points to protocol channel 517, which is implemented using a channel structure 301. The data structures representing the tmessages and rmessages are located using mount service array (MNTA) 502. There is an entry (MNTE) 503 in the array for every file channel 419 which represents a file provided by a protocol service 209. Each entry 503 includes an identifier and two pointers: one to file channel 419 corresponding to entry 503 and one to a mount service queue structure (MNTQ) 509. File channel 419 in turn contains the identifier for entry 503 as part of the information in dev field 315. Mount service queue structure 509 contains a pointer 511 to a data structure representing the procedure to which file channel 419 belongs, a pointer
515 to protocol channel 517 representing the connection to the protocol service 209 and file tree 125 containing the file represented by file channel 419, and a pointer 519 to a queue of messages 520. The messages in the queue are of course tmessages concerning the file to protocol service 209 and rmessages from protocol service 209. Mount service queue 509 thus associates the queue of messages with protocol channel 517 and with process 102.
Each message 520 is made up at least of a mount service header (MNTHDR) 521. Since there may be more than one outstanding message for the combination of file and process represented by file channel 419, mount service header 521 contains pointers
527 and 529 linking header 521 to headers for other messages for the process-file combination represented by channel 419. Mount service header 521 further contains a pointer 526 to the process data structure for the process to which file channel 419
belongs.
If the message 520 is a rmessage, the parts of the message other than the data are placed in rhdr 525; when the message 520 is a tmessage, the parts of the received message other than the data are placed in thdr 523; these fields thus contain the type of the message, the file identifier which mount service 203 has associated with the file, and other information as required. For example, if the message is a twrite message, thdr 523 contains the message type, the file identifier for the file, the number of bytes to be written, and the offset at which the write is to begin. Additionally, thdr 523 and rhdr 525 contain pointers 531 to mount buffers 533 which contain any data sent or received in the message.
A message 720 is transmitted by a function which takes entry 503 corresponding to file channel 419 and mount header 521 as arguments; the function uses entry 503 to locate protocol channel 517, and message 520 specified by mount header 521 is output to the connection specified by protocol channel 517. Upon transmission, the process 102 sending the message waits for the reply. When the reply arrives, it is placed in a message 520, the process is awakened, and the message 520 is read by the process 102. In a preferred embodiment, the tag in the rmessage is used to locate the proper mount header data structure 521. Functions in Plan 9 which must locate a structure 501 for a given file channel 419 do so by working through mount service array 502 until they find an entry 503 which contains a pointer 505 to the given file channel 419; the file channel is identifiable by the fact that its dev field 315 contains the identifier for the corresponding entry 503.
As is apparent from the foregoing discussion, the data structures shown in FIGS. 3-5 permit a process 102 which has a file descriptor 417 for a file to perform file access functions such as read and write on the file. They are, however, not sufficient to perform file locator functions, since these functions involve path names as well as file descriptors.
Representing Name Space 115: FIGS. 6,8,9
As previously indicated, a new process 102's name space 115 is either inherited from the process 102's parent or is built from a "stub" name space which Plan 9 makes available to a process 102 which does not inherit its parent's name space. FIG.
6 shows conceptually how a process 102's name space 115 is built from the "stub" name space. "Stub" name space 601 in FIG. 6 is provided by the root service, a kernel service which provides a root directory 603 with a single level of "files". Root directory 603 and files 605 to 615 serve only as points to which the roots of trees 125 belonging to other services 123 may be bound. In a preferred embodiment, the set of files 605 through 615 serves to divide up the kinds of services by function. The file names correspond to functions as follows:
dev 605 is the primary location at which services are bound. Services which provide I/O functions and other services having miscellaneous functions are all bound to dev 605;
boot 607 is the location at which the service which boots the CPU upon which Plan 9 is executed is bound;
fd 609 is the location at which the service which provides duplicate file descriptors for a process's connected files is bound;
proc 610 is the location at which the service which provides information about Plan 9 processes is bound;
env 611 is the location at which the service which provides environmental files for a process 102 is bound. These environmental files perform the same functions as the UNIX environmental variables;
bin 613 is the location at which the service which contains files of programs executed by process 102 is bound; and
srv 615 is the location at which the service is bound which provides files representing protocol channels 5 17 for the protocol servers 209 available to process 102.
Plan 9 provides a set of built-in services which can be bound to the file names in stub tree 601; some of the services may only be bound by Plan 9 kernel functions executed by process 102; others may be bound by any function executed by process
102. In Plan 9 path names, the names of the built-in services are preceded by the "#" character. The built-in services which may be bound only by kernel functions are the following:
#/: the kernel root service which provides stub tree 601;
#: the kernel pipe service, which provides UNIX-style pipes for inter-process communication; and
#M: mount service 603, which handles transactions with protocol servers 209.
The kernel binds the root service to the root of process 102's name space.
The built-in services which may be bound by any function executed by the process 102 include:
#b: the boot service. The service has files which, when written to, cause the kernel to branch to a given address and which permits the kernel to write to a given location in virtual memory. The boot service is generally bound to boot 607;
#b: the bit service. The service has files which represent a bit-mapped screen and a mouse. The bit service is generally bound to dev 605;
#c: the console service. The service has files which represent a system console. The console service is generally bound to dev 605;
#d: the duplicate service. The files provided by the service have names which are the file descriptors for connected files belonging to the process 102. The duplicate service is generally bound to fd 609;
#e: the environment service. The files provided by the service are generally bound to env 611;
#kname: the datakit service. The files provided by the service represent conversations on the datakit hardware represented by name. The service is generally bound to dev 605;
#p: the process service described in the general overview; it is generally bound to proc 610; and
#s: the service registry service. The files provided by the service represent already-open protocol channels 517 which are connected to protocol services 209.
In a preferred embodiment of Plan 9, when a process 102 which has not inherited its name space 115 begins running on a Gnot 711, the Plan 9 kernel executing on the Gnot 711 performs bind operations which give its name space 115 the form shown as tree 617. Where a name in tree 609 has had a service bound to it, that fact is indicated by "=" followed by the name of the service; if the bind operation created a union directory, that fact is indicated by listing the components of the union directory in parentheses following the "=". The list is in the order in which the components will be searched. Tree 617 has been produced by the following sequence of bind and mount operations:
bind ("#/","/",MREPL), which binds the tree provided by the root service to the "/" representing the root in the path names interpreted by process 102; references to "/" are now interpreted as references to "/" 603;
mount (FD,"/",MAFTER,""), which mounts a protocol channel 517 belonging to mount service 203 on "/", and thereby creates a union directory consisting of the files accessible from the root of the root service and the root of the protocol service
209 represented by protocol channel 517. In Plan 9, that protocol service 209 provides a default file tree 125 in a default file server 705.
bind ("/68020/bin", "/bin",MREPL), which binds the tree whose root is the directory/68020/bin in default file tree 125 to the name bin 613 in stub file tree 601. The directory /68020/bin contains executable code for Gnot 711, which in a preferred embodiment has a Motorola 68020 microprocessor;
bind ("/lib/rc","/bin",MAFTER), which binds the tree whose root is the directory/lib/rc in default tree 125 to the name bin 613. Since/68020/bin has already been bound to bin 613 and MAFTER was specified, the result is a union directory in which the directory/68020/bin will be searched before the directory/lib/rc.
bind ("#c","/dev",MREPL), which binds the tree provided by the console service #c to the name der 605; and
bind ("#d","/fd",MREPL), which binds the tree provided by the duplicate service #d to the name fd in stub tree 601.
As a result of these bindings, process 102 can refer to a file called "cc" in the directory/68020/bin of the protocol service 209 to which #M1 is connected by the path name "/bin/cc" and can refer to a file called "pwd" in/lib/rc of that same protocol service by the path name "/bin/pwd", unless, of course, there is a file "pwd" in/68020/bin. Similarly, the file "pid" provided by the #c service may be referred to by the path name "/dev/pid" and the file "0" provided by the service #d may be referred to by the path name "/fd/0".
As is apparent from the foregoing, name space 115 for a Plan 9 process 102 must contain a record of all of the bind and mount operations which have been performed on names in name space 115. That record is termed herein a mount table. The mount table of the preferred embodiment takes advantage of the fact that a file channel 419 for a file implicitly represents the path name for the file. Because this is the case, a record of the bind and mount operations may be made by establishing relationships between the file channels 419 representing the files represented by the old pathnames employed in the bind operations or mount operations and the file channels 419 representing the files specified by the new path names employed in the bind operations or the file channels 419 corresponding to the file descriptors employed in the mount operations. Of course, all of these file channels 419 represent files connected to some process 102 in a process group 103 from which the current process group 103 inherited its name space or by some process 102 in the current process group 103.
FIG. 8 shows a portion of mount table (MT) 801 as it is implemented in a preferred embodiment. The components of mount table 801 are located from mount table array 802, which in turn is located from user data structure 403. As shown in FIG. 8, user structure 403 includes a pointer 824 to process (PROC) data structure 825. Each process 102 has a user structure 403 and a process data structure 825. Pointer 824 is user structure 403 points to that process 102's process data structure. Process data structure 825 is further pointed to by pointer 511 in mount service queue structure 509 and itself contains a pointer to process group (PRGRP) structure 827, which represents the process group 103 to which process 102 belongs. Process group structure 827, finally, contains a pointer 829 to mount table array 802, and thereby permits location of mount table 801 for process 102 to which user structure 403 belongs.
Mount table array 802 contains an element (MTE) 803 for each old path name to which a new path name or file descriptor has been bound. A given mount table entry 803 contains two pointers. Pointer 805 is to a file channel data structure 419
which represents the file specified by the old path name. In the context of mount table 801, such data structures 419 are termed left-hand channels (LCHAN) 802. Pointer 807 is to a mount structure 809. Each mount structure 809 represents the results of one bind or mount operation. The contents of a mount structure 809 include the following:
reference field (REF) 811: This indicates whether left-hand channels other than the one pointed to by mount table entry 803 are using the mount structure;
termination field 813: this indicates whether mount structure 809 is the last in a list of mount structures 809 which define a union directory;
mount identifier field 815: this is a unique identifier for this mount structure 809 and thus for the results of the bind or mount operation represented by mount structure 809;
pointer 817: if mount structure 809 is part of a list of mount structures 809 defining a union directory and is not the last mount structure 809 on the list, pointer 817 points to the next mount structure on the list;
pointer 819: this pointer points to a file channel 419 which represents the file identified by the new path name (or in the case of mount, by the file descriptor); in the mount table context, such a file channel is termed a right-hand channel (RCHAN) 804.
As may be seen by the foregoing, each mount table entry 803 establishes a relationship between a left-hand channel 802 and one or more right-hand channels 804. Additionally, in any file channel 419 which represents a file whose path name's meaning has been altered because of a mount or bind operation, mount ptr field 327 points to mount structure 809 representing the mount or bind operation and mount identifier field 329 contains the mount identifier 815 for that mount structure 809. Furthermore, any file channel 419 which represents a file provided by a protocol server 209 includes a pointer in field 337 to protocol channel 517 for the connection to the protocol service 209 providing the file.
FIG. 8 also illustrates how mount table 801 permits implementation of the replace, before, and after options of the "mount" and "bind" functions. With the replace option, the mount structure 809 which points to right channel 804 representing the new path name simply replaces any list of mount structures 809; the before option adds the mount structure 809 to the head of any list of mount structures 809; an error results if there is no list of mount structures 809. With the after option, the mount structure 809 is added to the end of the list; again, an error results if there is no list. When a union directory is searched, the search begins with the directory represented by right-hand channel 804 pointed to by the first mount structure 809
in the list; the next directory to be searched is the directory represented by the right-hand channel 804 pointed to by the second mount structure 809 in the list, and so forth. Thus, if left-hand channel 802(a)represents the path name /M and the right-hand channel 804(a) represents a first root directory from a service 123 bound to /M and the right hand channel 804(b) represents a second root directory from a second service 123 bound to /M, then a file locator function such as "chdir("/M/.")" will result in a search for "O" first in the first root directory and if it is not found there, then in the second root directory.
As previously mentioned, a new name space 115 for a process 102 is created whenever the process 102 executes a "forkpgrp" system call; "forkpgrp" takes a single argument; if the argument has any value other than 0, the existing name space for the process 102 executing forkgrp is not to be copied into the new name space 115; if the argument has the value 0, the existing name space is to be copied When "forkgrp" is executed with a non- "0" argument, forkgrp simply makes a new process group structure 827 and a new mount table array 802 for process 102; the mount table entries 803 are then created as required for the bind and mount operations which define the new name space 115. If forkpgrp is executed with an "0" argument, mount table array entries 803 in the mount table array 802 for the process 102 executing the "forkpgrp" system call are copied into the new mount table array 802; when this is done, ref field 307 in each left-hand channel 802 pointed to by a given mount table entry
803 and ref field 811 in each mount structure 809 pointed to by that entry is incremented. As may be seen from the foregoing, the implementation of mount table 801 in a preferred embodiment is such that the creation of a new name space 105 for a process is a relatively efficient and inexpensive operation.
FIG. 9 shows the entire mount table 901 for file tree 617. In the figure, channel structures 301 are represented by long oblongs; the label on a channel structure 301 indicates the path name of the file represented by the channel 301. The first mount array entry 803(b) records the results of the bind and mount operations on the root, "/". LCHAN 802(b) representing the root, "/", is connected via entry 803(b) and two mount structures 809(c) and 809(d) to two right-hand channels, one, 804(c), for the root of the file tree provided by the kernel root service, and one, 804(d), for the root of the file tree provided by the protocol service 209 to which #M1 is connected. A tachart pointer 337 in right-hand channel 804(d) points to protocol channel 517 representing the connection to the root.
The second mount array entry 803(c) records the results of the bind operations on #/bin, represented by left-hand channel 802(c). Channel 802(c) is connected to two right-hand channels, 804(e), representing #M1/68020/bin, and 804(f), representing #M1/lib/rc. Since the files represented by the right-hand channels 804(d) and (e) are provided by the protocol service 209 to which #M1 is connected, both right-hand channels contain mchan pointers 337 to protocol channel 517 which provides the connection to service 209. Further, the path name #/bin of the file represented by left-hand channel 802(c) includes the path name #/. That directory file is represented by right-hand channel 804(c), and consequently, left-hand channel 802(c)'s mount pointer field 327 contains a pointer to mount structure 809(c) which points to right-hand channel 804(c).
As may be seen from the remainder of mount table 901, every file channel 419 in mount table 901 which represents a path name which includes a component which is represented by a right-hand channel 804 includes a pointer in mount pointer field 327
to mount structure 809 which points to the right-hand channel 804 which represents the component. Thus, left-hand channels 802(c), 802(d), and 802(e), all of which represent files with path names including #/, all point to mount structure 809(c), while right-hand channels 804(e) and 804(f), both of which represent files with path names including #M1, all point to mount structure 809(d). Of course, when mount pointer field 327 points to a mount structure 809, mount id field 329 in the channel 301 is set to the mount identifier for the mount structure 809.
The remaining entries 803(d) and 803(e) record bindings of built-in kernel services to the "stub" directories "dev" and "fd" provided by the kernel root service #/. In these cases, each entry 803 relates the left-hand channel 802 representing the stub directory to the right-hand channel 804 representing the built-in kernel service. Since the built-in kernel services are not protocol services, the righthand channels have no pointers to protocol channels 517.
Resolving Path names in Name Space 115: FIGS. 10 and 11
As already pointed out, mount table 801 takes advantage of the fact that every Plan 9 file channel 419 implicitly represents the path name of the the file represented by the file channel 419. When the Plan 9 operating system is presented with a path name in a file locator function such as "bind", "mount", "chdir", or "open", it resolves the path name by obtaining a file channel 419 representing the file specified by the path name and returning a pointer to the file channel 419. Since a process
102's file tree 117 is made up of file trees provided by services 123, the resolution of a path name involves both mount table 801 and the service requests which each service 123 responds to. The specific service requests are the walk service file locator request and in some cases the attach service file locator request.
The Plan 9 kernel function which resolves path names is termed "namec" (for name to channel). A flow chart for the function appears in FIG. 10. As shown at call oval 1003, namec takes as arguments a path name, a code for the kind of file access required by the system call which uses namec, and a code indicating how a file being opened by namec is to be opened. namec returns a pointer to a channel. namec handles the first character in the path name specially, but otherwise treats the path name as a list of names separated by "/" characters. Each of the names is termed an element.
The first part of namec (1067) obtains a channel 301 which represents the point at which the path name begins. The manner in which the channel is obtained depends on how the path name begins. There are three possibilities:
If the path name is a full path name, it begins with "/";
If the path name specifies a file provided by a built-in kernel service and the service has not yet been bound to another file name, the path name begins with "#";
If the path name begins at the working directory, the path name begins with a file name.
If the path name begins with "/", branch y of decision block 1005 is taken and the contents of the file channel 419 for the root are copied to a channel 301 "nchan" (1009). As previously indicated, the root file channel 419 can be located from user data structure 403 for the process.
If the path name begins with anything else, the n branch of block 1005 is taken and the first element is then tested at block 1007 to determine whether it is "#". If it is, an attach service file locator request is executed by the kernel built-in service specified by the character following the "#". The result of the attach locator request is a channel 301 which represents the root of a file tree 125 provided by the built-in service. The manner in which a given service handles the attach request depends of course on the service. For example, the attach function for the root service invokes a generic attach function, devattach, which takes the name of the built-in service as an argument, in this case, "/". devattach gets a new channel data structure 301, sets qid field 323 to the value specifying the root, and sets type field 315 to a value derived from the "/" argument. If the first element is neither "/" nor "#", the path name begins at the working directory, and as shown in block 1013, the file channel 419 representing the working directory is copied to nchan. That file channel can of course be located from user data structure 403 for the process.
The next stage, indicated by 1069, is finding out whether the path name for the file represented by nchan has been used as the old path name in a bind or mount function call. First, the next element in the path name is obtained. The function that does this sets the field ELEM 405 in user 403 to the file name which is the next element. Thereupon, mount table array 802 for the process group 103 is located from user data structure 403 for process 102. The left-hand channel 802 for each mount table entry 803 is compared with nchan in turn, until one is found that is "equal" to nchan. "Equal" in this contexts means that the two channels 301 represent the same file in the same service 123, i.e., that they have equal values in their type fields
313, their dev fields 315, and their qid fields 323.
If such a left-hand channel 802 is not found, the n branch of decision block 1019 is taken; otherwise, the y branch is taken, and the contents of right-hand channel 804 pointed by by entry 803 whose left-hand channel 802 is equal to nchan are copied to nchan (1021); thereupon mountptr field 327 in nchan is set to point to mount 809 which points to right-hand channel 804 and mountid field 329 is set to the value of mount 809's mountid. Thus, if the path name has been used as the old path name in a bind or mount call, nchan now represents the file represented by right-hand channel 804. Otherwise, it is unchanged.
Continuing with FIG. 10B, portion 1071 of namec consists of a loop 1025 which examines each remaining element in the path name. Each time through the loop, a call to a "walk" kernel function is made. The arguments used in the call are nchan, the current element (block 1029), and a flag indicating whether the current name can have a left-hand channel 802 representing its file (this is impossible only in the case of kernel built-in service names). As will be explained in more detail below, the walk function returns a channel 301 which represents the file specified by the element. That channel, termed tchan in FIG. 10, is assigned to nchan, as shown by block 1031; then, the next element is obtained from the path name (block 1033). As before, ELEM field 405 in USER structure 403 is set to the next element. When the loop is finished, the last element of the path name remains to be processed.
The manner in which the last element is processed depends on which system call invoked namec. If it was a create system call, the last element represents the name of the new file; otherwise, it represents the file to be accessed. Which system call invoked namec is indicated by the access mode argument, and as shown in decision block 1055, further processing depends on that argument. For purposes of the present discussion, it need only be noted, as shown in block 1057, that when the access mode argument indicates that a file is to be created, a create service file request is made using the last element and tchan, which at this point is a copy of the file channel 419 for the directory in which the new file is to be created. tchan consequently contains values in its type field 313 and its dev field 315 which specify the service 123 which contains the directory and the qid for the directory in its qid field 323. As a result of the create service file request, the new file is created and assigned the last element as its name and tchan's qid field is set to the qid of the new file (1057).
Otherwise, the last element represents the name of a file which already exists. In this case, the walk function is invoked again using the last element to obtain tchan, which represents the file referred to by the last element (1059). Thereupon, fields in tchan are set as required by the access mode and open mode arguments of namec (1061) and tchan is copied to nchan (1063). As shown in return oval 1065, nchan is then returned as the channel 301 representing the file specified in the path name argument.
FIG. 11 is a flow chart of the walk function. As is apparent from start oval 1103 of FIG. 11, walk takes a channel data structure 301, an element of a path name, and a flag indicating whether a mount on the element is legal. The channel data structure 301 represents a directory to be searched for a file having the element name. If the element name is found in the directory or a directory which is unioned with the directory represented by the channel data structure, the function returns a channel data structure 301 which represents the file specified by the element of the path name.
In overview, the algorithm is the following: First, a walk file service request is made to the service specified by type 313 and device 3 15 fields of channel 301 used as an argument. The walk file service request includes the element used as an argument. If the file service request succeeds, it returns a channel 301 for the file specified by the element; thereupon a check is made of mount table 308 to determine whether the channel 321 returned by the walk request is a left-hand channel 802; if it is not, the channel returned by the walk request is returned by the walk function; if it is, the right-hand channel 804 corresponding to the left-hand channel 802 is copied into the channel 301 to be returned by the walk function and the mount ptr and mountid fields of the channel 301 are set as required for the mount data structure 809 pointed to by the mount array element 803 which points to left-hand channel 802.
The walk file service request does not succeed if the file service cannot find a file having the name specified by the element in the directory specified by the channel. There are two reasons why this might be the case:
the directory has had another name bound to it; or
the file corresponding to the element does not exist in the directory.
In the first case, mount ptr 327 and mount id 329 specify mount structure 809 created when the other name was bound, and the walk service file locator request is attempted again with the service specified in right-hand channel 804 pointed to by mount structure 809. If the walk file service request succeeds, processing continues as indicated above. If it fails, either of the above possibilities may again hold. Assuming this time that the problem is that the file corresponding to the element does not exist in the directory, there are still two possibilities: that the file in fact does not exist, or that the directory which was searched is part of a union directory and other directories which are pan of the union directory remain to be searched. If the file in fact does not exist, term field 813 in mount structure 809 for the directory which was just searched will indicate that mount structure 809 is the last one in the list of mount structures 809 making up the union directory; if not, next field 817 in mount structure 809 will point to the mount structure 809 for the next directory in the union directory, and from that next mount structure, the right-hand channel 804 to be searched can be determined. In this case, the walk service file locator request is repeated using the right-hand channel 804 pointed to by the next mount structure.
In more detail, in block 1105, a flag first, indicating that it is the first time the service file locator request will be done, is set, and the channel provided as an argument is copied into cchan. In block 1107, the walk service file locator request is performed using cchan and the element. The action which the service performs in response to the request depends on the service. Two examples may suffice here. The root service responds to the walk request by determining whether cchan has a special value in qid field 323 which is reserved for the root of the root service and if it does, whether the element is one of the names which the root service provides as "stubs". If the latter is the case, cchan's qid field 323 is set to the special value reserved for that stub name; otherwise, the walk service locator request fails.
Mount service 203 responds to the walk request as follows: as previously mentioned, part of der field 315 of channels 301 representing files provided by protocol services 209 contains an identifier specifying a mount service array entry 503. Using that identifier, mount service 203 locates mount service queue 509 for the protocol channel 517 for the protocol service 209; allocates a mount header 521 for the tmessage required for the walk service locator request, and places the file identifier from the channel and the element in the treessage. If the protocol service 209 has a file with the name specified by the element in the directory specified by the file identifier from the channel, it returns an message containing a qid for the file, and that qid is placed in qid field 323 of cchan.
If the walk request succeeds, flag 321 in cchan is set to indicate that the channel 301 is not the result of a mount or bind operation (1119). The next step is to determine whether the walk request has been done more than once (1121). If that is the case, the channel received as an argument will not be the one returned, and the channel received as an argument may be closed (1123). The close operation decrements the count in the channel's ref field, and if the ref field then has the value 0, sends a clunk service request to the service indicated by the channel's dev field 315 and returns the channel structure 301 to the free list. If it's still the first time that the walk request has been made, that step ks skipped. The next step (decision block 1133) is to determine whether cchan represents a component of a pathname which may have another component mounted on it. This is true for any component other than "#" and is indicated by the "mountok" argument of walk. If there can be no mounted component, there is no need to look in mount table 801, and cchan is returned as it is (1135). Otherwise, mount table 801 is examined for a left-hand channel 802 which is equal to cchan, and if such a left-hand channel 802 is found, the value of right-hand channel 804 corresponding to left-hand channel 802 is copied into cchan, along with the mount ptr and the mount identifier for mount structure 809 pointing to the right-hand channel 804. cchan is the returned as thus modified (blocks 1137
to 1143).
Returning to walk service request block 1107, if the request fails, the first step is to determine whether cchan is the result of a mount or bind request. If it is, the search has failed and as may be seen at connector "B", the walk function returns 0, indicating failure (oval 1149). As shown in blocks 1145 and 1147, if the walk request has been done more than once, cchan is closed. Next, the value in cchan's mount ptr field 327 is used to locate a mount structure 809 (block 111). If the mount structure 809's term field 813 indicates that the mount structure is at the end of the list of mount structures 809 defining a union directory, the search has failed, and walk ends as shown at connector B. Otherwise, the pointer next 817 in the current mount structure 809 is followed and the right-hand channel 804 pointed to by the next mount structure 809 is copied into a temporary channel, tchan (block 1115). The mount ptr 327 field in tchan is then set to point to the next mount structure (block 1125), tchan is copied to cchan (1127), first is set to 0, and as shown by the connector "D", the walk service request in block 1107 is repeated. As indicated above, the loop 1131 defined by connector D is repeated until either the walk request in block 1107 succeeds or it is clear that the file corresponding to the element has not been found.
Implementing File Locator Operations
Once name resolution in Plan 9 is understood, the implementation of file locator system calls such as "open", "bind", and "mount" is straightforward. Beginning with open, the system call takes a path name and an integer specifying a mode of opening as argument and returns a file descriptor 417 for the opened file. The system call first locates an element 415 in file descriptor array 413 which does not presently point to a channel 301; the number of that element is the file descriptor for the newly-opened file. Then the system call invokes namec. The arguments are the path name, an access mode specifier specifying open, and the mode of opening. As indicated above, namec resolves the path name, specifies that the service indicated by the path name performs an operation on the file specified by the path name, and returns a channel 301 with fields set such that the channel 301 is now a file channel 419 representing the file. The open system call places a pointer to the file channel
419 in the file descriptor array element 415 specified by the file descriptor and returns the file descriptor.
The "bind" system call in a preferred embodiment takes as arguments a pointer to the new pathname, a pointer to the old path name, and a flag indicating whether the bind is being done with the replace, before, or after options. It returns an integer which has the value "1" if the bind succeeded and "0" if it did not. The system call calls a kernel function called "bindmount" which does both binds and mounts. A flag provided by the "bind" or "mount" system call indicates which is to be done. When a "bind" system call is indicated, bindmount calls namec with the first path name to obtain a first channel 301 representing the file specified by the first path name. Then it calls namec with the second path name to obtain a second channel
301 representing th