United States Patent5283868
Baker , ; et al.February 1, 1994

Title

Providing additional system characteristics to a data processing system through operations of an application program, transparently to the operating system

Abstract

The functions of two virtual operating systems (e.g., S/370 VM, VSE or IX370 and S/88 OS) are merged into one physical system. Partner pairs of S/88 processors run the S/88 OS and handle the fault tolerant and single system image aspects of the system. One or more partner pairs of S/370 processors are coupled to corresponding S/88 processors directly and through the S/88 bus. Each S/370 processor is allocated from 1 to 16 megabytes of contiguous storage from the S/88 main storage. Each S/370 virtual operating system thinks its memory allocation starts at address 0, and it manages its memory through normal S/370 dynamic memory allocation and paging techniques. The S/370 is limit checked to prevent the S/370 from accessing S/88 memory space. The S/88 Operating System is the master over all system hardware and I/O devices. The S/88 processors access the S/370 address space in direct response to a S/88 application program so that the S/88 may move I/O data into the S/370 I/O buffers and process the S/370 I/O operations. The S/88 and S/370 peer processor pairs execute their respective Operating Systems in a single system environment without significant rewriting of either operating system. Neither operating system is aware of the other operating system nor the other processor pairs.


Inventors:Baker; Ernest D. (Boca Raton, FL), Dinwiddie, Jr.; John M.  (West Palm Beach, FL), Grice; Lonnie E.  (Boca Raton, FL), Joyce; James M.  (Boca Raton, FL), Loffredo; John M.  (Deerfield Beach, FL), Sanderson; Kenneth R.  (West Palm Beach, FL)
Assignee:International Business Machines Corp. (Armonk, NY)
Appl. No.:353111
Filed:May 17, 1989

Current U.S. Class:709/227 709/246 
Field of Search:364/DIG.1,DIG.2 395/200,325,275,500

U.S. Patent Documents
4004277January 1977Gayril
4099234July 1978Woods et al.
4214305July 1980Tokita et al.
4228496October 1980Katzman et al.
4244019January 1981Anderson et al.
4245344January 1981Richter
4315321February 1982Parks et al.
4316244February 1982Grondalski
4325116April 1982Kranz et al.
4354225October 1982Frieder et al.
4356550October 1982Katzman et al.
4365295December 1982Katzman et al.
4368514January 1983Persaud
4400775August 1983Nozaki et al.
4412281October 1983Works
4414620November 1983Tsuchimoto et al.
4533996August 1985Hartung et al.
4563737January 1986Nakamura et al.
4564903January 1986Guyette et al.
4591975May 1986Wade et al.
4628508December 1986Sager et al.
4654779March 1987Kato et al.
4674038June 1987Brelsford et al.
4677546June 1987Freeman et al.
4679166July 1987Berger et al.
4722048January 1988Hirsch et al.
4727480February 1988Albright et al.
4747040May 1988Blanset et al.
4750177June 1988Hendrie et al.
Other References
IBM Systems Journal, vol. 27, No. 2, 1988 p. 93. .
Selwyn, Parallel Processing and Expert Systems, pp. 311-314. .
Weiser et al., Status and Performance of the Z Mob Parallel Processing System, Feb. 25-28, Spring Comp. Con 85 IEEE pp. 71-74. .
Ramadrandran et al., Hardware Support for Interprocess Communication, Jun. 2-5, 1987, 14th International Symposium Computer Architecture, IEEE. .
Peacock, Application Dictates Your Choice of a Multiprocessor Model, EDN Jun. 25, 1987, pp. 241-246, 248. .
Golkar et al., IBM-Compatible Mainframe in 20,000-Gate CMOS Arrays, VLSI Systems Design, May 20, 1987. .
Inselberg, Multiprocessor Architecture Ensures Fault-Tolerant Transaction Processing, Mini-Micro Systems, Apr. 1983..~
Primary Examiner: Lee; Thomas C.
Assistant Examiner: Von Buhr; Maria N.
Attorney, Agent or Firm:Black, Jr.; John C. Terrile Stephen A. Downey; Joseph T.

Claims


What is claimed is:
1. A method of processing of I/O functions of application programs being executed on a first central processing element and its associated first operating system in a first data processing system which is lacking a predetermined characteristic comprising the steps of
directly coupling the first central processing element to a second central processing element which forms a part of a second data processing system having said characteristic and including I/O devices and facilities coupled to the second processing element and addressable by an associated second operating system and indirectly addressable by a programmer via requests to the second operating system;
transferring, under the control of a predetermined second application program executed on the second processing element, I/O commands and data of first application programs being executed on the first processing element from the first processing element to the second processing element via said direct coupling in a manner which is transparent to the second operating system; and
converting in one of said data processing systems said I/O commands and data to commands executable by and data useable by said second processing element and its second operating system so that I/O functions of programs being executed on the first processing element and its first operating system are available for processing in the second data processing system under control of the second operating system having said predetermined characteristic.

2. The method of claim 1 further comprising the step of
processing said converted I/O commands and data in said second data processing system under control of said second operating system.

3. The method of claim 2 wherein said transferring step comprises
sending an interrupt request from the first processing element to the second processing element,
processing the interrupt request under control of the predetermined second program,
uncoupling the second processing element from the second data processing system under control of the predetermined second program, during the transfer step, and
transferring the I/O commands and data to the second processing element while it is uncoupled.

4. A method of providing a single system image for application programs executing on a first central processing element under the control of a first operating system lacking a single system image characteristic, comprising the steps of
directly coupling the first processing element to a second central processing element which forms a part of a complex of interconnected second processing elements having associated therewith respective second operating systems;
said second processing elements, their respective second operating systems and I/O resources in the complex providing a single system image within the complex for application programs executed on the second processing elements under the control of the respective second operating systems;
transferring I/O commands and data of application programs executed on the first processing element from the first processing element to the coupled second processing element transparent to the respective second operating systems of the coupled second processing element;
converting in said complex said I/O commands and data to commands executable by and data useable by said coupled processing element and its respective second operating system so that I/O functions of application programs executing on the first processing element and its first operating system are available for processing by the coupled second processing element under control of its respective second operating system with a single system image of the complex.

5. The method of claim 4 further comprising the step of
executing said converted I/O commands to process said converted data within said complex under control of the operating system of the coupled second processing element.

6. The method of claim 5 further comprising the step of
storing said converted data selectively throughout main storage and I/O device units within the complex.

7. A data processing method comprising the steps of
executing first application programs on a first central processing element and first hardware connected to the first processing element under the control of a first operating system,
executing second application programs on a second central processing element and second hardware including I/O devices connected to the second processing element under the control of a second operating system with a predetermined I/O operating characteristic,
directly coupling the first processing element and first hardware to the second processing element,
uncoupling said second processing element from its hardware and transferring I/O commands and data from the first processing element and first hardware to the second processing element while the second processing element is uncoupled from its hardware under the control of a predetermined one of said second application programs when I/O commands are encountered in the first application programs,
recoupling the second processing element with its hardware after completing the transferring of I/O commands and data from the first processing element and hardware,
converting said I/O commands and data to commands executable by and data useable by said second processing element and its second operating system so that I/O functions of the first processing element and its first operating system are available for processing by the second processing element and its connected second hardware under control of its second operating system.

8. The method of claim 7 further comprising the step of
executing said converted I/O commands on said connected second processing element and second hardware to process said converted data under control of said second operating system.

9. In a data processing system of the type in which a plurality of modules are interconnected by a high speed data interconnection, in which each module includes at least one each of a first central processing element, a main storage and an I/O device coupled to each other and managed and controlled by a respective first operating system to operate as a single processing unit, and in which the respective first operating systems assign corresponding unique object names to all I/O devices and sets of data resident in the modules so that means controlled by each one of the operating systems can access sets of data resident in all of the interconnected modules via said object names without an active intervention of the other operating systems, thereby providing a single system image to users of each of the modules; the improvement comprising
at least one additional processing element in one of the modules, alien to the respective operating system of a respective first central processing element of said one module, and managed and controlled by an operating system dissimilar to the first operating systems;
means for tightly coupling said additional processing element to the respective first central processing element within said one module;
means controlled by a predetermined application program executed on the respective first processing element for transferring I/O commands and data of application programs executed on the additional processing element from the additional processing element to said respective first processing element transparent to the operating system of the respective processing element; and
means in said one module for converting said I/O commands and data to commands executable by and data useable by the respective first processing element so that I/O functions of said additional processing element are available for processing within the plurality of modules under control of the operating system of the respective processing element without an active intervention by other first operating systems.

10. The data processing system of claim 9 further comprising
means for processing said converted I/O commands and data within said data processing system under the control of the respective first operating system of the one module.

11. The data processing system of claim 9 wherein said transferring means comprises
means for isolating said respective first processing element from its storage, I/O device and operating system and coupling said respective first processing element to said tight coupling means, and
means for transferring commands and data to said respective first processing element via said coupling means while the first processing element is isolated.

12. A data processing system comprising
a plurality of modules interconnected by a high speed data interconnection;
each module including at least one each of a first central processing element, a main storage and an I/O resource connected to each other and managed and controlled by a respective first operating system to operate as a single processing unit;
means including the first operating systems assigned corresponding unique object names to all I/O resources and stored data sets within all of the modules so that each one of the operating systems can access said resources and data sets resident in any of the interconnected modules via object names without an active intervention of the other operating systems, thereby providing a single system image to users of each of the modules;
at least one additional processing element in one of the modules, alien to the first operating systems, and managed and controlled by an operating system dissimilar to the first operating systems;
means tightly coupling said additional processing element to a respective first central processing element within said one module;
means controlled by a predetermined application program executed on the respective first processing element for transferring I/O commands and data of programs executed on the additional processing element from the additional processing element to said respective first processing element in a manner transparent to the first operating system of said respective first processing element; and
means in said one module for converting said I/O commands and data to commands executable by and data useable by the respective first processing element so that said converted I/O commands and data are available for processing by the respective first processing element under control of its first operating system to access said resources and data sets via said unique object names.

13. The data processing system of claim 12 further comprising
means including a first central processing element in said one module for executing said converted I/O commands to process said converted data within said data processing system under the control of the respective first operating system in said one module.

14. A mechanism for providing an I/O environment having a predetermined characteristic for first application programs being executed on a first processing element and connected hardware under the control of an associated first operating system which are lacking said characteristic, comprising
a second processing element which forms a part of a data processing system with an I/O environment having said characteristic, said I/O environment including I/O devices and facilities connected to the second processing element and addressable by an associated second operating system and indirectly addressable by a programmer via requests to the second operating system, second application programs being executed on the second processing element and I/O environment under the control of the second operating system;
means for directly coupling the first processing element and hardware to the second processing element;
means effective when I/O commands are encountered during the execution of said first application programs for transferring the encountered I/O commands and related data from the first processing element and hardware to the second processing element via said direct coupling means in a manner transparent to the second operating system; and
means controlled by a predetermined application program executed on one of the processing elements for converting said I/O commands and data to commands executable by and data useable by said second processing element and its second operating system so that said converted I/O commands are available for processing by the second processing element under control of its second operating system within said data processing system and its I/O environment having said selected characteristic.

15. The mechanism of claim 14 further comprising
means for processing said converted I/O commands and data in said data processing systems with said I/O environment under control of said second operating system.

16. The mechanism of claim 14 wherein said transferring means comprises
means in the first processing element and hardware for initiating an interrupt request to the second processing element,
means including an application program routine executed on the second processing element for processing the interrupt request,
means including an application program running on the second processing element thereafter effective during program execution for uncoupling the second processing element from the I/O devices and facilities, and
means for transferring the I/O commands and data to the second processing element while it is uncoupled via said direct coupling means.

17. The mechanism of claim 14 further comprising
a plurality of additional processing elements substantially identical to said second processing element,
means coupling the additional processing elements to said second processing element, I/O devices and facilities,
said second operating system effective for causing a sharing of I/O load tasks among the second and additional processing elements, whereby the I/O functions of the first processing element and its first operating system may be processed on any one of the second or additional processing elements.

18. A mechanism for processing I/O commands of System/370 application programs being executed on a System/370 processing element and connected hardware under the control of a System/370 operating system comprising
a System/88 data processing system including a System/88 processing element, associated System/88 I/O devices and facilities connected to the System/88 processing element and addressable by an associated System/88 operating system and indirectly addressable by a programmer via requests to the System/88 operating system;
means directly coupling the System/370 processing element and hardware to the System/88 processing element;
means effective when I/O commands are encountered during the execution of said System/370 application programs for transferring the encountered I/O commands and related data from the System/370 processing element and hardware to the System/88
processing element via said direct coupling means in a manner transparent to the System/88 operating system; and
means controlled by an application program executed on one of the processing elements for converting said encountered I/O commands and related data to commands executable by and data useable by said System/88 processing element and its operating system so that I/O functions of the System/370 application programs are available for processing by the System/88 processing element under control of its operating system.

19. The mechanism of claim 18 further comprising
means for processing said converted I/O commands and data under control of said System/88 operating system.

20. The mechanism of claim 19 wherein said transferring means comprises
means in the System/370 processing element and hardware for initiating interrupt requests to the System/88 processing element,
means including a System/88 application program routine for processing the interrupt requests, and
means including a System/88 application program thereafter effective for selectively uncoupling the System/88 processing element from its associated I/O devices and facilities during program execution and for transferring the System/370 I/O commands and data to the System/88 processing element, while it is uncoupled, via said direct coupling means.

21. The mechanism of claim 18 further comprising
a plurality of additional System/88 processing elements coupled to said first-mentioned System/88 processing element and to said I/O devices and facilities, and
means including said System/88 operating system effective for causing a sharing of I/O load tasks among the System/88 processing elements, whereby the System/370 I/O functions may be processed on any one of the System/88 processing elements.

22. A data processing method comprising the steps of
executing first application programs on a first central processing element and hardware connected to the processing element under the control of a first operating system in a first data processing system which lacks characteristics of a single system image,
directly coupling the first central processing element to a predetermined second central processing element of a second data processing system including a plurality of interconnected second central processing elements, the second central processing elements being connected to respective I/O devices and facilities, and having associated therewith respective second operation systems, said second data processing system including means for providing a single system image environment for application programs executed on the second central processing elements under the control of the second operating systems;
transferring I/O commands and data of application programs being executed on the first central processing element from the first central processing element to the predetermined second central processing element in a manner which is transparent to the respective predetermined second operating system; and
converting in one of said data processing systems alternatively before or after said transfer of I/O commands and data to commands executable by and data useable by said predetermined second processing element and its respective predetermined second operating system so that I/O functions of application programs executing on the first central processing element are available for processing by the second processing element under control of its second operating system within the single system image environment of the second data processing system.

23. Data processing apparatus comprising
a plurality of modules interconnected with each other to form a network;
each module including at least one each of said first central processing element, a main storage and an I/O device coupled to each other and managed and controlled by a respective first operating system to operate as a single processing unit;
the respective first operating systems assigning corresponding unique object names to each of said I/O devices and to data sets resident in the modules so that means controlled by each one of the operating systems can access sets of data resident in all of the interconnected modules via object names without an active intervention of other operating systems, thereby providing the characteristics of a single image to users of each of the modules;
a data processing means including at least one additional central processing element in one of the modules managed and controlled by an operating system dissimilar to the first operating systems and lacking single system image characteristics;
means for tightly coupling said additional processing element to a respective first central processing element within said one module;
means controlled by a predetermined application program executed on the respective first central processing element for isolating the respective first central processing element from its main storage, I/O device and operating system and for transferring I/O commands and data of application programs executed on the additional processing element from the data processing means to said isolated respective first processing element via said coupling means in a manner which is transparent to the operating system of the respective processing element;
means in said one module for converting said I/O commands and data to commands executable by and data useable by the respective first processing element alternatively before or after said transferring step; and
means in said modules for processing the converted I/O commands and data of said additional processing element under control of the operating systems of the respective first processing element without an active intervention by the other first operating system, thereby providing the characteristic of a single system image to users of the data processing means.

24. A data processing method comprising the steps of
executing a first application program on a first central processing element and first hardware connected to the first central processing element under the control of a first operating system in a first data processing system;
providing a second data processing system including a second central processing element and second hardware connected to the second processing element for executing second application programs and for processing I/O commands and data of said second application programs under the control of a second operating system dissimilar to the first operating system and having a predetermined I/O processing characteristic that is lacking in the first operating system;
directly coupling the first central processing element and first hardware to the second central processing element;
sending interrupt requests to the second processing element when I/O commands are encountered during the execution of the first application program on the first processing element;
processing said interrupt requests in the second central processing element under the control of a predetermined second application program;
during the execution of at least certain of the instruction of the predetermined second application program on the second processing element, operationally isolating the second processing element from the second hardware and second operating system and operating the second processing element with the first processing element and first hardware for completing the execution of the certain instructions with said first processing element and first hardware;
transferring said encountered I/O commands and related data of the first application program from the first data processing system to the second element during said execution of the certain instructions while the second processing element is isolated from the second hardware and second operating system and operating with the first processing element and first hardware;
converting said encountered I/O commands and data to commands executable by and data useable by the second processing element and the second operating system in one of said data processing systems alternatively before or after said transferring step; and
processing said converted I/O commands and data in said second data processing system under the control of the second operating system with said I/O characteristic.

25. A data processing method comprising
processing first application programs on a first data processing system including a first processing element connected to first hardware under the control of a first operating system lacking a predetermined characteristic for processing I/O functions of application programs;
processing second application programs on a second data processing system including a second processing element connected to second hardware including I/O devices under the control of a second operating system dissimilar to the first operating system and having said characteristic;
directly coupling the second processing element to the first processing element and first hardware;
configuration tables of the second operating system lacking a definition of said first system and its first processing element and said second operating system lacking a device driver for controlling data transfers between the first and second processing elements;
operating the second processing element under the control of a predetermined one of said second application programs to isolate itself from the second hardware and second operating system and to interact with the first hardware for transferring I/O commands and data of said first application programs being processed on the first processing element from the first data processing system to the second processing element via said direct coupling in a manner transparent to the second operating system; and
converting in one of said data processing systems said I/O commands and data to commands executable by and data usable by the second processing element and second operating system so that I/O functions of said first application programs are available for processing by the second processing element and second hardware under the control of the second operating system having said characteristic.

26. A data processing method comprising the steps of
executing first application programs on a first central processing element and first hardware connected to the element under the control of a first operating system;
executing second application programs on a second central processing element and second hardware including I/O devices connected to the second processing element under the control of a second operating system;
directly coupling the first processing element to the second processing element;
temporarily isolating the second central processing element from the second hardware and second operating system and operating the second processing element with the first processing element and first hardware under the control of a predetermined second application program executed on the second processing element in response to requests from the first processing element to allow transfer of I/O commands and data commands;
transferring I/O commands and data of first application programs being executed on the first processing element from the first processing element to the second processing element via said direct coupling in a manner which is transparent to the second operating system while the second processing element is isolated; and
converting said I/O commands and data to commands executable by and data useable by said second processing element and its second operating system so that I/O functions of programs being executed on the first processing element and its first operating system are available for processing by the second processing element under control of its second operating system.

Description

The subject application is related to other applications having different joint inventorships filed on the same day and assigned to a common assignee. These other applications are:

______________________________________ Ser. No. Title Inventors ______________________________________ 07/353116, Fault Tolerant Data E. D. Baker pending Processor System J. M. Dinwiddie L. E. Grice J. M. Joyce J. M. Loffredo K. R. Sanderson G. A. Suarez 07/353114, Uncoupling A Central E. D. Baker now U.S. Pat. No. Processing Unit From Its J. M. Dinwiddie 5,155,809 Associated Hardware For L. E. Grice Interaction With Data J. M. Joyce Handling Apparatus Alien J. M. Loffredo To The Operating System K. R. Sanderson Controlling Said Unit And Hardware 07/353117, Servicing Interrupt J. M. Dinwiddie pending Requests In A Data L. E. Grice Processing System Without J. M. Loffredo Using The Services Of An K. R. Sanderson Operating System 07/353113, A Single Physical Main E. D. Baker now U.S. Pat. No. Storage Shared By Two J. M. Dinwiddie 5,144,692 Or More Processors L. E. Grice Executing Respective J. M. Loffredo Operating Systems K. R. Sanderson G. A. Suarez 07/353115, Method And Apparatus E. D. Baker now abandoned, For The Direct Transfer J. M. Dinwiddie continued as Of Information Between L. E. Grice application Application Programs J. M. Joyce Ser. No. Running On Distinct J. M. Loffredo 07/957745, Processors Without K. R. Sanderson pending Utilizing The Services Of One Or Both Operating Systems 07/353112, Data Processing System J. M. Dinwiddie now U.S. Pat. No. With System Resource B. J. Freeman 5,113,522 Management For Itself L. E. Grice And For An Associated J. M. Loffredo Alien Processor K. R. Sanderson G. A. Suarez ______________________________________

TABLE OF CONTENTS

Background of the Invention

Field of the Invention

Prior Art

Summary of the Invention

Brief Description of the Drawings

Description of the Preferred Embodiment

Introduction

1. Operating a Normally Non-Fault Tolerant Processor in a Fault Tolerant Environment

2. Uncoupling a Processor from Its Associated Hardware to Present Commands and Data from Another Processor to Itself

3. Presentation of Interrupts to a System Transparent to the Operating System

4. Sharing a Real Storage Between Two or More Processors Executing Different Virtual Storage Operating Systems

5. Single System Image

6. Summary

Prior Art System/88 Detail Fault Tolerant S/370 Module 9 Interconnected via Links,

General Description of Duplexed Processor Partner Units 21, 23

Coupling of S/370 and S/88 Processor Elements 85, 62

Processor to Processor Interface 89

1. I/O Adapter 154 (Note: Uses FIG. 18 re IOA)

2. I/O Adapter Channel 0 and Channel 1 Bus

3. The Bus Control Unit 156--General Description

4. Direct Memory Access Controller 209

5. Bus Control Unit 156--Detailed Description

(a) Interface Registers for High Speed Data Transfer

(b) BCU Uncouple and Interrupt Logic 215, 216

(c) BCU Address Mapping

(d) Local Address and Data Bus Operations

(e) S/88 Processor 62 and DMAC209 Addressing To/From Local Storage 210

(f) BCU Basic Storage Module (BSM) RD/WR Byte Counter Operation

(g) Handshake Sequences BCU 156/Adapter 154

S/370 Processor Element 85

Processor Bus 170 and Processor Bus Commands

S/370 Storage Management Unit 81

1. Cache Controller 153

2. STCI 155

(a) Introduction

(b) System Bus Phases

(c) STCI Features

(d) Data Store Operations

(e) Data Fetch Operations

S/370 I/O Support

S/370 I/O Operations, Firmware Overview

System Microcode Design

1. Introduction

2. ETIO/EXEC370 Program Interface

3. EXEC370, S/370 Microcode Protocol

4. Instruction Flows Between S/370 Microcode and EXEC370

Operation of the Bus Control Unit (BCU) 156

1. Introduction

2. S/370 Start I/O Sequence Flow, General and Detailed Description

3. S/370 I/O Data Transfer Sequence Flow, General Description

(a) I/O Write Operations:

(b) I/O Read Operation:

(c) S/370 High Priority Message Transfer Sequence Flow

(d) BCU Status Command

(e) Programmed BCU Reset

Count, Key, and Data Track Format Emulation

1. The Object System

2. The Target System

3. The Emulation Format

4. Emulation Functions

Sharing of Real Storage 16 by S/88 and S/370

1. Introduction

2. Mapping S/88 Storage 16

3. Startup Procedure

4. Start S/370 Service Routine

5. Unthread Chosen String of MMC's From Free List

6. Writing Storage Base and Size to STCI

Initialization Functions for Uncoupling S/88 Interrupts Initiated by S/370

Gain Freedom Without Modifying the S/88 Operating System

Stealing Storage Without Modifying S/88 OS

Power on and Synchronization of Simplexed and Partner Units 21, 23

(S/88 Processing Unit as a Service Processor for S/370 Processing Unit)

1. Introduction

2. Fault Tolerant Hardware Synchronization

3. A Simplexed Processing Unit 21 is Powered On

(a) Hardware Implementation

(b) Microde--Only Implementation

4. Duplexed Processing Units 21, 23 are Powered On

(a) Hardware Implementation

(b) Microcode--Only Implementation

5. A Partner 23 Is Inserted While The Other Unit 21 Processes Normally

(a) Hardware Implementation

(b) Microcode--Only Implementation

6. A Partner Detects A Compare Failure

(a) Hardware Implementation

(b) Microcode--Only Implementation

Alternative Embodiments

1. Use In Other (Non-S/88) Fault-Tolerant Systems

2. Direct Data Transfers Between S/88 I/O Controllers and S/370 Main Storage

3. Uncoupling Both Processors of a Directly Connected Pair

BACKGROUND OF THE INVENTION

1. Field of the Invention

The improvements of the present application relate to adding characteristics to a data processing systems in which the central processing unit and associated operating system lack such characteristics.

2. Prior Art

Certain of today's more recently developed data processing systems offer many advanced features (or characteristics) that are not available on the older mainframe systems or that are not supported by the mainframe operating systems. Some of these features include: a single system image presented across a distributed computing network; the capability to hot plug processors and I/O controllers (remove and install cards with power on); instantaneous error detection, fault isolation and electrical removal from service of failed components without interruption to the computer user; customer replaceable units identified by remote service support; and dynamic reconfiguration resulting from component failure or adding additional devices to the system while the system is continuously operating.

However, the breadth of their customer base, the maturity of their operating systems, the number and extent of the available user programs are not as great as those of the significantly older mainframe systems of several manufacturers such as the System 370 (S/370) system marketed by International Business Machines Corporation.

One example of such a recent system, a fault tolerant system, is the System 88 (S/88) system marketed by International Business Machines Corporation. It is one model of this IBM S/88 and one model of an IBM S/370 which form an integral part of the preferred form of the present improvement.

Such fault tolerant systems have typically been designed from the bottom up for fault tolerant operation. The processors, storage, I/O apparatus and operating systems have been specifically tailored for a fault tolerant environment.

Proposals for incorporating the above features into the System/370 environment and architecture might typically consist of a major rewrite of the System/370 operating system(s) and user application programs and/or new hardware developed from scratch. However, the major rewrite of an operating system such as VM, VSE, IX370, etc. is considered by many to be a monumental task, requiring a large number of programmers and a considerable period of time. It usually takes more than five years for a complex operating system such as IBM System370 VM or MVS to mature. Up to this time most system crashes are a result of operating system errors. Also, many years are required for users to develop proficiency in the use of an operating system. Unfortunately, once an operating system has matured and has developed a large user base, it is not a simple effort to modify the code to introduce new functions such as fault tolerance, dynamic reconfiguration, single system image, and the like.

Because of the complexities and expense of migrating a mature operating system into a new machine architecture, the designers will usually decide to develop a new operating system which may not be readily accepted by the using community. It may provide impractical to modify the mature operating system to incorporate the new features exemplified by the newly developed operating system; however, the new operating system may never develop a substantial user base, and will take many years of field usage before most problems are resolved.

Accordingly it is a primary object of the present improvement to provide one or more additional features of characteristics, especially a single system image, for a data processing system lacking such characteristics without a major rewrite of the operating system.

SUMMARY OF THE INVENTION

This object is achieved in a preferred embodiment by uniquely coupling at least one first central processing unit (CPU) lacking a desired characteristic with at least one respective second different CPU operated under control of an operating system which provides the desired characteristic. The second CPU is controlled to operate as an I/O controller for the first CPU without the operating system of the second CPU being aware of the existence of the first CPU. In addition, commands and/or data are transferred between the first and second CPUs without the use of the operating system services of the second CPU.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 diagrammatically illustrates the standard interconnection computer systems utilizing a communication line;

FIG. 2 shows diagrammatically the interconnection of S/88 processors in a fault tolerant environment;

FIG. 3 shows diagrammatically the interconnection of S/370 processors with S/88 processors in the preferred embodiment;

FIG. 4 shows diagrammatically a S/370 system coupled to a S/88 system in the manner of the preferred embodiment;

FIG. 5 shows diagrammatically the uncoupling of a S/88 processor to provide data exchange between the S/370 and the S/88 of the preferred embodiment;

FIGS. 6A, 6B and 6C diagrammatically illustrate the prior art IBM System/88 module, plural modules interconnected by high speed data interconnections (HSDIs) and plural modules interconnected via a network in a fault tolerant environment with a single system image;

FIG. 7 diagrammatically illustrates one form of the improved module of the present invention which provides S/370 processors executing S/370 application programs under control of a S/370 operating system which are rendered fault tolerant by virtue of the manner in which the processors are connected to each other and to S/88 processors, I/O and main storage;

FIG. 8 diagrammatically illustrates in more detail the interconnection of paired S/370 units and S/88 units with each other to form a processor unit and their connection to an identical partner processor unit for fault tolerant operation;

FIGS. 9A and 9B illustrate one form of physical packaging of paired S/370 and S/88 units on two boards for insertion into the back panel of a processing system enclosure;

FIG. 10 conceptually illustrates S/88 main storage and sections of that storage dedicated to S/370 processor units without knowledge by the S/88 operating system;

FIG. 11 shows diagrammatically certain components of the preferred form of a S/370 processor and means connecting it to a S/88 processor and storage;

FIG. 12 shows the components of FIG. 11 in more detail and various components of a preferred form of a S/88 processor;

FIG. 13 diagrammatically illustrates the S/370 bus adapter;

FIGS. 14A, 14B and 15A-C illustrate conceptually the timing and movement of data across the output channels of the S/370 bus adapter;

FIG. 16 diagrammatically illustrates the direct interconnection between a S/370 and a S/88 processor in more detail;

FIG. 17 conceptually illustrates data flow between a S/370 bus adapter and a DMA controller of the interconnection of FIG. 16;

FIG. 18 shows DMAC registers for one of its four channels;

FIGS. 19A, 19B and 19C (with layout FIG. 19) are a schematic/diagrammatic illustration showing in more detail than FIG. 16 a preferred form of the bus control unit interconnecting a S/370 processor with a S/88 processor and main storage;

FIG. 20 is a schematic diagram of a preferred form of the logic uncoupling the S/88 processor from its associated system hardware and of the logic for handling interrupt requests from the alien S/370 processor to the S/88 processor;

FIG. 21 conceptually illustrates the modification of the existing S/88 interrupt structure for a module having a plurality of interconnected S/370-S/88 processors according the teachings of the present application;

FIGS. 22, 23 and 24 are timing diagrams for Read, Write and Interrupt Acknowledge cycles of the preferred form of the S/88 processors;

FIGS. 25 and 26 show handshake timing diagrams for adapter bus channels 0, 1 during mailbox read commands, Q select up commands, BSM read commands and BSM write commands;

FIG. 27 is a block diagram of a preferred form of a S/370 central processing element;

FIGS. 28 and 29 illustrate certain areas of the S/370 main storage and control storage;

FIG. 30 shows a preferred form of the interface buses between the S/370 central processing element, I/O adapter, cache controller, storage control interface and S/88 system bus, and processor;

FIG. 31 is a block diagram of a preferred form of a S/370 cache controller;

FIGS. 32A and 32B (with layout FIG. 32) schematically illustrate a preferred form of the storage control interface in greater detail;

FIG. 33 is a timing diagram illustrating the S/88 system bus phases for data transfer between units on the bus;

FIG. 34 is a fragmentary schematic diagram showing the "data in" registers of a paired storage control interface;

FIG. 35 shows formats of the command and store data words stored in the FIFO of FIG. 32B;

FIG. 36A-D illustrate store and fetch commands from the S/370 processor and adapter which are executed in the storage control interface;

FIG. 37 illustrates conceptually the preferred embodiment of the overall system of the present application from a programmer's point of view;

FIGS. 38, 39 and 40 illustrate diagrammatically preferred forms of the microcode design for the S/370 and S/88 interface, the S/370 I/O command execution and the partitioning of the interface between EXEC 370 software and the S/370 I/O driver (i.e. ETIO+BCU+S/370 microcode) respectively;

FIGS. 41A and 41B illustrate conceptually interfaces and protocols between EXEC 370 software and S/370 microcode and between ETIO microcode and EXEC 370 software;

FIGS. 41C-H illustrate the contents of the BCU local store including data buffers, work queue buffers, queues, queue communication areas and hardware communication areas including a link list and the movement of work queue buffers through the queues, which elements comprise the protocol through which S/370 microcode and EXEC 370 software communicate with each other;

FIG. 42 illustrates conceptually the movement of work queue buffers through the link list and the queues in conjunction with the protocols between the EXEC 370, ETIO, S/370 microcode and the S/370-S/88 coupling hardware;

FIG. 43 illustrates conceptually the execution of a typical S/370 Star I/O instruction;

FIGS. 44A-L illustrate diagrammatically the control/data flows for S/370 microcode and EXEC 370 as they communicate with each other for executing each type of S/370 I/O instruction;

FIGS. 45A-AG illustrate data, command and status information on the local address and data buses in the BCU during data transfer operations within the BCU;

FIGS. 46A-K illustrate conceptually a preferred form of disk emulation process whereby the S/88 (via the BCU, ETIO and EXEC 370) stores and fetches information on a S/88 disk in S/370 format in response to S/370 I/O instructions;

FIG. 47 illustrates conceptually the memory mapping of FIG. 10 together with a view of the S/88 storage map entries, certain of which are removed to accommodate one S/370 storage area;

FIGS. 48A-K illustrate a preferred form of virtual/physical storage management for the S/88 which can interact with newly provided subroutines during system start-up and reconfiguration routines to create S/370 storage areas within the S/88
physical storage;

FIGS. 49 and 50 are fragmentary diagrams illustrating certain of the logic used to synchronize S/370-S/88 processor pairs and partner units; and

FIGS. 51 and 52 illustrate alternative embodiments of the present improvement.

DESCRIPTION OF THE PREFERRED EMBODIMENT

Introduction

The preferred embodiment for implementing the present invention comprises a fault tolerant system. Fault tolerant systems have typically been designed from the bottom up for fault tolerant operation. The processors, storage, I/O apparatus and operating systems have been specifically tailored to provide a fault tolerant environment. However, the breadth of their customer base, the maturity of their operating systems, the number and extent of the available user programs are not as great as those of the significantly older mainframe systems of several manufacturers such as the System/370 (S/370) system marketed by International Business Machines Corporation.

Today's fault tolerant data processing systems offer many advanced features that are not normally available on the older non-fault tolerant mainframe systems or that are not supported by the mainframe operating systems. Some of these features include: a single system image presented across a distributed computing network; the capability to hot plug processors and I/O controllers (remove and install cards with power on); instantaneous error detection, fault isolation and electrical removal from service of failed components without interruption to the computer user; customer replaceable units identified by remote service support; and dynamic reconfiguration resulting from component failure or adding additional devices to the system while the system is continuously operating. One example of such fault tolerant systems is the System/88 (S/88) system marketed by International Business Machines Corporation.

Proposals for incorporating the above features into the S/370 environment and architecture might typically consist of a major rewrite of the operating system(s) and user application programs and/or new hardware developed from scratch. However, the major rewrite of an operating system such as VM, VSE, IX370, etc. is considered by many to be a monumental task, requiring a large number of programmers and a considerable period of time. It usually takes more than five years for a complex operating system such as IBM S/370 VM or MVS to mature. Up to this time most system crashes are a result of operating system errors. Also, many years are required for users to develop proficiency in the use of an operating system. Unfortunately, once an operating system has matured and has developed a large user base, it is not a simple effort to modify the code to introduce new functions such as fault tolerance, dynamic reconfiguration, single system image, and the like.

Because of the complexities and expense of migrating a mature operating system into a new machine architecture, the designers will usually decide to develop a new operating system which may not be readily accepted by the using community. It may prove impractical to modify the mature operating system to incorporate the new features exemplified by the newly developed operating system; however, the new operating system may never develop a substantial user base, and will take many years of field usage before most problems are resolved.

Accordingly, it is intended that the present improvement will provide a fault tolerant environment and architecture for a normally non-fault-tolerant processing system and operating system without major rewrite of the operating system. In the preferred embodiment a model of IBM System/88 is coupled to a model of an IBM S/370.

One current method of coupling distinct processors and operating systems is through some kind of communications controller added to each system, appending device drivers to the operating systems, and using some kind of communication code such as Systems Network Architecture (SNA) or OSI to transport data. Normally, to accomplish data communications between end-node computers in a network, it is necessary that the end nodes each understand and apply a consistent set of services to data that is to be exchanged.

To reduce their design complexity, most networks are organized as a series of layers or levels, each one built upon its predecessor. The number of layers, the name of each layer, and the function of each layer differ from network to network. However, in all networks, the purpose of each layer is to offer certain services to the higher layers, shielding those layers from the details of how the offered services are actually implemented. Layer n on one machine carries on a conversation with layer n on another machine. The rules and conventions used in this conversation are collectively known as the layer n protocol. The entities comprising the corresponding layers on different machines are called peer processes, and it is the peer processes that are said to communicate using the protocol.

In reality, no data are directly transferred from layer n on one machine to layer n on another machine (except in the lowest or physical layer). That is, there can be no direct coupling of application programs operating on distinct or alien systems. Instead, each layer passes data and control information to the layer immediately below it, until the lowest layer is reached. At the lowest layer there is physical communication with the other machine, as opposed to the virtual communication used by the higher layers.

Definitions of these sets of services have existed in a number of different networks as mentioned above and more recently, interest has centered on provision of protocols to ease interconnection of systems from different vendors. A structure for development of these protocols is the framework defined by the International Standards Organization (ISO) seven layer OSI (Open Systems Interconnect) model. Each of the layers in this model is responsible for providing networking services to the layer above it while requesting services from the layer below it. The services provided at each layer are well defined so that they can be applied consistently by each station in the network. This is said to allow for the interconnection of different vendors' equipment. Implementation of layer to layer services within a node is implementation-specific and allows vendor differentiation on the basis of services provided within a station.

It is important to note that the entire purpose of implementing such a structured set of protocols is to perform end-to-end transfer of data. The major divisions within the OSI model can be better understood if one realizes that the user node is concerned with the delivery of data from the source application program to the recipient application program. To deliver this data, the OSI protocols act upon the data at each level to furnish frames to the network. The frames are built up as the data coupled with corresponding headers applied at each OSI level. These frames are then provided to the physical medium as a set of bits which are transmitted through the medium. They then undergo a reverse set of procedures to provide the data to the application program at the receiving station.

As stated earlier, one current method of coupling distinct processors and operating systems is through some kind of communications controller added to each system, appending device drivers to the operating systems, and using some kind of communication code such as Systems Network Architecture (SNA) or OSI to transport data. FIG. 1 shows a standard interconnection of two computer systems by means of a Local Area Network (LAN). In particular an IBM S/370 architecture system is shown connected to an IBM System/88 architecture. It will be observed that in each architecture an application program operates through an interface with the operating system to control a processor and access an I/O channel or bus. Each architecture device has a communications controller to exchange data. In order to communicate, a multi-layered protocol must be utilized to allow data to be exchanged between the corresponding application programs.

An alternative method to exchange data would be a coprocessor method in which the coprocessor resides on the system bus, arbitrates for the system bus, and uses the same I/O as the host processor. The disadvantage of the coprocessor method is the amount of code rewrite required to support non-native (alien) host I/O. Another disadvantage is that the user must be familiar with both systems architectures to switch back and forth from coprocessor to host operating systems--an unfriendly user environment.

A prior art fault tolerant computer system has a processor module containing a processing unit, a random access memory unit, peripheral control units, and a single bus structure which provides all information transfers between the several units of the module. The system bus structure within each processor module includes duplicate partner buses, and each functional unit within a processor module also has a duplicate partner unit. The bus structure provides operating power to units of a module and system timing signals from a main clock.

FIG. 2 shows in the form of a functional diagram the structure of the processor unit portion of a processor module. By using identical paired processors mounted on a common replacement card and executing identical operations in synchronization, comparisons can be made to detect processing errors. Each card normally has a redundant partnered unit of identical structure.

The computer system provides fault detection at the level of each functional unit within the entire processor module. Error detectors monitor hardware operations within each unit and check information transfers between units. The detection of an error causes the processor module to isolate the unit which caused the error and to prohibit it from transferring information to other units, and the module continues operation by employing the partner of the faulty unit.

Upon detection of a fault in any unit, that unit is isolated and placed off-line so that it cannot transfer incorrect information to other units. The partner of the now off-line unit continues operating and thereby enables the entire module to continue operating. A user is seldom aware of such a fault detection and transition to off-line status, except for the display or other presentation of a maintenance request to service the off-line unit. The card arrangement allows easy removal and replacement.

The memory unit is also assigned the task of checking the system bus. For this purpose, the unit has parity checkers that test the address signals and that test the data signals on the bus structure. Upon determining that either bus is faulty, the memory unit signals other units of the module to obey only the non-faulty bus. The power supply unit for the processor module employs two power sources, each of which provides operating power to only one unit in each pair of partner units. Upon detecting a failing supply voltage, all output lines from the affected unit to the bus structure are clamped to ground potential to prevent a power failure from causing the transmission of faulty information to the bus structure.

FIG. 3 shows in the form of a functional diagram, the interconnection of paired S/370 processors with paired S/88 processors in the manner of a fault tolerant structure to enable the direct exchange of data. The similarity to the prior S/88
structure (FIG. 2) is intentional but it is the unique interconnection by means of both hardware and software that establishes the operation of the preferred embodiment. It will be observed that the S/370 processors are coupled to storage control logic and bus interface logic in addition to the S/88 type compare logic. As will be described the compare logic will function in the same manner as the compare logic for the S/88 processors. Moreover the S/370 processors are directly coupled and coupled through the system bus to corresponding S/88 processors. As with the S/88 processor the S/370 processors are coupled in pairs and the pairs are intended to be mounted on field replaceable, hot-pluggable, circuit cards. The detailed interconnections of the several drivers will be described in greater detail later.

The preferred embodiment interconnects plural S/370 processors for executing the same S/370 instructions concurrently under control of a S/370 operating system. These are coupled to corresponding plural S/88 processors, I/O apparatus and main storage, all executing the same S/88 instructions concurrently under control of a S/88 operating system. As will be described later means are included to asynchronously uncouple the S/88 processors from their I/O apparatus and storage, to pass S/370 I/O commands and data from the S/370 processors to the S/88 processors while the latter are uncoupled, and to convert the commands and data to a form useable by the S/88 for later processing by the S/88 processors when they are recoupled to their I/O apparatus and main storage.

1. Operating a Normally Non-Fault Tolerant Processor in a Fault Tolerant Environment

The previously listed fault tolerant features are achieved in a preferred embodiment by coupling normally non-fault-tolerant processors such as S/370 processors in a first pair which execute the same S/370 instructions simultaneously under control of one of the S/370 operating systems. Means are provided to compare the states of various signals in one processor with those in the other processor for instantaneously detecting errors in one or both processors.

A second partner pair of S/370 processors with compare means are provided for executing the same S/370 instructions concurrent with the first pair and for detecting errors in the second pair. Each S/370 processor is coupled to a respective S/88
processor of a fault-tolerant system such as the S/88 data processing system having first and partner second pairs of processors, S/88 I/O apparatus and S/88 main storage. Each S/88 processor has associated therewith hardware coupling it to the I/O apparatus and main storage.

The respective S/370 and S/88 processors each have their processor buses coupled to each other by means including a bus control unit. Each bus control unit includes means which interacts with an application program running on the respective S/88
processor to asynchronously uncouple the respective S/88 processor from its associated hardware and to couple it to the bus control unit (1) for the transfer of S/370 commands and data from the S/370 processor to the S/88 processor and (2) for conversion of the S/370 commands and data to commands executable by and data useable by the S/88.

The S/88 data processing system subsequently processes the commands and data under control of the S/88 operating system. The S/88 data processing system also responds to error signals in either one of the S/370 processor pairs or in their respectively coupled S/88 processor pair to remove the coupled pairs from service and permit continued fault tolerant operation with the other coupled S/370, S/88 pairs. With this arrangement, S/370 programs are executed by the S/370 processors (with the assistance of the S/88 system for I/O operations) in a fault tolerant (FT) environment with the advantageous features of the S/88, all without significant changes to the S/370 and S/88 operating systems.

In addition, the storage management unit of the S/88 is controlled so as to assign dedicated areas in the S/88 main storage to each of the duplexed S/370 processor pairs and their operating system without knowledge by the S/88 operating system. The processors of the duplexed S/370 processor pairs are coupled individually to the common bus structure of the S/88 via a storage manager apparatus and S/88 bus interface for fetching and storing S/370 instructions and data from their respective dedicated storage area.

The preferred embodiment provides a method and means of implementing fault tolerance in the S/370 hardware without rewriting the S/370 operating system or S/370 applications. Full S/370 CPU hardware redundancy and synchronization is provided without custom designing a processor to support fault tolerance. A S/370 operating system and a fault tolerant operating system, (both virtual memory systems) are run concurrently without a major rewrite of either operating system. A hardware/microcode interface is provided in the preferred embodiment between peer processor pairs, each processor executing a different operating system. One processor is a microcode controlled IBM S/370 engine executing an IBM Operating System (e.g., VM, VSE, IX370, etc.). The second processor of the preferred embodiment is a hardware fault tolerant engine executing an operating system capable of controlling a hardware fault tolerant environment (e.g., IBM System/88), executing S/88 VOS (virtual operating system).

The hardware/microcode interface between the processor pairs allows the two operating systems to coexist in an environment perceived by the user as a single system environment. The hardware/microcode resources (memory, system buses, disk I/O , tape, communications I/O terminals, power and enclosures) act independently of each other while each operating system handles its part of the system function. The words memory, storage and store are used interchangeably herein. The FT processor(s) and operating system manage error detection/isolation and recovery, dynamic reconfiguration, and I/O operations. The NFT processor(s) execute native instructions without any awareness of the FT processor. The FT processor appears to the NFT processor as multiple I/O channels.

The hardware/microcode interface allows both virtual memory processors to share a common fault tolerant memory. A continuous block of storage from the memory allocation table of the FT processor is assigned to each NFT processor. The NFT processor's dynamic address translation feature controls the block of storage that was allocated to it by the FT processor. The NFT processor perceives that its memory starts at address zero through the use of an offset register. Limit checking is performed to keep the NFT processor in its own storage boundaries. The FT processor can access the NFT storage and DMA I/O blocks of data in or out of the NFT address space, whereas the NFT processor is prevented from accessing storage outside its assigned address space. The NFT storage size can be altered by changing the configuration table.

2. Uncoupling a Processor from Its Associated Hardware to Present Commands and Data from Another Processor to Itself.

Adding a new device to an existing processor and operating system generally requires hardware attachment via a bus or channel, and the writing of new device driver software for the operating system. The improved "uncoupling" feature allows two distinct processors to communicate with each other without attaching one of the processors to a bus or channel and without arbitrating for bus mastership. The processors communicate without significant operating system modification or the requirements of a traditional device driver. It can give to a user the image of a single system when two distinct and dissimilar processors are merged, even though each processor is executing its own native operating system.

This feature provides a method and means of combining the special features exhibited by a more recently developed operating system, with the users view and reliability of a mature operating system. It couples the two systems (hardware and software) together to form a new third system. It will be clear to those skilled in the art that while the preferred embodiment shows a S/370 system coupled to a S/88 system any two distinct systems could be coupled. The design criteria of this concept are: little or no change to the mature operating system so that it maintains its reliability, and minimal impact to the more recently developed operating system because of the development time for code.

This feature involves a method of combining two dissimilar systems each with its own characteristics into a third system having characteristics of both. A preferred form of the method requires coupling logic between the systems that functions predominantly as a direct memory access controller (DMAC). The main objective of this feature is to give an application program running in a fault tolerant processor (e.g., S/88 in the preferred embodiment) and layered on the fault tolerant operating system, a method of obtaining data and commands from an alien processor (e.g., S/370 in the preferred embodiment) and its operating system. Both hardware and software defense mechanisms exist on any processor to prevent intrusion (i.e. supervisor versus user state, memory map checking, etc.). Typically, operating systems tend to control all system resources such as interrupts, DMA Channels, and I/O devices and controllers. Therefore, to couple two different architectures and transfer commands and data between these machines without having designed this function from the ground up is considered by many a monumental task and/or impractical.

FIG. 4 shows diagrammatically a S/370 processor coupled to a S/88 processor in the environment of the preferred embodiment. By contrast with the S/370 processor shown in FIG. 1, the memory has been replaced by S/88 bus interface logic and the S/370 channel processor has been replaced by a bus adapter and bus control unit. Particular attention is directed to the interconnection between the S/370 bus control unit and the S/88 processor which is shown by a double broken line.

This feature involves attaching the processor coupling logic to the S/88 fault tolerant processor's virtual address bus, data bus, control bus and interrupt bus structure, and not to the system bus or channel as most devices are attached. The strobe line indicating that a valid address is on the fault tolerant processor's virtual address bus is activated a few nanoseconds after the address signals are activated. The coupling logic comprising the bus adapter and the bus control unit determines whether a preselected address range is presented by a S/88 application program before the strobe signal appears. If this address range is detected, the address strobe signal is blocked from going to the S/88 fault tolerant processor hardware. This missing signal will prevent the fault tolerant hardware and operating system from knowing a machine cycle took place. The fault tolerant checking logic in the hardware is isolated during this cycle and will completely miss any activity that occurs during this time. All cache, virtual address mapping logic and floating point processors on the processor bus will fail to recognize that a machine cycle has occurred. That is, all S/88 CPU functions are `frozen,` awaiting the assertion of the Address Strobe signal by the S/88 processor.

The address strobe signal that was blocked from the fault tolerant processor logic is sent to the coupling logic. This gives the S/88 fault tolerant processor complete control over the coupling logic which is the interface between the fault tolerant special application program and the attached S/370 processor. The address strobe signal and the virtual address are used to select local storage, registers and the DMAC which are components of the coupling logic. FIG. 5 shows diagrammatically the result of the detection of an interrupt from the S/370 bus control logic which is determined to be at the appropriate level and corresponding to an appropriate address. In its broadest aspect therefore, the uncoupling mechanism disconnects a processor from its associated hardware and connects the processor to an alien entity for the efficient transfer of data with said entity.

The coupling logic has a local store which is used to queue incoming S/370 commands and store data going to and from the S/370. The data and commands are moved into the local store by multiple DMA channels in the coupling logic. The fault tolerant application program initializes the DMAC and services interrupts from the DMAC, which serves to notify the application program when a command has arrived or when a block of data has been received or sent. To complete an operation, the coupling logic must return data strobe acknowledge lines, prior to the clocking edge of the processor to insure that both sides of the fault tolerant processor stay in sync.

The application program receives S/370 channel type commands such as Start I/O , Test I/O , etc. The application program then converts each S/370 I/O command into a fault tolerant I/O command and initiates a normal fault tolerant I/O command sequence.

This is believed to be a new method of getting a block of data around an operating system and to an application. It is also a way of allowing an application to handle an interrupt which is a function usually done by an operating system. The application program can switch the fault tolerant processor from its normal processor function to the I/O controller function at will, and on a per cycle basis, just by the virtual address it selects.

Thus, two data processing systems having dissimilar instruction and memory addressing architectures are tightly coupled so as to permit one system to effectively access any part of the virtual memory space of the other system without the other system being aware of the one system's existence. Special application code in the other system communicates with the one system via hardware by placing special addresses on the bus. Hardware determines if the address is a special one. If it is, the strobe is blocked from being sensed by the other system's circuits, and redirected such that the other system's CPU can control special hardware, and a memory space, accessible to both systems.

The other system can completely control the one system when necessary, as for initialization and configuration tasks. The one system cannot in any way control the other system, but may present requests for service to the other system in the following manner:

The one system stages I/O commands and/or data in one system format in the commonly accessible memory space and, by use of special hardware, presents an interrupt to the other system at a special level calling the special application program into action.

The latter is directed to the memory space containing the staged information and processes same to convert its format to the other system's native form. Then the application program directs the native operating system of the other system to perform native I/O operations on the converted commands and data. Thus, all of the foregoing occurs completely transparent to and with no significant change in the native operating systems of both systems.

3. Presentation of Interrupts to a System Transparent to the Operating System

Most current programs execute in one of two (or more) states, a supervisor state or a user state. Application programs run in user state, and functions such as interrupts run in supervisor state.

An application attaches an I/O port then opens the port, issues an I/O request in the form of a read, write or control. At that time the processor will take a task switch. When the operating system receives an interrupt signifying an I/O completion, then the operating system will put this information into a ready queue and sort by priority for system resources.

The operating system reserves all interrupt vectors for its own use; none are available for new features such as an external interrupt signifying an I/O request from another machine.

In the S/88 of the preferred embodiment, a majority of the available interrupt vectors are actually unused, and these are set up to cause vectoring to a common error handler for `uninitialized` or `spurious` interrupts, as is the common practice in operating systems. The preferred embodiment of this improvement replaces a subset of these otherwise unused vectors with appropriate vectors to special interrupt handlers for the S/370 coupling logic interrupts. The modified S/88 Operating System is then rebound for use with the newly-integrated vectors in place.

The System/88 of the preferred embodiment has eight interrupt levels and uses autovectors on all levels except level 4. The improvement of the present application uses one of these autovector levels, level 6, which has the next to highest priority. This level 6 is normally used by the System/88 for A/C power disturbance interrupts.

The logic which couples the System/370 to the System/88 presents interrupts to level 6 by ORing its interrupt requests with those of the A/C power disturbance. During system initialization, appropriate vector numbers to the special interrupt handlers for the coupling logic interrupts are loaded into the coupling logic (some, for example, into DMAC registers) by an application program, transparent to the S/88 operating system.

When any interrupt is received by the System/88, it initiates an interrupt acknowledge (IACK) cycle using only hardware and internal operations of the S/88 processor to process the interrupt and fetch the first interrupt handler instruction. No program instruction execution is required. However, the vector number must also be obtained and presented in a transparent fashion. This is achieved in the preferred embodiment by uncoupling the S/88 processor from its associated hardware (including the interrupt presenting mechanism for A/C power disturbances) and coupling the S/88 processor to the S/370-S/88 coupling logic when a level 6 interrupt is presented by the coupling logic.

More specifically, the S/88 processor sets the function code and the interrupt level at its outputs and also asserts Address Strobe (AS) and Data Strobe (DS) at the beginning of the IACK cycle. The Address Strobe is blocked from the S/88
hardware, including the A/C power disturbance interrupt mechanism, if the coupling logic interrupt presenting signal is active; and AS is sent to the coupling logic to read out the appropriate vector number, which is gated into the S/88 processor by the Data Strobe. Because the Data Strobe is blocked from the S/88 hardware, the machine cycle (IACK) is transparent to the S/88 Operating System relative to obtaining the coupling logic interrupt vector number.

If the coupling logic interrupt signal had not been active at the beginning of the IACK cycle a normal S/88 level 6 interrupt would have been taken.

4. Sharing a Real Storage Between Two or More Processors Executing Different Virtual Storage Operating Systems.

This feature couples a fault tolerant system to an alien processor and operating system that does not have code to support a fault tolerant storage, i.e. code to support removal and insertion of storage boards via hot plugging, instantaneous detection of corrupted data and its recovery if appropriate, etc.

This feature provides a method and means whereby two or more processors each executing different virtual operating systems can be made to share a single real storage in a manner transparent to both operating systems, and wherein one processor can access the storage space of the other processor so that data transfers between these multiple processors can occur.

This feature combines two user-apparent operating systems environments to give the appearance to the user of a single operating system. Each operating system is a virtual operating system that normally controls its own complete real storage space. This invention has only one real storage space that is shared by both processors via a common system bus. Neither operating system is substantially rewritten and neither operating system knows the other exists, or that the real storage is shared. This feature uses an application program running on a first processor to search through the first operating system's storage allocation queue. When a contiguous storage space is found, large enough to satisfy the requirements of the second operating system, then this storage space is removed, by manipulating pointers, from the first operating system's storage allocation table. The first operating system no longer has use (e.g., the ability to reallocate) of this removed storage unless the application returns the storage back to the first operating system.

The first operating system is subservient to the second operating system from an I/O perspective and responds to the second operating system as an I/O controller. The first operating system is the master of all system resources, and in the preferred embodiment is a hardware fault tolerant operating system. The first operating system initially allocates and de-allocates storage (except for the storage which is "stolen" for the second operating system), and handles all associated hardware failures and recovery. The objective is to combine the two operating systems without altering the operating system code to any major degree. Each operating system must believe it is controlling all of system storage, since it is a single resource being used by both processors.

When the system is powered up, the first operating system and its processor assume control of the system, and hardware holds the second processor in a reset condition. The first operating system boots the system and determines how much real storage exists. The operating system eventually organizes all storage into 4 KB (4096 bytes) blocks and lists each available block in a storage allocation queue. Each 4 KB block listed in the queue points to the next available 4 KB block. Any storage used by the first system is either removed or added in 4 KB blocks from the top of the queue; and the block pointers are appropriately adjusted. As users request memory space from the operating system the requests are satisfied by assigning from the queue a required number of 4 KB blocks of real storage. When the storage is no longer needed, the blocks will be returned to the queue.

Next the first operating system executes a list of functions called module-start-up that configures the system. One application that is executed by the module-start-up is a new application used to capture storage from the first operating system and allocate the storage to the second operating system. This program scans the complete storage allocation list and finds a contiguous string of 4 KB blocks of storage. The application program then alters the pointers in the portion of the queue corresponding to the contiguous string of blocks, thereby removing a contiguous block of storage from the first operating system's memory allocation list. In the preferred embodiment, the pointer of the 4 KB block preceding the first 4 KB block removed is changed to point to the 4 KB block immediately following the removed contiguous string of blocks.

The first operating system at this point has no control or knowledge of this real memory space unless the system is rebooted or the application returns the storage pointers. It is as if the first operating system considers a segment of real storage allocated to a process running on itself and not reallocable because the blocks are removed from the table, not merely assigned to a user.

The removed address space is then turned over to the second operating system. There is hardware offset logic that makes the address block, stolen from the first operating system and given to the second operating system, appear to start at address zero to the second operating system. The second operating system then controls the storage stolen from the first operating system as if it is its own real storage, and controls the storage through its own virtual storage manager, i.e. it translates virtual addresses issued by the second system into real addresses within the assigned real storage address space.

An application program running on the first operating system can move I/O data into and out of the second processor's storage space, however, the second processor cannot read or write outside of its allocated space because the second operating system does not know of the additional storage. If an operating system malfunction occurs, in the second operating system, a hardware trap will prevent the second operating system from inadvertently writing in the first operating system space.

The amount of storage space allocated to the second operating system is defined in a table in the module-start-up program by the user. If the user wants the second processor to have 16 megabytes then he will define that in the module start up table and the application will acquire that much space from the first operating system. A special SVC (service call) allows the application program to gain access to the supervisor region of the first operating system so that the pointers can be modified.

An important reason why it is desirable for both operating systems to share the same storage is that the storage is fault tolerant on the first processor; and the second processor is allowed to use fault tolerant storage and I/O from the first processor. The second processor is made to be fault tolerant by replicating certain of the hardware and comparing certain of the address, data, and control lines. Using these techniques the second processor is, in fact, a fault tolerant machine even though the second operating system has no fault tolerant capabilities. More than one alien processor and operating system of the second type can be coupled to the first operating system with a separate real storage area provided for each alien processor.

In the preferred embodiment, the first operating system is that of the fault tolerant S/88 and the second operating system is one of the S/370 operating systems and the first and second processors are S/88 and S/370 processors respectively. This feature not only enables a normally non-fault- tolerant system to use a fault tolerant storage which is maintained by a fault tolerant system but also enables the non-fault-tolerant system (1) to share access to fault tolerant I/O apparatus maintained by the fault tolerant system and (2) to exchange data between the systems in a more efficient manner without the significant delays of a channel-to-channel coupling.

5. Single System Image

The term single system image is used to characterize computer networks in which user access to remote data and resources (e.g., printer, hard file, etc.) appears to the user to be the same as access to data and resources at the local terminal to Which the user's keyboard is attached. Thus, the user may access a data file or resource simply by name and without having to know the object's location in the network.

The concept of "derived single system image" is introduced here as a new term, and is intended to apply to computer elements of a network which lack facilities to attach directly to a network having a single system image, but utilize hardware and software resources of that network to attach directly to same with an effective single system image.

For purposes of this discussion, direct attachment of a computer system, for developing effects of "derived single system image," can be effectuated with various degrees of coupling between that system and elements of the network. The term "loose coupling" as used here means a coupling effectuated through I/O channels of the deriving computer and the "native" computer which is part of the network. "Tight coupling" is a term presently used to describe a relationship between the deriving and native computers which is established through special hardware allowing each to communicate with the other on a direct basis (i.e., without using existing I/O channels of either).

A special type of tight coupling presently contemplated, termed "transparent tight coupling," involves the adaptation of the coupling hardware to enable each computer (the deriving and native computers) to utilize resources of the other computer in a manner such that the operating system of each computer is unaware of such utilization. Transparent tight coupling, as just defined, forms a basis for achieving cost and performance advantages in the coupled network. The cost of the coupling hardware, notwithstanding complexity of design, should be more than offset by the savings realized by avoiding the extensive modifications of operating system software which otherwise would be needed. Performance advantages flow from faster connections due to the direct coupling and reduced bandwidth interference at the coupling interface.

The term "network" as used in this section is more restricted than the currently prevalent concept of a network which is a larger international teleprocessing/satellite connection scheme to which many dissimilar machine types may connect if in conformance to some specific protocol. Rather "network" is used in this section to apply to a connected complex of System/88 processors or alternatively to a connected complex of other processors having the characteristics of a single system image.

Several carefully defined terms will be used to further explain the concept of a single system image as contemplated herein; and it will be assumed that the specific preferred embodiment of the improvement will be used as the basis for the clarification:

a. High Speed Data Interconnection (HSDI) refers to a hardware subsystem (and cable) for data transfer between separate hardware units.

b. Link refers to a software construct or object which consists entirely of a multi-part pointer to some other software object and which has much of the character of an alias name.

c. MODULE refers to a free-standing processing unit consisting of at least one each of: enclosure, power supply, CPU, memory, and I/O device. A MODULE can be expanded by bolting together multiple enclosures to house additional peripheral devices creating a larger single module. Some I/O units (terminals, printers) may be external and connected to the enclosure by cables; they are considered part of the single MODULE. A MODULE may have only one CPU complex.

d. CPU COMPLEX refers to one or more single or dual processor boards within the same enclosure, managed and controlled by Operating System software to operate as a single CPU. Regardless of the actual number of processor boards installed, any user program or application is written, and executed, as if only one CPU were present. The processing workload is roughly shared among the available CPU boards, and multiple tasks may execute concurrently, but each application program is presented with a `SINGLE-CPU IMAGE.`

e. OBJECT refers to a collection of data (including executable programs) stored in the system (disk, tape) which can be uniquely identified by a hierarchical name. A LINK is a uniquely-named pointer to some other OBJECT, and so is considered an OBJECT itself. An I/O PORT is a uniquely-named software construct which points to a specific I/O device (a data source or target), and thus is also an OBJECT. The Operating System effectively prevents duplication of OBJECT NAMES.

Because the term `single system image` is not used consistently in the literature, it will be described in greater detail for clarification of the present improvement of a "derived single system image." In defining and describing the term SINGLE-SYSTEM IMAGE, the `image` refers to the application program's view of the system and environment. `System,` in this context, means the combined hardware (CPU complex) and software (Operating System and its utilities) to which the application programmer directs his instructions. `Environment` means all I/O devices and other connected facilities which are addressable by the Operating System and thus accessible indirectly by the programmer, through service requests to the Operating System.

A truly single, free-standing computer with its Operating System, then, must provide a SINGLE-SYSTEM IMAGE to the programmer. It is only when we want to connect multiple systems together in order to share I/O devices and distribute processing that this `image` seen by the programmer begins to change; the ordinary interconnection of two machines via teleprocessing lines (or even cables) forces the programmer to understand--and learn to handle--the dual environment, in order to take advantage of the expanded facilities.

Generally, in order to access facilities in the other environment, he must request his local Operating System to communicate his requirements to the `other` Operating System, and specify those requirements in detail. He must then be able to accept the results of his request asynchronously (and in proper sequence) after an arbitrarily long delay. The handling and control of the multiple messages and data transfers between machines constitute significant processing overhead in both machines; it can be unwieldy, inefficient, and difficult for the programmer in such a DUAL-SYSTEM environment. And when the number of conventionally-connected machines goes up, the complexity for the programmer can increase rapidly.

The System/88 original design included the means to simplify this situation and provide the SINGLE-SYSTEM IMAGE to the programmer, i.e., the HSDI connection between MODULEs, and HSDI drive software within the Operating System in each MODULE. Here, in a two-MODULE system for example, each of the two Operating Systems `know about` the entire environment, and can access facilities across the HSDI without the active intervention of the `other` Operating System. The reduction in communications overhead is considerable.

A large number of MODULEs of various sizes and model types can be interconnected via HSDI to create a system complex that appears to the programmer as one (expandable) environment. His product, an application program, can be stored on one disk in this system complex, executed in any of the CPUs in the complex, controlled or monitored from essentially any of the terminals of the complex, and can transfer data to and from any of the I/O devices of the complex, all without any special programming considerations and with improved execution efficiency over the older methods.

The operating system and its various features and facilities are written in such a way as to natively assume the distributed environment and operate within that environment with the user having no need to be concerned with or have control over where the various entities (utilities, applications, data, language processors, etc.) reside. The key to making all of this possible is the enforced rule that each OBJECT must have a unique name; and this rule easily extends to the entire system complex since the most basic name-qualifier is the MODULE name, which itself must be unique within the complex. Therefore, locating any OBJECT in the entire complex is as simple as correctly naming it. Naming an OBJECT is in turn simplified for the programmer by the provision of LINKs, which allow the use of very short alias pointers to (substitute names for) OBJECTS with very long and complicated names.

To achieve the concept of a "derived single system image" within this complex of interconnected S/88 modules, a plurality of S/370 processors are coupled to S/88 processors in such a manner as to provide for the S/370 processor users at least some aspects of the S/88 single system image features. This, even though the S/370 processors and operating systems do not provide these features.

One or more S/370 processors are provided within the S/88 MODULE. A S/88 processor is uniquely coupled to each S/370 processor. As will be seen, each S/370 processor is replicated and controlled by S/88 software for fault-tolerant operation. The unique direct coupling of the S/88 and S/370 processors, preferably by the uncoupling and interrupt function mechanisms described above, render data transfers between the processors transparent to both the S/370 and S/88 operating systems. Neither operating system is aware of the existence of the other processor or operating system.

Each S/370 processor uses the fault-tolerant S/88 system complex to completely provide the S/370 main storage, and emulated S/370 I/O Channel(s) and I/O device(s). The S/370s have no main memory, channels, or I/O devices which are not part of the S/88, and all of these facilities are fault-tolerant by design.

At system configuration time, each S/370 processor is assigned a dedicated contiguous block of 1 to 16 megabytes of main storage from the S/88 pool; this block is removed from the configuration tables of the S/88 so that the S/88 Operating System cannot access it, even inadvertently. Fault-tolerant hardware registers hold the storage block pointer for each S/370, so that the S/370 has no means to access any main storage other than that assigned to it. The result is an entirely conventional, single-system view of its main memory by the S/370; the fault-tolerant aspect of the memory is completely transparent. An application program (EXEC370) in the S/88 emulates S/370 Channel(s) and I/O device(s) using actual S/88 devices and S/88 Operating System calls. It has the SINGLE-SYSTEM IMAGE view of the S/88 complex, since it is an application program; thus this view is extended to the entire S/370 `pseudo-channel.`

From the opposite point of view, that of the S/370 Operating System (and application programs by extension), it may help to visualize a `window` (the channel) through which all I/O operations take place. The window is not altered in character--no S/370 programs need be changed--but the `view` through the window is broadened to include the SINGLE-SYSTEM IMAGE attributes. A small conceptual step then pictures a large number of S/370s efficiently sharing a single database, that managed by the S/88.

A consequence of this connection technique is relatively simple and quick dynamic reconfigurability of each S/370. The channel `window` is two-way, and the S/88 control program EXEC370 is on the other side of it; EXEC370 has full capability to stop, reset, reinitialize, reconfigure, and restart the S/370 CPU. Thus, by transparent emulation of S/370 I/O facilities using other facilities which possess the SINGLE-SYSTEM IMAGE attribute (S/88 I/O and Operating System), this attribute is extended and afforded to the S/370.

The S/370 therefore has been provided with object location independence. Its users may access a data file or other resource by name, a name assigned to it in the S/88 operating system directory. The user need not know the location of the data file in the complex of S/370-S/88 modules.

S/370 I/O commands issued by one S/370 processing unit in one module 9 are processed by an associated S/88 processing unit tightly coupled to the S/370 processing unit in the same module (or by other S/88 processing units interconnected in the module 9 and controlled by the same copy of the S/88 virtual operating system which supports multiprocessing) to access data files and the like resident in the same or other connected modules. It may return the accessed files to the requesting S/370
processing unit or send them to other modules, for example, to merge with other files.

6. Summary

Thus, the functions of two virtual operating systems (e.g., S/370 VM, VSE or IX370 and S/88 OS) are merged into one physical system. The S/88 processor runs the S/88 OS and handles the fault tolerant aspects of the system. At the same time, one or more S/370 processors are plugged into the S/88 rack and are allocated by the S/88 OS anywhere from 1 to 16 megabytes of contiguous memory per S/370 processor. Each S/370 virtual operating system thinks its memory allocation starts at address 0 and it manages its memory through normal S/370 dynamic memory allocation and paging techniques. The S/370 is limit checked to prevent the S/370 from accessing S/88 memory space. The S/88 must access the S/370 address space since the S/88 must move I/O data into the S/370 I/O buffers. The S/88 Operating System is the master over all system hardware and I/O devices. The peer processor pairs execute their respective Operating Systems in a single system environment without significant rewriting of either operating system.

Introduction--Prior Art System/88

The improvements of the present application will be described with respect to a preferred form in which IBM System/370 (S/370) processing units (executing S/370 instructions under the control of any one of the S/370 operating systems such as VM, VSE, IX370, etc.) are tightly coupled to IBM System/88 (S/88) processing units (executing S/88 instructions in a fault tolerant manner under control of a S/88 operating system in a fault tolerant environment) in a manner which permits fault tolerant operation of the S/370 processing units with the System/88 features of single system image, hot pluggability, instantaneous error detection, I/O load distribution and fault isolation and dynamic reconfigurability.

The IBM System/88 marketed by International Business Machines Corp. is described generally in the IBM System/88 Digest, Second Edition, published in 1986 and other available S/88 customer publications. The System/88 computer system including module 10, FIG. 6A, is a high availability system designed to meet the needs of customers who require highly reliable online processing. System/88 combines a duplexed hardware architecture with sophisticated operating system software to provide a fault tolerant system. The System/88 also provides horizontal growth through the attachment of multiple System/88 modules 10a, 10b, 10c, through the System/88 high speed data interconnections (HSDIs), FIG. 6B, and modules 10d-g through the System/88 Network, FIG. 6C.

The System/88 is designed to detect a component failure when and where it occurs, and to prevent errors and interruptions caused by such failures from being introduced into the system. Since fault tolerance is a part of the System/88 hardware design, it does not require programming by the application developer. Fault tolerance is accomplished with no software overhead or performance degradation. The System/88 achieves fault tolerance through the duplication of major components, including processors, direct access storage devices (DASDs) or disks, memory, and controllers. If a duplexed component fails, its duplexed partner automatically continues processing and the system remains available to the end users. Duplicate power supplies with battery backup for memory retention during a short-term power failure are also provided. System/88 and its software products offer ease of expansion, the sharing of resources among users, and solutions to complex requirements while maintaining a single system image to the end user.

A single system image is a distributed processing environment consisting of many processors, each with its own files and I/O, interconnected via a network or LAN, that presents to the user the impression he is logged on to a single machine. The operating system allows the user to converse from one machine to another just by changing a directory.

With proper planning, the System/88 processing capacity can be expanded while the System/88 is running and while maintaining a single-system image to the end user. Horizontal growth is accomplished by combining multiple processing modules into systems using the System/88 HSDI, and combining multiple systems into a network using the System/88 Network.

A System/88 processing module is a complete, stand-alone computer as seen in FIG. 6A of the drawings. A System/88 system is either a single module or a group of modules connected in a local network with the IBM HSDI as seen in FIG. 6B. The System/88 Network, using remote transmission facilities, is the facility used to interconnect multiple systems to form a single-system image to the end user. Two or more systems can be interconnected by communications lines to form a long haul network. This connection may be through a direct cable, a leased telephone line, or an X.25 network. The System/88 Network detects references to remote resources and routes messages between modules and systems completely transparent to the user.

Hot pluggability allows many hardware replacements to be done without interrupting system operation. The System/88 takes a failing component out of service, continuing service with its duplexed partner, and lights an indicator on the failing component--all without operator intervention. The customer or service personnel can remove and replace a failed duplexed board while processing continues. The benefits to a customer include timely repair and reduced maintenance costs.

Although the System/88 is a fault-tolerant, continuous operation machine, there are times when machine operation will need to be stopped. Some examples of this are to upgrade the System/88 Operating System, to change the hardware configuration (add main storage), or to perform certain service procedures.

The duplexed System/88 components and the System/88 software help maintain data integrity. The System/88 detects a failure or transient error at the point of failure and does not propagate it throughout the application or data. Data is protected from corruption and system integrity is maintained. Each component contains its own error-detection logic and diagnostics. The error-detection logic compares the results of parallel operations at every machine cycle.

If the system detects a component malfunction, that component is automatically removed from service. Processing continues on the duplexed partner while the failed component is checked by internal diagnostics. The error-detection functions will automatically run diagnostics on a failing component removed from service while processing continues on its duplexed partner. If the diagnostics determine that certain components need to be replaced, the System/88 can automatically call a support center to report the problem. The customer benefits from quick repairs and low maintenance costs.

The System/88 is based generally upon processor systems of the type described in detail in U.S. Pat. No. 4,453,215, entitled "Central Processing Apparatus for Fault Tolerant Computing", issued Jun. 5, 1984 to Robert Reid and related U.S. Pat. Nos. 4,486,826, 4,597,084, 4,654,857, and 4,750,177 and 4,816,990; and said patents are hereby incorporated herein by reference in their entirety as if they were set forth fully herein. Portions of the '215 Reid patent are shown diagrammatically in FIGS. 7 and 8 of the present application.

This computer system of FIGS. 7 and 8 of the present application has a processor module 10 with a processing unit 12, a random access storage unit 16, peripheral control units 20, 24, 32, and a single bus structure 30 which provides all information transfers between the several units of the module. The bus structure within each processor module includes duplicate partner buses A, B, and each functional unit 12, 16, 20, 24, 32 has an identical partner unit. Each unit, other than control units which operate with asynchronous peripheral devices, normally operates in lock-step synchronism with its partner unit. For example, the two partner memory units 16, 18 of a processor module normally both drive the two partner buses A, B, and are both driven by the bus structure 30, in full synchronism.

The computer system provides fault detection at the level of each functional unit within a processor module. To attain this feature, error detectors monitor hardware operations within each unit and check information transfers between the units. The detection of an error causes the processor module to isolate the bus or unit which caused the error from transferring information to other units, and the module continues operation. The continued operation employs the partner of the faulty bus or unit. Where the error detection precedes an information transfer, the continued operation can execute the transfer at the same time it would have occurred in the absence of the fault. Where the error detection coincides with an information transfer, the continued operation can repeat the transfer.

The computer system can effect the foregoing fault detection and remedial action rapidly, i.e. within a fraction of an operating cycle. The computer system has at most only a single information transfer that is of questionable validity and which requires repeating to ensure total data validity.

Although a processor module has significant hardware redundancy to provide fault-tolerant operation, a module that has no duplicate units is nevertheless fully operational.

The functional unit redundancy enables the module to continue operating in the event of a fault in any unit. In general, all units of a processor module operate continuously, and with selected synchronism, in the absence of any detected fault. Upon detection of an error-manifesting fault in any unit, that unit is isolated and placed off-line so that it cannot transfer information to other units of the module. The partner of the off-line unit continues operating, normally with essentially no interruption.

In addition to the partnered duplication of functional units within a module to provide fault-tolerant operation, each unit within a processor module generally has a duplicate of hardware which is involved in a data transfer. The purpose of this duplication, within a functional unit, is to test, independently of the other units, for faults within each unit. Other structure within each unit of a module, including the error detection structure, is in general not duplicated.

The common bus structure Which serves all units of a processor module preferably employs a combination of the foregoing two levels of duplication and has three sets of conductors that form an A bus, a B bus that duplicates the A bus, and an X bus. The A and B buses each carry an identical set of cycle-definition, address, data, parity and other signals that can be compared to warn of erroneous information transfer between units. The conductors of the X bus, which are not duplicated, in general carry module-wide and other operating signals such as timing, error conditions, and electrical power. An additional C bus is provided for local communication between partnered units.

A processor module detects and locates a fault by a combination of techniques within each functional unit including comparing the operation of duplicated sections of the unit, the use of parity and further error checking and correcting codes, and by monitoring operating parameters such as supply voltages. Each central processing unit has two redundant processing sections and, if the comparison is invalid, isolates the processing unit from transferring information to the bus structure. This isolates other functional units of the processor module from any faulty information which may stem from the processing unit in question. Each processing unit also has a stage for providing virtual memory, operation which is n